diff --git a/README.md b/README.md index 0980219..40f31a6 100644 --- a/README.md +++ b/README.md @@ -32,6 +32,10 @@ Operational limits: - `MAX_BODY_BYTES` (default `2097152`) limits request payload size. - `MAX_REQUESTS_PER_MINUTE` (default `120`) applies per-client throttling for POST API endpoints. +Docs: +- `docs/release-checklist.md` +- `docs/operations-runbook.md` + ## REST API ### `POST /compile` diff --git a/docs/operations-runbook.md b/docs/operations-runbook.md new file mode 100644 index 0000000..2b35d0d --- /dev/null +++ b/docs/operations-runbook.md @@ -0,0 +1,76 @@ +# Schemeta Operations Runbook + +This runbook covers baseline production operation for Schemeta API + UI. + +## Runtime + +- Node.js 18+ recommended. +- Start command: `npm run start` +- Default bind: `0.0.0.0:8787` + +## Environment Variables + +- `PORT` (default `8787`) +- `MAX_BODY_BYTES` (default `2097152`) + - Hard limit for request body size on `POST` endpoints. +- `MAX_REQUESTS_PER_MINUTE` (default `120`) + - Per-client IP rate limit window for `POST` endpoints. +- `CORS_ORIGIN` (optional) + - If set, CORS is enabled for this origin only. + +## Endpoints + +- `GET /health` + - Liveness probe, returns process uptime and status. +- `GET /` + - Serves workspace UI. +- `POST /compile` + - Compile + render with ERC/diagnostics and layout metrics. +- `POST /analyze` + - Topology and diagnostics summary. +- `GET /mcp/ui-bundle` + - Metadata for MCP UI embedding. + +## Production Checks + +1. Verify process liveness: + - `curl -s http://localhost:${PORT:-8787}/health` +2. Verify compile endpoint: + - post `frontend/sample.schemeta.json` to `/compile`. +3. Verify analyze endpoint: + - post same sample to `/analyze`. +4. Verify rate limiting: + - exceed `MAX_REQUESTS_PER_MINUTE` with repeated `POST` and confirm `429`. + +## Incident Playbook + +## High error rate (5xx) + +1. Check process logs for stack traces and malformed payload spikes. +2. Validate request body sizes; lower/raise `MAX_BODY_BYTES` as appropriate. +3. Reproduce with `frontend/sample.schemeta.json` to isolate model-driven payload issues. +4. Roll back to previous known-good tag if regression confirmed. + +## Elevated 429 responses + +1. Confirm traffic source and whether bursts are expected. +2. If trusted internal clients are throttled, tune `MAX_REQUESTS_PER_MINUTE`. +3. Consider fronting with reverse proxy rate limit tiers for external users. + +## UI/compile mismatch reports + +1. Capture JSON from user (`Copy Repro` in workspace). +2. Re-run through `/compile` and inspect `warnings`, `errors`, and `layout_metrics`. +3. Compare with last release baseline for crossing/overlap regressions. + +## Release / Rollback + +1. Follow `docs/release-checklist.md`. +2. Tag releases after checklist completion and test pass. +3. Keep previous stable tag ready for fast rollback. + +## Observability Recommendations + +- Add structured request logs at reverse proxy layer. +- Track latency percentiles for `/compile` and `/analyze`. +- Track per-endpoint status code rates and top warning/error IDs. diff --git a/docs/release-checklist.md b/docs/release-checklist.md new file mode 100644 index 0000000..6b2b8bc --- /dev/null +++ b/docs/release-checklist.md @@ -0,0 +1,43 @@ +# Schemeta Release Checklist + +Use this checklist before cutting a release tag. + +## Pre-merge + +- [ ] All relevant issue acceptance criteria are reviewed. +- [ ] PRs are linked to tracked Gitea issues. +- [ ] No unresolved critical/major diagnostics regressions. + +## Validation Gates + +- [ ] `npm test` passes. +- [ ] Core smoke flow tested in UI: + - [ ] Load sample + - [ ] Edit component/pin/net/symbol + - [ ] Undo/redo works + - [ ] Export/import roundtrip works +- [ ] Compile/analyze API smoke checks return expected envelopes. +- [ ] MCP tool calls (`schemeta_compile`, `schemeta_analyze`, `schemeta_ui_bundle`) work. + +## Visual Quality + +- [ ] Representative circuits reviewed for routing readability. +- [ ] Labels remain legible at common zoom levels. +- [ ] No major overlap/crossing regressions vs previous release baseline. + +## Security / Operations + +- [ ] Deployment env vars verified: + - [ ] `PORT` + - [ ] `MAX_BODY_BYTES` + - [ ] `MAX_REQUESTS_PER_MINUTE` + - [ ] `CORS_ORIGIN` +- [ ] Rate limiting behavior manually validated. +- [ ] Health endpoint checked in target environment. + +## Release Artifacts + +- [ ] `README.md` updated for any interface/version changes. +- [ ] `api_version` and `schema_version` changes reviewed/documented. +- [ ] Changelog/release notes generated with notable UX/compat changes. +- [ ] Tag pushed and release announcement links to milestone/issues.