96 lines
3.0 KiB
Markdown
96 lines
3.0 KiB
Markdown
# Schemeta Operations Runbook
|
|
|
|
This runbook covers baseline production operation for Schemeta API + UI.
|
|
|
|
## Runtime
|
|
|
|
- Node.js 18+ recommended.
|
|
- Start command: `npm run start`
|
|
- Default bind: `0.0.0.0:8787`
|
|
|
|
## Environment Variables
|
|
|
|
- `PORT` (default `8787`)
|
|
- `MAX_BODY_BYTES` (default `2097152`)
|
|
- Hard limit for request body size on `POST` endpoints.
|
|
- `MAX_REQUESTS_PER_MINUTE` (default `120`)
|
|
- Per-client IP rate limit window for `POST` endpoints.
|
|
- `SCHEMETA_AUTH_TOKEN` (optional)
|
|
- When set, all `POST` API routes require either:
|
|
- `Authorization: Bearer <token>`
|
|
- `x-api-key: <token>`
|
|
- `CORS_ORIGIN` (optional)
|
|
- If set, CORS is enabled for this origin only.
|
|
|
|
## Endpoints
|
|
|
|
- `GET /health`
|
|
- Liveness probe, returns process uptime and status.
|
|
- `GET /`
|
|
- Serves workspace UI.
|
|
- `POST /compile`
|
|
- Compile + render with ERC/diagnostics and layout metrics.
|
|
- `POST /analyze`
|
|
- Topology and diagnostics summary.
|
|
- `GET /mcp/ui-bundle`
|
|
- Metadata for MCP UI embedding.
|
|
|
|
## Request Correlation and Audit Logs
|
|
|
|
- Every response includes `x-request-id`.
|
|
- API envelopes include `request_id` for correlation in clients and logs.
|
|
- Server emits one JSON audit log entry per request on response finish with:
|
|
- `request_id`
|
|
- `method`
|
|
- `path`
|
|
- `status`
|
|
- `duration_ms`
|
|
- `client`
|
|
|
|
## Production Checks
|
|
|
|
1. Verify process liveness:
|
|
- `curl -s http://localhost:${PORT:-8787}/health`
|
|
2. Verify compile endpoint:
|
|
- post `frontend/sample.schemeta.json` to `/compile`.
|
|
3. Verify analyze endpoint:
|
|
- post same sample to `/analyze`.
|
|
4. Verify rate limiting:
|
|
- exceed `MAX_REQUESTS_PER_MINUTE` with repeated `POST` and confirm `429`.
|
|
5. Verify auth (if enabled):
|
|
- request `POST /compile` without token and confirm `401`.
|
|
- request with valid token and confirm `200`.
|
|
|
|
## Incident Playbook
|
|
|
|
## High error rate (5xx)
|
|
|
|
1. Check process logs for stack traces and malformed payload spikes.
|
|
2. Validate request body sizes; lower/raise `MAX_BODY_BYTES` as appropriate.
|
|
3. Reproduce with `frontend/sample.schemeta.json` to isolate model-driven payload issues.
|
|
4. Roll back to previous known-good tag if regression confirmed.
|
|
|
|
## Elevated 429 responses
|
|
|
|
1. Confirm traffic source and whether bursts are expected.
|
|
2. If trusted internal clients are throttled, tune `MAX_REQUESTS_PER_MINUTE`.
|
|
3. Consider fronting with reverse proxy rate limit tiers for external users.
|
|
|
|
## UI/compile mismatch reports
|
|
|
|
1. Capture JSON from user (`Copy Repro` in workspace).
|
|
2. Re-run through `/compile` and inspect `warnings`, `errors`, and `layout_metrics`.
|
|
3. Compare with last release baseline for crossing/overlap regressions.
|
|
|
|
## Release / Rollback
|
|
|
|
1. Follow `docs/release-checklist.md`.
|
|
2. Tag releases after checklist completion and test pass.
|
|
3. Keep previous stable tag ready for fast rollback.
|
|
|
|
## Observability Recommendations
|
|
|
|
- Structured request logs are emitted by the app; keep proxy logs for edge-level traces.
|
|
- Track latency percentiles for `/compile` and `/analyze`.
|
|
- Track per-endpoint status code rates and top warning/error IDs.
|