# Schemeta Operations Runbook This runbook covers baseline production operation for Schemeta API + UI. ## Runtime - Node.js 18+ recommended. - Start command: `npm run start` - Default bind: `0.0.0.0:8787` ## Environment Variables - `PORT` (default `8787`) - `MAX_BODY_BYTES` (default `2097152`) - Hard limit for request body size on `POST` endpoints. - `MAX_REQUESTS_PER_MINUTE` (default `120`) - Per-client IP rate limit window for `POST` endpoints. - `SCHEMETA_AUTH_TOKEN` (optional) - When set, all `POST` API routes require either: - `Authorization: Bearer ` - `x-api-key: ` - `CORS_ORIGIN` (optional) - If set, CORS is enabled for this origin only. ## Endpoints - `GET /health` - Liveness probe, returns process uptime and status. - `GET /` - Serves workspace UI. - `POST /compile` - Compile + render with ERC/diagnostics and layout metrics. - `POST /analyze` - Topology and diagnostics summary. - `GET /mcp/ui-bundle` - Metadata for MCP UI embedding. ## Request Correlation and Audit Logs - Every response includes `x-request-id`. - API envelopes include `request_id` for correlation in clients and logs. - Server emits one JSON audit log entry per request on response finish with: - `request_id` - `method` - `path` - `status` - `duration_ms` - `client` ## Production Checks 1. Verify process liveness: - `curl -s http://localhost:${PORT:-8787}/health` 2. Verify compile endpoint: - post `frontend/sample.schemeta.json` to `/compile`. 3. Verify analyze endpoint: - post same sample to `/analyze`. 4. Verify rate limiting: - exceed `MAX_REQUESTS_PER_MINUTE` with repeated `POST` and confirm `429`. 5. Verify auth (if enabled): - request `POST /compile` without token and confirm `401`. - request with valid token and confirm `200`. ## Incident Playbook ## High error rate (5xx) 1. Check process logs for stack traces and malformed payload spikes. 2. Validate request body sizes; lower/raise `MAX_BODY_BYTES` as appropriate. 3. Reproduce with `frontend/sample.schemeta.json` to isolate model-driven payload issues. 4. Roll back to previous known-good tag if regression confirmed. ## Elevated 429 responses 1. Confirm traffic source and whether bursts are expected. 2. If trusted internal clients are throttled, tune `MAX_REQUESTS_PER_MINUTE`. 3. Consider fronting with reverse proxy rate limit tiers for external users. ## UI/compile mismatch reports 1. Capture JSON from user (`Copy Repro` in workspace). 2. Re-run through `/compile` and inspect `warnings`, `errors`, and `layout_metrics`. 3. Compare with last release baseline for crossing/overlap regressions. ## Release / Rollback 1. Follow `docs/release-checklist.md`. 2. Tag releases after checklist completion and test pass. 3. Keep previous stable tag ready for fast rollback. ## Observability Recommendations - Structured request logs are emitted by the app; keep proxy logs for edge-level traces. - Track latency percentiles for `/compile` and `/analyze`. - Track per-endpoint status code rates and top warning/error IDs.