2026 OpenClaw Observability in Practice:
Token Usage, Model Auth State, and E2E Smoke Failure Summaries on a Remote Mac Gateway
Audience: teams running OpenClaw on a remote Mac with Playwright smoke when tokens move but model auth appears stuck. You get shell steps, a vendor-free webhook summary, and fields that split auth, gateway load, and UI flakes. Slug: 2026-openclaw-token-auth-e2e-summary-remote-mac.html. See /v1/models alignment, deploy hook smoke, trace and HAR summaries.
Traces help DOM regressions but hide gateway backpressure and stale bearer scopes. Per-phase OpenClaw fields show whether tokens moved for health checks or wedged auth retries.
Pain signals. Green shards while auth_state=degraded. Prompt tokens spike before browsers start. Clock skew makes JWT expiry disagree with the issuer.
Decision matrix.
| Evidence source | Best for | Blind spot |
|---|---|---|
| Vendor usage console | Monthly billing disputes. | Lag and no OPENCLAW_RUN_ID. |
| Gateway NDJSON plus webhook summary | Per deploy triage with idempotent digests. | Needs hostname redaction. |
| Playwright report.json only | Flaky selectors and trace links. | No model policy signal. |
Reproducible wiring steps.
- Export
OPENCLAW_RUN_ID,GIT_SHA,BASE_URL, and the model alias your smoke profile expects. - Append one NDJSON line per phase to
.openclaw/reports/gateway.ndjsonwithphase,duration_ms,prompt_tokens,completion_tokens,auth_state, andhttp_statusfrom the same process that calls the gateway. - Run
npx playwright testwith--reporter=jsonand copyreport.jsonbeside the NDJSON file. - Execute a small jq or node script that merges both artifacts into
smoke_summary/v1and prints nothing secret to stdout. - POST the merged JSON to your team webhook using
Idempotency-Key: ${GIT_SHA}:${OPENCLAW_RUN_ID}:smoke_summaryso duplicate hooks collapse.
Summary template. Version the schema so chat parsers stay stable.
{
"schema": "smoke_summary/v1",
"openclaw_run_id": "run_…",
"git_sha": "abc123…",
"base_url": "https://staging.example",
"gateway": {
"auth_state": "ok|degraded|blocked",
"prompt_tokens": 0,
"completion_tokens": 0,
"last_http_status": 200,
"last_latency_ms": 0
},
"playwright": {
"status": "passed|failed|timedOut",
"failed_spec": "tests/smoke/checkout.spec.ts",
"shard": "webkit",
"exit_code": 1
},
"failed_phase": "gateway_preflight|playwright_smoke|summary_post"
}
01 Gateway-side checklist
Run on the remote Mac before blaming Playwright. Aligns with the models smoke checklist but logs fields not clicks.
- Clock skew. Sync time and log offset ms beside
auth_state. - Token scope. Match
OPENCLAW_GATEWAY_TOKENto the least privilege runbook role. - Sticky sessions. Log
upstream_podor echoedx-request-idwhen workers fan out. - Aliases. Curl
/v1/modelsvia the same ingress as smoke; diff aliases to config. - Degraded semantics. Document if
auth_state=degradedstill allows read only probes.
Pair every gateway line with OPENCLAW_RUN_ID for one grep across NDJSON and reports.
Split prompt_tokens and completion_tokens; prompt up and completion flat implies preflight retries before assertions.
Truncate stderr to four kilobytes in chat; link artifacts instead of secrets.
02 Mapping Playwright report fields
Copy stable keys from report.json into the summary. Reuse the normalizer pattern from build metrics PR summaries.
| Playwright source | Suggested summary field | Why it helps gateway triage |
|---|---|---|
stats.expected versus stats.unexpected |
playwright.assertion_delta |
Split UI flakes from infra when tokens still move. |
suites[].specs[].tests[].results[].workerIndex |
playwright.worker_index |
Tie shards to gateway bursts. |
errors[].message |
playwright.first_error |
Human line for chat plus trace link. |
On timedOut, ship testTimeout plus gateway.last_latency_ms. Low latency plus blocked auth means renew creds before DOM dives.
03 Threshold alerts
Fire alerts when signals disagree; single metric pages fatigue staging.
- Token slope. Five minute
prompt_tokensgrowth over twice the median for the profile whileplaywright.statusstays passed. - Auth churn. More than three
auth_stateflips perOPENCLAW_RUN_IDwithout a rotation window. - Wall clock. Playwright duration over baseline by half while gateway latency is flat.
Store baselines with web ops monitoring notes.
04 FAQ
Why does the token panel move while Playwright still fails?
Health checks and embeddings move tokens before checkout. Join NDJSON phases to Playwright project names, not only totals.
Can we avoid any cloud console entirely?
Yes when ingress, gateway, and runner emit the same fields to stdout and webhooks. Omit console-only URLs.
What is a safe idempotency key for summaries?
Use ${GIT_SHA}:${OPENCLAW_RUN_ID}:${PROFILE} so redeploys dedupe yet second profiles still post.
Keep Web and OpenClaw Smoke Readable Without a Vendor Console
Open home, pricing, help without login. Next reads: pre deploy web smoke, E2E triage, blog index.