OpenClaw · Observability · Token · Model auth · E2E smoke · Remote Mac · 2026

2026 OpenClaw Observability in Practice:
Token Usage, Model Auth State, and E2E Smoke Failure Summaries on a Remote Mac Gateway

April 21, 2026 Web automation / Release gates 11 min read

Audience: teams running OpenClaw on a remote Mac with Playwright smoke when tokens move but model auth appears stuck. You get shell steps, a vendor-free webhook summary, and fields that split auth, gateway load, and UI flakes. Slug: 2026-openclaw-token-auth-e2e-summary-remote-mac.html. See /v1/models alignment, deploy hook smoke, trace and HAR summaries.

Traces help DOM regressions but hide gateway backpressure and stale bearer scopes. Per-phase OpenClaw fields show whether tokens moved for health checks or wedged auth retries.

Pain signals. Green shards while auth_state=degraded. Prompt tokens spike before browsers start. Clock skew makes JWT expiry disagree with the issuer.

Decision matrix.

Evidence source Best for Blind spot
Vendor usage console Monthly billing disputes. Lag and no OPENCLAW_RUN_ID.
Gateway NDJSON plus webhook summary Per deploy triage with idempotent digests. Needs hostname redaction.
Playwright report.json only Flaky selectors and trace links. No model policy signal.

Reproducible wiring steps.

  1. Export OPENCLAW_RUN_ID, GIT_SHA, BASE_URL, and the model alias your smoke profile expects.
  2. Append one NDJSON line per phase to .openclaw/reports/gateway.ndjson with phase, duration_ms, prompt_tokens, completion_tokens, auth_state, and http_status from the same process that calls the gateway.
  3. Run npx playwright test with --reporter=json and copy report.json beside the NDJSON file.
  4. Execute a small jq or node script that merges both artifacts into smoke_summary/v1 and prints nothing secret to stdout.
  5. POST the merged JSON to your team webhook using Idempotency-Key: ${GIT_SHA}:${OPENCLAW_RUN_ID}:smoke_summary so duplicate hooks collapse.

Summary template. Version the schema so chat parsers stay stable.

{
  "schema": "smoke_summary/v1",
  "openclaw_run_id": "run_…",
  "git_sha": "abc123…",
  "base_url": "https://staging.example",
  "gateway": {
    "auth_state": "ok|degraded|blocked",
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "last_http_status": 200,
    "last_latency_ms": 0
  },
  "playwright": {
    "status": "passed|failed|timedOut",
    "failed_spec": "tests/smoke/checkout.spec.ts",
    "shard": "webkit",
    "exit_code": 1
  },
  "failed_phase": "gateway_preflight|playwright_smoke|summary_post"
}

01 Gateway-side checklist

Run on the remote Mac before blaming Playwright. Aligns with the models smoke checklist but logs fields not clicks.

  • Clock skew. Sync time and log offset ms beside auth_state.
  • Token scope. Match OPENCLAW_GATEWAY_TOKEN to the least privilege runbook role.
  • Sticky sessions. Log upstream_pod or echoed x-request-id when workers fan out.
  • Aliases. Curl /v1/models via the same ingress as smoke; diff aliases to config.
  • Degraded semantics. Document if auth_state=degraded still allows read only probes.
Citable fact

Pair every gateway line with OPENCLAW_RUN_ID for one grep across NDJSON and reports.

Citable fact

Split prompt_tokens and completion_tokens; prompt up and completion flat implies preflight retries before assertions.

Citable fact

Truncate stderr to four kilobytes in chat; link artifacts instead of secrets.

02 Mapping Playwright report fields

Copy stable keys from report.json into the summary. Reuse the normalizer pattern from build metrics PR summaries.

Playwright source Suggested summary field Why it helps gateway triage
stats.expected versus stats.unexpected playwright.assertion_delta Split UI flakes from infra when tokens still move.
suites[].specs[].tests[].results[].workerIndex playwright.worker_index Tie shards to gateway bursts.
errors[].message playwright.first_error Human line for chat plus trace link.

On timedOut, ship testTimeout plus gateway.last_latency_ms. Low latency plus blocked auth means renew creds before DOM dives.

03 Threshold alerts

Fire alerts when signals disagree; single metric pages fatigue staging.

  • Token slope. Five minute prompt_tokens growth over twice the median for the profile while playwright.status stays passed.
  • Auth churn. More than three auth_state flips per OPENCLAW_RUN_ID without a rotation window.
  • Wall clock. Playwright duration over baseline by half while gateway latency is flat.

Store baselines with web ops monitoring notes.

04 FAQ

Why does the token panel move while Playwright still fails?

Health checks and embeddings move tokens before checkout. Join NDJSON phases to Playwright project names, not only totals.

Can we avoid any cloud console entirely?

Yes when ingress, gateway, and runner emit the same fields to stdout and webhooks. Omit console-only URLs.

What is a safe idempotency key for summaries?

Use ${GIT_SHA}:${OPENCLAW_RUN_ID}:${PROFILE} so redeploys dedupe yet second profiles still post.

OpenClaw · Web smoke · No login

Keep Web and OpenClaw Smoke Readable Without a Vendor Console

Open home, pricing, help without login. Next reads: pre deploy web smoke, E2E triage, blog index.

Observability Model auth Playwright
Rent Mac — OpenClaw QA