2026 OpenClaw Frontend QA in Practice:
Playwright Shards, Merged Reports, Flake Retries, and PR Summaries on a Remote Mac
Audience: frontend QA engineers who already run Playwright but need parallel shards, one merged report, honest flake handling, and a PR summary that humans read in under a minute. This HowTo chains a remote Mac runner to the OpenClaw gateway so orchestration stays reproducible. You get parameter tables, directory contracts, gateway boundaries, retry math, and timeout triage. Pair with token and auth observability, trace and HAR min-repro, and regression log triage—distinct from AI auto-fix flows that focus on repair loops instead of shard economics.
Pain points: (1) Four green HTML reports that nobody merges into one signal. (2) Branch protection reads junit from shard zero only. (3) Flaky cases retry forever and hide real regressions. (4) OpenClaw posts walls of JSON instead of reviewer-ready Markdown.
01 Playwright shard parameter table
| Parameter | Typical value | Notes for remote Mac |
|---|---|---|
PLAYWRIGHT_SHARD / PLAYWRIGHT_TOTAL |
3 of 8 |
Zero-based shard index must stay stable across reruns; total equals worker count you provision. |
--shard=current/total |
CLI mirror of env pair | Use the same tuple in logs OpenClaw echoes so support can grep one string. |
--workers |
1 inside each shard job |
Prefer one browser fleet per shard on Apple Silicon to avoid GPU contention with WebKit. |
| Blob output directory | blob-report/shard-3 |
Unique per shard before merge; never two writers on one path. |
| Approach | Wall clock | Reviewer UX | Flake visibility |
|---|---|---|---|
| Single fat runner | Slowest | One report, easy | Noisy retries stack on one timeline |
| Sharded without merge | Fastest raw | Fragmented tabs | Hard to compare flake rates |
| Sharded plus merge | Fast with one fan-in step | Single HTML and junit | Central flake_stats.json |
02 Report directory conventions
Treat .openclaw/reports/$GIT_SHA/ as the contract root on the remote Mac. Each shard writes raw/shard-$i/blob/. After copy, run merge-reports into merged/html and merged/junit.xml. Store flake_stats.json beside them so OpenClaw templates can read deterministic paths without scanning the tree.
mkdir -p ".openclaw/reports/${GIT_SHA}/raw/shard-${SHARD}/blob"
PLAYWRIGHT_SHARD=${SHARD} PLAYWRIGHT_TOTAL=${TOTAL} npx playwright test --reporter=blob
# fan-in host:
npx playwright merge-reports ".openclaw/reports/${GIT_SHA}/raw/shard-"*"/blob" \
--reporter=html,junit
Citeable facts: junit gates on GitHub read one file path. Blob reporter is the supported merge input in modern Playwright. WebKit cold starts routinely add tens of seconds on first launch.
03 OpenClaw invocation boundaries
Keep browsers and Playwright entirely on the Mac. Let the OpenClaw gateway ingest only structured artifacts: merged junit counts, top failing test titles, shard wall skew, and signed artifact URLs. Do not stream raw traces through the model unless triage needs them; link out like the trace summary playbook. POST summaries with Idempotency-Key: ${GIT_SHA}:${CI_PIPELINE_ID}:pw and embed <!-- openclaw-pw:${GIT_SHA} --> so comment updates stay idempotent.
- Gateway validates JSON schema version
pw_summary/v1before touching Git. - Secrets never leave the runner; OpenClaw receives presigned GET URLs with short TTL.
- When auth fails mid-run, reuse the observability fields from the token auth article instead of guessing.
04 Flake retry threshold formula examples
Let p be the per-attempt failure probability for a flaky case and ε your residual risk budget after automated retries, still assuming independence as a first pass. Solve p^(r+1) ≤ ε so r_max = min(3, ceil(ln(ε)/ln(p) - 1)). Example: p = 0.2 and ε = 0.01 needs r+1 ≥ 3 because 0.2^2 > 0.01 while 0.2^3 ≤ 0.01, giving two retries after the first attempt. For suites, cap total flake reruns with flake_budget = floor(0.15 * total_tests) so one noisy module cannot consume the whole pool.
Tag known flakes, run them in a slower quarantine project, and let merge reports show both first-attempt and retry columns so reviewers trust the green.
05 Common timeout troubleshooting
- Fixture timeout on WebKit: raise
expecttimeouts only after you raiseactionTimeoutand confirm the Mac is not thermal-throttling parallel shards. - Navigation timeout to preview URLs: align cold-start probes with your deployment hook smoke cadence so tests start after the edge returns
200. - Merge step stuck: verify every shard uploaded a complete blob; partial copies produce silent merge hangs—checksum each folder before fan-in.
06 Reproducible HowTo steps
- Export
GIT_SHA,TOTAL, and per-jobSHARD; refuse to start if the tuple is incomplete. - Install browsers once per machine image; cache
PLAYWRIGHT_BROWSERS_PATHoutside the workspace copy. - Run tests with blob reporter paths under
.openclaw/reports/$GIT_SHA/raw/shard-$SHARD/blob. - On the fan-in runner, merge blobs, emit HTML and junit, then compute flake deltas from junit timestamps.
- Build
pr_playwright_summary.mdwith sections for failures, flakes, shard skew, and artifact links. - POST through OpenClaw with the idempotency key and update the existing PR comment marker when the SHA matches.
Shard for speed, merge for signal, cap flakes with explicit math, and let OpenClaw ship Markdown instead of raw logs. A remote Mac keeps WebKit honest for continuous runs.
Browse the blog index for more OpenClaw playbooks.
Rent Apple Silicon for Sharded Playwright and OpenClaw Digests
Keep long-running shard pools and merge hosts on the same remote Mac class as production WebKit. Pricing, SSH and VNC help, rent or buy.