Why merge on the remote Mac instead of uploading each shard HTML separately?

Merged blob output preserves cross-shard timing and one junit.xml for branch protection rules. Reviewers see a single tree while CI still parallelizes execution.

How does OpenClaw avoid duplicating PR comments?

Anchor comments with an HTML marker such as openclaw-pw:sha and reuse the same idempotency key until the SHA changes.

OpenClaw · Playwright sharding · Flake policy · Remote Mac · 2026

2026 OpenClaw Frontend QA in Practice:
Playwright Shards, Merged Reports, Flake Retries, and PR Summaries on a Remote Mac

April 24, 2026 Frontend QA automation 10 min read

Audience: frontend QA engineers who already run Playwright but need parallel shards, one merged report, honest flake handling, and a PR summary that humans read in under a minute. This HowTo chains a remote Mac runner to the OpenClaw gateway so orchestration stays reproducible. You get parameter tables, directory contracts, gateway boundaries, retry math, and timeout triage. Pair with token and auth observability, trace and HAR min-repro, and regression log triage—distinct from AI auto-fix flows that focus on repair loops instead of shard economics.

Pain points: (1) Four green HTML reports that nobody merges into one signal. (2) Branch protection reads junit from shard zero only. (3) Flaky cases retry forever and hide real regressions. (4) OpenClaw posts walls of JSON instead of reviewer-ready Markdown.

01 Playwright shard parameter table

Parameter	Typical value	Notes for remote Mac
`PLAYWRIGHT_SHARD` / `PLAYWRIGHT_TOTAL`	`3` of `8`	Zero-based shard index must stay stable across reruns; total equals worker count you provision.
`--shard=current/total`	CLI mirror of env pair	Use the same tuple in logs OpenClaw echoes so support can grep one string.
`--workers`	`1` inside each shard job	Prefer one browser fleet per shard on Apple Silicon to avoid GPU contention with WebKit.
Blob output directory	`blob-report/shard-3`	Unique per shard before merge; never two writers on one path.

Approach	Wall clock	Reviewer UX	Flake visibility
Single fat runner	Slowest	One report, easy	Noisy retries stack on one timeline
Sharded without merge	Fastest raw	Fragmented tabs	Hard to compare flake rates
Sharded plus merge	Fast with one fan-in step	Single HTML and junit	Central flake_stats.json

02 Report directory conventions

Treat .openclaw/reports/$GIT_SHA/ as the contract root on the remote Mac. Each shard writes raw/shard-$i/blob/. After copy, run merge-reports into merged/html and merged/junit.xml. Store flake_stats.json beside them so OpenClaw templates can read deterministic paths without scanning the tree.

mkdir -p ".openclaw/reports/${GIT_SHA}/raw/shard-${SHARD}/blob"
PLAYWRIGHT_SHARD=${SHARD} PLAYWRIGHT_TOTAL=${TOTAL} npx playwright test --reporter=blob
# fan-in host:
npx playwright merge-reports ".openclaw/reports/${GIT_SHA}/raw/shard-"*"/blob" \
  --reporter=html,junit

Citeable facts: junit gates on GitHub read one file path. Blob reporter is the supported merge input in modern Playwright. WebKit cold starts routinely add tens of seconds on first launch.

03 OpenClaw invocation boundaries

Keep browsers and Playwright entirely on the Mac. Let the OpenClaw gateway ingest only structured artifacts: merged junit counts, top failing test titles, shard wall skew, and signed artifact URLs. Do not stream raw traces through the model unless triage needs them; link out like the trace summary playbook. POST summaries with Idempotency-Key: ${GIT_SHA}:${CI_PIPELINE_ID}:pw and embed  so comment updates stay idempotent.

Gateway validates JSON schema version pw_summary/v1 before touching Git.
Secrets never leave the runner; OpenClaw receives presigned GET URLs with short TTL.
When auth fails mid-run, reuse the observability fields from the token auth article instead of guessing.

04 Flake retry threshold formula examples

Let p be the per-attempt failure probability for a flaky case and ε your residual risk budget after automated retries, still assuming independence as a first pass. Solve p^(r+1) ≤ ε so r_max = min(3, ceil(ln(ε)/ln(p) - 1)). Example: p = 0.2 and ε = 0.01 needs r+1 ≥ 3 because 0.2^2 > 0.01 while 0.2^3 ≤ 0.01, giving two retries after the first attempt. For suites, cap total flake reruns with flake_budget = floor(0.15 * total_tests) so one noisy module cannot consume the whole pool.

Tag known flakes, run them in a slower quarantine project, and let merge reports show both first-attempt and retry columns so reviewers trust the green.

05 Common timeout troubleshooting

Fixture timeout on WebKit: raise expect timeouts only after you raise actionTimeout and confirm the Mac is not thermal-throttling parallel shards.
Navigation timeout to preview URLs: align cold-start probes with your deployment hook smoke cadence so tests start after the edge returns 200.
Merge step stuck: verify every shard uploaded a complete blob; partial copies produce silent merge hangs—checksum each folder before fan-in.

06 Reproducible HowTo steps

Export GIT_SHA, TOTAL, and per-job SHARD; refuse to start if the tuple is incomplete.
Install browsers once per machine image; cache PLAYWRIGHT_BROWSERS_PATH outside the workspace copy.
Run tests with blob reporter paths under .openclaw/reports/$GIT_SHA/raw/shard-$SHARD/blob.
On the fan-in runner, merge blobs, emit HTML and junit, then compute flake deltas from junit timestamps.
Build pr_playwright_summary.md with sections for failures, flakes, shard skew, and artifact links.
POST through OpenClaw with the idempotency key and update the existing PR comment marker when the SHA matches.

Summary

Shard for speed, merge for signal, cap flakes with explicit math, and let OpenClaw ship Markdown instead of raw logs. A remote Mac keeps WebKit honest for continuous runs.

Browse the blog index for more OpenClaw playbooks.

Remote Mac · WebKit · Always-on QA

Rent Apple Silicon for Sharded Playwright and OpenClaw Digests

Keep long-running shard pools and merge hosts on the same remote Mac class as production WebKit. Pricing, SSH and VNC help, rent or buy.

Parallel shards PR summaries WebKit fidelity

View Pricing Help & SSH/VNC

2026 OpenClaw Frontend QA in Practice: Playwright Shards, Merged Reports, Flake Retries, and PR Summaries on a Remote Mac