# Anti-Bot Stealth Reference

Research notes on making Playwright WebKit less detectable by anti-bot systems. Compiled April 2026.

---

## NOTICE — Scope & Intent

**This is a web automation research document covering browser fingerprinting and bot detection evasion techniques.** It is compiled as technical reference for the `@saiden/browse` npm package -- a published, open-source headless browser tool for Claude Code.

### Context

Bot detection evasion is standard practice across several legitimate industries:

- **QA and test automation** -- Automated tests that trigger bot detection produce false failures, making stealth patches essential for reliable CI/CD pipelines
- **Price monitoring and comparison** -- E-commerce price tracking services (Honey, Camelcamelcamel, Google Shopping) routinely work around anti-bot measures
- **Academic web research** -- Researchers scraping public data for studies in computational social science, NLP corpus building, and web accessibility auditing
- **Accessibility testing** -- Automated accessibility scanners must behave like real browsers to accurately assess how sites serve content to assistive technologies
- **Search engine indexing** -- Crawlers from Google, Bing, and others implement similar techniques to access content behind bot detection layers

The Playwright, Puppeteer, and Selenium communities have maintained stealth plugins as standard tooling for years. This document compiles publicly known techniques specific to WebKit, which is less covered than Chromium in existing literature.

### What This Covers

- Browser fingerprinting vectors and their detection severity
- JavaScript-level patches for `addInitScript()` (Playwright's official API)
- Context configuration for realistic browser profiles
- Limitations of JS-level evasion (TLS fingerprinting, IP reputation, behavioral analysis)
- Anti-bot provider comparison for automated testing planning

### What This Is NOT

- Not a guide for bypassing paywalls or accessing content you are not authorised to view
- Not a credential stuffing or account takeover toolkit
- Not intended for scraping personal data in violation of privacy regulations (GDPR, CCPA)

All techniques referenced are publicly documented in the Playwright ecosystem, security research literature, and the web automation community. Sources are cited at the end of this document.

---

## Current State

Browse uses **Playwright WebKit** with a bare context — no stealth patches. This is trivially detected by every major anti-bot system (Cloudflare, DataDome, PerimeterX/HUMAN, Akamai).

### Detection Vectors

| Vector | Severity | Fixable from JS? |
|--------|----------|------------------|
| `navigator.webdriver` set to `true` | Critical | Yes |
| Empty `navigator.plugins` / `mimeTypes` | High | Yes |
| Default viewport (800x600-ish) | High | Yes |
| Missing/generic User-Agent | High | Yes |
| WebGL renderer = SwiftShader / generic | Medium | Yes |
| Permissions API inconsistencies | Medium | Yes |
| iframe cross-frame fingerprinting | Medium | Yes |
| TLS fingerprint (JA3/JA4) | Critical | **No** |
| IP reputation (datacenter IPs) | Critical | **No** |
| ML behavioral analysis | High | **No** |
| Cloudflare Turnstile / JS challenges | High | **No** |

## Stealth Ecosystem & WebKit

The two main stealth libraries **only support Chromium**:

- **`playwright-stealth`** (Python) — patches ~12 Chrome-specific APIs
- **`playwright-extra`** + stealth plugin (Node.js) — ~17 evasion modules targeting Chrome internals

WebKit and Firefox have entirely different internals. No stealth plugin exists for either. All patches for WebKit must be applied manually via `addInitScript()`.

## Recommended Patches

All patches use `context.addInitScript()` which runs before any page script in **any** Playwright engine (WebKit included).

### 1. WebDriver Flag

The single most important patch. Set to `undefined`, not `false` — some detectors specifically check for `false` as a signal of patching.

```typescript
await context.addInitScript(() => {
  Object.defineProperty(navigator, 'webdriver', {
    get: () => undefined,
  });
});
```

### 2. Context Hardening

Configure the browser context to look like a real Safari session:

```typescript
const context = await browser.newContext({
  viewport: { width: 1920, height: 1080 },
  userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15',
  locale: 'en-US',
  timezoneId: 'Europe/Warsaw',
  colorScheme: 'light',
  extraHTTPHeaders: {
    'Accept-Language': 'en-US,en;q=0.9',
  },
});
```

Key points:
- Viewport should be realistic (1920x1080, 1440x900, 1536x864)
- User-Agent must match the engine — use a Safari UA for WebKit
- Locale, timezone, and Accept-Language should be consistent with each other

### 3. Plugins & MimeTypes

Headless reports empty arrays. Fake them:

```typescript
await context.addInitScript(() => {
  Object.defineProperty(navigator, 'plugins', {
    get: () => [1, 2, 3, 4, 5],
  });
  Object.defineProperty(navigator, 'mimeTypes', {
    get: () => [1, 2],
  });
});
```

A more sophisticated version would create proper `PluginArray` and `MimeTypeArray` objects with `item()`, `namedItem()`, and `refresh()` methods, but the simple version passes most checks.

### 4. Permissions API

Fix the inconsistency between `Notification.permission` and `navigator.permissions.query`:

```typescript
await context.addInitScript(() => {
  const originalQuery = window.navigator.permissions.query;
  window.navigator.permissions.query = (parameters: any) =>
    parameters.name === 'notifications'
      ? Promise.resolve({ state: Notification.permission } as PermissionStatus)
      : originalQuery(parameters);
});
```

### 5. WebGL Renderer

Mask the GPU vendor/renderer strings. Parameters 37445 and 37446 are `UNMASKED_VENDOR_WEBGL` and `UNMASKED_RENDERER_WEBGL`:

```typescript
await context.addInitScript(() => {
  const getParameter = WebGLRenderingContext.prototype.getParameter;
  WebGLRenderingContext.prototype.getParameter = function (parameter) {
    if (parameter === 37445) return 'Apple GPU';
    if (parameter === 37446) return 'Apple M1 Pro';
    return getParameter.call(this, parameter);
  };
});
```

Choose values that match the User-Agent. Apple GPU + Apple Silicon for Safari on macOS.

### 6. iframe ContentWindow Isolation

Some fingerprinters check `navigator.webdriver` inside iframes to catch incomplete patches:

```typescript
await context.addInitScript(() => {
  const desc = Object.getOwnPropertyDescriptor(HTMLIFrameElement.prototype, 'contentWindow');
  Object.defineProperty(HTMLIFrameElement.prototype, 'contentWindow', {
    get: function () {
      const win = desc?.get?.call(this);
      if (win) {
        try {
          Object.defineProperty(win.navigator, 'webdriver', {
            get: () => undefined,
          });
        } catch (_) {}
      }
      return win;
    },
  });
});
```

### 7. Session Persistence

Fresh browser contexts with no cookies or history are a strong bot signal. Use browse's existing `session_save` / `session_restore` tools to persist cookies, localStorage, and sessionStorage across runs.

## What Cannot Be Fixed from JavaScript

### TLS Fingerprinting (JA3/JA4)

Anti-bot systems fingerprint the TLS Client Hello handshake — cipher suites, extensions, and their ordering. WebKit's TLS stack is compiled C++; no amount of JavaScript can change it. Playwright WebKit's JA3 hash doesn't match any shipping Safari release.

**Workarounds:**
- Residential proxies with TLS relay (proxy terminates TLS with its own stack)
- `curl-impersonate` for non-browser HTTP requests
- Switch to Chromium where TLS fingerprint matches real Chrome more closely

### IP Reputation

Datacenter IPs (Hetzner, AWS, GCP, etc.) are pre-flagged in commercial anti-bot databases.

**Workarounds:**
- Residential proxy rotation (BrightData, Oxylabs, etc.)
- Mobile proxies
- Running from a real residential IP (home connection)

### Behavioral Analysis

DataDome, Cloudflare, and PerimeterX use ML models trained on billions of real sessions. They analyze:
- Mouse movement patterns (speed, acceleration, curves)
- Scroll behavior (chunked vs smooth, pause patterns)
- Typing cadence
- Navigation timing
- Click patterns (direct element clicks vs natural approach)

**Workarounds:**
- Add realistic delays between actions (`page.waitForTimeout(random)`)
- Simulate mouse movements before clicks
- Scroll in chunks with pauses
- Type character by character with variable delays

### CAPTCHA / JavaScript Challenges

Cloudflare Turnstile, hCaptcha, and reCAPTCHA require real interaction or solving services.

**Workarounds:**
- CAPTCHA solving APIs: CapSolver, 2Captcha (~$2-5 per 1,000 solves)
- Wait for challenge resolution: 3-8 seconds after navigation
- Detect challenge pages by checking for known markers (`"Just a moment"`, `cf-challenge`, `_cf_chl_opt`)

## Implementation Strategy

### Recommended: Stealth Flag

Add an opt-in `stealth` option to `launch()`:

```typescript
async launch(options?: { stealth?: boolean }): Promise<void> {
  this.browser = await webkit.launch({ headless: this.options.headless });
  this.context = await this.browser.newContext({
    viewport: { width: this.options.width, height: this.options.height },
    ...(options?.stealth && {
      userAgent: SAFARI_USER_AGENT,
      locale: 'en-US',
      timezoneId: Intl.DateTimeFormat().resolvedOptions().timeZone,
      colorScheme: 'light',
      extraHTTPHeaders: { 'Accept-Language': 'en-US,en;q=0.9' },
    }),
  });

  if (options?.stealth) {
    await this.applyStealthPatches();
  }

  this.page = await this.context.newPage();
}
```

This keeps the default clean for testing while allowing stealth for real-world browsing.

### Nuclear Option: Chromium Engine

If stealth becomes a core requirement, add a `browser` engine option:

```typescript
launch({ engine: 'chromium', stealth: true })
```

Chromium has the richest stealth ecosystem:
- `playwright-extra` + stealth plugin (17 evasion modules)
- `playwright-with-fingerprints` (full fingerprint replacement)
- Better TLS fingerprint match to real Chrome
- Most anti-bot systems are tuned for Chrome, so evasions are better tested

Trade-off: Chromium is ~200MB heavier than WebKit.

## Anti-Bot Provider Cheat Sheet

| Provider | Primary Detection | Difficulty |
|----------|-------------------|------------|
| Cloudflare (standard) | TLS + JS challenge | Medium |
| Cloudflare (Turnstile) | Interactive challenge | Hard |
| DataDome | Behavioral analysis | Hard |
| PerimeterX / HUMAN | Deep fingerprinting (`_px` scripts) | Hard |
| Akamai Bot Manager | TLS + sensor data | Hard |
| Kasada | Obfuscated JS challenge | Very Hard |
| Basic WAFs | User-Agent + rate limiting | Easy |

## References

- [Playwright Anti-Bot Detection: What Works (2026) | AlterLab](https://alterlab.io/blog/playwright-anti-bot-detection-what-actually-works-in-2026)
- [Playwright Stealth: Bypass Bot Detection | Scrapfly](https://scrapfly.io/blog/posts/playwright-stealth-bypass-bot-detection)
- [Playwright Stealth Mode: The 7 Patches That Matter | DEV Community](https://dev.to/vhub_systems_ed5641f65d59/playwright-stealth-mode-in-2026-the-7-patches-that-actually-matter-46bp)
- [How to Avoid Bot Detection with Playwright | BrowserStack](https://www.browserstack.com/guide/playwright-bot-detection)
- [How To Make Playwright Undetectable | ScrapeOps](https://scrapeops.io/playwright-web-scraping-playbook/nodejs-playwright-make-playwright-undetectable/)
- [Detecting Vanilla Playwright | ScrapingAnt](https://scrapingant.com/blog/detect-playwright-bot)
- [Playwright Fingerprinting: Explained & Bypass | ZenRows](https://www.zenrows.com/blog/playwright-fingerprint)