Files

T

aladac a80224df36 feat: add preview tool — navigate + screenshot in one call with optional POST

New MCP tool `preview` combines goto + screenshot with viewport control.
Optionally POSTs result to any HTTP endpoint (e.g. HUD/visor) via previewUrl.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-14 12:44:52 +02:00

12 KiB

Raw Blame History

Anti-Bot Stealth Reference

Research notes on making Playwright WebKit less detectable by anti-bot systems. Compiled April 2026.

NOTICE — Scope & Intent

This is a web automation research document covering browser fingerprinting and bot detection evasion techniques. It is compiled as technical reference for the @saiden/browse npm package -- a published, open-source headless browser tool for Claude Code.

Context

Bot detection evasion is standard practice across several legitimate industries:

QA and test automation -- Automated tests that trigger bot detection produce false failures, making stealth patches essential for reliable CI/CD pipelines
Price monitoring and comparison -- E-commerce price tracking services (Honey, Camelcamelcamel, Google Shopping) routinely work around anti-bot measures
Academic web research -- Researchers scraping public data for studies in computational social science, NLP corpus building, and web accessibility auditing
Accessibility testing -- Automated accessibility scanners must behave like real browsers to accurately assess how sites serve content to assistive technologies
Search engine indexing -- Crawlers from Google, Bing, and others implement similar techniques to access content behind bot detection layers

The Playwright, Puppeteer, and Selenium communities have maintained stealth plugins as standard tooling for years. This document compiles publicly known techniques specific to WebKit, which is less covered than Chromium in existing literature.

What This Covers

Browser fingerprinting vectors and their detection severity
JavaScript-level patches for addInitScript() (Playwright's official API)
Context configuration for realistic browser profiles
Limitations of JS-level evasion (TLS fingerprinting, IP reputation, behavioral analysis)
Anti-bot provider comparison for automated testing planning

What This Is NOT

Not a guide for bypassing paywalls or accessing content you are not authorised to view
Not a credential stuffing or account takeover toolkit
Not intended for scraping personal data in violation of privacy regulations (GDPR, CCPA)

All techniques referenced are publicly documented in the Playwright ecosystem, security research literature, and the web automation community. Sources are cited at the end of this document.

Current State

Browse uses Playwright WebKit with a bare context — no stealth patches. This is trivially detected by every major anti-bot system (Cloudflare, DataDome, PerimeterX/HUMAN, Akamai).

Detection Vectors

Vector	Severity	Fixable from JS?
`navigator.webdriver` set to `true`	Critical	Yes
Empty `navigator.plugins` / `mimeTypes`	High	Yes
Default viewport (800x600-ish)	High	Yes
Missing/generic User-Agent	High	Yes
WebGL renderer = SwiftShader / generic	Medium	Yes
Permissions API inconsistencies	Medium	Yes
iframe cross-frame fingerprinting	Medium	Yes
TLS fingerprint (JA3/JA4)	Critical	No
IP reputation (datacenter IPs)	Critical	No
ML behavioral analysis	High	No
Cloudflare Turnstile / JS challenges	High	No

Stealth Ecosystem & WebKit

The two main stealth libraries only support Chromium:

playwright-stealth (Python) — patches ~12 Chrome-specific APIs
playwright-extra + stealth plugin (Node.js) — ~17 evasion modules targeting Chrome internals

WebKit and Firefox have entirely different internals. No stealth plugin exists for either. All patches for WebKit must be applied manually via addInitScript().

Recommended Patches

All patches use context.addInitScript() which runs before any page script in any Playwright engine (WebKit included).

1. WebDriver Flag

The single most important patch. Set to undefined, not false — some detectors specifically check for false as a signal of patching.

await context.addInitScript(() => {
  Object.defineProperty(navigator, 'webdriver', {
    get: () => undefined,
  });
});

2. Context Hardening

Configure the browser context to look like a real Safari session:

const context = await browser.newContext({
  viewport: { width: 1920, height: 1080 },
  userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15',
  locale: 'en-US',
  timezoneId: 'Europe/Warsaw',
  colorScheme: 'light',
  extraHTTPHeaders: {
    'Accept-Language': 'en-US,en;q=0.9',
  },
});

Key points:

Viewport should be realistic (1920x1080, 1440x900, 1536x864)
User-Agent must match the engine — use a Safari UA for WebKit
Locale, timezone, and Accept-Language should be consistent with each other

3. Plugins & MimeTypes

Headless reports empty arrays. Fake them:

await context.addInitScript(() => {
  Object.defineProperty(navigator, 'plugins', {
    get: () => [1, 2, 3, 4, 5],
  });
  Object.defineProperty(navigator, 'mimeTypes', {
    get: () => [1, 2],
  });
});

A more sophisticated version would create proper PluginArray and MimeTypeArray objects with item(), namedItem(), and refresh() methods, but the simple version passes most checks.

4. Permissions API

Fix the inconsistency between Notification.permission and navigator.permissions.query:

await context.addInitScript(() => {
  const originalQuery = window.navigator.permissions.query;
  window.navigator.permissions.query = (parameters: any) =>
    parameters.name === 'notifications'
      ? Promise.resolve({ state: Notification.permission } as PermissionStatus)
      : originalQuery(parameters);
});

5. WebGL Renderer

Mask the GPU vendor/renderer strings. Parameters 37445 and 37446 are UNMASKED_VENDOR_WEBGL and UNMASKED_RENDERER_WEBGL:

await context.addInitScript(() => {
  const getParameter = WebGLRenderingContext.prototype.getParameter;
  WebGLRenderingContext.prototype.getParameter = function (parameter) {
    if (parameter === 37445) return 'Apple GPU';
    if (parameter === 37446) return 'Apple M1 Pro';
    return getParameter.call(this, parameter);
  };
});

Choose values that match the User-Agent. Apple GPU + Apple Silicon for Safari on macOS.

6. iframe ContentWindow Isolation

Some fingerprinters check navigator.webdriver inside iframes to catch incomplete patches:

await context.addInitScript(() => {
  const desc = Object.getOwnPropertyDescriptor(HTMLIFrameElement.prototype, 'contentWindow');
  Object.defineProperty(HTMLIFrameElement.prototype, 'contentWindow', {
    get: function () {
      const win = desc?.get?.call(this);
      if (win) {
        try {
          Object.defineProperty(win.navigator, 'webdriver', {
            get: () => undefined,
          });
        } catch (_) {}
      }
      return win;
    },
  });
});

7. Session Persistence

Fresh browser contexts with no cookies or history are a strong bot signal. Use browse's existing session_save / session_restore tools to persist cookies, localStorage, and sessionStorage across runs.

What Cannot Be Fixed from JavaScript

TLS Fingerprinting (JA3/JA4)

Anti-bot systems fingerprint the TLS Client Hello handshake — cipher suites, extensions, and their ordering. WebKit's TLS stack is compiled C++; no amount of JavaScript can change it. Playwright WebKit's JA3 hash doesn't match any shipping Safari release.

Workarounds:

Residential proxies with TLS relay (proxy terminates TLS with its own stack)
curl-impersonate for non-browser HTTP requests
Switch to Chromium where TLS fingerprint matches real Chrome more closely

IP Reputation

Datacenter IPs (Hetzner, AWS, GCP, etc.) are pre-flagged in commercial anti-bot databases.

Workarounds:

Residential proxy rotation (BrightData, Oxylabs, etc.)
Mobile proxies
Running from a real residential IP (home connection)

Behavioral Analysis

DataDome, Cloudflare, and PerimeterX use ML models trained on billions of real sessions. They analyze:

Mouse movement patterns (speed, acceleration, curves)
Scroll behavior (chunked vs smooth, pause patterns)
Typing cadence
Navigation timing
Click patterns (direct element clicks vs natural approach)

Workarounds:

Add realistic delays between actions (page.waitForTimeout(random))
Simulate mouse movements before clicks
Scroll in chunks with pauses
Type character by character with variable delays

CAPTCHA / JavaScript Challenges

Cloudflare Turnstile, hCaptcha, and reCAPTCHA require real interaction or solving services.

Workarounds:

CAPTCHA solving APIs: CapSolver, 2Captcha (~$2-5 per 1,000 solves)
Wait for challenge resolution: 3-8 seconds after navigation
Detect challenge pages by checking for known markers ("Just a moment", cf-challenge, _cf_chl_opt)

Implementation Strategy

Recommended: Stealth Flag

Add an opt-in stealth option to launch():

async launch(options?: { stealth?: boolean }): Promise<void> {
  this.browser = await webkit.launch({ headless: this.options.headless });
  this.context = await this.browser.newContext({
    viewport: { width: this.options.width, height: this.options.height },
    ...(options?.stealth && {
      userAgent: SAFARI_USER_AGENT,
      locale: 'en-US',
      timezoneId: Intl.DateTimeFormat().resolvedOptions().timeZone,
      colorScheme: 'light',
      extraHTTPHeaders: { 'Accept-Language': 'en-US,en;q=0.9' },
    }),
  });

  if (options?.stealth) {
    await this.applyStealthPatches();
  }

  this.page = await this.context.newPage();
}

This keeps the default clean for testing while allowing stealth for real-world browsing.

Nuclear Option: Chromium Engine

If stealth becomes a core requirement, add a browser engine option:

launch({ engine: 'chromium', stealth: true })

Chromium has the richest stealth ecosystem:

playwright-extra + stealth plugin (17 evasion modules)
playwright-with-fingerprints (full fingerprint replacement)
Better TLS fingerprint match to real Chrome
Most anti-bot systems are tuned for Chrome, so evasions are better tested

Trade-off: Chromium is ~200MB heavier than WebKit.

Anti-Bot Provider Cheat Sheet

Provider	Primary Detection	Difficulty
Cloudflare (standard)	TLS + JS challenge	Medium
Cloudflare (Turnstile)	Interactive challenge	Hard
DataDome	Behavioral analysis	Hard
PerimeterX / HUMAN	Deep fingerprinting (`_px` scripts)	Hard
Akamai Bot Manager	TLS + sensor data	Hard
Kasada	Obfuscated JS challenge	Very Hard
Basic WAFs	User-Agent + rate limiting	Easy

12 KiB Raw Blame History

Anti-Bot Stealth Reference

NOTICE — Scope & Intent

Context

What This Covers

What This Is NOT

Current State

Detection Vectors

Stealth Ecosystem & WebKit

Recommended Patches

1. WebDriver Flag

2. Context Hardening

3. Plugins & MimeTypes

4. Permissions API

5. WebGL Renderer

6. iframe ContentWindow Isolation

7. Session Persistence

What Cannot Be Fixed from JavaScript

TLS Fingerprinting (JA3/JA4)

IP Reputation

Behavioral Analysis

CAPTCHA / JavaScript Challenges

Implementation Strategy

Recommended: Stealth Flag

Nuclear Option: Chromium Engine

Anti-Bot Provider Cheat Sheet

References

12 KiB

Raw Blame History