Files
browse/STEALTH.md
T
aladac 1d3192cffd Add Firefox cookie import and stealth mode
- Firefox cookie importer: reads cookies.sqlite with WAL-safe copy,
  profile detection via profiles.ini, cross-platform paths, domain filtering
- Stealth mode: opt-in via launch(stealth: true), patches navigator.webdriver,
  plugins/mimeTypes, permissions API, WebGL renderer, iframe isolation,
  languages, plus realistic Safari UA and context hardening
- Import tool now accepts 'safari' | 'firefox' source
- STEALTH.md reference documentation
- Upgraded @types/node to v25 for node:sqlite support

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:03:15 +02:00

9.3 KiB

Anti-Bot Stealth Reference

Research notes on making Playwright WebKit less detectable by anti-bot systems. Compiled April 2026.

Current State

Browse uses Playwright WebKit with a bare context — no stealth patches. This is trivially detected by every major anti-bot system (Cloudflare, DataDome, PerimeterX/HUMAN, Akamai).

Detection Vectors

Vector Severity Fixable from JS?
navigator.webdriver set to true Critical Yes
Empty navigator.plugins / mimeTypes High Yes
Default viewport (800x600-ish) High Yes
Missing/generic User-Agent High Yes
WebGL renderer = SwiftShader / generic Medium Yes
Permissions API inconsistencies Medium Yes
iframe cross-frame fingerprinting Medium Yes
TLS fingerprint (JA3/JA4) Critical No
IP reputation (datacenter IPs) Critical No
ML behavioral analysis High No
Cloudflare Turnstile / JS challenges High No

Stealth Ecosystem & WebKit

The two main stealth libraries only support Chromium:

  • playwright-stealth (Python) — patches ~12 Chrome-specific APIs
  • playwright-extra + stealth plugin (Node.js) — ~17 evasion modules targeting Chrome internals

WebKit and Firefox have entirely different internals. No stealth plugin exists for either. All patches for WebKit must be applied manually via addInitScript().

All patches use context.addInitScript() which runs before any page script in any Playwright engine (WebKit included).

1. WebDriver Flag

The single most important patch. Set to undefined, not false — some detectors specifically check for false as a signal of patching.

await context.addInitScript(() => {
  Object.defineProperty(navigator, 'webdriver', {
    get: () => undefined,
  });
});

2. Context Hardening

Configure the browser context to look like a real Safari session:

const context = await browser.newContext({
  viewport: { width: 1920, height: 1080 },
  userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15',
  locale: 'en-US',
  timezoneId: 'Europe/Warsaw',
  colorScheme: 'light',
  extraHTTPHeaders: {
    'Accept-Language': 'en-US,en;q=0.9',
  },
});

Key points:

  • Viewport should be realistic (1920x1080, 1440x900, 1536x864)
  • User-Agent must match the engine — use a Safari UA for WebKit
  • Locale, timezone, and Accept-Language should be consistent with each other

3. Plugins & MimeTypes

Headless reports empty arrays. Fake them:

await context.addInitScript(() => {
  Object.defineProperty(navigator, 'plugins', {
    get: () => [1, 2, 3, 4, 5],
  });
  Object.defineProperty(navigator, 'mimeTypes', {
    get: () => [1, 2],
  });
});

A more sophisticated version would create proper PluginArray and MimeTypeArray objects with item(), namedItem(), and refresh() methods, but the simple version passes most checks.

4. Permissions API

Fix the inconsistency between Notification.permission and navigator.permissions.query:

await context.addInitScript(() => {
  const originalQuery = window.navigator.permissions.query;
  window.navigator.permissions.query = (parameters: any) =>
    parameters.name === 'notifications'
      ? Promise.resolve({ state: Notification.permission } as PermissionStatus)
      : originalQuery(parameters);
});

5. WebGL Renderer

Mask the GPU vendor/renderer strings. Parameters 37445 and 37446 are UNMASKED_VENDOR_WEBGL and UNMASKED_RENDERER_WEBGL:

await context.addInitScript(() => {
  const getParameter = WebGLRenderingContext.prototype.getParameter;
  WebGLRenderingContext.prototype.getParameter = function (parameter) {
    if (parameter === 37445) return 'Apple GPU';
    if (parameter === 37446) return 'Apple M1 Pro';
    return getParameter.call(this, parameter);
  };
});

Choose values that match the User-Agent. Apple GPU + Apple Silicon for Safari on macOS.

6. iframe ContentWindow Isolation

Some fingerprinters check navigator.webdriver inside iframes to catch incomplete patches:

await context.addInitScript(() => {
  const desc = Object.getOwnPropertyDescriptor(HTMLIFrameElement.prototype, 'contentWindow');
  Object.defineProperty(HTMLIFrameElement.prototype, 'contentWindow', {
    get: function () {
      const win = desc?.get?.call(this);
      if (win) {
        try {
          Object.defineProperty(win.navigator, 'webdriver', {
            get: () => undefined,
          });
        } catch (_) {}
      }
      return win;
    },
  });
});

7. Session Persistence

Fresh browser contexts with no cookies or history are a strong bot signal. Use browse's existing session_save / session_restore tools to persist cookies, localStorage, and sessionStorage across runs.

What Cannot Be Fixed from JavaScript

TLS Fingerprinting (JA3/JA4)

Anti-bot systems fingerprint the TLS Client Hello handshake — cipher suites, extensions, and their ordering. WebKit's TLS stack is compiled C++; no amount of JavaScript can change it. Playwright WebKit's JA3 hash doesn't match any shipping Safari release.

Workarounds:

  • Residential proxies with TLS relay (proxy terminates TLS with its own stack)
  • curl-impersonate for non-browser HTTP requests
  • Switch to Chromium where TLS fingerprint matches real Chrome more closely

IP Reputation

Datacenter IPs (Hetzner, AWS, GCP, etc.) are pre-flagged in commercial anti-bot databases.

Workarounds:

  • Residential proxy rotation (BrightData, Oxylabs, etc.)
  • Mobile proxies
  • Running from a real residential IP (home connection)

Behavioral Analysis

DataDome, Cloudflare, and PerimeterX use ML models trained on billions of real sessions. They analyze:

  • Mouse movement patterns (speed, acceleration, curves)
  • Scroll behavior (chunked vs smooth, pause patterns)
  • Typing cadence
  • Navigation timing
  • Click patterns (direct element clicks vs natural approach)

Workarounds:

  • Add realistic delays between actions (page.waitForTimeout(random))
  • Simulate mouse movements before clicks
  • Scroll in chunks with pauses
  • Type character by character with variable delays

CAPTCHA / JavaScript Challenges

Cloudflare Turnstile, hCaptcha, and reCAPTCHA require real interaction or solving services.

Workarounds:

  • CAPTCHA solving APIs: CapSolver, 2Captcha (~$2-5 per 1,000 solves)
  • Wait for challenge resolution: 3-8 seconds after navigation
  • Detect challenge pages by checking for known markers ("Just a moment", cf-challenge, _cf_chl_opt)

Implementation Strategy

Add an opt-in stealth option to launch():

async launch(options?: { stealth?: boolean }): Promise<void> {
  this.browser = await webkit.launch({ headless: this.options.headless });
  this.context = await this.browser.newContext({
    viewport: { width: this.options.width, height: this.options.height },
    ...(options?.stealth && {
      userAgent: SAFARI_USER_AGENT,
      locale: 'en-US',
      timezoneId: Intl.DateTimeFormat().resolvedOptions().timeZone,
      colorScheme: 'light',
      extraHTTPHeaders: { 'Accept-Language': 'en-US,en;q=0.9' },
    }),
  });

  if (options?.stealth) {
    await this.applyStealthPatches();
  }

  this.page = await this.context.newPage();
}

This keeps the default clean for testing while allowing stealth for real-world browsing.

Nuclear Option: Chromium Engine

If stealth becomes a core requirement, add a browser engine option:

launch({ engine: 'chromium', stealth: true })

Chromium has the richest stealth ecosystem:

  • playwright-extra + stealth plugin (17 evasion modules)
  • playwright-with-fingerprints (full fingerprint replacement)
  • Better TLS fingerprint match to real Chrome
  • Most anti-bot systems are tuned for Chrome, so evasions are better tested

Trade-off: Chromium is ~200MB heavier than WebKit.

Anti-Bot Provider Cheat Sheet

Provider Primary Detection Difficulty
Cloudflare (standard) TLS + JS challenge Medium
Cloudflare (Turnstile) Interactive challenge Hard
DataDome Behavioral analysis Hard
PerimeterX / HUMAN Deep fingerprinting (_px scripts) Hard
Akamai Bot Manager TLS + sensor data Hard
Kasada Obfuscated JS challenge Very Hard
Basic WAFs User-Agent + rate limiting Easy

References