Files

T

aladac 1d3192cffd Add Firefox cookie import and stealth mode

- Firefox cookie importer: reads cookies.sqlite with WAL-safe copy,
  profile detection via profiles.ini, cross-platform paths, domain filtering
- Stealth mode: opt-in via launch(stealth: true), patches navigator.webdriver,
  plugins/mimeTypes, permissions API, WebGL renderer, iframe isolation,
  languages, plus realistic Safari UA and context hardening
- Import tool now accepts 'safari' | 'firefox' source
- STEALTH.md reference documentation
- Upgraded @types/node to v25 for node:sqlite support

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-12 23:03:15 +02:00

9.3 KiB

Raw Blame History

Anti-Bot Stealth Reference

Research notes on making Playwright WebKit less detectable by anti-bot systems. Compiled April 2026.

Current State

Browse uses Playwright WebKit with a bare context — no stealth patches. This is trivially detected by every major anti-bot system (Cloudflare, DataDome, PerimeterX/HUMAN, Akamai).

Detection Vectors

Vector	Severity	Fixable from JS?
`navigator.webdriver` set to `true`	Critical	Yes
Empty `navigator.plugins` / `mimeTypes`	High	Yes
Default viewport (800x600-ish)	High	Yes
Missing/generic User-Agent	High	Yes
WebGL renderer = SwiftShader / generic	Medium	Yes
Permissions API inconsistencies	Medium	Yes
iframe cross-frame fingerprinting	Medium	Yes
TLS fingerprint (JA3/JA4)	Critical	No
IP reputation (datacenter IPs)	Critical	No
ML behavioral analysis	High	No
Cloudflare Turnstile / JS challenges	High	No

Stealth Ecosystem & WebKit

The two main stealth libraries only support Chromium:

playwright-stealth (Python) — patches ~12 Chrome-specific APIs
playwright-extra + stealth plugin (Node.js) — ~17 evasion modules targeting Chrome internals

WebKit and Firefox have entirely different internals. No stealth plugin exists for either. All patches for WebKit must be applied manually via addInitScript().

Recommended Patches

All patches use context.addInitScript() which runs before any page script in any Playwright engine (WebKit included).

1. WebDriver Flag

The single most important patch. Set to undefined, not false — some detectors specifically check for false as a signal of patching.

await context.addInitScript(() => {
  Object.defineProperty(navigator, 'webdriver', {
    get: () => undefined,
  });
});

2. Context Hardening

Configure the browser context to look like a real Safari session:

const context = await browser.newContext({
  viewport: { width: 1920, height: 1080 },
  userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15',
  locale: 'en-US',
  timezoneId: 'Europe/Warsaw',
  colorScheme: 'light',
  extraHTTPHeaders: {
    'Accept-Language': 'en-US,en;q=0.9',
  },
});

Key points:

Viewport should be realistic (1920x1080, 1440x900, 1536x864)
User-Agent must match the engine — use a Safari UA for WebKit
Locale, timezone, and Accept-Language should be consistent with each other

3. Plugins & MimeTypes

Headless reports empty arrays. Fake them:

await context.addInitScript(() => {
  Object.defineProperty(navigator, 'plugins', {
    get: () => [1, 2, 3, 4, 5],
  });
  Object.defineProperty(navigator, 'mimeTypes', {
    get: () => [1, 2],
  });
});

A more sophisticated version would create proper PluginArray and MimeTypeArray objects with item(), namedItem(), and refresh() methods, but the simple version passes most checks.

4. Permissions API

Fix the inconsistency between Notification.permission and navigator.permissions.query:

await context.addInitScript(() => {
  const originalQuery = window.navigator.permissions.query;
  window.navigator.permissions.query = (parameters: any) =>
    parameters.name === 'notifications'
      ? Promise.resolve({ state: Notification.permission } as PermissionStatus)
      : originalQuery(parameters);
});

5. WebGL Renderer

Mask the GPU vendor/renderer strings. Parameters 37445 and 37446 are UNMASKED_VENDOR_WEBGL and UNMASKED_RENDERER_WEBGL:

await context.addInitScript(() => {
  const getParameter = WebGLRenderingContext.prototype.getParameter;
  WebGLRenderingContext.prototype.getParameter = function (parameter) {
    if (parameter === 37445) return 'Apple GPU';
    if (parameter === 37446) return 'Apple M1 Pro';
    return getParameter.call(this, parameter);
  };
});

Choose values that match the User-Agent. Apple GPU + Apple Silicon for Safari on macOS.

6. iframe ContentWindow Isolation

Some fingerprinters check navigator.webdriver inside iframes to catch incomplete patches:

await context.addInitScript(() => {
  const desc = Object.getOwnPropertyDescriptor(HTMLIFrameElement.prototype, 'contentWindow');
  Object.defineProperty(HTMLIFrameElement.prototype, 'contentWindow', {
    get: function () {
      const win = desc?.get?.call(this);
      if (win) {
        try {
          Object.defineProperty(win.navigator, 'webdriver', {
            get: () => undefined,
          });
        } catch (_) {}
      }
      return win;
    },
  });
});

7. Session Persistence

Fresh browser contexts with no cookies or history are a strong bot signal. Use browse's existing session_save / session_restore tools to persist cookies, localStorage, and sessionStorage across runs.

What Cannot Be Fixed from JavaScript

TLS Fingerprinting (JA3/JA4)

Anti-bot systems fingerprint the TLS Client Hello handshake — cipher suites, extensions, and their ordering. WebKit's TLS stack is compiled C++; no amount of JavaScript can change it. Playwright WebKit's JA3 hash doesn't match any shipping Safari release.

Workarounds:

Residential proxies with TLS relay (proxy terminates TLS with its own stack)
curl-impersonate for non-browser HTTP requests
Switch to Chromium where TLS fingerprint matches real Chrome more closely

IP Reputation

Datacenter IPs (Hetzner, AWS, GCP, etc.) are pre-flagged in commercial anti-bot databases.

Workarounds:

Residential proxy rotation (BrightData, Oxylabs, etc.)
Mobile proxies
Running from a real residential IP (home connection)

Behavioral Analysis

DataDome, Cloudflare, and PerimeterX use ML models trained on billions of real sessions. They analyze:

Mouse movement patterns (speed, acceleration, curves)
Scroll behavior (chunked vs smooth, pause patterns)
Typing cadence
Navigation timing
Click patterns (direct element clicks vs natural approach)

Workarounds:

Add realistic delays between actions (page.waitForTimeout(random))
Simulate mouse movements before clicks
Scroll in chunks with pauses
Type character by character with variable delays

CAPTCHA / JavaScript Challenges

Cloudflare Turnstile, hCaptcha, and reCAPTCHA require real interaction or solving services.

Workarounds:

CAPTCHA solving APIs: CapSolver, 2Captcha (~$2-5 per 1,000 solves)
Wait for challenge resolution: 3-8 seconds after navigation
Detect challenge pages by checking for known markers ("Just a moment", cf-challenge, _cf_chl_opt)

Implementation Strategy

Recommended: Stealth Flag

Add an opt-in stealth option to launch():

async launch(options?: { stealth?: boolean }): Promise<void> {
  this.browser = await webkit.launch({ headless: this.options.headless });
  this.context = await this.browser.newContext({
    viewport: { width: this.options.width, height: this.options.height },
    ...(options?.stealth && {
      userAgent: SAFARI_USER_AGENT,
      locale: 'en-US',
      timezoneId: Intl.DateTimeFormat().resolvedOptions().timeZone,
      colorScheme: 'light',
      extraHTTPHeaders: { 'Accept-Language': 'en-US,en;q=0.9' },
    }),
  });

  if (options?.stealth) {
    await this.applyStealthPatches();
  }

  this.page = await this.context.newPage();
}

This keeps the default clean for testing while allowing stealth for real-world browsing.

Nuclear Option: Chromium Engine

If stealth becomes a core requirement, add a browser engine option:

launch({ engine: 'chromium', stealth: true })

Chromium has the richest stealth ecosystem:

playwright-extra + stealth plugin (17 evasion modules)
playwright-with-fingerprints (full fingerprint replacement)
Better TLS fingerprint match to real Chrome
Most anti-bot systems are tuned for Chrome, so evasions are better tested

Trade-off: Chromium is ~200MB heavier than WebKit.

Anti-Bot Provider Cheat Sheet

Provider	Primary Detection	Difficulty
Cloudflare (standard)	TLS + JS challenge	Medium
Cloudflare (Turnstile)	Interactive challenge	Hard
DataDome	Behavioral analysis	Hard
PerimeterX / HUMAN	Deep fingerprinting (`_px` scripts)	Hard
Akamai Bot Manager	TLS + sensor data	Hard
Kasada	Obfuscated JS challenge	Very Hard
Basic WAFs	User-Agent + rate limiting	Easy

9.3 KiB Raw Blame History

Anti-Bot Stealth Reference

Current State

Detection Vectors

Stealth Ecosystem & WebKit

Recommended Patches

1. WebDriver Flag

2. Context Hardening

3. Plugins & MimeTypes

4. Permissions API

5. WebGL Renderer

6. iframe ContentWindow Isolation

7. Session Persistence

What Cannot Be Fixed from JavaScript

TLS Fingerprinting (JA3/JA4)

IP Reputation

Behavioral Analysis

CAPTCHA / JavaScript Challenges

Implementation Strategy

Recommended: Stealth Flag

Nuclear Option: Chromium Engine

Anti-Bot Provider Cheat Sheet

References

9.3 KiB

Raw Blame History