1d3192cffd
- Firefox cookie importer: reads cookies.sqlite with WAL-safe copy, profile detection via profiles.ini, cross-platform paths, domain filtering - Stealth mode: opt-in via launch(stealth: true), patches navigator.webdriver, plugins/mimeTypes, permissions API, WebGL renderer, iframe isolation, languages, plus realistic Safari UA and context hardening - Import tool now accepts 'safari' | 'firefox' source - STEALTH.md reference documentation - Upgraded @types/node to v25 for node:sqlite support Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
258 lines
9.3 KiB
Markdown
258 lines
9.3 KiB
Markdown
# Anti-Bot Stealth Reference
|
|
|
|
Research notes on making Playwright WebKit less detectable by anti-bot systems. Compiled April 2026.
|
|
|
|
## Current State
|
|
|
|
Browse uses **Playwright WebKit** with a bare context — no stealth patches. This is trivially detected by every major anti-bot system (Cloudflare, DataDome, PerimeterX/HUMAN, Akamai).
|
|
|
|
### Detection Vectors
|
|
|
|
| Vector | Severity | Fixable from JS? |
|
|
|--------|----------|------------------|
|
|
| `navigator.webdriver` set to `true` | Critical | Yes |
|
|
| Empty `navigator.plugins` / `mimeTypes` | High | Yes |
|
|
| Default viewport (800x600-ish) | High | Yes |
|
|
| Missing/generic User-Agent | High | Yes |
|
|
| WebGL renderer = SwiftShader / generic | Medium | Yes |
|
|
| Permissions API inconsistencies | Medium | Yes |
|
|
| iframe cross-frame fingerprinting | Medium | Yes |
|
|
| TLS fingerprint (JA3/JA4) | Critical | **No** |
|
|
| IP reputation (datacenter IPs) | Critical | **No** |
|
|
| ML behavioral analysis | High | **No** |
|
|
| Cloudflare Turnstile / JS challenges | High | **No** |
|
|
|
|
## Stealth Ecosystem & WebKit
|
|
|
|
The two main stealth libraries **only support Chromium**:
|
|
|
|
- **`playwright-stealth`** (Python) — patches ~12 Chrome-specific APIs
|
|
- **`playwright-extra`** + stealth plugin (Node.js) — ~17 evasion modules targeting Chrome internals
|
|
|
|
WebKit and Firefox have entirely different internals. No stealth plugin exists for either. All patches for WebKit must be applied manually via `addInitScript()`.
|
|
|
|
## Recommended Patches
|
|
|
|
All patches use `context.addInitScript()` which runs before any page script in **any** Playwright engine (WebKit included).
|
|
|
|
### 1. WebDriver Flag
|
|
|
|
The single most important patch. Set to `undefined`, not `false` — some detectors specifically check for `false` as a signal of patching.
|
|
|
|
```typescript
|
|
await context.addInitScript(() => {
|
|
Object.defineProperty(navigator, 'webdriver', {
|
|
get: () => undefined,
|
|
});
|
|
});
|
|
```
|
|
|
|
### 2. Context Hardening
|
|
|
|
Configure the browser context to look like a real Safari session:
|
|
|
|
```typescript
|
|
const context = await browser.newContext({
|
|
viewport: { width: 1920, height: 1080 },
|
|
userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15',
|
|
locale: 'en-US',
|
|
timezoneId: 'Europe/Warsaw',
|
|
colorScheme: 'light',
|
|
extraHTTPHeaders: {
|
|
'Accept-Language': 'en-US,en;q=0.9',
|
|
},
|
|
});
|
|
```
|
|
|
|
Key points:
|
|
- Viewport should be realistic (1920x1080, 1440x900, 1536x864)
|
|
- User-Agent must match the engine — use a Safari UA for WebKit
|
|
- Locale, timezone, and Accept-Language should be consistent with each other
|
|
|
|
### 3. Plugins & MimeTypes
|
|
|
|
Headless reports empty arrays. Fake them:
|
|
|
|
```typescript
|
|
await context.addInitScript(() => {
|
|
Object.defineProperty(navigator, 'plugins', {
|
|
get: () => [1, 2, 3, 4, 5],
|
|
});
|
|
Object.defineProperty(navigator, 'mimeTypes', {
|
|
get: () => [1, 2],
|
|
});
|
|
});
|
|
```
|
|
|
|
A more sophisticated version would create proper `PluginArray` and `MimeTypeArray` objects with `item()`, `namedItem()`, and `refresh()` methods, but the simple version passes most checks.
|
|
|
|
### 4. Permissions API
|
|
|
|
Fix the inconsistency between `Notification.permission` and `navigator.permissions.query`:
|
|
|
|
```typescript
|
|
await context.addInitScript(() => {
|
|
const originalQuery = window.navigator.permissions.query;
|
|
window.navigator.permissions.query = (parameters: any) =>
|
|
parameters.name === 'notifications'
|
|
? Promise.resolve({ state: Notification.permission } as PermissionStatus)
|
|
: originalQuery(parameters);
|
|
});
|
|
```
|
|
|
|
### 5. WebGL Renderer
|
|
|
|
Mask the GPU vendor/renderer strings. Parameters 37445 and 37446 are `UNMASKED_VENDOR_WEBGL` and `UNMASKED_RENDERER_WEBGL`:
|
|
|
|
```typescript
|
|
await context.addInitScript(() => {
|
|
const getParameter = WebGLRenderingContext.prototype.getParameter;
|
|
WebGLRenderingContext.prototype.getParameter = function (parameter) {
|
|
if (parameter === 37445) return 'Apple GPU';
|
|
if (parameter === 37446) return 'Apple M1 Pro';
|
|
return getParameter.call(this, parameter);
|
|
};
|
|
});
|
|
```
|
|
|
|
Choose values that match the User-Agent. Apple GPU + Apple Silicon for Safari on macOS.
|
|
|
|
### 6. iframe ContentWindow Isolation
|
|
|
|
Some fingerprinters check `navigator.webdriver` inside iframes to catch incomplete patches:
|
|
|
|
```typescript
|
|
await context.addInitScript(() => {
|
|
const desc = Object.getOwnPropertyDescriptor(HTMLIFrameElement.prototype, 'contentWindow');
|
|
Object.defineProperty(HTMLIFrameElement.prototype, 'contentWindow', {
|
|
get: function () {
|
|
const win = desc?.get?.call(this);
|
|
if (win) {
|
|
try {
|
|
Object.defineProperty(win.navigator, 'webdriver', {
|
|
get: () => undefined,
|
|
});
|
|
} catch (_) {}
|
|
}
|
|
return win;
|
|
},
|
|
});
|
|
});
|
|
```
|
|
|
|
### 7. Session Persistence
|
|
|
|
Fresh browser contexts with no cookies or history are a strong bot signal. Use browse's existing `session_save` / `session_restore` tools to persist cookies, localStorage, and sessionStorage across runs.
|
|
|
|
## What Cannot Be Fixed from JavaScript
|
|
|
|
### TLS Fingerprinting (JA3/JA4)
|
|
|
|
Anti-bot systems fingerprint the TLS Client Hello handshake — cipher suites, extensions, and their ordering. WebKit's TLS stack is compiled C++; no amount of JavaScript can change it. Playwright WebKit's JA3 hash doesn't match any shipping Safari release.
|
|
|
|
**Workarounds:**
|
|
- Residential proxies with TLS relay (proxy terminates TLS with its own stack)
|
|
- `curl-impersonate` for non-browser HTTP requests
|
|
- Switch to Chromium where TLS fingerprint matches real Chrome more closely
|
|
|
|
### IP Reputation
|
|
|
|
Datacenter IPs (Hetzner, AWS, GCP, etc.) are pre-flagged in commercial anti-bot databases.
|
|
|
|
**Workarounds:**
|
|
- Residential proxy rotation (BrightData, Oxylabs, etc.)
|
|
- Mobile proxies
|
|
- Running from a real residential IP (home connection)
|
|
|
|
### Behavioral Analysis
|
|
|
|
DataDome, Cloudflare, and PerimeterX use ML models trained on billions of real sessions. They analyze:
|
|
- Mouse movement patterns (speed, acceleration, curves)
|
|
- Scroll behavior (chunked vs smooth, pause patterns)
|
|
- Typing cadence
|
|
- Navigation timing
|
|
- Click patterns (direct element clicks vs natural approach)
|
|
|
|
**Workarounds:**
|
|
- Add realistic delays between actions (`page.waitForTimeout(random)`)
|
|
- Simulate mouse movements before clicks
|
|
- Scroll in chunks with pauses
|
|
- Type character by character with variable delays
|
|
|
|
### CAPTCHA / JavaScript Challenges
|
|
|
|
Cloudflare Turnstile, hCaptcha, and reCAPTCHA require real interaction or solving services.
|
|
|
|
**Workarounds:**
|
|
- CAPTCHA solving APIs: CapSolver, 2Captcha (~$2-5 per 1,000 solves)
|
|
- Wait for challenge resolution: 3-8 seconds after navigation
|
|
- Detect challenge pages by checking for known markers (`"Just a moment"`, `cf-challenge`, `_cf_chl_opt`)
|
|
|
|
## Implementation Strategy
|
|
|
|
### Recommended: Stealth Flag
|
|
|
|
Add an opt-in `stealth` option to `launch()`:
|
|
|
|
```typescript
|
|
async launch(options?: { stealth?: boolean }): Promise<void> {
|
|
this.browser = await webkit.launch({ headless: this.options.headless });
|
|
this.context = await this.browser.newContext({
|
|
viewport: { width: this.options.width, height: this.options.height },
|
|
...(options?.stealth && {
|
|
userAgent: SAFARI_USER_AGENT,
|
|
locale: 'en-US',
|
|
timezoneId: Intl.DateTimeFormat().resolvedOptions().timeZone,
|
|
colorScheme: 'light',
|
|
extraHTTPHeaders: { 'Accept-Language': 'en-US,en;q=0.9' },
|
|
}),
|
|
});
|
|
|
|
if (options?.stealth) {
|
|
await this.applyStealthPatches();
|
|
}
|
|
|
|
this.page = await this.context.newPage();
|
|
}
|
|
```
|
|
|
|
This keeps the default clean for testing while allowing stealth for real-world browsing.
|
|
|
|
### Nuclear Option: Chromium Engine
|
|
|
|
If stealth becomes a core requirement, add a `browser` engine option:
|
|
|
|
```typescript
|
|
launch({ engine: 'chromium', stealth: true })
|
|
```
|
|
|
|
Chromium has the richest stealth ecosystem:
|
|
- `playwright-extra` + stealth plugin (17 evasion modules)
|
|
- `playwright-with-fingerprints` (full fingerprint replacement)
|
|
- Better TLS fingerprint match to real Chrome
|
|
- Most anti-bot systems are tuned for Chrome, so evasions are better tested
|
|
|
|
Trade-off: Chromium is ~200MB heavier than WebKit.
|
|
|
|
## Anti-Bot Provider Cheat Sheet
|
|
|
|
| Provider | Primary Detection | Difficulty |
|
|
|----------|-------------------|------------|
|
|
| Cloudflare (standard) | TLS + JS challenge | Medium |
|
|
| Cloudflare (Turnstile) | Interactive challenge | Hard |
|
|
| DataDome | Behavioral analysis | Hard |
|
|
| PerimeterX / HUMAN | Deep fingerprinting (`_px` scripts) | Hard |
|
|
| Akamai Bot Manager | TLS + sensor data | Hard |
|
|
| Kasada | Obfuscated JS challenge | Very Hard |
|
|
| Basic WAFs | User-Agent + rate limiting | Easy |
|
|
|
|
## References
|
|
|
|
- [Playwright Anti-Bot Detection: What Works (2026) | AlterLab](https://alterlab.io/blog/playwright-anti-bot-detection-what-actually-works-in-2026)
|
|
- [Playwright Stealth: Bypass Bot Detection | Scrapfly](https://scrapfly.io/blog/posts/playwright-stealth-bypass-bot-detection)
|
|
- [Playwright Stealth Mode: The 7 Patches That Matter | DEV Community](https://dev.to/vhub_systems_ed5641f65d59/playwright-stealth-mode-in-2026-the-7-patches-that-actually-matter-46bp)
|
|
- [How to Avoid Bot Detection with Playwright | BrowserStack](https://www.browserstack.com/guide/playwright-bot-detection)
|
|
- [How To Make Playwright Undetectable | ScrapeOps](https://scrapeops.io/playwright-web-scraping-playbook/nodejs-playwright-make-playwright-undetectable/)
|
|
- [Detecting Vanilla Playwright | ScrapingAnt](https://scrapingant.com/blog/detect-playwright-bot)
|
|
- [Playwright Fingerprinting: Explained & Bypass | ZenRows](https://www.zenrows.com/blog/playwright-fingerprint)
|