Files
browse/STEALTH.md
T
aladac 1d3192cffd Add Firefox cookie import and stealth mode
- Firefox cookie importer: reads cookies.sqlite with WAL-safe copy,
  profile detection via profiles.ini, cross-platform paths, domain filtering
- Stealth mode: opt-in via launch(stealth: true), patches navigator.webdriver,
  plugins/mimeTypes, permissions API, WebGL renderer, iframe isolation,
  languages, plus realistic Safari UA and context hardening
- Import tool now accepts 'safari' | 'firefox' source
- STEALTH.md reference documentation
- Upgraded @types/node to v25 for node:sqlite support

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:03:15 +02:00

258 lines
9.3 KiB
Markdown

# Anti-Bot Stealth Reference
Research notes on making Playwright WebKit less detectable by anti-bot systems. Compiled April 2026.
## Current State
Browse uses **Playwright WebKit** with a bare context — no stealth patches. This is trivially detected by every major anti-bot system (Cloudflare, DataDome, PerimeterX/HUMAN, Akamai).
### Detection Vectors
| Vector | Severity | Fixable from JS? |
|--------|----------|------------------|
| `navigator.webdriver` set to `true` | Critical | Yes |
| Empty `navigator.plugins` / `mimeTypes` | High | Yes |
| Default viewport (800x600-ish) | High | Yes |
| Missing/generic User-Agent | High | Yes |
| WebGL renderer = SwiftShader / generic | Medium | Yes |
| Permissions API inconsistencies | Medium | Yes |
| iframe cross-frame fingerprinting | Medium | Yes |
| TLS fingerprint (JA3/JA4) | Critical | **No** |
| IP reputation (datacenter IPs) | Critical | **No** |
| ML behavioral analysis | High | **No** |
| Cloudflare Turnstile / JS challenges | High | **No** |
## Stealth Ecosystem & WebKit
The two main stealth libraries **only support Chromium**:
- **`playwright-stealth`** (Python) — patches ~12 Chrome-specific APIs
- **`playwright-extra`** + stealth plugin (Node.js) — ~17 evasion modules targeting Chrome internals
WebKit and Firefox have entirely different internals. No stealth plugin exists for either. All patches for WebKit must be applied manually via `addInitScript()`.
## Recommended Patches
All patches use `context.addInitScript()` which runs before any page script in **any** Playwright engine (WebKit included).
### 1. WebDriver Flag
The single most important patch. Set to `undefined`, not `false` — some detectors specifically check for `false` as a signal of patching.
```typescript
await context.addInitScript(() => {
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined,
});
});
```
### 2. Context Hardening
Configure the browser context to look like a real Safari session:
```typescript
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15',
locale: 'en-US',
timezoneId: 'Europe/Warsaw',
colorScheme: 'light',
extraHTTPHeaders: {
'Accept-Language': 'en-US,en;q=0.9',
},
});
```
Key points:
- Viewport should be realistic (1920x1080, 1440x900, 1536x864)
- User-Agent must match the engine — use a Safari UA for WebKit
- Locale, timezone, and Accept-Language should be consistent with each other
### 3. Plugins & MimeTypes
Headless reports empty arrays. Fake them:
```typescript
await context.addInitScript(() => {
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5],
});
Object.defineProperty(navigator, 'mimeTypes', {
get: () => [1, 2],
});
});
```
A more sophisticated version would create proper `PluginArray` and `MimeTypeArray` objects with `item()`, `namedItem()`, and `refresh()` methods, but the simple version passes most checks.
### 4. Permissions API
Fix the inconsistency between `Notification.permission` and `navigator.permissions.query`:
```typescript
await context.addInitScript(() => {
const originalQuery = window.navigator.permissions.query;
window.navigator.permissions.query = (parameters: any) =>
parameters.name === 'notifications'
? Promise.resolve({ state: Notification.permission } as PermissionStatus)
: originalQuery(parameters);
});
```
### 5. WebGL Renderer
Mask the GPU vendor/renderer strings. Parameters 37445 and 37446 are `UNMASKED_VENDOR_WEBGL` and `UNMASKED_RENDERER_WEBGL`:
```typescript
await context.addInitScript(() => {
const getParameter = WebGLRenderingContext.prototype.getParameter;
WebGLRenderingContext.prototype.getParameter = function (parameter) {
if (parameter === 37445) return 'Apple GPU';
if (parameter === 37446) return 'Apple M1 Pro';
return getParameter.call(this, parameter);
};
});
```
Choose values that match the User-Agent. Apple GPU + Apple Silicon for Safari on macOS.
### 6. iframe ContentWindow Isolation
Some fingerprinters check `navigator.webdriver` inside iframes to catch incomplete patches:
```typescript
await context.addInitScript(() => {
const desc = Object.getOwnPropertyDescriptor(HTMLIFrameElement.prototype, 'contentWindow');
Object.defineProperty(HTMLIFrameElement.prototype, 'contentWindow', {
get: function () {
const win = desc?.get?.call(this);
if (win) {
try {
Object.defineProperty(win.navigator, 'webdriver', {
get: () => undefined,
});
} catch (_) {}
}
return win;
},
});
});
```
### 7. Session Persistence
Fresh browser contexts with no cookies or history are a strong bot signal. Use browse's existing `session_save` / `session_restore` tools to persist cookies, localStorage, and sessionStorage across runs.
## What Cannot Be Fixed from JavaScript
### TLS Fingerprinting (JA3/JA4)
Anti-bot systems fingerprint the TLS Client Hello handshake — cipher suites, extensions, and their ordering. WebKit's TLS stack is compiled C++; no amount of JavaScript can change it. Playwright WebKit's JA3 hash doesn't match any shipping Safari release.
**Workarounds:**
- Residential proxies with TLS relay (proxy terminates TLS with its own stack)
- `curl-impersonate` for non-browser HTTP requests
- Switch to Chromium where TLS fingerprint matches real Chrome more closely
### IP Reputation
Datacenter IPs (Hetzner, AWS, GCP, etc.) are pre-flagged in commercial anti-bot databases.
**Workarounds:**
- Residential proxy rotation (BrightData, Oxylabs, etc.)
- Mobile proxies
- Running from a real residential IP (home connection)
### Behavioral Analysis
DataDome, Cloudflare, and PerimeterX use ML models trained on billions of real sessions. They analyze:
- Mouse movement patterns (speed, acceleration, curves)
- Scroll behavior (chunked vs smooth, pause patterns)
- Typing cadence
- Navigation timing
- Click patterns (direct element clicks vs natural approach)
**Workarounds:**
- Add realistic delays between actions (`page.waitForTimeout(random)`)
- Simulate mouse movements before clicks
- Scroll in chunks with pauses
- Type character by character with variable delays
### CAPTCHA / JavaScript Challenges
Cloudflare Turnstile, hCaptcha, and reCAPTCHA require real interaction or solving services.
**Workarounds:**
- CAPTCHA solving APIs: CapSolver, 2Captcha (~$2-5 per 1,000 solves)
- Wait for challenge resolution: 3-8 seconds after navigation
- Detect challenge pages by checking for known markers (`"Just a moment"`, `cf-challenge`, `_cf_chl_opt`)
## Implementation Strategy
### Recommended: Stealth Flag
Add an opt-in `stealth` option to `launch()`:
```typescript
async launch(options?: { stealth?: boolean }): Promise<void> {
this.browser = await webkit.launch({ headless: this.options.headless });
this.context = await this.browser.newContext({
viewport: { width: this.options.width, height: this.options.height },
...(options?.stealth && {
userAgent: SAFARI_USER_AGENT,
locale: 'en-US',
timezoneId: Intl.DateTimeFormat().resolvedOptions().timeZone,
colorScheme: 'light',
extraHTTPHeaders: { 'Accept-Language': 'en-US,en;q=0.9' },
}),
});
if (options?.stealth) {
await this.applyStealthPatches();
}
this.page = await this.context.newPage();
}
```
This keeps the default clean for testing while allowing stealth for real-world browsing.
### Nuclear Option: Chromium Engine
If stealth becomes a core requirement, add a `browser` engine option:
```typescript
launch({ engine: 'chromium', stealth: true })
```
Chromium has the richest stealth ecosystem:
- `playwright-extra` + stealth plugin (17 evasion modules)
- `playwright-with-fingerprints` (full fingerprint replacement)
- Better TLS fingerprint match to real Chrome
- Most anti-bot systems are tuned for Chrome, so evasions are better tested
Trade-off: Chromium is ~200MB heavier than WebKit.
## Anti-Bot Provider Cheat Sheet
| Provider | Primary Detection | Difficulty |
|----------|-------------------|------------|
| Cloudflare (standard) | TLS + JS challenge | Medium |
| Cloudflare (Turnstile) | Interactive challenge | Hard |
| DataDome | Behavioral analysis | Hard |
| PerimeterX / HUMAN | Deep fingerprinting (`_px` scripts) | Hard |
| Akamai Bot Manager | TLS + sensor data | Hard |
| Kasada | Obfuscated JS challenge | Very Hard |
| Basic WAFs | User-Agent + rate limiting | Easy |
## References
- [Playwright Anti-Bot Detection: What Works (2026) | AlterLab](https://alterlab.io/blog/playwright-anti-bot-detection-what-actually-works-in-2026)
- [Playwright Stealth: Bypass Bot Detection | Scrapfly](https://scrapfly.io/blog/posts/playwright-stealth-bypass-bot-detection)
- [Playwright Stealth Mode: The 7 Patches That Matter | DEV Community](https://dev.to/vhub_systems_ed5641f65d59/playwright-stealth-mode-in-2026-the-7-patches-that-actually-matter-46bp)
- [How to Avoid Bot Detection with Playwright | BrowserStack](https://www.browserstack.com/guide/playwright-bot-detection)
- [How To Make Playwright Undetectable | ScrapeOps](https://scrapeops.io/playwright-web-scraping-playbook/nodejs-playwright-make-playwright-undetectable/)
- [Detecting Vanilla Playwright | ScrapingAnt](https://scrapingant.com/blog/detect-playwright-bot)
- [Playwright Fingerprinting: Explained & Bypass | ZenRows](https://www.zenrows.com/blog/playwright-fingerprint)