Skip to main content
POST
/
api
/
v1
/
platform
/
scrapers
/
web
/
scrape
/
jobs
Queue a deep web scrape job
curl --request POST \
  --url https://developer.thehog.ai/api/v1/platform/scrapers/web/scrape/jobs \
  --header 'Content-Type: application/json' \
  --header 'X-Access-Key: <api-key>' \
  --header 'X-Secret-Key: <api-key>' \
  --data '
{
  "url": "https://example.com/page",
  "formats": [
    "text",
    "metadata"
  ],
  "jsonSchema": {},
  "instructions": "<string>",
  "maxAgeMs": 0,
  "maxDurationMs": 120000,
  "maxScrolls": 40,
  "scrollWaitMs": 500,
  "contentStableRounds": 3,
  "expandClickableContent": true,
  "maxExpansionClicks": 25
}
'
{
  "data": {
    "id": "<string>",
    "operationId": "<string>",
    "status": "queued",
    "pollUrl": "<string>"
  },
  "meta": {
    "requestId": "<string>"
  }
}

POST /api/v1/platform/scrapers/web/scrape/jobs

Queue a deep web scrape for long or dynamic pages and poll for the result.
Use deep scrape when a page needs more rendering time than the synchronous scrape endpoint should spend, such as long comment threads, lazy-loaded content, or pages with repeated “load more” controls. The endpoint returns an operation ID immediately; poll GET /api/operations/:id until the operation reaches a terminal status. The result uses the same requested formats as single-page scrape. Metadata may include capture details such as how many scrolls were completed and why capture stopped. A deep scrape is still bounded by the limits you send, so use the capture metadata to decide whether to run again with higher limits.

Example

curl -X POST https://developer.thehog.ai/api/v1/platform/scrapers/web/scrape/jobs \
  -H "X-Access-Key: ak_xxxxxxxxxxxxxxxx" \
  -H "X-Secret-Key: sk_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: deep-scrape-2026-06-24-001" \
  -d '{
    "url": "https://example.com/thread",
    "formats": ["markdown", "metadata"],
    "maxDurationMs": 120000,
    "maxScrolls": 40,
    "contentStableRounds": 3,
    "expandClickableContent": true
  }'
Use formats: ["json"] with jsonSchema when you want schema-guided extraction from the captured page content. The schema defines the returned keys and shape.

Authorizations

X-Access-Key
string
header
required

The public API key from the Credentials page.

X-Secret-Key
string
header
required

The API secret shown when the credential is created.

Headers

Idempotency-Key
string

Optional. Reusing the same key for the same organization returns the existing queued deep scrape operation.

Body

application/json
url
string
required

URL to scrape with an async deep browser acquisition job.

Example:

"https://example.com/page"

formats
enum<string>[]

Output formats to store on the completed operation. Defaults to ["text", "metadata"]. Request "json" only with jsonSchema.

Available options:
text,
markdown,
html,
links,
metadata,
json
jsonSchema
object

JSON Schema used for schema-guided extraction when formats includes "json".

instructions
string

Additional extraction instructions used only when formats includes "json".

Maximum string length: 8000
maxAgeMs
number
default:0

Maximum accepted cache age in milliseconds. Defaults to 0 for deep jobs so dynamic pages are fetched fresh.

Required range: 0 <= x <= 2592000000
maxDurationMs
number
default:120000

Maximum deep render wall-clock duration in milliseconds.

Required range: 5000 <= x <= 600000
maxScrolls
number
default:40

Maximum viewport scroll steps during deep render.

Required range: 0 <= x <= 200
scrollWaitMs
number
default:500

Delay after each deep-render scroll step.

Required range: 100 <= x <= 5000
contentStableRounds
number
default:3

Stop after this many consecutive scroll rounds without meaningful content growth.

Required range: 1 <= x <= 20
expandClickableContent
boolean
default:true

Click generic visible "load more" or "show more" controls during deep render.

maxExpansionClicks
number
default:25

Maximum generic expansion clicks during deep render.

Required range: 0 <= x <= 200

Response

Deep scrape accepted. Poll the returned operation URL for results.

data
object
required
meta
object
required