Skip to main content
POST
/
api
/
v1
/
platform
/
scrapers
/
web
/
crawl
Crawl website
curl --request POST \
  --url https://developer.thehog.ai/api/v1/platform/scrapers/web/crawl \
  --header 'Content-Type: application/json' \
  --header 'X-Access-Key: <api-key>' \
  --header 'X-Secret-Key: <api-key>' \
  --data '
{
  "url": "https://example.com",
  "limit": 5,
  "maxPages": 5,
  "maxDepth": 1,
  "sameDomainOnly": true,
  "instructions": "<string>"
}
'
{
  "data": {
    "id": "<string>",
    "operationId": "<string>",
    "pollUrl": "<string>"
  },
  "meta": {
    "requestId": "<string>"
  }
}

POST /api/v1/platform/scrapers/web/crawl

Crawl a website and return discovered page content.
Crawl a website from a starting URL with an optional page limit.

Example

curl -X POST https://developer.thehog.ai/api/v1/platform/scrapers/web/crawl \
  -H "X-Access-Key: ak_xxxxxxxxxxxxxxxx" \
  -H "X-Secret-Key: sk_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "limit": 5}'

Authorizations

X-Access-Key
string
header
required

The public API key from the Credentials page.

X-Secret-Key
string
header
required

The API secret shown when the credential is created.

Headers

Idempotency-Key
string

Optional. Reusing the same key for the same organization returns the existing queued crawl operation.

Body

application/json
url
string
required

Domain or URL to crawl

Example:

"https://example.com"

limit
number
default:5
Required range: 1 <= x <= 50
maxPages
number
default:5

Maximum number of pages to crawl. Overrides limit when set.

Required range: 1 <= x <= 50
maxDepth
number
default:1

Maximum link depth from the starting URL.

Required range: 0 <= x <= 5
sameDomainOnly
boolean
default:true

Restrict discovered links to the starting hostname.

instructions
string

Instructions for content extraction

Response

Crawl accepted. Poll the returned operation URL for results.

data
object
required
meta
object
required