API Reference
The Spider API is based on REST. Our API is predictable, returns JSON-encoded responses, uses standard HTTP response codes, authentication, and verbs. Set your API secret key in the authorization
header to commence. You can use the content-type
header with application/json
, application/xml
, text/csv
, and application/jsonl
for shaping the response.
The Spider API supports multi domain actions. You can work with multiple domains per request by adding the urls comma separated.
The Spider API differs for every account as we release new versions and tailor functionality. You can add v1
before any path to pin to the version.
Just getting started?
Check out our development quickstart guide.
Not a developer?
Use Spiders no-code options or apps to get started with Spider and to do more with your Spider account no code required.
https://api.spider.cloud
Crawl websites
Start crawling a website(s) to collect resources.
POST https://api.spider.cloud/crawl
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
request string
The request type to perform. Possible values are
http
,chrome
, andsmart
. Usesmart
to perform HTTP request by default until JavaScript rendering is needed for the HTML.limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.
depth number
The crawl limit for maximum depth. If zero, no limit will be applied.
[ { "content": "<html>...", "error": null, "status": 200, "url": "http://www.example.com" }, // more content... ]
Crawl websites get links
Start crawling a website(s) to collect links found.
POST https://api.spider.cloud/links
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
request string
The request type to perform. Possible values are
http
,chrome
, andsmart
. Usesmart
to perform HTTP request by default until JavaScript rendering is needed for the HTML.limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.
depth number
The crawl limit for maximum depth. If zero, no limit will be applied.
[ { "content": "", "error": null, "status": 200, "url": "http://www.example.com" }, // more content... ]
Screenshot websites
Start taking screenshots of website(s) to collect images to base64 or binary.
POST https://api.spider.cloud/screenshot
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
request string
The request type to perform. Possible values are
http
,chrome
, andsmart
. Usesmart
to perform HTTP request by default until JavaScript rendering is needed for the HTML.limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.
depth number
The crawl limit for maximum depth. If zero, no limit will be applied.
[ { "content": "base64...", "error": null, "status": 200, "url": "http://www.example.com" }, // more content... ]
Pipelines
Create powerful workflows with our pipeline API endpoints. Use AI to extract contacts from any website or filter links with prompts with ease.
Crawl websites and extract contacts
Start crawling a website(s) to collect all contacts utilizing AI. A minimum of $25 in credits is necessary for extraction.
POST https://api.spider.cloud/pipeline/extract-contacts
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
request string
The request type to perform. Possible values are
http
,chrome
, andsmart
. Usesmart
to perform HTTP request by default until JavaScript rendering is needed for the HTML.limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.
depth number
The crawl limit for maximum depth. If zero, no limit will be applied.
[ { "content": [{ "full_name": "John Doe", "email": "johndoe@gmail.com", "phone": "555-555-555", "title": "Baker"}, ...], "error": null, "status": 200, "url": "http://www.example.com" }, // more content... ]
Label website
Crawl a website and accurately categorize it using AI.
POST https://api.spider.cloud/pipeline/label
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
request string
The request type to perform. Possible values are
http
,chrome
, andsmart
. Usesmart
to perform HTTP request by default until JavaScript rendering is needed for the HTML.limit number
The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.
depth number
The crawl limit for maximum depth. If zero, no limit will be applied.
[ { "content": ["Government"], "error": null, "status": 200, "url": "http://www.example.com" }, // more content... ]
Queries
Query the data that you collect. Add dynamic filters for extracting exactly what is needed.
Crawl State
Get the state of the crawl for the domain.
POST https://api.spider.cloud/crawl/status
Request body
url required string
The URI resource to crawl. This can be a comma split list for multiple urls.
{ "data": { "id": "195bf2f2-2821-421d-b89c-f27e57ca71fh", "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg", "domain": "example.com", "url": "https://example.com/", "links": 1, "credits_used": 3, "mode": 2, "crawl_duration": 340, "message": null, "request_user_agent": "Spider", "level": "info", "status_code": 0, "created_at": "2024-04-21T01:21:32.886863+00:00", "updated_at": "2024-04-21T01:21:32.886863+00:00" }, "error": "" }
Credits Available
Get the remaining credits available.
GET https://api.spider.cloud/data/credits
{ "data": { "id": "8d662167-5a5f-41aa-9cb8-0cbb7d536891", "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg", "credits": 53334, "created_at": "2024-04-21T01:21:32.886863+00:00", "updated_at": "2024-04-21T01:21:32.886863+00:00" } }
Websites Collection
Get the websites stored.
GET https://api.spider.cloud/data/websites
{ "data": [ { "id": "2a503c02-f161-444b-b1fa-03a3914667b6", "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfd", "url": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfd/example.com/index.html", "domain": "example.com", "created_at": "2024-04-18T15:40:25.667063+00:00", "updated_at": "2024-04-18T15:40:25.667063+00:00", "pathname": "/", "fts": "", "scheme": "https:", "last_checked_at": "2024-05-10T13:39:32.293017+00:00", "screenshot": null } ] }
Pages Collection
Get the pages/resources stored.
GET https://api.spider.cloud/data/pages
{ "data": [ { "id": "733b0d0f-e406-4229-949d-8068ade54752", "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfd", "url": "https://www.example.com", "domain": "www.example.com", "created_at": "2024-04-17T01:28:15.016975+00:00", "updated_at": "2024-04-17T01:28:15.016975+00:00", "proxy": true, "headless": true, "crawl_budget": null, "scheme": "https:", "last_checked_at": "2024-04-17T01:28:15.016975+00:00", "full_resources": false, "metadata": true, "gpt_config": null, "smart_mode": false, "fts": "'www.example.com':1" } ] }