API Reference

The Spider API is based on REST. Our API is predictable, returns JSON-encoded responses, uses standard HTTP response codes, authentication, and verbs. Set your API secret key in the authorization header to commence. You can use the content-type header with application/json, application/xml, text/csv, and application/jsonl for shaping the response.

The Spider API supports multi domain actions. You can work with multiple domains per request by adding the urls comma separated.

The Spider API differs for every account as we release new versions and tailor functionality. You can add v1 before any path to pin to the version.

Just getting started?

Check out our development quickstart guide.

Not a developer?

Use Spiders no-code options or apps to get started with Spider and to do more with your Spider account no code required.

Base Url
https://api.spider.cloud

Crawl websites

Start crawling a website(s) to collect resources.

POST https://api.spider.cloud/crawl

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

  • request string

    The request type to perform. Possible values are http, chrome, and smart. Use smart to perform HTTP request by default until JavaScript rendering is needed for the HTML.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

  • depth number

    The crawl limit for maximum depth. If zero, no limit will be applied.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

json_data = {"limit":50,"url":"http://www.example.com"}

response = requests.post('https://api.spider.cloud/crawl', 
  headers=headers, 
  json=json_data)

print(response.json())
Response
[
  {
    "content": "<html>...",
    "error": null,
    "status": 200,
    "url": "http://www.example.com"
  },
  // more content...
]

Crawl websites get links

Start crawling a website(s) to collect links found.

POST https://api.spider.cloud/links

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

  • request string

    The request type to perform. Possible values are http, chrome, and smart. Use smart to perform HTTP request by default until JavaScript rendering is needed for the HTML.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

  • depth number

    The crawl limit for maximum depth. If zero, no limit will be applied.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

json_data = {"limit":50,"url":"http://www.example.com"}

response = requests.post('https://api.spider.cloud/links', 
  headers=headers, 
  json=json_data)

print(response.json())
Response
[
  {
    "content": "",
    "error": null,
    "status": 200,
    "url": "http://www.example.com"
  },
  // more content...
]

Screenshot websites

Start taking screenshots of website(s) to collect images to base64 or binary.

POST https://api.spider.cloud/screenshot

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

  • request string

    The request type to perform. Possible values are http, chrome, and smart. Use smart to perform HTTP request by default until JavaScript rendering is needed for the HTML.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

  • depth number

    The crawl limit for maximum depth. If zero, no limit will be applied.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

json_data = {"limit":50,"url":"http://www.example.com"}

response = requests.post('https://api.spider.cloud/screenshot', 
  headers=headers, 
  json=json_data)

print(response.json())
Response
[
  {
    "content": "base64...",
    "error": null,
    "status": 200,
    "url": "http://www.example.com"
  },
  // more content...
]

Pipelines

Create powerful workflows with our pipeline API endpoints. Use AI to extract contacts from any website or filter links with prompts with ease.

Crawl websites and extract contacts

Start crawling a website(s) to collect all contacts utilizing AI. A minimum of $25 in credits is necessary for extraction.

POST https://api.spider.cloud/pipeline/extract-contacts

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

  • request string

    The request type to perform. Possible values are http, chrome, and smart. Use smart to perform HTTP request by default until JavaScript rendering is needed for the HTML.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

  • depth number

    The crawl limit for maximum depth. If zero, no limit will be applied.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

json_data = {"limit":50,"url":"http://www.example.com"}

response = requests.post('https://api.spider.cloud/pipeline/extract-contacts', 
  headers=headers, 
  json=json_data)

print(response.json())
Response
[
  {
    "content": [{ "full_name": "John Doe", "email": "johndoe@gmail.com", "phone": "555-555-555", "title": "Baker"}, ...],
    "error": null,
    "status": 200,
    "url": "http://www.example.com"
  },
  // more content...
]

Label website

Crawl a website and accurately categorize it using AI.

POST https://api.spider.cloud/pipeline/label

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

  • request string

    The request type to perform. Possible values are http, chrome, and smart. Use smart to perform HTTP request by default until JavaScript rendering is needed for the HTML.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

  • depth number

    The crawl limit for maximum depth. If zero, no limit will be applied.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

json_data = {"limit":50,"url":"http://www.example.com"}

response = requests.post('https://api.spider.cloud/pipeline/label', 
  headers=headers, 
  json=json_data)

print(response.json())
Response
[
  {
    "content": ["Government"],
    "error": null,
    "status": 200,
    "url": "http://www.example.com"
  },
  // more content...
]

Queries

Query the data that you collect. Add dynamic filters for extracting exactly what is needed.

Crawl State

Get the state of the crawl for the domain.

POST https://api.spider.cloud/crawl/status

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

response = requests.post('https://api.spider.cloud/crawl/status', 
  headers=headers)

print(response.json())
Response
{
    "data": {
      "id": "195bf2f2-2821-421d-b89c-f27e57ca71fh",
      "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg",
      "domain": "example.com",
      "url": "https://example.com/",
      "links": 1,
      "credits_used": 3,
      "mode": 2,
      "crawl_duration": 340,
      "message": null,
      "request_user_agent": "Spider",
      "level": "info",
      "status_code": 0,
      "created_at": "2024-04-21T01:21:32.886863+00:00",
      "updated_at": "2024-04-21T01:21:32.886863+00:00"
    },
    "error": ""
  }

Credits Available

Get the remaining credits available.

GET https://api.spider.cloud/data/credits

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

response = requests.post('https://api.spider.cloud/data/credits', 
  headers=headers)

print(response.json())
Response
{
    "data": {
      "id": "8d662167-5a5f-41aa-9cb8-0cbb7d536891",
      "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfg",
      "credits": 53334,
      "created_at": "2024-04-21T01:21:32.886863+00:00",
      "updated_at": "2024-04-21T01:21:32.886863+00:00"
    }
  }

Websites Collection

Get the websites stored.

GET https://api.spider.cloud/data/websites

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

response = requests.post('https://api.spider.cloud/data/websites', 
  headers=headers)

print(response.json())
Response
{
 "data": [
  {
    "id": "2a503c02-f161-444b-b1fa-03a3914667b6",
    "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfd",
    "url": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfd/example.com/index.html",
    "domain": "example.com",
    "created_at": "2024-04-18T15:40:25.667063+00:00",
    "updated_at": "2024-04-18T15:40:25.667063+00:00",
    "pathname": "/",
    "fts": "",
    "scheme": "https:",
    "last_checked_at": "2024-05-10T13:39:32.293017+00:00",
    "screenshot": null
  }
 ] 
} 

Pages Collection

Get the pages/resources stored.

GET https://api.spider.cloud/data/pages

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

response = requests.post('https://api.spider.cloud/data/pages', 
  headers=headers)

print(response.json())
Response
{
  "data": [
    {
      "id": "733b0d0f-e406-4229-949d-8068ade54752",
      "user_id": "6bd06efa-bb0a-4f1f-a29f-05db0c4b1bfd",
      "url": "https://www.example.com",
      "domain": "www.example.com",
      "created_at": "2024-04-17T01:28:15.016975+00:00",
      "updated_at": "2024-04-17T01:28:15.016975+00:00",
      "proxy": true,
      "headless": true,
      "crawl_budget": null,
      "scheme": "https:",
      "last_checked_at": "2024-04-17T01:28:15.016975+00:00",
      "full_resources": false,
      "metadata": true,
      "gpt_config": null,
      "smart_mode": false,
      "fts": "'www.example.com':1"
    }
  ]
}