API Reference

The Spider API is based on REST. Our API is predictable, returns JSON-encoded responses, uses standard HTTP response codes, authentication, and verbs. Set your API secret key in the authorization header to commence. You can use the content-type header with application/json, application/xml, text/csv, and application/jsonl for shaping the response.

The Spider API supports multi domain actions. You can work with multiple domains per request by adding the urls comma separated.

The Spider API differs for every account as we release new versions and tailor functionality. You can add v1 before any path to pin to the version.

Just getting started?

Check out our development quickstart guide.

Not a developer?

Use Spiders no-code options or apps to get started with Spider and to do more with your Spider account no code required.

Base Url
https://spider.a11ywatch.com

Crawl websites

Start crawling a website(s) to collect resources.

POST https://spider.a11ywatch.com/crawl

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

  • request string

    The request type to perform. Possible values are http, chrome, and smart. Use smart to perform HTTP request by default until JavaScript rendering is needed for the HTML.

  • return_format string

    The format to return the data in. Possible values are markdown, raw, text, and html2text. Use raw to return the default format of the page like HTML etc.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

json_data = {"limit":1,"url":"http://www.example.com"}

response = requests.post('https://spider.a11ywatch.com/crawl', 
  headers=headers, 
  json=json_data)

print(response.json())
Response
[
  {
      "content": "<html>...",
      "error": null,
      "status": 200,
      "url": "http://www.example.com"
  },
  {
      "content": "<html>...",
      "error": null,
      "status": 200,
      "url": "http://www.example.com/some-page"
  },
  // more content...
]

Crawl websites get links

Start crawling a website(s) to collect links found.

POST https://spider.a11ywatch.com/links

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

  • request string

    The request type to perform. Possible values are http, chrome, and smart. Use smart to perform HTTP request by default until JavaScript rendering is needed for the HTML.

  • return_format string

    The format to return the data in. Possible values are markdown, raw, text, and html2text. Use raw to return the default format of the page like HTML etc.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

json_data = {"limit":1,"url":"http://www.example.com"}

response = requests.post('https://spider.a11ywatch.com/links', 
  headers=headers, 
  json=json_data)

print(response.json())
Response
[
  {
      "content": "",
      "error": null,
      "status": 200,
      "url": "http://www.example.com"
  },
  {
      "content": "",
      "error": null,
      "status": 200,
      "url": "http://www.example.com/some-page"
  },
  // more content...
]

Screenshot websites

Start taking screenshots of website(s) to collect images to base64 or binary.

POST https://spider.a11ywatch.com/screenshot

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

  • subdomains boolean

    Allow subdomains to be included.

  • depth number

    The crawl limit for maximum depth. If zero, no limit will be applied.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

json_data = {"limit":1,"url":"http://www.example.com"}

response = requests.post('https://spider.a11ywatch.com/screenshot', 
  headers=headers, 
  json=json_data)

print(response.json())
Response
[
  {
      "content": "base64...",
      "error": null,
      "status": 200,
      "url": "http://www.example.com"
  },
  {
      "content": "base64...",
      "error": null,
      "status": 200,
      "url": "http://www.example.com/some-page"
  },
  // more content...
]

Pipelines

Create powerful workflows with our pipeline API endpoints. Use AI to extract contacts from any website or filter links with prompts with ease.

Crawl websites and extract contacts

Start crawling a website(s) to collect all contacts found leveraging AI.

POST https://spider.a11ywatch.com/pipeline/extract-contacts

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

  • limit number

    The maximum amount of pages allowed to crawl per website. Remove the value or set it to 0 to crawl all pages.

  • store_data boolean

    Boolean to determine if storage should be used. If set this takes precedence overstorageless. Defaults to false.

  • model string

    The type of AI model to use like gpt-4-1106-preview or gpt-3.5-turbo-16k etc. OpenSource models coming soon.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

json_data = {"limit":1,"url":"http://www.example.com"}

response = requests.post('https://spider.a11ywatch.com/pipeline/extract-contacts', 
  headers=headers, 
  json=json_data)

print(response.json())
Response
[
  {
      "content": [{ "full_name": "John Doe", "email": "johndoe@gmail.com", "phone": "555-555-555", "title": "Baker"}, ...],
      "error": null,
      "status": 200,
      "url": "http://www.example.com"
  },
  {
      "content": [{ "full_name": "John Doe", "email": "johndoe@gmail.com", "phone": "555-555-555", "title": "Baker"}, ...],
      "error": null,
      "status": 200,
      "url": "http://www.example.com/some-page"
  },
  // more content...
]

Filter links

Filter a set of urls leveraging AI.

POST https://spider.a11ywatch.com/pipeline/filter-links

Request body

  • url required string

    The URI resource to crawl. This can be a comma split list for multiple urls.

  • store_data boolean

    Boolean to determine if storage should be used. If set this takes precedence overstorageless. Defaults to false.

  • model string

    The type of AI model to use like gpt-4-1106-preview or gpt-3.5-turbo-16k etc. OpenSource models coming soon.

  • prompt string

    A custom prompt to pass in to the chat model.

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

json_data = {"limit":1,"url":"http://www.example.com"}

response = requests.post('https://spider.a11ywatch.com/pipeline/filter-links', 
  headers=headers, 
  json=json_data)

print(response.json())
Response
[
  {
      "content": "<html>...",
      "error": null,
      "status": 200,
      "url": "http://www.example.com"
  },
  {
      "content": "<html>...",
      "error": null,
      "status": 200,
      "url": "http://www.example.com/some-page"
  },
  // more content...
]

Credits Available

Get the remaining credits available.

GET https://spider.a11ywatch.com/credits

Example request
import requests, os

headers = {
    'Authorization': os.environ["SPIDER_API_KEY"],
    'Content-Type': 'application/json',
}

response = requests.post('https://spider.a11ywatch.com/credits', 
  headers=headers)

print(response.json())
Response
{ "credits": 52566 }