Diffbot

Diffbot

Diffbot provides AI-powered tools to extract and structure data from web pages, transforming unstructured web content into structured, linked data.

1VIEWS
2154USERS

Install MCP Server

Paste and run this command in your terminal to set up Cursor with MCP

npx @composio/cli add cursor --app diffbotnpx @composio/cli add cursor --app diffbotnpx @composio/cli add cursor --app diffbotnpx @composio/cli add cursor --app diffbot

After running the command, restart Cursor to start using the MCP Server.

Available Tools

Diffbot Search

Tool to search data extracted by crawl or bulk jobs using dql queries. use after data extraction jobs complete to retrieve search results.

Get Diffbot Account Details

Tool to retrieve account details, including plan information and usage statistics. use after authenticating to verify subscription and daily quota status.

Diffbot Analyze

Tool to automatically determine a page's content type and route it to the appropriate extraction api. use when you have only a url and need diffbot to choose the right extractor.

Get Article Data

Tool to extract information from articles, including authors, publication dates, and images. use when you need structured metadata from a web article url.

Get Discussion Thread

Tool to extract threads of content from forums, comment sections, and review pages. use when you need structured discussion data from web pages after identifying the discussion url.

Diffbot Get Event

Tool to extract event details from web pages. use when you need structured event data such as venue, date, and description.

Diffbot Get Image

Tool to extract detailed information about images, including dimensions and recognition data. use after confirming the image url is publicly accessible.

Diffbot Get Product

Tool to extract product information such as specifications, prices, availability, and reviews. use when you need structured product data including specs, pricing, and reviews.

Get Video Data

Tool to extract information from videos, including titles, descriptions, and embedded html. use when you need structured video metadata from any web page.

List Bulk Jobs

Tool to list all bulk jobs associated with a specific token. use after authenticating to retrieve statuses of all jobs for the account.

Resolve Lost Id

Tool to resolve lost ids in the knowledge graph. use when you need to map a lost identifier to its canonical counterpart for data consistency.

Start Bulk Job

Tool to start a bulk extract job. use when processing large numbers of urls asynchronously.

Start Crawl Job

Tool to spider a site for links and process them with the extract api into a single collection. use when you have seed urls and want to collect structured data across a site. requires a plus plan for crawl api access.

Stop Bulk Job

Tool to stop a running bulk job. use when you need to halt further processing of urls in a job in progress. invoke only after confirming the jobid to avoid accidental stoppage.

14 actions available