n8n+DiffHook

n8n web scraping — diff-driven, webhook-first

Point DiffHook at the pages you want to scrape, describe what to extract with a CSS selector, and n8n receives the HTML diff every time it moves — already cached, already deduped, already signed.

The usual n8n scraper is a Schedule trigger + HTTP Request + HTML Extract + some Code-node diffing against a datastore. It works, but it scrapes on every tick even when nothing has changed, and the "is this different from last time" logic ends up reinventing a tiny database inside n8n. DiffHook moves the fetch, the HTML parsing, and the diff out of n8n so the workflow runs once per real change — nothing more.

n8n

The complete n8n + DiffHook hub

See every n8n recipe, template, and pricing tier in one place.

Workflow

Scrape, diff, and deliver to n8n in 5 steps

Five settings, no Code node, no storage plumbing. The whole thing is declarative.

01

Define what to scrape

Pick the target URL and the CSS selector that isolates the block you care about — a product card, a pricing table, a changelog entry. DiffHook renders the page and keeps only the matching HTML.

02

Choose what counts as a change

Text-only diff to ignore style tweaks, or full HTML diff to catch every attribute. Set include_html: true when you want n8n to see the raw markup alongside the extracted text.

03

Create the monitor

POST once to /v1/monitors with the URL, selector, interval, and a webhook delivery pointing at your n8n workflow. No cron, no storage, no duplicate detection — DiffHook owns all of that.

04

Receive the diff in n8n

n8n's Webhook trigger fires with a signed JSON body containing previous_html, current_html, and the extracted text. Verify the HMAC in a Crypto node, then move to downstream steps.

05

Parse, enrich, route

Use n8n's HTML Extract or Code node to pull structured fields out of the diff, enrich with an AI node if needed, and send the result to Slack, Airtable, Notion, or a database.

API example

Scrape and diff in one request

Declarative monitor definition — include_html surfaces the raw markup so n8n can parse it further downstream.

POST /v1/monitors
POST https://api.diffhook.com/v1/monitors
Authorization: Bearer $DIFFHOOK_API_KEY
Content-Type: application/json

{
  "type": "html_css",
  "url": "https://competitor.example.com/products",
  "css_selector": "main .product-card",
  "include_html": true,
  "interval_seconds": 900,
  "deliveries": [
    {
      "type": "webhook",
      "url": "https://n8n.yourdomain.com/webhook/scrape-products"
    }
  ]
}

Importable workflow

Drop-in n8n scraping workflow

The template parses the diff, extracts product cards with HTML Extract, and ships the structured rows to a Google Sheet. Swap the destination node for your own.

FAQ

n8n web scraping — common questions

Why offload scraping from n8n to DiffHook?
Three reasons: compute, correctness, and complexity. Compute, because polling a dozen URLs from n8n on a 5-minute cron burns execution minutes whether anything changed or not. Correctness, because rolling your own "is this different" logic in a Code node is how you get duplicate alerts at 3am. Complexity, because DiffHook's monitor is a single POST while the n8n equivalent is 4+ nodes plus a datastore.
Can DiffHook scrape JavaScript-rendered pages?
Yes. Set type to html_rendered and pick the engine (Playwright or Puppeteer). DiffHook waits for a selector or a network-idle signal before snapshotting, so SPAs and client-rendered React apps work out of the box. See the dedicated n8n Playwright and n8n Puppeteer pages for engine-specific examples.
How do I get structured data, not just raw HTML?
Two ways. Use a tight CSS selector to isolate one element (DiffHook returns the extracted text directly in extracted_text), or set include_html: true and parse the HTML fragment inside n8n with the HTML Extract node. Both paths are in the importable template.
Does the scraper respect robots.txt and rate limits?
Yes. Every monitor has a configurable interval (down to 60 seconds on paid plans) and DiffHook adds jitter and exponential backoff on 429 / 503 responses. If a target serves a robots.txt disallow, the monitor flags as blocked in the dashboard and stops fetching.
What if the site adds a CAPTCHA or blocks the scraper?
DiffHook surfaces the last fetch status on every monitor. When a site starts blocking, you see it as a red status and the last-good cache is kept so n8n doesn't receive a spurious diff. You can then rotate the user agent, switch to the rendered engine, or add request headers per monitor without touching n8n.

Related workflows

Also great with DiffHook

Stop scraping on a cron. Scrape on change.

Free tier, 60-second checks, HMAC-signed payloads, Playwright and Puppeteer engines included. No cards, no commitment.