n8n web scraping — diff-driven, webhook-first
Point DiffHook at the pages you want to scrape, describe what to extract with a CSS selector, and n8n receives the HTML diff every time it moves — already cached, already deduped, already signed.
The usual n8n scraper is a Schedule trigger + HTTP Request + HTML Extract + some Code-node diffing against a datastore. It works, but it scrapes on every tick even when nothing has changed, and the "is this different from last time" logic ends up reinventing a tiny database inside n8n. DiffHook moves the fetch, the HTML parsing, and the diff out of n8n so the workflow runs once per real change — nothing more.
The complete n8n + DiffHook hub
See every n8n recipe, template, and pricing tier in one place.
Workflow
Scrape, diff, and deliver to n8n in 5 steps
Five settings, no Code node, no storage plumbing. The whole thing is declarative.
Define what to scrape
Pick the target URL and the CSS selector that isolates the block you care about — a product card, a pricing table, a changelog entry. DiffHook renders the page and keeps only the matching HTML.
Choose what counts as a change
Text-only diff to ignore style tweaks, or full HTML diff to catch every attribute. Set include_html: true when you want n8n to see the raw markup alongside the extracted text.
Create the monitor
POST once to /v1/monitors with the URL, selector, interval, and a webhook delivery pointing at your n8n workflow. No cron, no storage, no duplicate detection — DiffHook owns all of that.
Receive the diff in n8n
n8n's Webhook trigger fires with a signed JSON body containing previous_html, current_html, and the extracted text. Verify the HMAC in a Crypto node, then move to downstream steps.
Parse, enrich, route
Use n8n's HTML Extract or Code node to pull structured fields out of the diff, enrich with an AI node if needed, and send the result to Slack, Airtable, Notion, or a database.
API example
Scrape and diff in one request
Declarative monitor definition — include_html surfaces the raw markup so n8n can parse it further downstream.
POST https://api.diffhook.com/v1/monitors
Authorization: Bearer $DIFFHOOK_API_KEY
Content-Type: application/json
{
"type": "html_css",
"url": "https://competitor.example.com/products",
"css_selector": "main .product-card",
"include_html": true,
"interval_seconds": 900,
"deliveries": [
{
"type": "webhook",
"url": "https://n8n.yourdomain.com/webhook/scrape-products"
}
]
}Importable workflow
Drop-in n8n scraping workflow
The template parses the diff, extracts product cards with HTML Extract, and ships the structured rows to a Google Sheet. Swap the destination node for your own.
FAQ
n8n web scraping — common questions
Why offload scraping from n8n to DiffHook?
Can DiffHook scrape JavaScript-rendered pages?
How do I get structured data, not just raw HTML?
Does the scraper respect robots.txt and rate limits?
What if the site adds a CAPTCHA or blocks the scraper?
Related workflows
Also great with DiffHook
n8n webhook
Use the same webhook delivery pattern without the HTML scraping side — pure change-to-trigger.
n8n + Playwright
Scrape client-rendered SPAs with DiffHook's Playwright engine and push the post-render HTML into n8n.
n8n + Puppeteer
Same SPA scraping, Chromium-Puppeteer flavour — pick whichever engine fits the target site.
Zapier web scraping
Run the same pattern into Zapier Catch Hooks — no Code by Zapier required.
Make.com web scraping
Scrape and diff into a Make.com webhook module — identical monitor shape, different destination.
Replace a scraping script
Migrating from a Python/Node scraper? See how the managed monitor compares side-by-side.
Stop scraping on a cron. Scrape on change.
Free tier, 60-second checks, HMAC-signed payloads, Playwright and Puppeteer engines included. No cards, no commitment.