- B2B Agents
- Posts
- This Automation Turns Any URL into AI-Ready Markdown (In Seconds)
This Automation Turns Any URL into AI-Ready Markdown (In Seconds)
Markdown: The Secret Language of LLMs
If you’ve ever tried to feed a website into ChatGPT or Gemini, you know the struggle: messy HTML, broken formatting, and missing links. Not only does it waste tokens, but it also confuses your AI.
That’s where this workflow comes in. With just one API call, you can transform any web page into clean, structured markdown — and pull out every link for structured crawling.
The Use Case
Need to process articles or blog posts for LLM analysis? ✅
Want your AI to read content like a human, not a browser? ✅
Need both text and outbound links neatly extracted? ✅
Don’t want to hit API rate limits while crawling? ✅
How the Workflow Works
We used Firecrawl.dev (a dead-simple but powerful API for web extraction) and wired it into n8n:
1️⃣ Convert HTML → Markdown
Firecrawl API cleans out the markup, leaving only structured, AI-friendly text.
2️⃣ Extract All Links
Every internal and external link on the page gets captured for further crawling.
3️⃣ Respect Rate Limits
No more API bans — requests are automatically throttled.
4️⃣ Batch Processing
Pulls multiple URLs from your database (or a simple input array).
Setup in 3 Steps
1. Get your Firecrawl API key (free to start).
2. Plug it into the HTTP Request node in n8n (Authorization header).
3. Connect your URL database or edit the test array in the workflow.
Done. Your pages now output in clean Markdown and link arrays.
Why this is game-changing:
AI doesn’t understand raw HTML — but it loves Markdown. With this, you’re turning every webpage into a dataset your AI can actually use.
📩 In the paid version of the newsletter, I’ll share the ready-to-import n8n workflow JSON