Creating fresh discount content manually is a time sink. You find deals, scrape product details, format everything, and write descriptions—only to repeat the process tomorrow. This n8n workflow eliminates that grind by automating the entire pipeline from URL list to published content. You'll learn how to build a self-sustaining content engine that scrapes discount URLs, stores structured data in Supabase, and leverages AI agents to produce ready-to-publish content. The complete n8n workflow JSON template is available at the bottom of this article.
The Problem: Manual Discount Content Creation Doesn't Scale
E-commerce sites, deal aggregators, and affiliate marketers face the same bottleneck. Finding discount opportunities is easy—turning them into compelling content is not.
Current challenges:
- Manually visiting each discount URL to extract product details, pricing, and descriptions
- Copying data into spreadsheets or databases without standardization
- Writing unique content for each deal while maintaining brand voice
- Repeating this process daily as new deals emerge
Business impact:
- Time spent: 15-20 hours per week on content production for 50-100 deals
- Opportunity cost: Missing time-sensitive deals because manual processing is too slow
- Consistency issues: Content quality varies based on writer availability and fatigue
The real problem isn't finding deals. It's transforming raw URLs into structured, publishable content fast enough to capitalize on limited-time offers.
The Solution Overview
This n8n workflow creates a three-stage automation pipeline. First, it ingests a list of discount URLs. Second, it uses self-hosted Firecrawl to scrape product data from those URLs and stores everything in Supabase. Third, it retrieves stored data and feeds it to AI agents that generate optimized discount content.
The workflow leverages SearXNG for additional research, Supabase for persistent storage, and AI models to produce content that matches your brand voice. You maintain full control over data quality through validation nodes and can customize content templates for different product categories.
What You'll Build
This automation handles the complete discount content lifecycle with zero manual intervention after initial setup.
| Component | Technology | Purpose |
|---|---|---|
| URL Ingestion | Manual Trigger/Webhook | Accept lists of discount URLs |
| Web Scraping | Self-hosted Firecrawl | Extract product data, pricing, images |
| Data Storage | Supabase (PostgreSQL) | Store structured discount data |
| Search Enhancement | SearXNG | Gather additional product context |
| Content Generation | AI Agents (OpenAI/Claude) | Produce unique discount descriptions |
| Output Delivery | Database/API | Store finished content for publishing |
Key capabilities:
- Batch process 50-100 URLs per execution
- Extract product titles, prices, discount percentages, descriptions, and images
- Store normalized data with timestamps and metadata
- Query stored data to avoid duplicate processing
- Generate SEO-optimized content with AI agents
- Handle errors gracefully with retry logic
Prerequisites
Before starting, ensure you have:
- n8n instance (self-hosted recommended for Firecrawl integration)
- Supabase account with PostgreSQL database configured
- Self-hosted Firecrawl instance with API access
- SearXNG instance (self-hosted or public endpoint)
- OpenAI or Anthropic API key for content generation
- Basic SQL knowledge for database schema design
- JavaScript familiarity for Function nodes
Step 1: Set Up Supabase Database Schema
Your database structure determines how efficiently you can query and reference discount data later.
Create the discounts table:
CREATE TABLE discounts (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
url TEXT UNIQUE NOT NULL,
product_title TEXT,
original_price DECIMAL(10,2),
discount_price DECIMAL(10,2),
discount_percentage INTEGER,
description TEXT,
image_url TEXT,
scraped_at TIMESTAMP DEFAULT NOW(),
content_generated BOOLEAN DEFAULT FALSE,
generated_content TEXT,
metadata JSONB
);
CREATE INDEX idx_url ON discounts(url);
CREATE INDEX idx_content_generated ON discounts(content_generated);
Why this schema works:
urlwith UNIQUE constraint prevents duplicate scrapingcontent_generatedflag tracks which records need AI processingmetadataJSONB field stores flexible additional data (category, merchant, expiration)- Indexes on
urlandcontent_generatedspeed up lookups and batch queries
Configure Supabase credentials in n8n:
- Add Supabase credential in n8n
- Enter your project URL:
https://[project-id].supabase.co - Add service role API key (not anon key—you need write access)
- Test connection with a simple SELECT query
Step 2: Configure Firecrawl for Web Scraping
Firecrawl handles JavaScript-heavy sites better than basic HTTP requests. Self-hosting gives you control over rate limits and costs.
Set up Firecrawl HTTP Request node:
{
"method": "POST",
"url": "http://your-firecrawl-instance:3000/scrape",
"headers": {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_FIRECRAWL_API_KEY"
},
"body": {
"url": "{{$json.discount_url}}",
"formats": ["markdown", "html"],
"onlyMainContent": true,
"waitFor": 2000
}
}
Critical settings:
waitFor: 2000ensures JavaScript-rendered prices loadonlyMainContent: truefilters out navigation and footer noise- Request both markdown and HTML formats for flexible parsing
Common scraping issues:
- Rate limiting → Add 2-3 second delays between requests with Wait node
- Dynamic pricing → Increase
waitForto 3000-5000ms for slow sites - Anti-bot measures → Rotate user agents in HTTP Request headers
Step 3: Extract and Normalize Product Data
Raw HTML from Firecrawl needs transformation into structured database records.
Add Function node for data extraction:
// Extract product data from Firecrawl response
const html = $input.item.json.html;
const markdown = $input.item.json.markdown;
// Price extraction with regex
const priceRegex = /\$(\d+\.?\d*)/g;
const prices = markdown.match(priceRegex);
// Calculate discount
const originalPrice = prices && prices[0] ? parseFloat(prices[0].replace('$', '')) : null;
const discountPrice = prices && prices[1] ? parseFloat(prices[1].replace('$', '')) : null;
const discountPercentage = originalPrice && discountPrice
? Math.round(((originalPrice - discountPrice) / originalPrice) * 100)
: null;
// Extract title (usually first H1 in markdown)
const titleMatch = markdown.match(/^#\s+(.+)$/m);
const productTitle = titleMatch ? titleMatch[1] : null;
// Image extraction
const imageMatch = html.match(/<img[^>]+src="([^">]+)"/);
const imageUrl = imageMatch ? imageMatch[1] : null;
return {
json: {
url: $input.item.json.original_url,
product_title: productTitle,
original_price: originalPrice,
discount_price: discountPrice,
discount_percentage: discountPercentage,
description: markdown.substring(0, 500), // First 500 chars
image_url: imageUrl,
metadata: {
scraped_html_length: html.length,
markdown_length: markdown.length
}
}
};
Why this approach:
Think of this node as a data translator. Firecrawl gives you a messy pile of HTML and markdown—like dumping a toolbox on the floor. This function sorts everything into labeled drawers. It hunts for dollar signs to find prices, grabs the biggest heading for the product name, and calculates discount percentages automatically. The regex patterns act like magnets, pulling specific data types from the chaos.
Variables to customize:
priceRegex: Adjust for international currencies (€, £, ¥)descriptionlength: Change 500 to match your content requirements- Add category detection based on URL patterns or keywords
Step 4: Store Data in Supabase with Upsert Logic
Prevent duplicate entries while updating existing records when URLs are re-scraped.
Configure Supabase node (Upsert operation):
{
"operation": "upsert",
"table": "discounts",
"conflictColumns": ["url"],
"updateColumns": [
"product_title",
"original_price",
"discount_price",
"discount_percentage",
"description",
"image_url",
"scraped_at",
"metadata"
]
}
Why upsert matters:
If you scrape the same URL twice, you want to update the price (it might have changed), not create a duplicate row. The conflictColumns: ["url"] tells Supabase "if this URL exists, update it; if not, insert it." This keeps your database clean and ensures you always have the latest discount data.
Error handling:
Add an Error Trigger node after Supabase to catch failed inserts. Common failures include null values in NOT NULL columns or data type mismatches. Log errors to a separate table for debugging.
Step 5: Enhance Data with SearXNG Research
AI agents produce better content when they have context beyond the product page.
Configure SearXNG HTTP Request:
{
"method": "GET",
"url": "http://your-searxng-instance:8080/search",
"qs": {
"q": "{{$json.product_title}} reviews",
"format": "json",
"engines": "google,duckduckgo",
"safesearch": 1
}
}
Extract relevant snippets:
// Function node to process SearXNG results
const results = $input.item.json.results;
const topSnippets = results
.slice(0, 5)
.map(r => r.content)
.join('
');
return {
json: {
product_title: $input.item.json.product_title,
research_context: topSnippets
}
};
When to use SearXNG:
- Product categories where reviews matter (electronics, appliances)
- Comparing similar products to highlight unique value
- Finding trending keywords for SEO optimization
Skip SearXNG for time-sensitive flash deals where speed matters more than depth.
Step 6: Generate Content with AI Agents
This is where stored data transforms into publishable content.
Query Supabase for unprocessed discounts:
SELECT * FROM discounts
WHERE content_generated = FALSE
LIMIT 10;
Configure OpenAI/Claude node:
{
"model": "gpt-4",
"messages": [
{
"role": "system",
"content": "You are a discount content writer. Create compelling, SEO-optimized product descriptions that highlight savings and value. Use an enthusiastic but trustworthy tone. Include the discount percentage prominently."
},
{
"role": "user",
"content": `Product: {{$json.product_title}}
Original Price: ${{$json.original_price}}
Discount Price: ${{$json.discount_price}}
Discount: {{$json.discount_percentage}}% off
Description: {{$json.description}}
Research Context: {{$json.research_context}}
Write a 150-word discount content piece that emphasizes the value and urgency.`
}
],
"temperature": 0.7,
"max_tokens": 300
}
Update Supabase with generated content:
UPDATE discounts
SET generated_content = '{{$json.choices[0].message.content}}',
content_generated = TRUE
WHERE id = '{{$json.id}}';
Content quality controls:
- Set
temperature: 0.7for creative but consistent output - Use
max_tokens: 300to control length (150 words ≈ 200-250 tokens) - Add a Function node to validate output length before storing
Workflow Architecture Overview
This workflow consists of 12-15 nodes organized into 3 main sections:
- Data ingestion and scraping (Nodes 1-5): Manual trigger accepts URL list, Firecrawl scrapes each URL, Function node extracts structured data
- Storage and enhancement (Nodes 6-9): Supabase upsert stores data, SearXNG adds research context, data merges for AI processing
- Content generation (Nodes 10-15): Query unprocessed records, AI generates content, update database with results
Execution flow:
- Trigger: Manual execution with URL array or scheduled daily run
- Average run time: 45-90 seconds for 10 URLs (depends on Firecrawl response time)
- Key dependencies: Firecrawl must be running, Supabase credentials valid, AI API key active
Critical nodes:
- HTTP Request (Firecrawl): Handles all web scraping with retry logic
- Function (Data Extraction): Normalizes scraped HTML into database schema
- Supabase (Upsert): Prevents duplicates while updating changed data
- OpenAI/Claude: Generates final content from stored data
The complete n8n workflow JSON template is available at the bottom of this article.
Critical Configuration Settings
Firecrawl Integration
Required fields:
- API Endpoint:
http://your-firecrawl-instance:3000/scrape - Authorization: Bearer token from Firecrawl setup
- Timeout: 30 seconds (increase to 60 for slow-loading sites)
Common issues:
- Using public Firecrawl endpoints → Rate limits hit quickly with batch processing
- Not setting
waitFor→ JavaScript-rendered prices missing from scraped data - Always use self-hosted Firecrawl for production workflows with >100 URLs/day
Supabase Connection
Variables to customize:
batch_size: Process 10-50 URLs per execution (balance speed vs. API limits)content_generatedflag: Add additional statuses like "pending_review" or "published"- Database indexes: Add indexes on
discount_percentageorscraped_atfor complex queries
Testing & Validation
Test each component independently:
- Firecrawl scraping: Run with 3-5 test URLs, verify HTML and markdown outputs contain prices
- Data extraction: Check Function node output—all fields should have values (null is acceptable for missing data)
- Supabase storage: Query database directly to confirm records inserted with correct data types
- AI generation: Review 5-10 generated content pieces for tone, accuracy, and length
Common troubleshooting:
| Issue | Cause | Fix |
|---|---|---|
| Prices not extracted | Regex doesn't match site's format | Update priceRegex to match currency symbols and decimal formats |
| Duplicate database entries | Upsert not configured | Verify conflictColumns: ["url"] in Supabase node |
| AI content too short/long | Token limits misconfigured | Adjust max_tokens and validate with word count Function node |
| Workflow times out | Too many URLs processed at once | Reduce batch size or add Split In Batches node |
Deployment Considerations
Production Deployment Checklist
| Area | Requirement | Why It Matters |
|---|---|---|
| Error Handling | Retry logic with 3 attempts, 5-second delays | Firecrawl occasionally times out—retries prevent data loss |
| Monitoring | Webhook to Slack/Discord on workflow failure | Detect scraping failures within minutes vs. discovering stale data days later |
| Rate Limiting | 2-second delay between Firecrawl requests | Prevents IP bans and respects server resources |
| Data Validation | Function node checks for null prices before storage | Catches scraping failures early, prevents bad data in database |
| Backup Strategy | Daily Supabase backups via pg_dump | Protects against accidental data deletion or corruption |
Scaling considerations:
For 500+ URLs per day, implement these optimizations:
- Split workflow into separate scraping and content generation workflows
- Use Supabase's batch insert API (insert 100 records at once)
- Add Redis caching layer for frequently accessed product data
- Deploy multiple Firecrawl instances behind a load balancer
Real-World Use Cases
Use Case 1: Affiliate Deal Site
- Industry: E-commerce affiliate marketing
- Scale: 200 new deals per day across 10 product categories
- Modifications needed: Add category classification Function node, create separate AI prompts per category, integrate with WordPress API for auto-publishing
Use Case 2: Price Comparison Platform
- Industry: Consumer electronics
- Scale: 50 products tracked continuously for price changes
- Modifications needed: Schedule workflow to run every 6 hours, add price change detection logic, send alerts when discounts exceed 20%
Use Case 3: Email Newsletter Automation
- Industry: Daily deals newsletter
- Scale: 30-50 curated deals sent to 10,000 subscribers
- Modifications needed: Add filtering logic for minimum discount percentage (>15%), integrate with Mailchimp API, generate HTML email templates with AI
Customizing This Workflow
Alternative Integrations
Instead of Firecrawl:
- Apify: Best for sites with complex anti-bot measures—requires swapping HTTP Request node with Apify API calls
- Puppeteer (via n8n Execute Command): Better control over browser automation—add 8-10 nodes for full implementation
- Browserless: Use when you need headless Chrome at scale—similar API to Firecrawl but different response format
Workflow Extensions
Add automated content publishing:
- Connect to WordPress REST API with HTTP Request node
- Map generated content to post title, body, featured image
- Set post status to "draft" for manual review or "publish" for full automation
- Nodes needed: +3 (HTTP Request for auth, HTTP Request for post creation, Set node for field mapping)
Implement content quality scoring:
- Add Function node after AI generation to analyze readability, keyword density, sentiment
- Use TextRazor or similar NLP API for advanced scoring
- Store quality scores in Supabase for performance tracking
- Reject low-scoring content and regenerate with adjusted prompts
- Nodes needed: +5 (HTTP Request to NLP API, Function for scoring logic, IF node for quality gate, Loop back to AI node)
Integration possibilities:
| Add This | To Get This | Complexity |
|---|---|---|
| Airtable sync | Visual content calendar with approval workflow | Easy (3 nodes) |
| Google Sheets export | Daily deal reports for non-technical team members | Easy (2 nodes) |
| Telegram bot | Real-time deal alerts to mobile devices | Medium (6 nodes) |
| Shopify integration | Auto-create discount products in your store | Medium (8 nodes) |
| Instagram API | Auto-post deals as Instagram stories | Hard (12+ nodes, requires Meta approval) |
Scale to handle more data:
- Replace manual trigger with Webhook node for continuous URL ingestion
- Implement queue system using Redis or RabbitMQ for URL processing
- Add Split In Batches node to process 1000 URLs in chunks of 50
- Performance improvement: 20x faster for >1000 URLs with parallel processing
Get Started Today
Ready to automate your discount content production?
- Download the template: Scroll to the bottom of this article to copy the n8n workflow JSON
- Import to n8n: Go to Workflows → Import from File, paste the JSON
- Configure your services: Add credentials for Supabase, Firecrawl, SearXNG, and OpenAI
- Set up database: Run the SQL schema creation script in your Supabase dashboard
- Test with sample data: Start with 5-10 URLs to verify scraping and content generation
- Deploy to production: Schedule the workflow or set up webhook triggers for continuous processing
Need help customizing this workflow for your specific discount content needs? Schedule an intro call with Atherial.
