Build an AI Research Agent That Writes Reports from URLs with n8n

Research takes time. You find sources, read through content, synthesize information, and write coherent reports. This AI research agent automates that entire process. Give it a URL and research question, and it extracts the content, conducts AI-powered research, and generates a complete report. You'll learn how to build this agent using n8n, Perplexity AI, and OpenAI—with a working JSON template at the end.

The Problem: Manual Research Is a Time Sink

Research workflows consume hours of productive time across industries. Content teams spend 3-5 hours per article researching competitors and industry trends. Analysts manually compile market intelligence from dozens of sources. Consultants synthesize client data into actionable reports.

Current challenges:

Manual content extraction from multiple URLs wastes 30-45 minutes per source
Context switching between research and writing breaks focus and reduces quality
Inconsistent report formatting requires additional editing time
Scaling research operations requires hiring more analysts

Business impact:

Time spent: 4-8 hours per comprehensive research report
Cost: $200-400 per report at standard consulting rates
Opportunity cost: Research time that could be spent on strategic analysis

The Solution Overview

This n8n workflow creates an AI research agent that operates in three phases: content extraction, AI research, and report generation. You provide a URL and research question through a webhook. The agent extracts content using Jina AI's reader API, sends the extracted content to Perplexity AI for deep research, and uses OpenAI to generate a structured report. The entire process runs automatically, delivering comprehensive research reports in 2-3 minutes instead of hours.

What You'll Build

This research agent handles the complete research-to-report pipeline with minimal configuration.

Component	Technology	Purpose
Input Interface	Webhook (POST)	Accepts URL and research question
Content Extraction	Jina AI Reader API	Converts web pages to clean markdown
Research Engine	Perplexity AI (sonar-pro)	Conducts AI-powered research with citations
Report Generation	OpenAI (gpt-4o-mini)	Synthesizes findings into structured reports
Output Delivery	HTTP Response	Returns markdown report via webhook

Key capabilities:

Extracts content from any public URL automatically
Conducts multi-source research with Perplexity's web search
Generates reports with proper citations and structure
Handles errors gracefully with detailed error messages
Returns results in clean markdown format

Prerequisites

Before starting, ensure you have:

n8n instance (cloud or self-hosted version 1.0+)
Jina AI account with API key (free tier available at jina.ai)
Perplexity AI API access with credits (api.perplexity.ai)
OpenAI API key with GPT-4 access (platform.openai.com)
Basic understanding of webhook testing (Postman or curl)
JSON editing capability for API credential configuration

Step 1: Set Up the Webhook Trigger

The workflow starts with a webhook that accepts POST requests containing your research parameters. This creates a programmable endpoint you can call from any application.

Configure the Webhook node:

Add a Webhook node as your trigger
Set HTTP Method to POST
Set Path to research-agent (or your preferred endpoint name)
Enable "Respond Immediately" to false (we'll respond after processing)
Set Authentication to None (add authentication in production)

Expected input format:

{
  "url": "https://example.com/article",
  "research_question": "What are the key trends in AI automation?"
}

Why this works:
The webhook creates a REST API endpoint that accepts structured data. By using POST instead of GET, you can send complex research questions without URL encoding issues. Disabling "Respond Immediately" ensures the workflow completes all processing before returning results, preventing timeout issues with long-running research tasks.

Step 2: Extract Content with Jina AI

The Jina AI Reader API converts any web page into clean, LLM-ready markdown. This eliminates HTML parsing complexity and provides consistent content formatting.

Configure the HTTP Request node:

Add an HTTP Request node after the webhook
Set Method to GET
Set URL to: https://r.jina.ai/{{ $json.body.url }}
Add header: Authorization: Bearer YOUR_JINA_API_KEY
Set Response Format to String (not JSON)

Node configuration:

{
  "method": "GET",
  "url": "=https://r.jina.ai/{{ $json.body.url }}",
  "authentication": "genericCredentialType",
  "genericAuthType": "httpHeaderAuth",
  "sendHeaders": true,
  "headerParameters": {
    "parameters": [
      {
        "name": "Authorization",
        "value": "Bearer YOUR_JINA_API_KEY"
      }
    ]
  },
  "options": {
    "response": {
      "response": {
        "responseFormat": "string"
      }
    }
  }
}

Why this approach:
Jina's Reader API handles JavaScript rendering, removes navigation elements, and extracts main content automatically. The markdown output includes proper heading hierarchy and preserves important formatting—perfect for LLM consumption. Using string response format prevents n8n from trying to parse the markdown as JSON.

Common issues:

403 errors indicate missing or invalid API key
Empty responses mean the URL is behind authentication
Timeout errors suggest the page takes >30 seconds to load (increase timeout in node options)

Step 3: Conduct AI Research with Perplexity

Perplexity AI searches the web and synthesizes information with citations. This node sends the extracted content and research question to Perplexity's sonar-pro model.

Configure the Perplexity HTTP Request node:

Add an HTTP Request node for Perplexity
Set Method to POST
Set URL to: https://api.perplexity.ai/chat/completions
Add header: Authorization: Bearer YOUR_PERPLEXITY_API_KEY
Add header: Content-Type: application/json
Set Body to JSON with this structure:

Request body:

{
  "model": "sonar-pro",
  "messages": [
    {
      "role": "system",
      "content": "You are a research assistant. Analyze the provided content and research question, then conduct additional research to provide comprehensive insights with citations."
    },
    {
      "role": "user",
      "content": "Content: {{ $('HTTP Request').item.json.data }}

Research Question: {{ $('Webhook').item.json.body.research_question }}

Conduct thorough research and provide detailed findings with sources."
    }
  ],
  "temperature": 0.2,
  "max_tokens": 4000
}

Why this works:
The sonar-pro model combines Perplexity's web search with advanced reasoning. Setting temperature to 0.2 produces focused, factual research rather than creative speculation. The 4000 token limit ensures comprehensive responses while staying within API constraints. The system message primes the model for research-focused output with proper citations.

Variables to customize:

temperature: Increase to 0.4-0.6 for more exploratory research
max_tokens: Reduce to 2000 for faster, more concise research
model: Switch to sonar (not sonar-pro) for faster, lower-cost research

Step 4: Generate the Final Report with OpenAI

OpenAI's GPT-4o-mini synthesizes Perplexity's research into a structured, readable report. This node formats the findings with proper markdown structure.

Configure the OpenAI HTTP Request node:

Add an HTTP Request node for OpenAI
Set Method to POST
Set URL to: https://api.openai.com/v1/chat/completions
Add header: Authorization: Bearer YOUR_OPENAI_API_KEY
Add header: Content-Type: application/json

Request body:

{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "system",
      "content": "You are a professional report writer. Transform research findings into a well-structured report with clear sections, bullet points, and proper citations. Use markdown formatting."
    },
    {
      "role": "user",
      "content": "Research findings:
{{ $('HTTP Request1').item.json.choices[0].message.content }}

Create a comprehensive report with:
1. Executive Summary
2. Key Findings
3. Detailed Analysis
4. Conclusions
5. Sources"
    }
  ],
  "temperature": 0.3,
  "max_tokens": 3000
}

Why this approach:
GPT-4o-mini balances quality and cost—perfect for report formatting tasks. The structured prompt ensures consistent report formatting across all runs. Temperature 0.3 maintains factual accuracy while allowing natural language flow. The explicit section requirements in the prompt guarantee every report follows the same structure.

Step 5: Return the Report via Webhook Response

The final node sends the generated report back through the webhook response. This completes the API request-response cycle.

Configure the Respond to Webhook node:

Add a Respond to Webhook node at the end
Set Respond With to JSON
Map the response body:

Response configuration:

{
  "status": "success",
  "report": "={{ $('HTTP Request2').item.json.choices[0].message.content }}",
  "metadata": {
    "url_analyzed": "={{ $('Webhook').item.json.body.url }}",
    "research_question": "={{ $('Webhook').item.json.body.research_question }}",
    "timestamp": "={{ $now.toISO() }}"
  }
}

Why this structure:
Wrapping the report in a JSON response with metadata makes the API response self-documenting. The status field enables programmatic success checking. Including the original URL and question in metadata helps with logging and debugging. The ISO timestamp provides audit trail capability.

Workflow Architecture Overview

This workflow consists of 5 nodes organized into 3 main processing phases:

Input handling (Node 1): Webhook receives URL and research question via POST request
Research pipeline (Nodes 2-4): Sequential content extraction, AI research, and report generation
Output delivery (Node 5): Structured JSON response with complete report

Execution flow:

Trigger: POST request to webhook endpoint with URL and research question
Average run time: 45-90 seconds depending on content length and API response times
Key dependencies: Jina AI, Perplexity AI, and OpenAI APIs must all be configured with valid credentials

Critical nodes:

HTTP Request (Jina): Extracts clean markdown from target URL—fails if URL is inaccessible
HTTP Request (Perplexity): Conducts web research with citations—quality depends on research question specificity
HTTP Request (OpenAI): Formats final report—structure consistency depends on prompt engineering

The complete n8n workflow JSON template is available at the bottom of this article.

Critical Configuration Settings

Jina AI Integration

Required fields:

API Key: Your Jina AI API key from jina.ai/reader
Endpoint: https://r.jina.ai/ (prepended to target URL)
Response format: String (not JSON or Auto-detect)

Common issues:

Using jina.ai instead of r.jina.ai → Results in 404 errors
Setting response format to JSON → Causes parsing errors on markdown content
Missing Bearer prefix in Authorization header → Returns 401 unauthorized

Perplexity AI Configuration

Required fields:

Model: sonar-pro for comprehensive research (or sonar for faster results)
Temperature: 0.2 (range: 0.0-1.0, lower = more factual)
Max tokens: 4000 (adjust based on research depth needed)

Why this approach:
Perplexity's sonar-pro model searches the web in real-time and includes citations automatically. This eliminates the need for separate web scraping infrastructure. The low temperature setting prioritizes factual accuracy over creative interpretation—critical for research applications.

OpenAI Report Generation

Required fields:

Model: gpt-4o-mini (balance of quality and cost)
Temperature: 0.3 (slightly higher than research phase for natural language)
Max tokens: 3000 (sufficient for detailed reports)

Variables to customize:

model: Use gpt-4o for higher quality reports (4x cost increase)
temperature: Increase to 0.5 for more engaging writing style
System prompt: Modify to change report structure and tone

Testing & Validation

Test the complete workflow:

Activate the workflow in n8n (toggle switch in top-right)
Get your webhook URL from the Webhook node settings
Send a test request using curl:

curl -X POST https://your-n8n-instance.com/webhook/research-agent \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://n8n.io/blog/",
    "research_question": "What are the latest features in n8n?"
  }'

Validate each stage:

Content extraction: Check that Jina returns clean markdown without HTML tags
Research quality: Verify Perplexity includes citations and covers the research question
Report structure: Confirm OpenAI output includes all required sections
Response format: Ensure JSON response is valid and includes metadata

Common troubleshooting:

Error	Cause	Solution
401 Unauthorized	Invalid API key	Verify API key is correct and active
Timeout after 60s	Long content processing	Increase timeout in node settings to 120s
Empty report field	OpenAI response parsing error	Check that response path matches API structure
Missing citations	Perplexity temperature too high	Reduce temperature to 0.2 or lower

Deployment Considerations

Production Deployment Checklist

Area	Requirement	Why It Matters
Error Handling	Add Error Trigger node with retry logic	Prevents workflow failures from API timeouts or rate limits
Authentication	Implement webhook authentication (API key or OAuth)	Prevents unauthorized usage and protects API credits
Rate Limiting	Add Queue node for high-volume scenarios	Prevents API rate limit errors when processing multiple requests
Monitoring	Configure workflow execution alerts in n8n	Detect failures within minutes instead of discovering them days later
Logging	Add Set nodes to log inputs/outputs	Enables debugging and quality auditing of research outputs
Cost Control	Set monthly budget alerts in API dashboards	Prevents unexpected bills from runaway usage

Error handling strategy:

Add an Error Trigger node that catches failures and implements exponential backoff:

First retry: Immediate
Second retry: 5 seconds delay
Third retry: 25 seconds delay
After 3 failures: Send alert via email or Slack

Monitoring recommendations:

Track average execution time (baseline: 60 seconds)
Monitor API error rates by provider
Set alerts for execution times >120 seconds
Log all research questions for quality review

Real-World Use Cases

Use Case 1: Competitive Intelligence Reports

Industry: SaaS companies tracking competitor features
Scale: 20-30 competitor URLs analyzed weekly
Modifications needed: Add scheduling trigger to run weekly, store reports in Airtable or Google Sheets
Time savings: 6 hours per week (from 8 hours manual to 2 hours review)

Use Case 2: Content Research for Writers

Industry: Content marketing agencies and freelance writers
Scale: 5-10 articles researched daily
Modifications needed: Add multiple URL inputs, aggregate findings across sources
Output enhancement: Include keyword density analysis and content gap identification

Use Case 3: Market Research Synthesis

Industry: Consulting firms and investment analysts
Scale: 50-100 sources per research project
Modifications needed: Add batch processing, integrate with CRM for client delivery
Quality improvement: Include sentiment analysis and trend identification

Use Case 4: Academic Literature Review

Industry: Researchers and graduate students
Scale: 30-50 papers per literature review
Modifications needed: Add PDF extraction (replace Jina with PDF parser), include citation formatting
Time savings: 12-15 hours per literature review

Customizing This Workflow

Alternative Integrations

Instead of Jina AI:

Firecrawl API: Better for JavaScript-heavy sites—requires changing HTTP Request URL to https://api.firecrawl.dev/v0/scrape
Apify Web Scraper: Best for sites requiring authentication—swap HTTP Request node with Apify node
Custom Python script: Use when you need specific extraction logic—add Code node with BeautifulSoup

Instead of Perplexity AI:

OpenAI with web browsing: Use GPT-4 with browsing enabled—requires OpenAI node configuration change
Anthropic Claude with search: Better for longer context—change HTTP Request endpoint to Anthropic API
Google Gemini: Use when you need multimodal research—requires Google AI node

Workflow Extensions

Add automated report distribution:

Connect Gmail or SendGrid node after report generation
Schedule daily/weekly research runs with Cron trigger
Store reports in Google Drive with automatic folder organization
Nodes needed: +4 (Gmail/SendGrid, Google Drive, Schedule Trigger, Set)

Scale to handle multiple URLs:

Add Loop Over Items node to process URL arrays
Implement parallel processing with Split In Batches node
Aggregate findings from multiple sources into single report
Performance improvement: Process 10 URLs in 3 minutes vs 30 minutes sequentially

Add quality scoring:

Integrate additional OpenAI call to score report completeness
Check for citation count and source diversity
Flag reports below quality threshold for human review
Nodes needed: +3 (HTTP Request for scoring, IF node for threshold, Set for metadata)

Integration possibilities:

Add This	To Get This	Complexity
Slack integration	Post reports to team channels automatically	Easy (2 nodes)
Airtable database	Store and organize all research reports	Easy (3 nodes)
Notion integration	Create formatted pages with reports	Medium (5 nodes)
Email digest	Weekly summary of all research conducted	Medium (6 nodes)
Zapier webhook	Connect to 5000+ apps without custom code	Easy (1 node)

Get Started Today

Ready to automate your research workflow?

Download the template: Scroll to the bottom of this article to copy the complete n8n workflow JSON
Import to n8n: Go to Workflows → Add Workflow → Import from File, paste the JSON
Configure your API keys: Add credentials for Jina AI, Perplexity AI, and OpenAI in the respective HTTP Request nodes
Test with sample data: Send a POST request with a test URL and research question
Deploy to production: Add error handling, authentication, and activate the workflow

Next steps for customization:

Modify the report structure prompt to match your formatting preferences
Add your company's research guidelines to the system prompts
Integrate with your existing content management or CRM systems
Set up monitoring and alerts for production usage

Need help customizing this workflow for your specific research needs? Schedule an intro call with Atherial.

How to Build an AI Research Agent That Writes Reports from URLs with n8n (Free Template)

The Problem: Manual Research Is a Time Sink

The Solution Overview

Have us adapt this automation to your workflow

What You'll Build

Prerequisites

Step 1: Set Up the Webhook Trigger

Step 2: Extract Content with Jina AI

Step 3: Conduct AI Research with Perplexity

Step 4: Generate the Final Report with OpenAI

Step 5: Return the Report via Webhook Response

Workflow Architecture Overview

Critical Configuration Settings

Testing & Validation

Deployment Considerations

Real-World Use Cases

Customizing This Workflow

Get Started Today

Turn this idea into a working prototype