Research takes time. You find sources, read through content, synthesize information, and write coherent reports. This AI research agent automates that entire process. Give it a URL and research question, and it extracts the content, conducts AI-powered research, and generates a complete report. You'll learn how to build this agent using n8n, Perplexity AI, and OpenAI—with a working JSON template at the end.
The Problem: Manual Research Is a Time Sink
Research workflows consume hours of productive time across industries. Content teams spend 3-5 hours per article researching competitors and industry trends. Analysts manually compile market intelligence from dozens of sources. Consultants synthesize client data into actionable reports.
Current challenges:
- Manual content extraction from multiple URLs wastes 30-45 minutes per source
- Context switching between research and writing breaks focus and reduces quality
- Inconsistent report formatting requires additional editing time
- Scaling research operations requires hiring more analysts
Business impact:
- Time spent: 4-8 hours per comprehensive research report
- Cost: $200-400 per report at standard consulting rates
- Opportunity cost: Research time that could be spent on strategic analysis
The Solution Overview
This n8n workflow creates an AI research agent that operates in three phases: content extraction, AI research, and report generation. You provide a URL and research question through a webhook. The agent extracts content using Jina AI's reader API, sends the extracted content to Perplexity AI for deep research, and uses OpenAI to generate a structured report. The entire process runs automatically, delivering comprehensive research reports in 2-3 minutes instead of hours.
What You'll Build
This research agent handles the complete research-to-report pipeline with minimal configuration.
| Component | Technology | Purpose |
|---|---|---|
| Input Interface | Webhook (POST) | Accepts URL and research question |
| Content Extraction | Jina AI Reader API | Converts web pages to clean markdown |
| Research Engine | Perplexity AI (sonar-pro) | Conducts AI-powered research with citations |
| Report Generation | OpenAI (gpt-4o-mini) | Synthesizes findings into structured reports |
| Output Delivery | HTTP Response | Returns markdown report via webhook |
Key capabilities:
- Extracts content from any public URL automatically
- Conducts multi-source research with Perplexity's web search
- Generates reports with proper citations and structure
- Handles errors gracefully with detailed error messages
- Returns results in clean markdown format
Prerequisites
Before starting, ensure you have:
- n8n instance (cloud or self-hosted version 1.0+)
- Jina AI account with API key (free tier available at jina.ai)
- Perplexity AI API access with credits (api.perplexity.ai)
- OpenAI API key with GPT-4 access (platform.openai.com)
- Basic understanding of webhook testing (Postman or curl)
- JSON editing capability for API credential configuration
Step 1: Set Up the Webhook Trigger
The workflow starts with a webhook that accepts POST requests containing your research parameters. This creates a programmable endpoint you can call from any application.
Configure the Webhook node:
- Add a Webhook node as your trigger
- Set HTTP Method to POST
- Set Path to
research-agent(or your preferred endpoint name) - Enable "Respond Immediately" to false (we'll respond after processing)
- Set Authentication to None (add authentication in production)
Expected input format:
{
"url": "https://example.com/article",
"research_question": "What are the key trends in AI automation?"
}
Why this works:
The webhook creates a REST API endpoint that accepts structured data. By using POST instead of GET, you can send complex research questions without URL encoding issues. Disabling "Respond Immediately" ensures the workflow completes all processing before returning results, preventing timeout issues with long-running research tasks.
Step 2: Extract Content with Jina AI
The Jina AI Reader API converts any web page into clean, LLM-ready markdown. This eliminates HTML parsing complexity and provides consistent content formatting.
Configure the HTTP Request node:
- Add an HTTP Request node after the webhook
- Set Method to GET
- Set URL to:
https://r.jina.ai/{{ $json.body.url }} - Add header:
Authorization: Bearer YOUR_JINA_API_KEY - Set Response Format to String (not JSON)
Node configuration:
{
"method": "GET",
"url": "=https://r.jina.ai/{{ $json.body.url }}",
"authentication": "genericCredentialType",
"genericAuthType": "httpHeaderAuth",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{
"name": "Authorization",
"value": "Bearer YOUR_JINA_API_KEY"
}
]
},
"options": {
"response": {
"response": {
"responseFormat": "string"
}
}
}
}
Why this approach:
Jina's Reader API handles JavaScript rendering, removes navigation elements, and extracts main content automatically. The markdown output includes proper heading hierarchy and preserves important formatting—perfect for LLM consumption. Using string response format prevents n8n from trying to parse the markdown as JSON.
Common issues:
- 403 errors indicate missing or invalid API key
- Empty responses mean the URL is behind authentication
- Timeout errors suggest the page takes >30 seconds to load (increase timeout in node options)
Step 3: Conduct AI Research with Perplexity
Perplexity AI searches the web and synthesizes information with citations. This node sends the extracted content and research question to Perplexity's sonar-pro model.
Configure the Perplexity HTTP Request node:
- Add an HTTP Request node for Perplexity
- Set Method to POST
- Set URL to:
https://api.perplexity.ai/chat/completions - Add header:
Authorization: Bearer YOUR_PERPLEXITY_API_KEY - Add header:
Content-Type: application/json - Set Body to JSON with this structure:
Request body:
{
"model": "sonar-pro",
"messages": [
{
"role": "system",
"content": "You are a research assistant. Analyze the provided content and research question, then conduct additional research to provide comprehensive insights with citations."
},
{
"role": "user",
"content": "Content: {{ $('HTTP Request').item.json.data }}
Research Question: {{ $('Webhook').item.json.body.research_question }}
Conduct thorough research and provide detailed findings with sources."
}
],
"temperature": 0.2,
"max_tokens": 4000
}
Why this works:
The sonar-pro model combines Perplexity's web search with advanced reasoning. Setting temperature to 0.2 produces focused, factual research rather than creative speculation. The 4000 token limit ensures comprehensive responses while staying within API constraints. The system message primes the model for research-focused output with proper citations.
Variables to customize:
temperature: Increase to 0.4-0.6 for more exploratory researchmax_tokens: Reduce to 2000 for faster, more concise researchmodel: Switch tosonar(not sonar-pro) for faster, lower-cost research
Step 4: Generate the Final Report with OpenAI
OpenAI's GPT-4o-mini synthesizes Perplexity's research into a structured, readable report. This node formats the findings with proper markdown structure.
Configure the OpenAI HTTP Request node:
- Add an HTTP Request node for OpenAI
- Set Method to POST
- Set URL to:
https://api.openai.com/v1/chat/completions - Add header:
Authorization: Bearer YOUR_OPENAI_API_KEY - Add header:
Content-Type: application/json
Request body:
{
"model": "gpt-4o-mini",
"messages": [
{
"role": "system",
"content": "You are a professional report writer. Transform research findings into a well-structured report with clear sections, bullet points, and proper citations. Use markdown formatting."
},
{
"role": "user",
"content": "Research findings:
{{ $('HTTP Request1').item.json.choices[0].message.content }}
Create a comprehensive report with:
1. Executive Summary
2. Key Findings
3. Detailed Analysis
4. Conclusions
5. Sources"
}
],
"temperature": 0.3,
"max_tokens": 3000
}
Why this approach:
GPT-4o-mini balances quality and cost—perfect for report formatting tasks. The structured prompt ensures consistent report formatting across all runs. Temperature 0.3 maintains factual accuracy while allowing natural language flow. The explicit section requirements in the prompt guarantee every report follows the same structure.
Step 5: Return the Report via Webhook Response
The final node sends the generated report back through the webhook response. This completes the API request-response cycle.
Configure the Respond to Webhook node:
- Add a Respond to Webhook node at the end
- Set Respond With to JSON
- Map the response body:
Response configuration:
{
"status": "success",
"report": "={{ $('HTTP Request2').item.json.choices[0].message.content }}",
"metadata": {
"url_analyzed": "={{ $('Webhook').item.json.body.url }}",
"research_question": "={{ $('Webhook').item.json.body.research_question }}",
"timestamp": "={{ $now.toISO() }}"
}
}
Why this structure:
Wrapping the report in a JSON response with metadata makes the API response self-documenting. The status field enables programmatic success checking. Including the original URL and question in metadata helps with logging and debugging. The ISO timestamp provides audit trail capability.
Workflow Architecture Overview
This workflow consists of 5 nodes organized into 3 main processing phases:
- Input handling (Node 1): Webhook receives URL and research question via POST request
- Research pipeline (Nodes 2-4): Sequential content extraction, AI research, and report generation
- Output delivery (Node 5): Structured JSON response with complete report
Execution flow:
- Trigger: POST request to webhook endpoint with URL and research question
- Average run time: 45-90 seconds depending on content length and API response times
- Key dependencies: Jina AI, Perplexity AI, and OpenAI APIs must all be configured with valid credentials
Critical nodes:
- HTTP Request (Jina): Extracts clean markdown from target URL—fails if URL is inaccessible
- HTTP Request (Perplexity): Conducts web research with citations—quality depends on research question specificity
- HTTP Request (OpenAI): Formats final report—structure consistency depends on prompt engineering
The complete n8n workflow JSON template is available at the bottom of this article.
Critical Configuration Settings
Jina AI Integration
Required fields:
- API Key: Your Jina AI API key from jina.ai/reader
- Endpoint:
https://r.jina.ai/(prepended to target URL) - Response format: String (not JSON or Auto-detect)
Common issues:
- Using
jina.aiinstead ofr.jina.ai→ Results in 404 errors - Setting response format to JSON → Causes parsing errors on markdown content
- Missing Bearer prefix in Authorization header → Returns 401 unauthorized
Perplexity AI Configuration
Required fields:
- Model:
sonar-profor comprehensive research (orsonarfor faster results) - Temperature: 0.2 (range: 0.0-1.0, lower = more factual)
- Max tokens: 4000 (adjust based on research depth needed)
Why this approach:
Perplexity's sonar-pro model searches the web in real-time and includes citations automatically. This eliminates the need for separate web scraping infrastructure. The low temperature setting prioritizes factual accuracy over creative interpretation—critical for research applications.
OpenAI Report Generation
Required fields:
- Model:
gpt-4o-mini(balance of quality and cost) - Temperature: 0.3 (slightly higher than research phase for natural language)
- Max tokens: 3000 (sufficient for detailed reports)
Variables to customize:
model: Usegpt-4ofor higher quality reports (4x cost increase)temperature: Increase to 0.5 for more engaging writing style- System prompt: Modify to change report structure and tone
Testing & Validation
Test the complete workflow:
- Activate the workflow in n8n (toggle switch in top-right)
- Get your webhook URL from the Webhook node settings
- Send a test request using curl:
curl -X POST https://your-n8n-instance.com/webhook/research-agent \
-H "Content-Type: application/json" \
-d '{
"url": "https://n8n.io/blog/",
"research_question": "What are the latest features in n8n?"
}'
Validate each stage:
- Content extraction: Check that Jina returns clean markdown without HTML tags
- Research quality: Verify Perplexity includes citations and covers the research question
- Report structure: Confirm OpenAI output includes all required sections
- Response format: Ensure JSON response is valid and includes metadata
Common troubleshooting:
| Error | Cause | Solution |
|---|---|---|
| 401 Unauthorized | Invalid API key | Verify API key is correct and active |
| Timeout after 60s | Long content processing | Increase timeout in node settings to 120s |
| Empty report field | OpenAI response parsing error | Check that response path matches API structure |
| Missing citations | Perplexity temperature too high | Reduce temperature to 0.2 or lower |
Deployment Considerations
Production Deployment Checklist
| Area | Requirement | Why It Matters |
|---|---|---|
| Error Handling | Add Error Trigger node with retry logic | Prevents workflow failures from API timeouts or rate limits |
| Authentication | Implement webhook authentication (API key or OAuth) | Prevents unauthorized usage and protects API credits |
| Rate Limiting | Add Queue node for high-volume scenarios | Prevents API rate limit errors when processing multiple requests |
| Monitoring | Configure workflow execution alerts in n8n | Detect failures within minutes instead of discovering them days later |
| Logging | Add Set nodes to log inputs/outputs | Enables debugging and quality auditing of research outputs |
| Cost Control | Set monthly budget alerts in API dashboards | Prevents unexpected bills from runaway usage |
Error handling strategy:
Add an Error Trigger node that catches failures and implements exponential backoff:
- First retry: Immediate
- Second retry: 5 seconds delay
- Third retry: 25 seconds delay
- After 3 failures: Send alert via email or Slack
Monitoring recommendations:
- Track average execution time (baseline: 60 seconds)
- Monitor API error rates by provider
- Set alerts for execution times >120 seconds
- Log all research questions for quality review
Real-World Use Cases
Use Case 1: Competitive Intelligence Reports
- Industry: SaaS companies tracking competitor features
- Scale: 20-30 competitor URLs analyzed weekly
- Modifications needed: Add scheduling trigger to run weekly, store reports in Airtable or Google Sheets
- Time savings: 6 hours per week (from 8 hours manual to 2 hours review)
Use Case 2: Content Research for Writers
- Industry: Content marketing agencies and freelance writers
- Scale: 5-10 articles researched daily
- Modifications needed: Add multiple URL inputs, aggregate findings across sources
- Output enhancement: Include keyword density analysis and content gap identification
Use Case 3: Market Research Synthesis
- Industry: Consulting firms and investment analysts
- Scale: 50-100 sources per research project
- Modifications needed: Add batch processing, integrate with CRM for client delivery
- Quality improvement: Include sentiment analysis and trend identification
Use Case 4: Academic Literature Review
- Industry: Researchers and graduate students
- Scale: 30-50 papers per literature review
- Modifications needed: Add PDF extraction (replace Jina with PDF parser), include citation formatting
- Time savings: 12-15 hours per literature review
Customizing This Workflow
Alternative Integrations
Instead of Jina AI:
- Firecrawl API: Better for JavaScript-heavy sites—requires changing HTTP Request URL to
https://api.firecrawl.dev/v0/scrape - Apify Web Scraper: Best for sites requiring authentication—swap HTTP Request node with Apify node
- Custom Python script: Use when you need specific extraction logic—add Code node with BeautifulSoup
Instead of Perplexity AI:
- OpenAI with web browsing: Use GPT-4 with browsing enabled—requires OpenAI node configuration change
- Anthropic Claude with search: Better for longer context—change HTTP Request endpoint to Anthropic API
- Google Gemini: Use when you need multimodal research—requires Google AI node
Workflow Extensions
Add automated report distribution:
- Connect Gmail or SendGrid node after report generation
- Schedule daily/weekly research runs with Cron trigger
- Store reports in Google Drive with automatic folder organization
- Nodes needed: +4 (Gmail/SendGrid, Google Drive, Schedule Trigger, Set)
Scale to handle multiple URLs:
- Add Loop Over Items node to process URL arrays
- Implement parallel processing with Split In Batches node
- Aggregate findings from multiple sources into single report
- Performance improvement: Process 10 URLs in 3 minutes vs 30 minutes sequentially
Add quality scoring:
- Integrate additional OpenAI call to score report completeness
- Check for citation count and source diversity
- Flag reports below quality threshold for human review
- Nodes needed: +3 (HTTP Request for scoring, IF node for threshold, Set for metadata)
Integration possibilities:
| Add This | To Get This | Complexity |
|---|---|---|
| Slack integration | Post reports to team channels automatically | Easy (2 nodes) |
| Airtable database | Store and organize all research reports | Easy (3 nodes) |
| Notion integration | Create formatted pages with reports | Medium (5 nodes) |
| Email digest | Weekly summary of all research conducted | Medium (6 nodes) |
| Zapier webhook | Connect to 5000+ apps without custom code | Easy (1 node) |
Get Started Today
Ready to automate your research workflow?
- Download the template: Scroll to the bottom of this article to copy the complete n8n workflow JSON
- Import to n8n: Go to Workflows → Add Workflow → Import from File, paste the JSON
- Configure your API keys: Add credentials for Jina AI, Perplexity AI, and OpenAI in the respective HTTP Request nodes
- Test with sample data: Send a POST request with a test URL and research question
- Deploy to production: Add error handling, authentication, and activate the workflow
Next steps for customization:
- Modify the report structure prompt to match your formatting preferences
- Add your company's research guidelines to the system prompts
- Integrate with your existing content management or CRM systems
- Set up monitoring and alerts for production usage
Need help customizing this workflow for your specific research needs? Schedule an intro call with Atherial.
