Most automation tools execute single tasks. This n8n agent operates as a complete synthetic employee—checking its own task list, using browser automation to complete work, seeing images, hearing voice commands, and learning from documented procedures. You'll build an autonomous loop that runs every 15 minutes, pulling tasks from Notion, executing them with Puppeteer browser control, and consulting vector memory for SOPs before taking action.
The Problem: Manual Task Execution Doesn't Scale
Your team drowns in repetitive browser-based tasks. Someone must log into dashboards, extract data, screenshot pages, and compile reports. Each task requires human attention, even when the steps are identical every time.
Current challenges:
- Employees spend 8-12 hours weekly on repetitive browser tasks
- Voice notes and images require manual processing before action
- Documented SOPs sit unused in Notion—no one checks them before starting work
- Tasks pile up when team members are unavailable
- No system remembers context between related tasks
Business impact:
- Time spent: 40+ hours per month on automatable work
- Error rate: 15-20% when procedures aren't followed exactly
- Response delay: 4-8 hours between task assignment and completion
- Knowledge loss: Procedures exist but aren't consulted consistently
The Solution Overview
This n8n workflow creates an autonomous agent that operates on a 15-minute loop. It checks Notion for assigned tasks, retrieves relevant SOPs from Qdrant vector memory, executes browser automation with Puppeteer, processes images through vision APIs, and transcribes voice commands via Groq Whisper. The entire system runs self-hosted on Elestio, using a custom high-reasoning LLM that's OpenAI-compatible. When the agent encounters obstacles, it requests human help through Slack or Discord. This architecture separates the "brain" (your custom reasoning API) from the "body" (the n8n execution layer), making the intelligence layer completely swappable.
What You'll Build
This autonomous agent system delivers complete multimodal task execution with memory and learning capabilities.
| Component | Technology | Purpose |
|---|---|---|
| Task Queue | Notion Database | Centralized task assignment and status tracking |
| Execution Loop | n8n Schedule Trigger | 15-minute autonomous check-and-execute cycle |
| Browser Automation | Puppeteer/Playwright | Login, navigation, data extraction, screenshots |
| Vision Processing | Custom Vision API | Image analysis and visual data extraction |
| Voice Transcription | Groq Whisper API | Voice note to text conversion |
| Vector Memory | Qdrant | SOP storage and contextual retrieval |
| Reasoning Engine | Custom OpenAI-compatible LLM | Task planning and decision-making |
| Error Handling | Slack/Discord Webhooks | Human escalation when stuck |
| Hosting Infrastructure | Elestio | Self-hosted n8n instance with full control |
Prerequisites
Before starting, ensure you have:
- n8n instance on Elestio (or self-hosted with Docker)
- Notion workspace with API integration enabled
- Qdrant vector database instance (cloud or self-hosted)
- Groq API account with Whisper model access
- Custom vision API endpoint and credentials
- Your OpenAI-compatible reasoning model URL and API key
- Slack or Discord webhook URL for notifications
- Basic JavaScript knowledge for Function nodes
- Understanding of REST API authentication
Step 1: Configure the Autonomous Task Loop
The agent's "body" starts with a Schedule Trigger that fires every 15 minutes. This creates the autonomous loop—the agent doesn't wait for human commands.
Configure the Schedule Trigger:
- Add a Schedule Trigger node to your workflow
- Set interval to "Every 15 Minutes"
- Configure timezone to match your operation hours
- Add execution conditions to prevent off-hours runs if needed
Connect to Notion Database:
- Add a Notion node after the Schedule Trigger
- Select operation: "Get Database Items"
- Configure filters to retrieve only tasks with status "Ready" or "Assigned"
- Sort by priority field (descending) to handle urgent tasks first
Node configuration:
{
"databaseId": "{{$env.NOTION_DATABASE_ID}}",
"filters": {
"and": [
{
"property": "Status",
"select": {
"equals": "Ready"
}
}
]
},
"sorts": [
{
"property": "Priority",
"direction": "descending"
}
]
}
Why this works:
The 15-minute interval balances responsiveness with API rate limits. Notion's filter system ensures the agent only sees actionable tasks, preventing wasted execution cycles. Priority sorting means urgent work gets handled first, even when multiple tasks queue up.
Step 2: Implement Vector Memory with Qdrant
Before executing any task, the agent must check Qdrant for relevant SOPs. This is the "memory" component—the agent learns from documented procedures.
Set Up Qdrant Connection:
- Add an HTTP Request node after the Notion retrieval
- Configure authentication with your Qdrant API key
- Set method to POST for vector search
- Build the search query using task description as context
Query construction:
// In a Function node before the Qdrant HTTP Request
const taskDescription = $input.item.json.properties.Description.rich_text[0].plain_text;
return {
json: {
vector: await generateEmbedding(taskDescription), // Use your embedding model
limit: 3,
score_threshold: 0.7,
with_payload: true
}
};
Qdrant HTTP Request configuration:
{
"method": "POST",
"url": "{{$env.QDRANT_URL}}/collections/sops/points/search",
"authentication": "headerAuth",
"headerAuth": {
"name": "api-key",
"value": "={{$env.QDRANT_API_KEY}}"
},
"body": {
"vector": "={{$json.vector}}",
"limit": 3,
"score_threshold": 0.7,
"with_payload": true
}
}
Why this approach:
Vector search retrieves SOPs semantically related to the task, not just keyword matches. A score threshold of 0.7 filters out irrelevant procedures. Limiting to 3 results prevents context overload while providing enough guidance. The agent now has "institutional memory" without hardcoded rules.
Variables to customize:
limit: Increase to 5 for complex tasks requiring multiple proceduresscore_threshold: Lower to 0.6 if you're getting too few results, raise to 0.8 for stricter matching
Step 3: Build Browser Automation with Puppeteer
The agent's "hands" use Puppeteer to control a headless browser. This test case demonstrates login, screenshot, and data extraction.
Install Puppeteer in n8n:
Your Elestio n8n instance needs Puppeteer installed. Add this to your Docker configuration or run in the container:
npm install puppeteer
Configure the Execute Command Node:
- Add an Execute Command node after retrieving the SOP
- Set command to run a Node.js script
- Pass task parameters as environment variables
Puppeteer automation script:
// In a Function node that generates the Puppeteer script
const puppeteerScript = `
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
// Navigate to target URL
await page.goto('${$json.targetUrl}', { waitUntil: 'networkidle2' });
// Login sequence
await page.type('#username', '${$env.TARGET_USERNAME}');
await page.type('#password', '${$env.TARGET_PASSWORD}');
await page.click('button[type="submit"]');
await page.waitForNavigation({ waitUntil: 'networkidle2' });
// Take screenshot
const screenshot = await page.screenshot({
encoding: 'base64',
fullPage: true
});
// Extract data
const data = await page.evaluate(() => {
return {
title: document.querySelector('h1').innerText,
stats: Array.from(document.querySelectorAll('.stat-value')).map(el => el.innerText)
};
});
await browser.close();
console.log(JSON.stringify({ screenshot, data }));
})();
`;
return { json: { script: puppeteerScript } };
Execute the browser automation:
Add an Execute Command node with:
- Command:
node - Arguments: Pass the script via stdin or temp file
- Capture stdout to retrieve screenshot and extracted data
Why this works:
Puppeteer runs in headless mode, consuming minimal resources. The waitUntil: 'networkidle2' ensures pages fully load before interaction. Base64 screenshot encoding allows direct storage in Notion or transmission via API without file system dependencies. The evaluate() method runs JavaScript in the browser context, enabling complex data extraction.
Common issues:
- Timeout errors → Increase
waitUntiltimeout or add explicitpage.waitForSelector() - Login failures → Add
await page.waitForTimeout(2000)after form submission - Missing data → Use browser DevTools to verify selectors match the actual DOM structure
Step 4: Integrate Vision API for Image Processing
The agent's "eyes" process images through a custom vision API. This handles screenshots from Puppeteer or images attached to Notion tasks.
Configure Vision API Node:
- Add an HTTP Request node after Puppeteer execution
- Set method to POST
- Configure custom authentication headers
- Send base64-encoded image in request body
Vision API request configuration:
{
"method": "POST",
"url": "{{$env.VISION_API_URL}}/analyze",
"authentication": "headerAuth",
"headerAuth": {
"name": "Authorization",
"value": "Bearer {{$env.VISION_API_KEY}}"
},
"body": {
"image": "={{$json.screenshot}}",
"tasks": ["ocr", "object_detection", "scene_understanding"],
"detail": "high"
},
"options": {
"timeout": 30000
}
}
Process vision results:
// Function node to parse vision API response
const visionResults = $input.item.json;
return {
json: {
extractedText: visionResults.ocr.text,
detectedObjects: visionResults.objects.map(obj => obj.label),
sceneDescription: visionResults.scene.description,
confidence: visionResults.confidence_score
}
};
Why this approach:
Requesting multiple analysis tasks (OCR, object detection, scene understanding) in one API call reduces latency. The 30-second timeout accommodates large images. High detail mode improves accuracy for dashboard screenshots with small text. The confidence score lets you flag low-quality results for human review.
Step 5: Add Voice Transcription with Groq Whisper
The agent's "ears" transcribe voice notes instantly using Groq's Whisper API. This allows verbal task assignment.
Configure Groq Whisper Node:
- Add an HTTP Request node triggered by voice note upload
- Set method to POST with multipart/form-data
- Configure Groq API authentication
- Send audio file for transcription
Groq Whisper configuration:
{
"method": "POST",
"url": "https://api.groq.com/openai/v1/audio/transcriptions",
"authentication": "headerAuth",
"headerAuth": {
"name": "Authorization",
"value": "Bearer {{$env.GROQ_API_KEY}}"
},
"body": {
"file": "={{$binary.audio}}",
"model": "whisper-large-v3",
"language": "en",
"response_format": "json",
"temperature": 0.0
}
}
Process transcription and create task:
// Function node to convert transcription to Notion task
const transcription = $input.item.json.text;
// Extract task components using simple parsing
const taskMatch = transcription.match(/create a task to (.+)/i);
const priorityMatch = transcription.match(/priority (high|medium|low)/i);
return {
json: {
taskDescription: taskMatch ? taskMatch[1] : transcription,
priority: priorityMatch ? priorityMatch[1] : "medium",
status: "Ready",
source: "voice_note",
timestamp: new Date().toISOString()
}
};
Why this works:
Groq's Whisper model delivers transcription in under 2 seconds for typical voice notes. Temperature 0.0 ensures deterministic output—the same audio always produces identical text. JSON response format simplifies parsing. The language parameter optimizes for English, improving accuracy. Simple regex parsing extracts task details without requiring LLM processing, reducing latency and cost.
Step 6: Connect Your Custom Reasoning Model
The agent's "brain" uses your custom high-reasoning LLM. This step makes the intelligence layer completely swappable.
Configure OpenAI-Compatible Node:
- Add an OpenAI node (or HTTP Request node)
- Use environment variables for base URL and API key
- Structure prompts to include task context, SOP guidance, and execution results
OpenAI node configuration:
{
"resource": "chat",
"operation": "create",
"options": {
"baseURL": "={{$env.CUSTOM_LLM_BASE_URL}}",
"apiKey": "={{$env.CUSTOM_LLM_API_KEY}}"
},
"messages": [
{
"role": "system",
"content": "You are an autonomous task execution agent. Follow SOPs exactly. If you encounter errors, explain what went wrong and what help you need."
},
{
"role": "user",
"content": "Task: {{$json.taskDescription}}
Relevant SOP: {{$json.sop}}
Browser automation result: {{$json.browserResult}}
Vision analysis: {{$json.visionResult}}
What is the next action?"
}
],
"model": "{{$env.CUSTOM_LLM_MODEL}}",
"temperature": 0.3,
"max_tokens": 1000
}
Why this approach:
Using environment variables for baseURL and apiKey means you swap models by changing two values—no workflow editing required. The OpenAI-compatible format works with any provider (OpenRouter, Together AI, Anyscale, or your private deployment). Low temperature (0.3) ensures consistent reasoning. The system prompt establishes agent behavior. The user prompt provides complete context: what to do (task), how to do it (SOP), what happened (results), and what was seen (vision).
Variables to customize:
temperature: Increase to 0.5-0.7 for creative tasks, keep at 0.1-0.3 for procedural workmax_tokens: Increase to 2000 for complex reasoning chainsmodel: Point to different model versions without workflow changes
Step 7: Implement Error Handling and Human Escalation
When the agent gets stuck, it requests help through Slack or Discord. This prevents silent failures.
Configure Error Detection:
// Function node to evaluate if agent is stuck
const llmResponse = $input.item.json.choices[0].message.content;
const browserSuccess = $input.item.json.browserResult.success;
const confidenceScore = $input.item.json.visionResult.confidence;
const isStuck =
llmResponse.toLowerCase().includes("i need help") ||
llmResponse.toLowerCase().includes("unable to") ||
!browserSuccess ||
confidenceScore < 0.6;
return {
json: {
stuck: isStuck,
reason: isStuck ? determineReason(llmResponse, browserSuccess, confidenceScore) : null,
originalTask: $input.item.json.taskDescription
}
};
function determineReason(response, browserSuccess, confidence) {
if (!browserSuccess) return "Browser automation failed";
if (confidence < 0.6) return "Vision analysis uncertain";
if (response.includes("unable to")) return "LLM cannot proceed";
return "General execution error";
}
Slack/Discord notification:
{
"method": "POST",
"url": "{{$env.SLACK_WEBHOOK_URL}}",
"body": {
"text": "🚨 Agent needs help",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Task:* {{$json.originalTask}}
*Reason:* {{$json.reason}}
*Status:* Paused and awaiting human input"
}
},
{
"type": "actions",
"elements": [
{
"type": "button",
"text": {
"type": "plain_text",
"text": "View in Notion"
},
"url": "{{$json.notionTaskUrl}}"
}
]
}
]
}
}
Why this works:
Multiple failure detection methods catch different error types. Browser failures indicate technical issues. Low vision confidence suggests unclear screenshots. LLM responses containing "I need help" show reasoning limitations. The notification includes context (what task, why stuck) and a direct link to Notion for quick human intervention. This prevents the agent from repeatedly attempting impossible tasks.
Workflow Architecture Overview
This workflow consists of 18 nodes organized into 5 main sections:
- Task retrieval and memory (Nodes 1-5): Schedule trigger fires every 15 minutes, queries Notion for ready tasks, retrieves relevant SOPs from Qdrant vector memory
- Execution layer (Nodes 6-11): Puppeteer browser automation, screenshot capture, data extraction, vision API processing
- Reasoning engine (Nodes 12-14): Custom LLM analyzes results, consults SOPs, determines next actions
- Multimodal input (Nodes 15-16): Groq Whisper transcription for voice notes, separate trigger for audio file uploads
- Error handling (Nodes 17-18): Stuck detection logic, Slack/Discord human escalation
Execution flow:
- Trigger: Schedule (every 15 minutes) or webhook (for voice notes)
- Average run time: 45-90 seconds per task
- Key dependencies: Notion API, Qdrant, Groq, custom LLM endpoint, Puppeteer
Critical nodes:
- Schedule Trigger: Creates autonomous loop—agent doesn't wait for commands
- Qdrant HTTP Request: Retrieves SOPs before execution—this is the "learning" component
- Execute Command (Puppeteer): Browser automation—the agent's "hands"
- Custom LLM HTTP Request: Reasoning and decision-making—the swappable "brain"
- IF Node (Stuck Detection): Routes to human escalation when agent cannot proceed
The complete n8n workflow JSON template is available at the bottom of this article.
Critical Configuration Settings
Custom LLM Integration
Required environment variables:
CUSTOM_LLM_BASE_URL: Your OpenAI-compatible endpoint (e.g.,https://api.your-model.com/v1)CUSTOM_LLM_API_KEY: Authentication token for your modelCUSTOM_LLM_MODEL: Model identifier (e.g.,your-reasoning-model-v2)
Common issues:
- Using wrong API version → Check if your endpoint requires
/v1or/v2suffix - Authentication failures → Verify API key format (some require
Bearerprefix, others don't) - Model not found errors → Confirm model name matches exactly what your provider expects
Qdrant Vector Memory
Required fields:
- Collection name:
sops(create this in Qdrant before first run) - Vector dimensions: Must match your embedding model (typically 1536 for OpenAI, 768 for sentence-transformers)
- Distance metric: Cosine similarity (best for semantic search)
Why this approach:
Separating the reasoning model from the workflow means you can upgrade your "brain" without touching the "body." Testing a new model? Change one environment variable. Your custom model goes down? Swap in OpenAI's API as backup. This architecture treats intelligence as a pluggable component.
Puppeteer Configuration
Docker considerations for Elestio:
- Install Chromium dependencies:
apt-get install -y chromium-browser - Set
--no-sandboxflag (required in Docker containers) - Allocate 2GB+ RAM for browser instances
- Use
--disable-dev-shm-usageif you encounter shared memory errors
Variables to customize:
viewport: Adjust width/height for different screen sizes (mobile: 375x667, desktop: 1920x1080)waitUntil: Change fromnetworkidle2toloadfor faster execution on simple pagestimeout: Increase from default 30s to 60s for slow-loading dashboards
Testing & Validation
Test each component independently:
- Task retrieval: Manually trigger the Schedule node, verify Notion returns expected tasks
- Vector memory: Query Qdrant directly with a test embedding, confirm SOP retrieval
- Browser automation: Run Puppeteer script outside n8n first, validate login and screenshot
- Vision API: Send a test image, review OCR and object detection accuracy
- Voice transcription: Upload a sample audio file, check transcription quality
- LLM reasoning: Test your custom model endpoint with curl before integrating
Run end-to-end validation:
Create a test task in Notion with known requirements:
- Task: "Log into example.com and extract the dashboard title"
- Expected SOP: Should retrieve "Dashboard Login Procedure" from Qdrant
- Expected result: Screenshot of logged-in page + extracted title text
Monitor execution in n8n:
- Check each node's output for expected data structure
- Verify Puppeteer completes without timeout errors
- Confirm vision API returns confidence >0.7
- Review LLM response for correct next action
Troubleshooting common issues:
| Issue | Cause | Solution |
|---|---|---|
| "No tasks found" every cycle | Notion filter too restrictive | Check Status field values match exactly |
| Puppeteer timeout | Page load too slow | Increase timeout to 60s, add explicit waits |
| Vision API low confidence | Screenshot quality poor | Increase viewport size, use PNG format |
| LLM gives generic responses | Insufficient context | Include full SOP text and all execution results |
| Qdrant returns no SOPs | Embedding mismatch | Verify vector dimensions match collection config |
Deployment Considerations
Production Deployment Checklist
| Area | Requirement | Why It Matters |
|---|---|---|
| Error Handling | Retry logic with exponential backoff | Prevents data loss on temporary API failures |
| Monitoring | Webhook health checks every 5 min | Detect failures within 5 minutes vs hours |
| Rate Limiting | Implement token bucket for APIs | Avoid hitting provider limits during burst activity |
| Logging | Store full execution logs for 30 days | Debug issues that only appear in production |
| Secrets Management | Use n8n credentials, never hardcode | Rotate API keys without workflow changes |
| Resource Limits | Set max concurrent executions to 3 | Prevent memory exhaustion from parallel browser instances |
| Backup Strategy | Export workflow JSON weekly | Recover quickly from accidental deletions |
Customization ideas:
- Add task prioritization: Implement urgency scoring based on task age and priority field
- Create execution reports: Send daily summaries of completed tasks, success rate, and stuck instances
- Implement learning feedback: Store successful execution patterns back to Qdrant for future reference
- Add multi-language support: Configure Whisper for multiple languages, route to appropriate LLM prompts
- Scale browser automation: Use BrowserBase or Browserless for managed browser infrastructure
Use Cases & Variations
Use Case 1: Automated Competitive Intelligence
- Industry: SaaS, E-commerce
- Scale: 50+ competitor sites monitored daily
- Modifications needed: Add price extraction logic, store historical data in PostgreSQL, generate comparison reports
- Task example: "Check competitor pricing page, screenshot changes, extract new features"
Use Case 2: Customer Support Ticket Processing
- Industry: Support operations
- Scale: 200+ tickets/day
- Modifications needed: Replace Notion with Zendesk API, add sentiment analysis to vision results, route to appropriate team
- Task example: "Review support ticket screenshot, extract issue type, suggest SOP-based response"
Use Case 3: Data Entry from Invoices
- Industry: Accounting, Finance
- Scale: 500+ invoices/month
- Modifications needed: Add OCR validation, implement double-entry verification, connect to QuickBooks API
- Task example: "Extract invoice data from PDF screenshot, validate against PO, create accounting entry"
Use Case 4: Social Media Content Moderation
- Industry: Community management
- Scale: 1000+ posts/day
- Modifications needed: Add content policy SOPs to Qdrant, implement confidence-based auto-approval, flag edge cases
- Task example: "Review flagged post screenshot, check against community guidelines, approve or escalate"
Use Case 5: Research Report Generation
- Industry: Market research, Consulting
- Scale: 20+ reports/week
- Modifications needed: Add web scraping nodes, implement citation tracking, generate formatted documents
- Task example: "Research topic from voice note, gather data from 10 sources, compile findings into report"
Customizing This Workflow
Alternative Integrations
Instead of Notion:
- Airtable: Better for complex relational data - requires changing API endpoints in nodes 2-3, same filter logic applies
- Google Sheets: Simplest option for small teams - swap Notion node for Google Sheets node, use row numbers as task IDs
- Linear: Best for engineering teams - requires OAuth setup, provides better task dependencies
Instead of Qdrant:
- Pinecone: Managed vector DB with better scaling - change HTTP Request URLs, same query structure
- Weaviate: Better for hybrid search (vector + keyword) - requires GraphQL queries instead of REST
- Supabase pgvector: Best if you already use Supabase - use SQL queries, simpler setup
Instead of Puppeteer:
- Playwright: Better cross-browser support - nearly identical API, change require statement
- Browser Use library: Higher-level abstractions - reduces code but less control
- BrowserBase: Managed browser infrastructure - eliminates Docker setup, costs $0.01/minute
Workflow Extensions
Add automated reporting:
- Add a Schedule node to run daily at 6 PM
- Connect to Google Slides API or Notion page creation
- Generate executive summary with task completion stats, error rates, time saved
- Nodes needed: +6 (Schedule, HTTP Request for data aggregation, Function for calculations, Google Slides/Notion nodes)
Scale to handle more data:
- Replace Notion with PostgreSQL for >1000 tasks/day
- Add batch processing (process 10 tasks per cycle instead of 1)
- Implement Redis caching for frequently accessed SOPs
- Performance improvement: 5x faster for high-volume scenarios
Add human-in-the-loop approval:
- Insert an approval step before browser automation executes
- Send Slack message with task preview and "Approve/Reject" buttons
- Pause workflow execution until human responds
- Nodes needed: +4 (Slack send, Webhook wait, IF condition, Notion status update)
Integration possibilities:
| Add This | To Get This | Complexity |
|---|---|---|
| Slack integration | Real-time task notifications in channels | Easy (2 nodes) |
| Zapier webhook | Connect to 5000+ apps without custom code | Easy (1 node) |
| PostgreSQL | Store execution history and analytics | Medium (5 nodes) |
| Google Drive | Save screenshots and reports automatically | Medium (3 nodes) |
| Stripe API | Process payment-related tasks | Medium (6 nodes) |
| Twilio | SMS notifications for critical errors | Easy (2 nodes) |
| Airtable sync | Better data visualization and sharing | Medium (4 nodes) |
| Power BI connector | Executive dashboards and BI reports | Advanced (8 nodes) |
Get Started Today
Ready to build your autonomous agent?
- Download the template: The complete n8n workflow JSON is available at the bottom of this article
- Set up your infrastructure: Deploy n8n on Elestio, create Notion database, set up Qdrant collection
- Configure environment variables: Add all API keys and URLs to n8n credentials
- Install Puppeteer: Run
npm install puppeteerin your n8n Docker container - Import the workflow: Go to Workflows → Import from File, paste the JSON
- Test each component: Validate Notion connection, Qdrant retrieval, browser automation independently
- Run end-to-end test: Create a simple test task and watch the agent execute it
- Deploy to production: Set the schedule to active and monitor the first few cycles
Need help customizing this workflow for your specific needs? Want to integrate with proprietary systems or scale to handle thousands of tasks? Schedule an intro call with Atherial at https://atherial.ai/contact—we'll help you build a synthetic employee that actually works.
