What You'll Build
This autonomous agent system delivers complete multimodal task execution with memory and learning capabilities.
| Component |
Technology |
Purpose |
| Task Queue |
Notion Database |
Centralized task assignment and status tracking |
| Execution Loop |
n8n Schedule Trigger |
15-minute autonomous check-and-execute cycle |
| Browser Automation |
Puppeteer/Playwright |
Login, navigation, data extraction, screenshots |
| Vision Processing |
Custom Vision API |
Image analysis and visual data extraction |
| Voice Transcription |
Groq Whisper API |
Voice note to text conversion |
| Vector Memory |
Qdrant |
SOP storage and contextual retrieval |
| Reasoning Engine |
Custom OpenAI-compatible LLM |
Task planning and decision-making |
| Error Handling |
Slack/Discord Webhooks |
Human escalation when stuck |
| Hosting Infrastructure |
Elestio |
Self-hosted n8n instance with full control |
Prerequisites
Before starting, ensure you have:
Step 1: Configure the Autonomous Task Loop
The agent's "body" starts with a Schedule Trigger that fires every 15 minutes. This creates the autonomous loop—the agent doesn't wait for human commands.
Configure the Schedule Trigger:
- Add a Schedule Trigger node to your workflow
- Set interval to "Every 15 Minutes"
- Configure timezone to match your operation hours
- Add execution conditions to prevent off-hours runs if needed
Connect to Notion Database:
- Add a Notion node after the Schedule Trigger
- Select operation: "Get Database Items"
- Configure filters to retrieve only tasks with status "Ready" or "Assigned"
- Sort by priority field (descending) to handle urgent tasks first
Node configuration:
{
"databaseId": "{{$env.NOTION_DATABASE_ID}}",
"filters": {
"and": [
{
"property": "Status",
"select": {
"equals": "Ready"
}
}
]
},
"sorts": [
{
"property": "Priority",
"direction": "descending"
}
]
}
Why this works:
The 15-minute interval balances responsiveness with API rate limits. Notion's filter system ensures the agent only sees actionable tasks, preventing wasted execution cycles. Priority sorting means urgent work gets handled first, even when multiple tasks queue up.
Step 2: Implement Vector Memory with Qdrant
Before executing any task, the agent must check Qdrant for relevant SOPs. This is the "memory" component—the agent learns from documented procedures.
Set Up Qdrant Connection:
- Add an HTTP Request node after the Notion retrieval
- Configure authentication with your Qdrant API key
- Set method to POST for vector search
- Build the search query using task description as context
Query construction:
// In a Function node before the Qdrant HTTP Request
const taskDescription = $input.item.json.properties.Description.rich_text[0].plain_text;
return {
json: {
vector: await generateEmbedding(taskDescription), // Use your embedding model
limit: 3,
score_threshold: 0.7,
with_payload: true
}
};
Qdrant HTTP Request configuration:
{
"method": "POST",
"url": "{{$env.QDRANT_URL}}/collections/sops/points/search",
"authentication": "headerAuth",
"headerAuth": {
"name": "api-key",
"value": "={{$env.QDRANT_API_KEY}}"
},
"body": {
"vector": "={{$json.vector}}",
"limit": 3,
"score_threshold": 0.7,
"with_payload": true
}
}
Why this approach:
Vector search retrieves SOPs semantically related to the task, not just keyword matches. A score threshold of 0.7 filters out irrelevant procedures. Limiting to 3 results prevents context overload while providing enough guidance. The agent now has "institutional memory" without hardcoded rules.
Variables to customize:
limit: Increase to 5 for complex tasks requiring multiple procedures
score_threshold: Lower to 0.6 if you're getting too few results, raise to 0.8 for stricter matching
Step 3: Build Browser Automation with Puppeteer
The agent's "hands" use Puppeteer to control a headless browser. This test case demonstrates login, screenshot, and data extraction.
Install Puppeteer in n8n:
Your Elestio n8n instance needs Puppeteer installed. Add this to your Docker configuration or run in the container:
npm install puppeteer
Configure the Execute Command Node:
- Add an Execute Command node after retrieving the SOP
- Set command to run a Node.js script
- Pass task parameters as environment variables
Puppeteer automation script:
// In a Function node that generates the Puppeteer script
const puppeteerScript = `
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
// Navigate to target URL
await page.goto('${$json.targetUrl}', { waitUntil: 'networkidle2' });
// Login sequence
await page.type('#username', '${$env.TARGET_USERNAME}');
await page.type('#password', '${$env.TARGET_PASSWORD}');
await page.click('button[type="submit"]');
await page.waitForNavigation({ waitUntil: 'networkidle2' });
// Take screenshot
const screenshot = await page.screenshot({
encoding: 'base64',
fullPage: true
});
// Extract data
const data = await page.evaluate(() => {
return {
title: document.querySelector('h1').innerText,
stats: Array.from(document.querySelectorAll('.stat-value')).map(el => el.innerText)
};
});
await browser.close();
console.log(JSON.stringify({ screenshot, data }));
})();
`;
return { json: { script: puppeteerScript } };
Execute the browser automation:
Add an Execute Command node with:
- Command:
node
- Arguments: Pass the script via stdin or temp file
- Capture stdout to retrieve screenshot and extracted data
Why this works:
Puppeteer runs in headless mode, consuming minimal resources. The waitUntil: 'networkidle2' ensures pages fully load before interaction. Base64 screenshot encoding allows direct storage in Notion or transmission via API without file system dependencies. The evaluate() method runs JavaScript in the browser context, enabling complex data extraction.
Common issues:
- Timeout errors → Increase
waitUntil timeout or add explicit page.waitForSelector()
- Login failures → Add
await page.waitForTimeout(2000) after form submission
- Missing data → Use browser DevTools to verify selectors match the actual DOM structure
Step 4: Integrate Vision API for Image Processing
The agent's "eyes" process images through a custom vision API. This handles screenshots from Puppeteer or images attached to Notion tasks.
Configure Vision API Node:
- Add an HTTP Request node after Puppeteer execution
- Set method to POST
- Configure custom authentication headers
- Send base64-encoded image in request body
Vision API request configuration:
{
"method": "POST",
"url": "{{$env.VISION_API_URL}}/analyze",
"authentication": "headerAuth",
"headerAuth": {
"name": "Authorization",
"value": "Bearer {{$env.VISION_API_KEY}}"
},
"body": {
"image": "={{$json.screenshot}}",
"tasks": ["ocr", "object_detection", "scene_understanding"],
"detail": "high"
},
"options": {
"timeout": 30000
}
}
Process vision results:
// Function node to parse vision API response
const visionResults = $input.item.json;
return {
json: {
extractedText: visionResults.ocr.text,
detectedObjects: visionResults.objects.map(obj => obj.label),
sceneDescription: visionResults.scene.description,
confidence: visionResults.confidence_score
}
};
Why this approach:
Requesting multiple analysis tasks (OCR, object detection, scene understanding) in one API call reduces latency. The 30-second timeout accommodates large images. High detail mode improves accuracy for dashboard screenshots with small text. The confidence score lets you flag low-quality results for human review.
Step 5: Add Voice Transcription with Groq Whisper
The agent's "ears" transcribe voice notes instantly using Groq's Whisper API. This allows verbal task assignment.
Configure Groq Whisper Node:
- Add an HTTP Request node triggered by voice note upload
- Set method to POST with multipart/form-data
- Configure Groq API authentication
- Send audio file for transcription
Groq Whisper configuration:
{
"method": "POST",
"url": "https://api.groq.com/openai/v1/audio/transcriptions",
"authentication": "headerAuth",
"headerAuth": {
"name": "Authorization",
"value": "Bearer {{$env.GROQ_API_KEY}}"
},
"body": {
"file": "={{$binary.audio}}",
"model": "whisper-large-v3",
"language": "en",
"response_format": "json",
"temperature": 0.0
}
}
Process transcription and create task:
// Function node to convert transcription to Notion task
const transcription = $input.item.json.text;
// Extract task components using simple parsing
const taskMatch = transcription.match(/create a task to (.+)/i);
const priorityMatch = transcription.match(/priority (high|medium|low)/i);
return {
json: {
taskDescription: taskMatch ? taskMatch[1] : transcription,
priority: priorityMatch ? priorityMatch[1] : "medium",
status: "Ready",
source: "voice_note",
timestamp: new Date().toISOString()
}
};
Why this works:
Groq's Whisper model delivers transcription in under 2 seconds for typical voice notes. Temperature 0.0 ensures deterministic output—the same audio always produces identical text. JSON response format simplifies parsing. The language parameter optimizes for English, improving accuracy. Simple regex parsing extracts task details without requiring LLM processing, reducing latency and cost.
Step 6: Connect Your Custom Reasoning Model
The agent's "brain" uses your custom high-reasoning LLM. This step makes the intelligence layer completely swappable.
Configure OpenAI-Compatible Node:
- Add an OpenAI node (or HTTP Request node)
- Use environment variables for base URL and API key
- Structure prompts to include task context, SOP guidance, and execution results
OpenAI node configuration:
{
"resource": "chat",
"operation": "create",
"options": {
"baseURL": "={{$env.CUSTOM_LLM_BASE_URL}}",
"apiKey": "={{$env.CUSTOM_LLM_API_KEY}}"
},
"messages": [
{
"role": "system",
"content": "You are an autonomous task execution agent. Follow SOPs exactly. If you encounter errors, explain what went wrong and what help you need."
},
{
"role": "user",
"content": "Task: {{$json.taskDescription}}
Relevant SOP: {{$json.sop}}
Browser automation result: {{$json.browserResult}}
Vision analysis: {{$json.visionResult}}
What is the next action?"
}
],
"model": "{{$env.CUSTOM_LLM_MODEL}}",
"temperature": 0.3,
"max_tokens": 1000
}
Why this approach:
Using environment variables for baseURL and apiKey means you swap models by changing two values—no workflow editing required. The OpenAI-compatible format works with any provider (OpenRouter, Together AI, Anyscale, or your private deployment). Low temperature (0.3) ensures consistent reasoning. The system prompt establishes agent behavior. The user prompt provides complete context: what to do (task), how to do it (SOP), what happened (results), and what was seen (vision).
Variables to customize:
temperature: Increase to 0.5-0.7 for creative tasks, keep at 0.1-0.3 for procedural work
max_tokens: Increase to 2000 for complex reasoning chains
model: Point to different model versions without workflow changes
Step 7: Implement Error Handling and Human Escalation
When the agent gets stuck, it requests help through Slack or Discord. This prevents silent failures.
Configure Error Detection:
// Function node to evaluate if agent is stuck
const llmResponse = $input.item.json.choices[0].message.content;
const browserSuccess = $input.item.json.browserResult.success;
const confidenceScore = $input.item.json.visionResult.confidence;
const isStuck =
llmResponse.toLowerCase().includes("i need help") ||
llmResponse.toLowerCase().includes("unable to") ||
!browserSuccess ||
confidenceScore < 0.6;
return {
json: {
stuck: isStuck,
reason: isStuck ? determineReason(llmResponse, browserSuccess, confidenceScore) : null,
originalTask: $input.item.json.taskDescription
}
};
function determineReason(response, browserSuccess, confidence) {
if (!browserSuccess) return "Browser automation failed";
if (confidence < 0.6) return "Vision analysis uncertain";
if (response.includes("unable to")) return "LLM cannot proceed";
return "General execution error";
}
Slack/Discord notification:
{
"method": "POST",
"url": "{{$env.SLACK_WEBHOOK_URL}}",
"body": {
"text": "🚨 Agent needs help",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Task:* {{$json.originalTask}}
*Reason:* {{$json.reason}}
*Status:* Paused and awaiting human input"
}
},
{
"type": "actions",
"elements": [
{
"type": "button",
"text": {
"type": "plain_text",
"text": "View in Notion"
},
"url": "{{$json.notionTaskUrl}}"
}
]
}
]
}
}
Why this works:
Multiple failure detection methods catch different error types. Browser failures indicate technical issues. Low vision confidence suggests unclear screenshots. LLM responses containing "I need help" show reasoning limitations. The notification includes context (what task, why stuck) and a direct link to Notion for quick human intervention. This prevents the agent from repeatedly attempting impossible tasks.
Workflow Architecture Overview
This workflow consists of 18 nodes organized into 5 main sections:
- Task retrieval and memory (Nodes 1-5): Schedule trigger fires every 15 minutes, queries Notion for ready tasks, retrieves relevant SOPs from Qdrant vector memory
- Execution layer (Nodes 6-11): Puppeteer browser automation, screenshot capture, data extraction, vision API processing
- Reasoning engine (Nodes 12-14): Custom LLM analyzes results, consults SOPs, determines next actions
- Multimodal input (Nodes 15-16): Groq Whisper transcription for voice notes, separate trigger for audio file uploads
- Error handling (Nodes 17-18): Stuck detection logic, Slack/Discord human escalation
Execution flow:
- Trigger: Schedule (every 15 minutes) or webhook (for voice notes)
- Average run time: 45-90 seconds per task
- Key dependencies: Notion API, Qdrant, Groq, custom LLM endpoint, Puppeteer
Critical nodes:
- Schedule Trigger: Creates autonomous loop—agent doesn't wait for commands
- Qdrant HTTP Request: Retrieves SOPs before execution—this is the "learning" component
- Execute Command (Puppeteer): Browser automation—the agent's "hands"
- Custom LLM HTTP Request: Reasoning and decision-making—the swappable "brain"
- IF Node (Stuck Detection): Routes to human escalation when agent cannot proceed
The complete n8n workflow JSON template is available at the bottom of this article.
Critical Configuration Settings
Custom LLM Integration
Required environment variables:
CUSTOM_LLM_BASE_URL: Your OpenAI-compatible endpoint (e.g., https://api.your-model.com/v1)
CUSTOM_LLM_API_KEY: Authentication token for your model
CUSTOM_LLM_MODEL: Model identifier (e.g., your-reasoning-model-v2)
Common issues:
- Using wrong API version → Check if your endpoint requires
/v1 or /v2 suffix
- Authentication failures → Verify API key format (some require
Bearer prefix, others don't)
- Model not found errors → Confirm model name matches exactly what your provider expects
Qdrant Vector Memory
Required fields:
- Collection name:
sops (create this in Qdrant before first run)
- Vector dimensions: Must match your embedding model (typically 1536 for OpenAI, 768 for sentence-transformers)
- Distance metric: Cosine similarity (best for semantic search)
Why this approach:
Separating the reasoning model from the workflow means you can upgrade your "brain" without touching the "body." Testing a new model? Change one environment variable. Your custom model goes down? Swap in OpenAI's API as backup. This architecture treats intelligence as a pluggable component.
Puppeteer Configuration
Docker considerations for Elestio:
- Install Chromium dependencies:
apt-get install -y chromium-browser
- Set
--no-sandbox flag (required in Docker containers)
- Allocate 2GB+ RAM for browser instances
- Use
--disable-dev-shm-usage if you encounter shared memory errors
Variables to customize:
viewport: Adjust width/height for different screen sizes (mobile: 375x667, desktop: 1920x1080)
waitUntil: Change from networkidle2 to load for faster execution on simple pages
timeout: Increase from default 30s to 60s for slow-loading dashboards
Testing & Validation
Test each component independently:
- Task retrieval: Manually trigger the Schedule node, verify Notion returns expected tasks
- Vector memory: Query Qdrant directly with a test embedding, confirm SOP retrieval
- Browser automation: Run Puppeteer script outside n8n first, validate login and screenshot
- Vision API: Send a test image, review OCR and object detection accuracy
- Voice transcription: Upload a sample audio file, check transcription quality
- LLM reasoning: Test your custom model endpoint with curl before integrating
Run end-to-end validation:
Create a test task in Notion with known requirements:
- Task: "Log into example.com and extract the dashboard title"
- Expected SOP: Should retrieve "Dashboard Login Procedure" from Qdrant
- Expected result: Screenshot of logged-in page + extracted title text
Monitor execution in n8n:
- Check each node's output for expected data structure
- Verify Puppeteer completes without timeout errors
- Confirm vision API returns confidence >0.7
- Review LLM response for correct next action
Troubleshooting common issues:
| Issue |
Cause |
Solution |
| "No tasks found" every cycle |
Notion filter too restrictive |
Check Status field values match exactly |
| Puppeteer timeout |
Page load too slow |
Increase timeout to 60s, add explicit waits |
| Vision API low confidence |
Screenshot quality poor |
Increase viewport size, use PNG format |
| LLM gives generic responses |
Insufficient context |
Include full SOP text and all execution results |
| Qdrant returns no SOPs |
Embedding mismatch |
Verify vector dimensions match collection config |
Deployment Considerations
Production Deployment Checklist
| Area |
Requirement |
Why It Matters |
| Error Handling |
Retry logic with exponential backoff |
Prevents data loss on temporary API failures |
| Monitoring |
Webhook health checks every 5 min |
Detect failures within 5 minutes vs hours |
| Rate Limiting |
Implement token bucket for APIs |
Avoid hitting provider limits during burst activity |
| Logging |
Store full execution logs for 30 days |
Debug issues that only appear in production |
| Secrets Management |
Use n8n credentials, never hardcode |
Rotate API keys without workflow changes |
| Resource Limits |
Set max concurrent executions to 3 |
Prevent memory exhaustion from parallel browser instances |
| Backup Strategy |
Export workflow JSON weekly |
Recover quickly from accidental deletions |
Customization ideas:
- Add task prioritization: Implement urgency scoring based on task age and priority field
- Create execution reports: Send daily summaries of completed tasks, success rate, and stuck instances
- Implement learning feedback: Store successful execution patterns back to Qdrant for future reference
- Add multi-language support: Configure Whisper for multiple languages, route to appropriate LLM prompts
- Scale browser automation: Use BrowserBase or Browserless for managed browser infrastructure
Use Cases & Variations
Use Case 1: Automated Competitive Intelligence
- Industry: SaaS, E-commerce
- Scale: 50+ competitor sites monitored daily
- Modifications needed: Add price extraction logic, store historical data in PostgreSQL, generate comparison reports
- Task example: "Check competitor pricing page, screenshot changes, extract new features"
Use Case 2: Customer Support Ticket Processing
- Industry: Support operations
- Scale: 200+ tickets/day
- Modifications needed: Replace Notion with Zendesk API, add sentiment analysis to vision results, route to appropriate team
- Task example: "Review support ticket screenshot, extract issue type, suggest SOP-based response"
Use Case 3: Data Entry from Invoices
- Industry: Accounting, Finance
- Scale: 500+ invoices/month
- Modifications needed: Add OCR validation, implement double-entry verification, connect to QuickBooks API
- Task example: "Extract invoice data from PDF screenshot, validate against PO, create accounting entry"
Use Case 4: Social Media Content Moderation
- Industry: Community management
- Scale: 1000+ posts/day
- Modifications needed: Add content policy SOPs to Qdrant, implement confidence-based auto-approval, flag edge cases
- Task example: "Review flagged post screenshot, check against community guidelines, approve or escalate"
Use Case 5: Research Report Generation
- Industry: Market research, Consulting
- Scale: 20+ reports/week
- Modifications needed: Add web scraping nodes, implement citation tracking, generate formatted documents
- Task example: "Research topic from voice note, gather data from 10 sources, compile findings into report"
Customizing This Workflow
Alternative Integrations
Instead of Notion:
- Airtable: Better for complex relational data - requires changing API endpoints in nodes 2-3, same filter logic applies
- Google Sheets: Simplest option for small teams - swap Notion node for Google Sheets node, use row numbers as task IDs
- Linear: Best for engineering teams - requires OAuth setup, provides better task dependencies
Instead of Qdrant:
- Pinecone: Managed vector DB with better scaling - change HTTP Request URLs, same query structure
- Weaviate: Better for hybrid search (vector + keyword) - requires GraphQL queries instead of REST
- Supabase pgvector: Best if you already use Supabase - use SQL queries, simpler setup
Instead of Puppeteer:
- Playwright: Better cross-browser support - nearly identical API, change require statement
- Browser Use library: Higher-level abstractions - reduces code but less control
- BrowserBase: Managed browser infrastructure - eliminates Docker setup, costs $0.01/minute
Workflow Extensions
Add automated reporting:
- Add a Schedule node to run daily at 6 PM
- Connect to Google Slides API or Notion page creation
- Generate executive summary with task completion stats, error rates, time saved
- Nodes needed: +6 (Schedule, HTTP Request for data aggregation, Function for calculations, Google Slides/Notion nodes)
Scale to handle more data:
- Replace Notion with PostgreSQL for >1000 tasks/day
- Add batch processing (process 10 tasks per cycle instead of 1)
- Implement Redis caching for frequently accessed SOPs
- Performance improvement: 5x faster for high-volume scenarios
Add human-in-the-loop approval:
- Insert an approval step before browser automation executes
- Send Slack message with task preview and "Approve/Reject" buttons
- Pause workflow execution until human responds
- Nodes needed: +4 (Slack send, Webhook wait, IF condition, Notion status update)
Integration possibilities:
| Add This |
To Get This |
Complexity |
| Slack integration |
Real-time task notifications in channels |
Easy (2 nodes) |
| Zapier webhook |
Connect to 5000+ apps without custom code |
Easy (1 node) |
| PostgreSQL |
Store execution history and analytics |
Medium (5 nodes) |
| Google Drive |
Save screenshots and reports automatically |
Medium (3 nodes) |
| Stripe API |
Process payment-related tasks |
Medium (6 nodes) |
| Twilio |
SMS notifications for critical errors |
Easy (2 nodes) |
| Airtable sync |
Better data visualization and sharing |
Medium (4 nodes) |
| Power BI connector |
Executive dashboards and BI reports |
Advanced (8 nodes) |
Get Started Today
Ready to build your autonomous agent?
- Download the template: The complete n8n workflow JSON is available at the bottom of this article
- Set up your infrastructure: Deploy n8n on Elestio, create Notion database, set up Qdrant collection
- Configure environment variables: Add all API keys and URLs to n8n credentials
- Install Puppeteer: Run
npm install puppeteer in your n8n Docker container
- Import the workflow: Go to Workflows → Import from File, paste the JSON
- Test each component: Validate Notion connection, Qdrant retrieval, browser automation independently
- Run end-to-end test: Create a simple test task and watch the agent execute it
- Deploy to production: Set the schedule to active and monitor the first few cycles
Need help customizing this workflow for your specific needs? Want to integrate with proprietary systems or scale to handle thousands of tasks? Schedule an intro call with Atherial at https://atherial.ai/contact—we'll help you build a synthetic employee that actually works.
Complete N8N Workflow Template
Copy the JSON below and import it into your N8N instance via Workflows → Import from File
{
"name": "Autonomous AI Agent with Multimodal Senses",
"nodes": [
{
"id": "interval-trigger",
"name": "Schedule Trigger (15min)",
"type": "n8n-nodes-base.interval",
"position": [
100,
100
],
"parameters": {
"unit": "minutes",
"interval": 15
},
"typeVersion": 1
},
{
"id": "fetch-notion-tasks",
"name": "Fetch Notion Tasks",
"type": "n8n-nodes-base.notion",
"position": [
300,
100
],
"parameters": {
"simple": true,
"resource": "databasePage",
"operation": "getAll"
},
"typeVersion": 2.2
},
{
"id": "filter-pending-tasks",
"name": "Filter Pending Tasks",
"type": "n8n-nodes-base.filter",
"position": [
500,
100
],
"parameters": {
"conditions": {
"options": {
"operator": {
"name": "filter.operator.equals",
"value": "=="
},
"leftValue": "{{ $json.Status }}",
"rightValue": "Pending",
"caseSensitive": false
}
}
},
"typeVersion": 1
},
{
"id": "process-task-input",
"name": "Prepare Task Input",
"type": "n8n-nodes-base.code",
"position": [
700,
100
],
"parameters": {
"mode": "runOnceForAllItems",
"jsCode": "return items.map(item => ({\n taskId: item.json.ID,\n title: item.json.Title,\n description: item.json.Description,\n priority: item.json.Priority || 'Normal',\n dueDate: item.json['Due Date'],\n voiceInput: item.json['Voice Input'] || null,\n imageInput: item.json['Image Input'] || null,\n status: 'Processing'\n}));",
"language": "javaScript"
},
"typeVersion": 2
},
{
"id": "check-voice-input",
"name": "Check Voice Input",
"type": "n8n-nodes-base.if",
"position": [
900,
50
],
"parameters": {
"conditions": {
"options": {
"operator": {
"name": "filter.operator.notEmpty",
"value": "notEmpty"
},
"leftValue": "{{ $json.voiceInput }}",
"caseSensitive": false
}
}
},
"typeVersion": 2
},
{
"id": "transcribe-voice-groq",
"name": "Transcribe Voice (Groq)",
"type": "n8n-nodes-base.httpRequest",
"position": [
1100,
10
],
"parameters": {
"url": "https://api.groq.com/openai/v1/audio/transcriptions",
"method": "POST",
"headers": {
"Authorization": "Bearer {{ $env.GROQ_API_KEY }}"
},
"sendBody": true,
"contentType": "multipart-form-data",
"authentication": "genericCredentialType",
"bodyParameters": {
"parameters": [
{
"name": "file",
"value": "{{ $json.voiceInput }}"
},
{
"name": "model",
"value": "whisper-large-v3-turbo"
}
]
}
},
"typeVersion": 4.3
},
{
"id": "extract-voice-text",
"name": "Extract Transcription",
"type": "n8n-nodes-base.code",
"position": [
1300,
10
],
"parameters": {
"mode": "runOnceForEachItem",
"jsCode": "return items.map(item => ({\n ...item.json,\n voiceTranscription: item.json.text || item.json.transcription || 'Unable to transcribe'\n}));",
"language": "javaScript"
},
"typeVersion": 2
},
{
"id": "process-vision-input",
"name": "Process Vision Input",
"type": "n8n-nodes-base.if",
"position": [
900,
150
],
"parameters": {
"conditions": {
"options": {
"operator": {
"name": "filter.operator.notEmpty",
"value": "notEmpty"
},
"leftValue": "{{ $json.imageInput }}",
"caseSensitive": false
}
}
},
"typeVersion": 2
},
{
"id": "analyze-image-custom-api",
"name": "Analyze Image (Custom Vision)",
"type": "n8n-nodes-base.httpRequest",
"position": [
1100,
150
],
"parameters": {
"url": "{{ $env.CUSTOM_VISION_API_URL }}/analyze",
"body": {
"image": "{{ $json.imageInput }}",
"taskContext": "{{ $json.description }}"
},
"method": "POST",
"headers": {
"Authorization": "Bearer {{ $env.CUSTOM_API_KEY }}"
},
"sendBody": true,
"contentType": "json",
"authentication": "genericCredentialType"
},
"typeVersion": 4.3
},
{
"id": "merge-multimodal-inputs",
"name": "Merge Multimodal Inputs",
"type": "n8n-nodes-base.merge",
"position": [
1400,
100
],
"parameters": {
"mode": "merge"
},
"typeVersion": 3
},
{
"id": "retrieve-sop-from-qdrant",
"name": "Retrieve SOP from Qdrant",
"type": "@n8n/n8n-nodes-langchain.vectorStoreQdrant",
"position": [
1600,
100
],
"parameters": {
"mode": "load",
"topK": 3,
"prompt": "{{ $json.title + ' ' + ($json.description || '') + ' ' + ($json.voiceTranscription || '') }}"
},
"typeVersion": 1.3
},
{
"id": "embeddings-for-context",
"name": "Generate Embeddings",
"type": "@n8n/n8n-nodes-langchain.embeddingsOpenAi",
"position": [
1500,
50
],
"parameters": {
"model": "text-embedding-ada-002"
},
"typeVersion": 1.2
},
{
"id": "build-reasoning-prompt",
"name": "Build Reasoning Prompt",
"type": "n8n-nodes-base.code",
"position": [
1800,
100
],
"parameters": {
"mode": "runOnceForEachItem",
"jsCode": "return items.map(item => ({\n ...item.json,\n reasoningPrompt: `Task: ${item.json.title}\\nDescription: ${item.json.description}\\nVoice Input: ${item.json.voiceTranscription || 'None'}\\nImage Analysis: ${item.json.imageAnalysis || 'None'}\\nStandard Operating Procedure Context: ${item.json.sopContext || 'No SOP found'}\\n\\nPlease analyze this task thoroughly and provide a detailed action plan.`\n}));",
"language": "javaScript"
},
"typeVersion": 2
},
{
"id": "route-to-custom-llm",
"name": "Route to Custom LLM",
"type": "n8n-nodes-base.httpRequest",
"position": [
2000,
100
],
"parameters": {
"url": "{{ $env.CUSTOM_LLM_BASE_URL }}/v1/chat/completions",
"body": {
"model": "{{ $env.CUSTOM_LLM_MODEL || 'gpt-4-turbo' }}",
"messages": [
{
"role": "system",
"content": "You are an autonomous AI agent that executes tasks with high-reasoning capability. You have access to standard operating procedures and multimodal context. Provide detailed, actionable analysis."
},
{
"role": "user",
"content": "{{ $json.reasoningPrompt }}"
}
],
"max_tokens": 2000,
"temperature": 0.7
},
"method": "POST",
"headers": {
"Content-Type": "application/json",
"Authorization": "Bearer {{ $env.CUSTOM_LLM_API_KEY }}"
},
"sendBody": true,
"contentType": "json",
"authentication": "genericCredentialType"
},
"typeVersion": 4.3
},
{
"id": "execute-task-action",
"name": "Execute Task Action",
"type": "n8n-nodes-base.code",
"position": [
2200,
100
],
"parameters": {
"mode": "runOnceForEachItem",
"jsCode": "return items.map(item => {\n const reasoning = item.json.choices?.[0]?.message?.content || item.json.content || '';\n return {\n ...item.json,\n taskId: item.json.taskId,\n aiReasoning: reasoning,\n executionStatus: 'ready',\n executedAt: new Date().toISOString()\n };\n});",
"language": "javaScript"
},
"typeVersion": 2
},
{
"id": "check-execution-success",
"name": "Check Execution Result",
"type": "n8n-nodes-base.if",
"position": [
2400,
100
],
"parameters": {
"conditions": {
"options": {
"operator": {
"name": "filter.operator.notEmpty",
"value": "notEmpty"
},
"leftValue": "{{ $json.aiReasoning }}",
"caseSensitive": false
}
}
},
"typeVersion": 2
},
{
"id": "update-task-status-success",
"name": "Update Task - Success",
"type": "n8n-nodes-base.notion",
"position": [
2550,
50
],
"parameters": {
"pageId": "{{ $json.taskId }}",
"resource": "databasePage",
"operation": "update",
"properties": {
"Result": "{{ $json.aiReasoning }}",
"Status": "Completed",
"Completed At": "{{ new Date().toISOString() }}"
}
},
"typeVersion": 2.2
},
{
"id": "handle-execution-error",
"name": "Handle Error - Request Help",
"type": "n8n-nodes-base.if",
"position": [
2550,
150
],
"parameters": {
"conditions": {
"options": {
"operator": {
"name": "filter.operator.equals",
"value": "=="
},
"leftValue": "{{ $json.executionStatus }}",
"rightValue": "error",
"caseSensitive": false
}
}
},
"typeVersion": 2
},
{
"id": "send-slack-alert",
"name": "Send Slack Alert",
"type": "n8n-nodes-base.slack",
"position": [
2700,
50
],
"parameters": {
"text": "🤖 AI Agent Task Completed\n\n*Task:* {{ $json.title }}\n*Result:* Task has been processed successfully\n\n_Reasoning applied:_\n{{ $json.aiReasoning }}",
"channel": "{{ $env.SLACK_CHANNEL || '#ai-agent-logs' }}",
"resource": "message",
"operation": "create"
},
"typeVersion": 2.3
},
{
"id": "send-discord-alert",
"name": "Send Discord Alert",
"type": "n8n-nodes-base.discord",
"position": [
2700,
150
],
"parameters": {
"text": "⚠️ AI Agent Help Request\n\nTask: {{ $json.title }}\nError: Unable to complete task\nReason: {{ $json.errorMessage || 'Unknown' }}\n\nPlease review and provide guidance.",
"guildId": "{{ $env.DISCORD_GUILD_ID }}",
"resource": "message",
"channelId": "{{ $env.DISCORD_CHANNEL_ID }}",
"operation": "send"
},
"typeVersion": 2
},
{
"id": "log-execution-metrics",
"name": "Log Execution Metrics",
"type": "n8n-nodes-base.code",
"position": [
2800,
100
],
"parameters": {
"mode": "runOnceForAllItems",
"jsCode": "const metrics = {\n timestamp: new Date().toISOString(),\n tasksProcessed: $input.all().length,\n successCount: $input.all().filter(t => t.json.executionStatus === 'ready').length,\n failureCount: $input.all().filter(t => t.json.executionStatus === 'error').length,\n avgProcessingTime: 0\n};\n\nreturn [{ json: metrics }];",
"language": "javaScript"
},
"typeVersion": 2
}
],
"connections": {
"Send Slack Alert": {
"main": [
[
{
"node": "Log Execution Metrics",
"type": "main",
"index": 0
}
]
]
},
"Check Image Input": {
"main": [
[
{
"node": "Analyze Image (Custom Vision)",
"type": "main",
"index": 0
}
],
[
{
"node": "Merge Multimodal Inputs",
"type": "main",
"index": 1
}
]
]
},
"Check Voice Input": {
"main": [
[
{
"node": "Transcribe Voice (Groq)",
"type": "main",
"index": 0
}
],
[
{
"node": "Check Image Input",
"type": "main",
"index": 0
}
]
]
},
"Fetch Notion Tasks": {
"main": [
[
{
"node": "Filter Pending Tasks",
"type": "main",
"index": 0
}
]
]
},
"Prepare Task Input": {
"main": [
[
{
"node": "Check Voice Input",
"type": "main",
"index": 0
}
]
]
},
"Send Discord Alert": {
"main": [
[
{
"node": "Log Execution Metrics",
"type": "main",
"index": 0
}
]
]
},
"Execute Task Action": {
"main": [
[
{
"node": "Check Execution Result",
"type": "main",
"index": 0
}
]
]
},
"Route to Custom LLM": {
"main": [
[
{
"node": "Execute Task Action",
"type": "main",
"index": 0
}
]
]
},
"Filter Pending Tasks": {
"main": [
[
{
"node": "Prepare Task Input",
"type": "main",
"index": 0
}
]
]
},
"Extract Transcription": {
"main": [
[
{
"node": "Merge Multimodal Inputs",
"type": "main",
"index": 0
}
]
]
},
"Update Task - Success": {
"main": [
[
{
"node": "Send Slack Alert",
"type": "main",
"index": 0
}
]
]
},
"Build Reasoning Prompt": {
"main": [
[
{
"node": "Route to Custom LLM",
"type": "main",
"index": 0
}
]
]
},
"Check Execution Result": {
"main": [
[
{
"node": "Update Task - Success",
"type": "main",
"index": 0
}
],
[
{
"node": "Handle Error - Request Help",
"type": "main",
"index": 0
}
]
]
},
"Merge Multimodal Inputs": {
"main": [
[
{
"node": "Retrieve SOP from Qdrant",
"type": "main",
"index": 0
}
]
]
},
"Transcribe Voice (Groq)": {
"main": [
[
{
"node": "Extract Transcription",
"type": "main",
"index": 0
}
]
]
},
"Retrieve SOP from Qdrant": {
"main": [
[
{
"node": "Build Reasoning Prompt",
"type": "main",
"index": 0
}
]
]
},
"Schedule Trigger (15min)": {
"main": [
[
{
"node": "Fetch Notion Tasks",
"type": "main",
"index": 0
}
]
]
},
"Handle Error - Request Help": {
"main": [
[
{
"node": "Send Discord Alert",
"type": "main",
"index": 0
}
],
[
{
"node": "Log Execution Metrics",
"type": "main",
"index": 0
}
]
]
},
"Analyze Image (Custom Vision)": {
"main": [
[
{
"node": "Merge Multimodal Inputs",
"type": "main",
"index": 1
}
]
]
}
}
}