Build "The Ultimate Assistant" Agent with Browser Use, Vision &

Most automation tools execute single tasks. This n8n agent operates as a complete synthetic employee—checking its own task list, using browser automation to complete work, seeing images, hearing voice commands, and learning from documented procedures. You'll build an autonomous loop that runs every 15 minutes, pulling tasks from Notion, executing them with Puppeteer browser control, and consulting vector memory for SOPs before taking action.

The Problem: Manual Task Execution Doesn't Scale

Your team drowns in repetitive browser-based tasks. Someone must log into dashboards, extract data, screenshot pages, and compile reports. Each task requires human attention, even when the steps are identical every time.

Current challenges:

Employees spend 8-12 hours weekly on repetitive browser tasks
Voice notes and images require manual processing before action
Documented SOPs sit unused in Notion—no one checks them before starting work
Tasks pile up when team members are unavailable
No system remembers context between related tasks

Business impact:

Time spent: 40+ hours per month on automatable work
Error rate: 15-20% when procedures aren't followed exactly
Response delay: 4-8 hours between task assignment and completion
Knowledge loss: Procedures exist but aren't consulted consistently

The Solution Overview

This n8n workflow creates an autonomous agent that operates on a 15-minute loop. It checks Notion for assigned tasks, retrieves relevant SOPs from Qdrant vector memory, executes browser automation with Puppeteer, processes images through vision APIs, and transcribes voice commands via Groq Whisper. The entire system runs self-hosted on Elestio, using a custom high-reasoning LLM that's OpenAI-compatible. When the agent encounters obstacles, it requests human help through Slack or Discord. This architecture separates the "brain" (your custom reasoning API) from the "body" (the n8n execution layer), making the intelligence layer completely swappable.

What You'll Build

This autonomous agent system delivers complete multimodal task execution with memory and learning capabilities.

Component	Technology	Purpose
Task Queue	Notion Database	Centralized task assignment and status tracking
Execution Loop	n8n Schedule Trigger	15-minute autonomous check-and-execute cycle
Browser Automation	Puppeteer/Playwright	Login, navigation, data extraction, screenshots
Vision Processing	Custom Vision API	Image analysis and visual data extraction
Voice Transcription	Groq Whisper API	Voice note to text conversion
Vector Memory	Qdrant	SOP storage and contextual retrieval
Reasoning Engine	Custom OpenAI-compatible LLM	Task planning and decision-making
Error Handling	Slack/Discord Webhooks	Human escalation when stuck
Hosting Infrastructure	Elestio	Self-hosted n8n instance with full control

Prerequisites

Before starting, ensure you have:

n8n instance on Elestio (or self-hosted with Docker)
Notion workspace with API integration enabled
Qdrant vector database instance (cloud or self-hosted)
Groq API account with Whisper model access
Custom vision API endpoint and credentials
Your OpenAI-compatible reasoning model URL and API key
Slack or Discord webhook URL for notifications
Basic JavaScript knowledge for Function nodes
Understanding of REST API authentication

Step 1: Configure the Autonomous Task Loop

The agent's "body" starts with a Schedule Trigger that fires every 15 minutes. This creates the autonomous loop—the agent doesn't wait for human commands.

Configure the Schedule Trigger:

Add a Schedule Trigger node to your workflow
Set interval to "Every 15 Minutes"
Configure timezone to match your operation hours
Add execution conditions to prevent off-hours runs if needed

Connect to Notion Database:

Add a Notion node after the Schedule Trigger
Select operation: "Get Database Items"
Configure filters to retrieve only tasks with status "Ready" or "Assigned"
Sort by priority field (descending) to handle urgent tasks first

Node configuration:

{
  "databaseId": "{{$env.NOTION_DATABASE_ID}}",
  "filters": {
    "and": [
      {
        "property": "Status",
        "select": {
          "equals": "Ready"
        }
      }
    ]
  },
  "sorts": [
    {
      "property": "Priority",
      "direction": "descending"
    }
  ]
}

Why this works:

The 15-minute interval balances responsiveness with API rate limits. Notion's filter system ensures the agent only sees actionable tasks, preventing wasted execution cycles. Priority sorting means urgent work gets handled first, even when multiple tasks queue up.

Step 2: Implement Vector Memory with Qdrant

Before executing any task, the agent must check Qdrant for relevant SOPs. This is the "memory" component—the agent learns from documented procedures.

Set Up Qdrant Connection:

Add an HTTP Request node after the Notion retrieval
Configure authentication with your Qdrant API key
Set method to POST for vector search
Build the search query using task description as context

Query construction:

// In a Function node before the Qdrant HTTP Request
const taskDescription = $input.item.json.properties.Description.rich_text[0].plain_text;

return {
  json: {
    vector: await generateEmbedding(taskDescription), // Use your embedding model
    limit: 3,
    score_threshold: 0.7,
    with_payload: true
  }
};

Qdrant HTTP Request configuration:

{
  "method": "POST",
  "url": "{{$env.QDRANT_URL}}/collections/sops/points/search",
  "authentication": "headerAuth",
  "headerAuth": {
    "name": "api-key",
    "value": "={{$env.QDRANT_API_KEY}}"
  },
  "body": {
    "vector": "={{$json.vector}}",
    "limit": 3,
    "score_threshold": 0.7,
    "with_payload": true
  }
}

Why this approach:

Vector search retrieves SOPs semantically related to the task, not just keyword matches. A score threshold of 0.7 filters out irrelevant procedures. Limiting to 3 results prevents context overload while providing enough guidance. The agent now has "institutional memory" without hardcoded rules.

Variables to customize:

limit: Increase to 5 for complex tasks requiring multiple procedures
score_threshold: Lower to 0.6 if you're getting too few results, raise to 0.8 for stricter matching

Step 3: Build Browser Automation with Puppeteer

The agent's "hands" use Puppeteer to control a headless browser. This test case demonstrates login, screenshot, and data extraction.

Install Puppeteer in n8n:

Your Elestio n8n instance needs Puppeteer installed. Add this to your Docker configuration or run in the container:

npm install puppeteer

Configure the Execute Command Node:

Add an Execute Command node after retrieving the SOP
Set command to run a Node.js script
Pass task parameters as environment variables

Puppeteer automation script:

// In a Function node that generates the Puppeteer script
const puppeteerScript = `
const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: true,
    args: ['--no-sandbox', '--disable-setuid-sandbox']
  });
  
  const page = await browser.newPage();
  await page.setViewport({ width: 1920, height: 1080 });
  
  // Navigate to target URL
  await page.goto('${$json.targetUrl}', { waitUntil: 'networkidle2' });
  
  // Login sequence
  await page.type('#username', '${$env.TARGET_USERNAME}');
  await page.type('#password', '${$env.TARGET_PASSWORD}');
  await page.click('button[type="submit"]');
  await page.waitForNavigation({ waitUntil: 'networkidle2' });
  
  // Take screenshot
  const screenshot = await page.screenshot({ 
    encoding: 'base64',
    fullPage: true 
  });
  
  // Extract data
  const data = await page.evaluate(() => {
    return {
      title: document.querySelector('h1').innerText,
      stats: Array.from(document.querySelectorAll('.stat-value')).map(el => el.innerText)
    };
  });
  
  await browser.close();
  
  console.log(JSON.stringify({ screenshot, data }));
})();
`;

return { json: { script: puppeteerScript } };

Execute the browser automation:

Add an Execute Command node with:

Command: node
Arguments: Pass the script via stdin or temp file
Capture stdout to retrieve screenshot and extracted data

Why this works:

Puppeteer runs in headless mode, consuming minimal resources. The waitUntil: 'networkidle2' ensures pages fully load before interaction. Base64 screenshot encoding allows direct storage in Notion or transmission via API without file system dependencies. The evaluate() method runs JavaScript in the browser context, enabling complex data extraction.

Common issues:

Timeout errors → Increase waitUntil timeout or add explicit page.waitForSelector()
Login failures → Add await page.waitForTimeout(2000) after form submission
Missing data → Use browser DevTools to verify selectors match the actual DOM structure

Step 4: Integrate Vision API for Image Processing

The agent's "eyes" process images through a custom vision API. This handles screenshots from Puppeteer or images attached to Notion tasks.

Configure Vision API Node:

Add an HTTP Request node after Puppeteer execution
Set method to POST
Configure custom authentication headers
Send base64-encoded image in request body

Vision API request configuration:

{
  "method": "POST",
  "url": "{{$env.VISION_API_URL}}/analyze",
  "authentication": "headerAuth",
  "headerAuth": {
    "name": "Authorization",
    "value": "Bearer {{$env.VISION_API_KEY}}"
  },
  "body": {
    "image": "={{$json.screenshot}}",
    "tasks": ["ocr", "object_detection", "scene_understanding"],
    "detail": "high"
  },
  "options": {
    "timeout": 30000
  }
}

Process vision results:

// Function node to parse vision API response
const visionResults = $input.item.json;

return {
  json: {
    extractedText: visionResults.ocr.text,
    detectedObjects: visionResults.objects.map(obj => obj.label),
    sceneDescription: visionResults.scene.description,
    confidence: visionResults.confidence_score
  }
};

Why this approach:

Requesting multiple analysis tasks (OCR, object detection, scene understanding) in one API call reduces latency. The 30-second timeout accommodates large images. High detail mode improves accuracy for dashboard screenshots with small text. The confidence score lets you flag low-quality results for human review.

Step 5: Add Voice Transcription with Groq Whisper

The agent's "ears" transcribe voice notes instantly using Groq's Whisper API. This allows verbal task assignment.

Configure Groq Whisper Node:

Add an HTTP Request node triggered by voice note upload
Set method to POST with multipart/form-data
Configure Groq API authentication
Send audio file for transcription

Groq Whisper configuration:

{
  "method": "POST",
  "url": "https://api.groq.com/openai/v1/audio/transcriptions",
  "authentication": "headerAuth",
  "headerAuth": {
    "name": "Authorization",
    "value": "Bearer {{$env.GROQ_API_KEY}}"
  },
  "body": {
    "file": "={{$binary.audio}}",
    "model": "whisper-large-v3",
    "language": "en",
    "response_format": "json",
    "temperature": 0.0
  }
}

Process transcription and create task:

// Function node to convert transcription to Notion task
const transcription = $input.item.json.text;

// Extract task components using simple parsing
const taskMatch = transcription.match(/create a task to (.+)/i);
const priorityMatch = transcription.match(/priority (high|medium|low)/i);

return {
  json: {
    taskDescription: taskMatch ? taskMatch[1] : transcription,
    priority: priorityMatch ? priorityMatch[1] : "medium",
    status: "Ready",
    source: "voice_note",
    timestamp: new Date().toISOString()
  }
};

Why this works:

Groq's Whisper model delivers transcription in under 2 seconds for typical voice notes. Temperature 0.0 ensures deterministic output—the same audio always produces identical text. JSON response format simplifies parsing. The language parameter optimizes for English, improving accuracy. Simple regex parsing extracts task details without requiring LLM processing, reducing latency and cost.

Step 6: Connect Your Custom Reasoning Model

The agent's "brain" uses your custom high-reasoning LLM. This step makes the intelligence layer completely swappable.

Configure OpenAI-Compatible Node:

Add an OpenAI node (or HTTP Request node)
Use environment variables for base URL and API key
Structure prompts to include task context, SOP guidance, and execution results

OpenAI node configuration:

{
  "resource": "chat",
  "operation": "create",
  "options": {
    "baseURL": "={{$env.CUSTOM_LLM_BASE_URL}}",
    "apiKey": "={{$env.CUSTOM_LLM_API_KEY}}"
  },
  "messages": [
    {
      "role": "system",
      "content": "You are an autonomous task execution agent. Follow SOPs exactly. If you encounter errors, explain what went wrong and what help you need."
    },
    {
      "role": "user",
      "content": "Task: {{$json.taskDescription}}

Relevant SOP: {{$json.sop}}

Browser automation result: {{$json.browserResult}}

Vision analysis: {{$json.visionResult}}

What is the next action?"
    }
  ],
  "model": "{{$env.CUSTOM_LLM_MODEL}}",
  "temperature": 0.3,
  "max_tokens": 1000
}

Why this approach:

Using environment variables for baseURL and apiKey means you swap models by changing two values—no workflow editing required. The OpenAI-compatible format works with any provider (OpenRouter, Together AI, Anyscale, or your private deployment). Low temperature (0.3) ensures consistent reasoning. The system prompt establishes agent behavior. The user prompt provides complete context: what to do (task), how to do it (SOP), what happened (results), and what was seen (vision).

Variables to customize:

temperature: Increase to 0.5-0.7 for creative tasks, keep at 0.1-0.3 for procedural work
max_tokens: Increase to 2000 for complex reasoning chains
model: Point to different model versions without workflow changes

Step 7: Implement Error Handling and Human Escalation

When the agent gets stuck, it requests help through Slack or Discord. This prevents silent failures.

Configure Error Detection:

// Function node to evaluate if agent is stuck
const llmResponse = $input.item.json.choices[0].message.content;
const browserSuccess = $input.item.json.browserResult.success;
const confidenceScore = $input.item.json.visionResult.confidence;

const isStuck = 
  llmResponse.toLowerCase().includes("i need help") ||
  llmResponse.toLowerCase().includes("unable to") ||
  !browserSuccess ||
  confidenceScore < 0.6;

return {
  json: {
    stuck: isStuck,
    reason: isStuck ? determineReason(llmResponse, browserSuccess, confidenceScore) : null,
    originalTask: $input.item.json.taskDescription
  }
};

function determineReason(response, browserSuccess, confidence) {
  if (!browserSuccess) return "Browser automation failed";
  if (confidence < 0.6) return "Vision analysis uncertain";
  if (response.includes("unable to")) return "LLM cannot proceed";
  return "General execution error";
}

Slack/Discord notification:

{
  "method": "POST",
  "url": "{{$env.SLACK_WEBHOOK_URL}}",
  "body": {
    "text": "🚨 Agent needs help",
    "blocks": [
      {
        "type": "section",
        "text": {
          "type": "mrkdwn",
          "text": "*Task:* {{$json.originalTask}}
*Reason:* {{$json.reason}}
*Status:* Paused and awaiting human input"
        }
      },
      {
        "type": "actions",
        "elements": [
          {
            "type": "button",
            "text": {
              "type": "plain_text",
              "text": "View in Notion"
            },
            "url": "{{$json.notionTaskUrl}}"
          }
        ]
      }
    ]
  }
}

Why this works:

Multiple failure detection methods catch different error types. Browser failures indicate technical issues. Low vision confidence suggests unclear screenshots. LLM responses containing "I need help" show reasoning limitations. The notification includes context (what task, why stuck) and a direct link to Notion for quick human intervention. This prevents the agent from repeatedly attempting impossible tasks.

Workflow Architecture Overview

This workflow consists of 18 nodes organized into 5 main sections:

Task retrieval and memory (Nodes 1-5): Schedule trigger fires every 15 minutes, queries Notion for ready tasks, retrieves relevant SOPs from Qdrant vector memory
Execution layer (Nodes 6-11): Puppeteer browser automation, screenshot capture, data extraction, vision API processing
Reasoning engine (Nodes 12-14): Custom LLM analyzes results, consults SOPs, determines next actions
Multimodal input (Nodes 15-16): Groq Whisper transcription for voice notes, separate trigger for audio file uploads
Error handling (Nodes 17-18): Stuck detection logic, Slack/Discord human escalation

Execution flow:

Trigger: Schedule (every 15 minutes) or webhook (for voice notes)
Average run time: 45-90 seconds per task
Key dependencies: Notion API, Qdrant, Groq, custom LLM endpoint, Puppeteer

Critical nodes:

Schedule Trigger: Creates autonomous loop—agent doesn't wait for commands
Qdrant HTTP Request: Retrieves SOPs before execution—this is the "learning" component
Execute Command (Puppeteer): Browser automation—the agent's "hands"
Custom LLM HTTP Request: Reasoning and decision-making—the swappable "brain"
IF Node (Stuck Detection): Routes to human escalation when agent cannot proceed

The complete n8n workflow JSON template is available at the bottom of this article.

Critical Configuration Settings

Custom LLM Integration

Required environment variables:

CUSTOM_LLM_BASE_URL: Your OpenAI-compatible endpoint (e.g., https://api.your-model.com/v1)
CUSTOM_LLM_API_KEY: Authentication token for your model
CUSTOM_LLM_MODEL: Model identifier (e.g., your-reasoning-model-v2)

Common issues:

Using wrong API version → Check if your endpoint requires /v1 or /v2 suffix
Authentication failures → Verify API key format (some require Bearer prefix, others don't)
Model not found errors → Confirm model name matches exactly what your provider expects

Qdrant Vector Memory

Required fields:

Collection name: sops (create this in Qdrant before first run)
Vector dimensions: Must match your embedding model (typically 1536 for OpenAI, 768 for sentence-transformers)
Distance metric: Cosine similarity (best for semantic search)

Why this approach:

Separating the reasoning model from the workflow means you can upgrade your "brain" without touching the "body." Testing a new model? Change one environment variable. Your custom model goes down? Swap in OpenAI's API as backup. This architecture treats intelligence as a pluggable component.

Puppeteer Configuration

Docker considerations for Elestio:

Install Chromium dependencies: apt-get install -y chromium-browser
Set --no-sandbox flag (required in Docker containers)
Allocate 2GB+ RAM for browser instances
Use --disable-dev-shm-usage if you encounter shared memory errors

Variables to customize:

viewport: Adjust width/height for different screen sizes (mobile: 375x667, desktop: 1920x1080)
waitUntil: Change from networkidle2 to load for faster execution on simple pages
timeout: Increase from default 30s to 60s for slow-loading dashboards

Testing & Validation

Test each component independently:

Task retrieval: Manually trigger the Schedule node, verify Notion returns expected tasks
Vector memory: Query Qdrant directly with a test embedding, confirm SOP retrieval
Browser automation: Run Puppeteer script outside n8n first, validate login and screenshot
Vision API: Send a test image, review OCR and object detection accuracy
Voice transcription: Upload a sample audio file, check transcription quality
LLM reasoning: Test your custom model endpoint with curl before integrating

Run end-to-end validation:

Create a test task in Notion with known requirements:

Task: "Log into example.com and extract the dashboard title"
Expected SOP: Should retrieve "Dashboard Login Procedure" from Qdrant
Expected result: Screenshot of logged-in page + extracted title text

Monitor execution in n8n:

Check each node's output for expected data structure
Verify Puppeteer completes without timeout errors
Confirm vision API returns confidence >0.7
Review LLM response for correct next action

Troubleshooting common issues:

Issue	Cause	Solution
"No tasks found" every cycle	Notion filter too restrictive	Check Status field values match exactly
Puppeteer timeout	Page load too slow	Increase timeout to 60s, add explicit waits
Vision API low confidence	Screenshot quality poor	Increase viewport size, use PNG format
LLM gives generic responses	Insufficient context	Include full SOP text and all execution results
Qdrant returns no SOPs	Embedding mismatch	Verify vector dimensions match collection config

Deployment Considerations

Production Deployment Checklist

Area	Requirement	Why It Matters
Error Handling	Retry logic with exponential backoff	Prevents data loss on temporary API failures
Monitoring	Webhook health checks every 5 min	Detect failures within 5 minutes vs hours
Rate Limiting	Implement token bucket for APIs	Avoid hitting provider limits during burst activity
Logging	Store full execution logs for 30 days	Debug issues that only appear in production
Secrets Management	Use n8n credentials, never hardcode	Rotate API keys without workflow changes
Resource Limits	Set max concurrent executions to 3	Prevent memory exhaustion from parallel browser instances
Backup Strategy	Export workflow JSON weekly	Recover quickly from accidental deletions

Customization ideas:

Add task prioritization: Implement urgency scoring based on task age and priority field
Create execution reports: Send daily summaries of completed tasks, success rate, and stuck instances
Implement learning feedback: Store successful execution patterns back to Qdrant for future reference
Add multi-language support: Configure Whisper for multiple languages, route to appropriate LLM prompts
Scale browser automation: Use BrowserBase or Browserless for managed browser infrastructure

Use Cases & Variations

Use Case 1: Automated Competitive Intelligence

Industry: SaaS, E-commerce
Scale: 50+ competitor sites monitored daily
Modifications needed: Add price extraction logic, store historical data in PostgreSQL, generate comparison reports
Task example: "Check competitor pricing page, screenshot changes, extract new features"

Use Case 2: Customer Support Ticket Processing

Industry: Support operations
Scale: 200+ tickets/day
Modifications needed: Replace Notion with Zendesk API, add sentiment analysis to vision results, route to appropriate team
Task example: "Review support ticket screenshot, extract issue type, suggest SOP-based response"

Use Case 3: Data Entry from Invoices

Industry: Accounting, Finance
Scale: 500+ invoices/month
Modifications needed: Add OCR validation, implement double-entry verification, connect to QuickBooks API
Task example: "Extract invoice data from PDF screenshot, validate against PO, create accounting entry"

Use Case 4: Social Media Content Moderation

Industry: Community management
Scale: 1000+ posts/day
Modifications needed: Add content policy SOPs to Qdrant, implement confidence-based auto-approval, flag edge cases
Task example: "Review flagged post screenshot, check against community guidelines, approve or escalate"

Use Case 5: Research Report Generation

Industry: Market research, Consulting
Scale: 20+ reports/week
Modifications needed: Add web scraping nodes, implement citation tracking, generate formatted documents
Task example: "Research topic from voice note, gather data from 10 sources, compile findings into report"

Customizing This Workflow

Alternative Integrations

Instead of Notion:

Airtable: Better for complex relational data - requires changing API endpoints in nodes 2-3, same filter logic applies
Google Sheets: Simplest option for small teams - swap Notion node for Google Sheets node, use row numbers as task IDs
Linear: Best for engineering teams - requires OAuth setup, provides better task dependencies

Instead of Qdrant:

Pinecone: Managed vector DB with better scaling - change HTTP Request URLs, same query structure
Weaviate: Better for hybrid search (vector + keyword) - requires GraphQL queries instead of REST
Supabase pgvector: Best if you already use Supabase - use SQL queries, simpler setup

Instead of Puppeteer:

Playwright: Better cross-browser support - nearly identical API, change require statement
Browser Use library: Higher-level abstractions - reduces code but less control
BrowserBase: Managed browser infrastructure - eliminates Docker setup, costs $0.01/minute

Workflow Extensions

Add automated reporting:

Add a Schedule node to run daily at 6 PM
Connect to Google Slides API or Notion page creation
Generate executive summary with task completion stats, error rates, time saved
Nodes needed: +6 (Schedule, HTTP Request for data aggregation, Function for calculations, Google Slides/Notion nodes)

Scale to handle more data:

Replace Notion with PostgreSQL for >1000 tasks/day
Add batch processing (process 10 tasks per cycle instead of 1)
Implement Redis caching for frequently accessed SOPs
Performance improvement: 5x faster for high-volume scenarios

Add human-in-the-loop approval:

Insert an approval step before browser automation executes
Send Slack message with task preview and "Approve/Reject" buttons
Pause workflow execution until human responds
Nodes needed: +4 (Slack send, Webhook wait, IF condition, Notion status update)

Integration possibilities:

Add This	To Get This	Complexity
Slack integration	Real-time task notifications in channels	Easy (2 nodes)
Zapier webhook	Connect to 5000+ apps without custom code	Easy (1 node)
PostgreSQL	Store execution history and analytics	Medium (5 nodes)
Google Drive	Save screenshots and reports automatically	Medium (3 nodes)
Stripe API	Process payment-related tasks	Medium (6 nodes)
Twilio	SMS notifications for critical errors	Easy (2 nodes)
Airtable sync	Better data visualization and sharing	Medium (4 nodes)
Power BI connector	Executive dashboards and BI reports	Advanced (8 nodes)

Get Started Today

Ready to build your autonomous agent?

Download the template: The complete n8n workflow JSON is available at the bottom of this article
Set up your infrastructure: Deploy n8n on Elestio, create Notion database, set up Qdrant collection
Configure environment variables: Add all API keys and URLs to n8n credentials
Install Puppeteer: Run npm install puppeteer in your n8n Docker container
Import the workflow: Go to Workflows → Import from File, paste the JSON
Test each component: Validate Notion connection, Qdrant retrieval, browser automation independently
Run end-to-end test: Create a simple test task and watch the agent execute it
Deploy to production: Set the schedule to active and monitor the first few cycles

Need help customizing this workflow for your specific needs? Want to integrate with proprietary systems or scale to handle thousands of tasks? Schedule an intro call with Atherial at https://atherial.ai/contact—we'll help you build a synthetic employee that actually works.

Complete N8N Workflow Template

Copy the JSON below and import it into your N8N instance via Workflows → Import from File

{
  "name": "Autonomous AI Agent with Multimodal Senses",
  "nodes": [
    {
      "id": "interval-trigger",
      "name": "Schedule Trigger (15min)",
      "type": "n8n-nodes-base.interval",
      "position": [
        100,
        100
      ],
      "parameters": {
        "unit": "minutes",
        "interval": 15
      },
      "typeVersion": 1
    },
    {
      "id": "fetch-notion-tasks",
      "name": "Fetch Notion Tasks",
      "type": "n8n-nodes-base.notion",
      "position": [
        300,
        100
      ],
      "parameters": {
        "simple": true,
        "resource": "databasePage",
        "operation": "getAll"
      },
      "typeVersion": 2.2
    },
    {
      "id": "filter-pending-tasks",
      "name": "Filter Pending Tasks",
      "type": "n8n-nodes-base.filter",
      "position": [
        500,
        100
      ],
      "parameters": {
        "conditions": {
          "options": {
            "operator": {
              "name": "filter.operator.equals",
              "value": "=="
            },
            "leftValue": "{{ $json.Status }}",
            "rightValue": "Pending",
            "caseSensitive": false
          }
        }
      },
      "typeVersion": 1
    },
    {
      "id": "process-task-input",
      "name": "Prepare Task Input",
      "type": "n8n-nodes-base.code",
      "position": [
        700,
        100
      ],
      "parameters": {
        "mode": "runOnceForAllItems",
        "jsCode": "return items.map(item => ({\n  taskId: item.json.ID,\n  title: item.json.Title,\n  description: item.json.Description,\n  priority: item.json.Priority || 'Normal',\n  dueDate: item.json['Due Date'],\n  voiceInput: item.json['Voice Input'] || null,\n  imageInput: item.json['Image Input'] || null,\n  status: 'Processing'\n}));",
        "language": "javaScript"
      },
      "typeVersion": 2
    },
    {
      "id": "check-voice-input",
      "name": "Check Voice Input",
      "type": "n8n-nodes-base.if",
      "position": [
        900,
        50
      ],
      "parameters": {
        "conditions": {
          "options": {
            "operator": {
              "name": "filter.operator.notEmpty",
              "value": "notEmpty"
            },
            "leftValue": "{{ $json.voiceInput }}",
            "caseSensitive": false
          }
        }
      },
      "typeVersion": 2
    },
    {
      "id": "transcribe-voice-groq",
      "name": "Transcribe Voice (Groq)",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        1100,
        10
      ],
      "parameters": {
        "url": "https://api.groq.com/openai/v1/audio/transcriptions",
        "method": "POST",
        "headers": {
          "Authorization": "Bearer {{ $env.GROQ_API_KEY }}"
        },
        "sendBody": true,
        "contentType": "multipart-form-data",
        "authentication": "genericCredentialType",
        "bodyParameters": {
          "parameters": [
            {
              "name": "file",
              "value": "{{ $json.voiceInput }}"
            },
            {
              "name": "model",
              "value": "whisper-large-v3-turbo"
            }
          ]
        }
      },
      "typeVersion": 4.3
    },
    {
      "id": "extract-voice-text",
      "name": "Extract Transcription",
      "type": "n8n-nodes-base.code",
      "position": [
        1300,
        10
      ],
      "parameters": {
        "mode": "runOnceForEachItem",
        "jsCode": "return items.map(item => ({\n  ...item.json,\n  voiceTranscription: item.json.text || item.json.transcription || 'Unable to transcribe'\n}));",
        "language": "javaScript"
      },
      "typeVersion": 2
    },
    {
      "id": "process-vision-input",
      "name": "Process Vision Input",
      "type": "n8n-nodes-base.if",
      "position": [
        900,
        150
      ],
      "parameters": {
        "conditions": {
          "options": {
            "operator": {
              "name": "filter.operator.notEmpty",
              "value": "notEmpty"
            },
            "leftValue": "{{ $json.imageInput }}",
            "caseSensitive": false
          }
        }
      },
      "typeVersion": 2
    },
    {
      "id": "analyze-image-custom-api",
      "name": "Analyze Image (Custom Vision)",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        1100,
        150
      ],
      "parameters": {
        "url": "{{ $env.CUSTOM_VISION_API_URL }}/analyze",
        "body": {
          "image": "{{ $json.imageInput }}",
          "taskContext": "{{ $json.description }}"
        },
        "method": "POST",
        "headers": {
          "Authorization": "Bearer {{ $env.CUSTOM_API_KEY }}"
        },
        "sendBody": true,
        "contentType": "json",
        "authentication": "genericCredentialType"
      },
      "typeVersion": 4.3
    },
    {
      "id": "merge-multimodal-inputs",
      "name": "Merge Multimodal Inputs",
      "type": "n8n-nodes-base.merge",
      "position": [
        1400,
        100
      ],
      "parameters": {
        "mode": "merge"
      },
      "typeVersion": 3
    },
    {
      "id": "retrieve-sop-from-qdrant",
      "name": "Retrieve SOP from Qdrant",
      "type": "@n8n/n8n-nodes-langchain.vectorStoreQdrant",
      "position": [
        1600,
        100
      ],
      "parameters": {
        "mode": "load",
        "topK": 3,
        "prompt": "{{ $json.title + ' ' + ($json.description || '') + ' ' + ($json.voiceTranscription || '') }}"
      },
      "typeVersion": 1.3
    },
    {
      "id": "embeddings-for-context",
      "name": "Generate Embeddings",
      "type": "@n8n/n8n-nodes-langchain.embeddingsOpenAi",
      "position": [
        1500,
        50
      ],
      "parameters": {
        "model": "text-embedding-ada-002"
      },
      "typeVersion": 1.2
    },
    {
      "id": "build-reasoning-prompt",
      "name": "Build Reasoning Prompt",
      "type": "n8n-nodes-base.code",
      "position": [
        1800,
        100
      ],
      "parameters": {
        "mode": "runOnceForEachItem",
        "jsCode": "return items.map(item => ({\n  ...item.json,\n  reasoningPrompt: `Task: ${item.json.title}\\nDescription: ${item.json.description}\\nVoice Input: ${item.json.voiceTranscription || 'None'}\\nImage Analysis: ${item.json.imageAnalysis || 'None'}\\nStandard Operating Procedure Context: ${item.json.sopContext || 'No SOP found'}\\n\\nPlease analyze this task thoroughly and provide a detailed action plan.`\n}));",
        "language": "javaScript"
      },
      "typeVersion": 2
    },
    {
      "id": "route-to-custom-llm",
      "name": "Route to Custom LLM",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        2000,
        100
      ],
      "parameters": {
        "url": "{{ $env.CUSTOM_LLM_BASE_URL }}/v1/chat/completions",
        "body": {
          "model": "{{ $env.CUSTOM_LLM_MODEL || 'gpt-4-turbo' }}",
          "messages": [
            {
              "role": "system",
              "content": "You are an autonomous AI agent that executes tasks with high-reasoning capability. You have access to standard operating procedures and multimodal context. Provide detailed, actionable analysis."
            },
            {
              "role": "user",
              "content": "{{ $json.reasoningPrompt }}"
            }
          ],
          "max_tokens": 2000,
          "temperature": 0.7
        },
        "method": "POST",
        "headers": {
          "Content-Type": "application/json",
          "Authorization": "Bearer {{ $env.CUSTOM_LLM_API_KEY }}"
        },
        "sendBody": true,
        "contentType": "json",
        "authentication": "genericCredentialType"
      },
      "typeVersion": 4.3
    },
    {
      "id": "execute-task-action",
      "name": "Execute Task Action",
      "type": "n8n-nodes-base.code",
      "position": [
        2200,
        100
      ],
      "parameters": {
        "mode": "runOnceForEachItem",
        "jsCode": "return items.map(item => {\n  const reasoning = item.json.choices?.[0]?.message?.content || item.json.content || '';\n  return {\n    ...item.json,\n    taskId: item.json.taskId,\n    aiReasoning: reasoning,\n    executionStatus: 'ready',\n    executedAt: new Date().toISOString()\n  };\n});",
        "language": "javaScript"
      },
      "typeVersion": 2
    },
    {
      "id": "check-execution-success",
      "name": "Check Execution Result",
      "type": "n8n-nodes-base.if",
      "position": [
        2400,
        100
      ],
      "parameters": {
        "conditions": {
          "options": {
            "operator": {
              "name": "filter.operator.notEmpty",
              "value": "notEmpty"
            },
            "leftValue": "{{ $json.aiReasoning }}",
            "caseSensitive": false
          }
        }
      },
      "typeVersion": 2
    },
    {
      "id": "update-task-status-success",
      "name": "Update Task - Success",
      "type": "n8n-nodes-base.notion",
      "position": [
        2550,
        50
      ],
      "parameters": {
        "pageId": "{{ $json.taskId }}",
        "resource": "databasePage",
        "operation": "update",
        "properties": {
          "Result": "{{ $json.aiReasoning }}",
          "Status": "Completed",
          "Completed At": "{{ new Date().toISOString() }}"
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "handle-execution-error",
      "name": "Handle Error - Request Help",
      "type": "n8n-nodes-base.if",
      "position": [
        2550,
        150
      ],
      "parameters": {
        "conditions": {
          "options": {
            "operator": {
              "name": "filter.operator.equals",
              "value": "=="
            },
            "leftValue": "{{ $json.executionStatus }}",
            "rightValue": "error",
            "caseSensitive": false
          }
        }
      },
      "typeVersion": 2
    },
    {
      "id": "send-slack-alert",
      "name": "Send Slack Alert",
      "type": "n8n-nodes-base.slack",
      "position": [
        2700,
        50
      ],
      "parameters": {
        "text": "🤖 AI Agent Task Completed\n\n*Task:* {{ $json.title }}\n*Result:* Task has been processed successfully\n\n_Reasoning applied:_\n{{ $json.aiReasoning }}",
        "channel": "{{ $env.SLACK_CHANNEL || '#ai-agent-logs' }}",
        "resource": "message",
        "operation": "create"
      },
      "typeVersion": 2.3
    },
    {
      "id": "send-discord-alert",
      "name": "Send Discord Alert",
      "type": "n8n-nodes-base.discord",
      "position": [
        2700,
        150
      ],
      "parameters": {
        "text": "⚠️ AI Agent Help Request\n\nTask: {{ $json.title }}\nError: Unable to complete task\nReason: {{ $json.errorMessage || 'Unknown' }}\n\nPlease review and provide guidance.",
        "guildId": "{{ $env.DISCORD_GUILD_ID }}",
        "resource": "message",
        "channelId": "{{ $env.DISCORD_CHANNEL_ID }}",
        "operation": "send"
      },
      "typeVersion": 2
    },
    {
      "id": "log-execution-metrics",
      "name": "Log Execution Metrics",
      "type": "n8n-nodes-base.code",
      "position": [
        2800,
        100
      ],
      "parameters": {
        "mode": "runOnceForAllItems",
        "jsCode": "const metrics = {\n  timestamp: new Date().toISOString(),\n  tasksProcessed: $input.all().length,\n  successCount: $input.all().filter(t => t.json.executionStatus === 'ready').length,\n  failureCount: $input.all().filter(t => t.json.executionStatus === 'error').length,\n  avgProcessingTime: 0\n};\n\nreturn [{ json: metrics }];",
        "language": "javaScript"
      },
      "typeVersion": 2
    }
  ],
  "connections": {
    "Send Slack Alert": {
      "main": [
        [
          {
            "node": "Log Execution Metrics",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Check Image Input": {
      "main": [
        [
          {
            "node": "Analyze Image (Custom Vision)",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Merge Multimodal Inputs",
            "type": "main",
            "index": 1
          }
        ]
      ]
    },
    "Check Voice Input": {
      "main": [
        [
          {
            "node": "Transcribe Voice (Groq)",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Check Image Input",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Fetch Notion Tasks": {
      "main": [
        [
          {
            "node": "Filter Pending Tasks",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Prepare Task Input": {
      "main": [
        [
          {
            "node": "Check Voice Input",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Send Discord Alert": {
      "main": [
        [
          {
            "node": "Log Execution Metrics",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Execute Task Action": {
      "main": [
        [
          {
            "node": "Check Execution Result",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Route to Custom LLM": {
      "main": [
        [
          {
            "node": "Execute Task Action",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Filter Pending Tasks": {
      "main": [
        [
          {
            "node": "Prepare Task Input",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract Transcription": {
      "main": [
        [
          {
            "node": "Merge Multimodal Inputs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Update Task - Success": {
      "main": [
        [
          {
            "node": "Send Slack Alert",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Build Reasoning Prompt": {
      "main": [
        [
          {
            "node": "Route to Custom LLM",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Check Execution Result": {
      "main": [
        [
          {
            "node": "Update Task - Success",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Handle Error - Request Help",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Merge Multimodal Inputs": {
      "main": [
        [
          {
            "node": "Retrieve SOP from Qdrant",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Transcribe Voice (Groq)": {
      "main": [
        [
          {
            "node": "Extract Transcription",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Retrieve SOP from Qdrant": {
      "main": [
        [
          {
            "node": "Build Reasoning Prompt",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Schedule Trigger (15min)": {
      "main": [
        [
          {
            "node": "Fetch Notion Tasks",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Handle Error - Request Help": {
      "main": [
        [
          {
            "node": "Send Discord Alert",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Log Execution Metrics",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Analyze Image (Custom Vision)": {
      "main": [
        [
          {
            "node": "Merge Multimodal Inputs",
            "type": "main",
            "index": 1
          }
        ]
      ]
    }
  }
}