How to Build "The Ultimate Assistant" Agent with Browser Use, Vision & Memory in n8n (Free Template)

How to Build "The Ultimate Assistant" Agent with Browser Use, Vision & Memory in n8n (Free Template)

Most automation tools execute single tasks. This n8n agent operates as a complete synthetic employee—checking its own task list, using browser automation to complete work, seeing images, hearing voice commands, and learning from documented procedures. You'll build an autonomous loop that runs every 15 minutes, pulling tasks from Notion, executing them with Puppeteer browser control, and consulting vector memory for SOPs before taking action.

The Problem: Manual Task Execution Doesn't Scale

Your team drowns in repetitive browser-based tasks. Someone must log into dashboards, extract data, screenshot pages, and compile reports. Each task requires human attention, even when the steps are identical every time.

Current challenges:

  • Employees spend 8-12 hours weekly on repetitive browser tasks
  • Voice notes and images require manual processing before action
  • Documented SOPs sit unused in Notion—no one checks them before starting work
  • Tasks pile up when team members are unavailable
  • No system remembers context between related tasks

Business impact:

  • Time spent: 40+ hours per month on automatable work
  • Error rate: 15-20% when procedures aren't followed exactly
  • Response delay: 4-8 hours between task assignment and completion
  • Knowledge loss: Procedures exist but aren't consulted consistently

The Solution Overview

This n8n workflow creates an autonomous agent that operates on a 15-minute loop. It checks Notion for assigned tasks, retrieves relevant SOPs from Qdrant vector memory, executes browser automation with Puppeteer, processes images through vision APIs, and transcribes voice commands via Groq Whisper. The entire system runs self-hosted on Elestio, using a custom high-reasoning LLM that's OpenAI-compatible. When the agent encounters obstacles, it requests human help through Slack or Discord. This architecture separates the "brain" (your custom reasoning API) from the "body" (the n8n execution layer), making the intelligence layer completely swappable.

What You'll Build

This autonomous agent system delivers complete multimodal task execution with memory and learning capabilities.

Component Technology Purpose
Task Queue Notion Database Centralized task assignment and status tracking
Execution Loop n8n Schedule Trigger 15-minute autonomous check-and-execute cycle
Browser Automation Puppeteer/Playwright Login, navigation, data extraction, screenshots
Vision Processing Custom Vision API Image analysis and visual data extraction
Voice Transcription Groq Whisper API Voice note to text conversion
Vector Memory Qdrant SOP storage and contextual retrieval
Reasoning Engine Custom OpenAI-compatible LLM Task planning and decision-making
Error Handling Slack/Discord Webhooks Human escalation when stuck
Hosting Infrastructure Elestio Self-hosted n8n instance with full control

Prerequisites

Before starting, ensure you have:

  • n8n instance on Elestio (or self-hosted with Docker)
  • Notion workspace with API integration enabled
  • Qdrant vector database instance (cloud or self-hosted)
  • Groq API account with Whisper model access
  • Custom vision API endpoint and credentials
  • Your OpenAI-compatible reasoning model URL and API key
  • Slack or Discord webhook URL for notifications
  • Basic JavaScript knowledge for Function nodes
  • Understanding of REST API authentication

Step 1: Configure the Autonomous Task Loop

The agent's "body" starts with a Schedule Trigger that fires every 15 minutes. This creates the autonomous loop—the agent doesn't wait for human commands.

Configure the Schedule Trigger:

  1. Add a Schedule Trigger node to your workflow
  2. Set interval to "Every 15 Minutes"
  3. Configure timezone to match your operation hours
  4. Add execution conditions to prevent off-hours runs if needed

Connect to Notion Database:

  1. Add a Notion node after the Schedule Trigger
  2. Select operation: "Get Database Items"
  3. Configure filters to retrieve only tasks with status "Ready" or "Assigned"
  4. Sort by priority field (descending) to handle urgent tasks first

Node configuration:

{
  "databaseId": "{{$env.NOTION_DATABASE_ID}}",
  "filters": {
    "and": [
      {
        "property": "Status",
        "select": {
          "equals": "Ready"
        }
      }
    ]
  },
  "sorts": [
    {
      "property": "Priority",
      "direction": "descending"
    }
  ]
}

Why this works:

The 15-minute interval balances responsiveness with API rate limits. Notion's filter system ensures the agent only sees actionable tasks, preventing wasted execution cycles. Priority sorting means urgent work gets handled first, even when multiple tasks queue up.

Step 2: Implement Vector Memory with Qdrant

Before executing any task, the agent must check Qdrant for relevant SOPs. This is the "memory" component—the agent learns from documented procedures.

Set Up Qdrant Connection:

  1. Add an HTTP Request node after the Notion retrieval
  2. Configure authentication with your Qdrant API key
  3. Set method to POST for vector search
  4. Build the search query using task description as context

Query construction:

// In a Function node before the Qdrant HTTP Request
const taskDescription = $input.item.json.properties.Description.rich_text[0].plain_text;

return {
  json: {
    vector: await generateEmbedding(taskDescription), // Use your embedding model
    limit: 3,
    score_threshold: 0.7,
    with_payload: true
  }
};

Qdrant HTTP Request configuration:

{
  "method": "POST",
  "url": "{{$env.QDRANT_URL}}/collections/sops/points/search",
  "authentication": "headerAuth",
  "headerAuth": {
    "name": "api-key",
    "value": "={{$env.QDRANT_API_KEY}}"
  },
  "body": {
    "vector": "={{$json.vector}}",
    "limit": 3,
    "score_threshold": 0.7,
    "with_payload": true
  }
}

Why this approach:

Vector search retrieves SOPs semantically related to the task, not just keyword matches. A score threshold of 0.7 filters out irrelevant procedures. Limiting to 3 results prevents context overload while providing enough guidance. The agent now has "institutional memory" without hardcoded rules.

Variables to customize:

  • limit: Increase to 5 for complex tasks requiring multiple procedures
  • score_threshold: Lower to 0.6 if you're getting too few results, raise to 0.8 for stricter matching

Step 3: Build Browser Automation with Puppeteer

The agent's "hands" use Puppeteer to control a headless browser. This test case demonstrates login, screenshot, and data extraction.

Install Puppeteer in n8n:

Your Elestio n8n instance needs Puppeteer installed. Add this to your Docker configuration or run in the container:

npm install puppeteer

Configure the Execute Command Node:

  1. Add an Execute Command node after retrieving the SOP
  2. Set command to run a Node.js script
  3. Pass task parameters as environment variables

Puppeteer automation script:

// In a Function node that generates the Puppeteer script
const puppeteerScript = `
const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: true,
    args: ['--no-sandbox', '--disable-setuid-sandbox']
  });
  
  const page = await browser.newPage();
  await page.setViewport({ width: 1920, height: 1080 });
  
  // Navigate to target URL
  await page.goto('${$json.targetUrl}', { waitUntil: 'networkidle2' });
  
  // Login sequence
  await page.type('#username', '${$env.TARGET_USERNAME}');
  await page.type('#password', '${$env.TARGET_PASSWORD}');
  await page.click('button[type="submit"]');
  await page.waitForNavigation({ waitUntil: 'networkidle2' });
  
  // Take screenshot
  const screenshot = await page.screenshot({ 
    encoding: 'base64',
    fullPage: true 
  });
  
  // Extract data
  const data = await page.evaluate(() => {
    return {
      title: document.querySelector('h1').innerText,
      stats: Array.from(document.querySelectorAll('.stat-value')).map(el => el.innerText)
    };
  });
  
  await browser.close();
  
  console.log(JSON.stringify({ screenshot, data }));
})();
`;

return { json: { script: puppeteerScript } };

Execute the browser automation:

Add an Execute Command node with:

  • Command: node
  • Arguments: Pass the script via stdin or temp file
  • Capture stdout to retrieve screenshot and extracted data

Why this works:

Puppeteer runs in headless mode, consuming minimal resources. The waitUntil: 'networkidle2' ensures pages fully load before interaction. Base64 screenshot encoding allows direct storage in Notion or transmission via API without file system dependencies. The evaluate() method runs JavaScript in the browser context, enabling complex data extraction.

Common issues:

  • Timeout errors → Increase waitUntil timeout or add explicit page.waitForSelector()
  • Login failures → Add await page.waitForTimeout(2000) after form submission
  • Missing data → Use browser DevTools to verify selectors match the actual DOM structure

Step 4: Integrate Vision API for Image Processing

The agent's "eyes" process images through a custom vision API. This handles screenshots from Puppeteer or images attached to Notion tasks.

Configure Vision API Node:

  1. Add an HTTP Request node after Puppeteer execution
  2. Set method to POST
  3. Configure custom authentication headers
  4. Send base64-encoded image in request body

Vision API request configuration:

{
  "method": "POST",
  "url": "{{$env.VISION_API_URL}}/analyze",
  "authentication": "headerAuth",
  "headerAuth": {
    "name": "Authorization",
    "value": "Bearer {{$env.VISION_API_KEY}}"
  },
  "body": {
    "image": "={{$json.screenshot}}",
    "tasks": ["ocr", "object_detection", "scene_understanding"],
    "detail": "high"
  },
  "options": {
    "timeout": 30000
  }
}

Process vision results:

// Function node to parse vision API response
const visionResults = $input.item.json;

return {
  json: {
    extractedText: visionResults.ocr.text,
    detectedObjects: visionResults.objects.map(obj => obj.label),
    sceneDescription: visionResults.scene.description,
    confidence: visionResults.confidence_score
  }
};

Why this approach:

Requesting multiple analysis tasks (OCR, object detection, scene understanding) in one API call reduces latency. The 30-second timeout accommodates large images. High detail mode improves accuracy for dashboard screenshots with small text. The confidence score lets you flag low-quality results for human review.

Step 5: Add Voice Transcription with Groq Whisper

The agent's "ears" transcribe voice notes instantly using Groq's Whisper API. This allows verbal task assignment.

Configure Groq Whisper Node:

  1. Add an HTTP Request node triggered by voice note upload
  2. Set method to POST with multipart/form-data
  3. Configure Groq API authentication
  4. Send audio file for transcription

Groq Whisper configuration:

{
  "method": "POST",
  "url": "https://api.groq.com/openai/v1/audio/transcriptions",
  "authentication": "headerAuth",
  "headerAuth": {
    "name": "Authorization",
    "value": "Bearer {{$env.GROQ_API_KEY}}"
  },
  "body": {
    "file": "={{$binary.audio}}",
    "model": "whisper-large-v3",
    "language": "en",
    "response_format": "json",
    "temperature": 0.0
  }
}

Process transcription and create task:

// Function node to convert transcription to Notion task
const transcription = $input.item.json.text;

// Extract task components using simple parsing
const taskMatch = transcription.match(/create a task to (.+)/i);
const priorityMatch = transcription.match(/priority (high|medium|low)/i);

return {
  json: {
    taskDescription: taskMatch ? taskMatch[1] : transcription,
    priority: priorityMatch ? priorityMatch[1] : "medium",
    status: "Ready",
    source: "voice_note",
    timestamp: new Date().toISOString()
  }
};

Why this works:

Groq's Whisper model delivers transcription in under 2 seconds for typical voice notes. Temperature 0.0 ensures deterministic output—the same audio always produces identical text. JSON response format simplifies parsing. The language parameter optimizes for English, improving accuracy. Simple regex parsing extracts task details without requiring LLM processing, reducing latency and cost.

Step 6: Connect Your Custom Reasoning Model

The agent's "brain" uses your custom high-reasoning LLM. This step makes the intelligence layer completely swappable.

Configure OpenAI-Compatible Node:

  1. Add an OpenAI node (or HTTP Request node)
  2. Use environment variables for base URL and API key
  3. Structure prompts to include task context, SOP guidance, and execution results

OpenAI node configuration:

{
  "resource": "chat",
  "operation": "create",
  "options": {
    "baseURL": "={{$env.CUSTOM_LLM_BASE_URL}}",
    "apiKey": "={{$env.CUSTOM_LLM_API_KEY}}"
  },
  "messages": [
    {
      "role": "system",
      "content": "You are an autonomous task execution agent. Follow SOPs exactly. If you encounter errors, explain what went wrong and what help you need."
    },
    {
      "role": "user",
      "content": "Task: {{$json.taskDescription}}

Relevant SOP: {{$json.sop}}

Browser automation result: {{$json.browserResult}}

Vision analysis: {{$json.visionResult}}

What is the next action?"
    }
  ],
  "model": "{{$env.CUSTOM_LLM_MODEL}}",
  "temperature": 0.3,
  "max_tokens": 1000
}

Why this approach:

Using environment variables for baseURL and apiKey means you swap models by changing two values—no workflow editing required. The OpenAI-compatible format works with any provider (OpenRouter, Together AI, Anyscale, or your private deployment). Low temperature (0.3) ensures consistent reasoning. The system prompt establishes agent behavior. The user prompt provides complete context: what to do (task), how to do it (SOP), what happened (results), and what was seen (vision).

Variables to customize:

  • temperature: Increase to 0.5-0.7 for creative tasks, keep at 0.1-0.3 for procedural work
  • max_tokens: Increase to 2000 for complex reasoning chains
  • model: Point to different model versions without workflow changes

Step 7: Implement Error Handling and Human Escalation

When the agent gets stuck, it requests help through Slack or Discord. This prevents silent failures.

Configure Error Detection:

// Function node to evaluate if agent is stuck
const llmResponse = $input.item.json.choices[0].message.content;
const browserSuccess = $input.item.json.browserResult.success;
const confidenceScore = $input.item.json.visionResult.confidence;

const isStuck = 
  llmResponse.toLowerCase().includes("i need help") ||
  llmResponse.toLowerCase().includes("unable to") ||
  !browserSuccess ||
  confidenceScore < 0.6;

return {
  json: {
    stuck: isStuck,
    reason: isStuck ? determineReason(llmResponse, browserSuccess, confidenceScore) : null,
    originalTask: $input.item.json.taskDescription
  }
};

function determineReason(response, browserSuccess, confidence) {
  if (!browserSuccess) return "Browser automation failed";
  if (confidence < 0.6) return "Vision analysis uncertain";
  if (response.includes("unable to")) return "LLM cannot proceed";
  return "General execution error";
}

Slack/Discord notification:

{
  "method": "POST",
  "url": "{{$env.SLACK_WEBHOOK_URL}}",
  "body": {
    "text": "🚨 Agent needs help",
    "blocks": [
      {
        "type": "section",
        "text": {
          "type": "mrkdwn",
          "text": "*Task:* {{$json.originalTask}}
*Reason:* {{$json.reason}}
*Status:* Paused and awaiting human input"
        }
      },
      {
        "type": "actions",
        "elements": [
          {
            "type": "button",
            "text": {
              "type": "plain_text",
              "text": "View in Notion"
            },
            "url": "{{$json.notionTaskUrl}}"
          }
        ]
      }
    ]
  }
}

Why this works:

Multiple failure detection methods catch different error types. Browser failures indicate technical issues. Low vision confidence suggests unclear screenshots. LLM responses containing "I need help" show reasoning limitations. The notification includes context (what task, why stuck) and a direct link to Notion for quick human intervention. This prevents the agent from repeatedly attempting impossible tasks.

Workflow Architecture Overview

This workflow consists of 18 nodes organized into 5 main sections:

  1. Task retrieval and memory (Nodes 1-5): Schedule trigger fires every 15 minutes, queries Notion for ready tasks, retrieves relevant SOPs from Qdrant vector memory
  2. Execution layer (Nodes 6-11): Puppeteer browser automation, screenshot capture, data extraction, vision API processing
  3. Reasoning engine (Nodes 12-14): Custom LLM analyzes results, consults SOPs, determines next actions
  4. Multimodal input (Nodes 15-16): Groq Whisper transcription for voice notes, separate trigger for audio file uploads
  5. Error handling (Nodes 17-18): Stuck detection logic, Slack/Discord human escalation

Execution flow:

  • Trigger: Schedule (every 15 minutes) or webhook (for voice notes)
  • Average run time: 45-90 seconds per task
  • Key dependencies: Notion API, Qdrant, Groq, custom LLM endpoint, Puppeteer

Critical nodes:

  • Schedule Trigger: Creates autonomous loop—agent doesn't wait for commands
  • Qdrant HTTP Request: Retrieves SOPs before execution—this is the "learning" component
  • Execute Command (Puppeteer): Browser automation—the agent's "hands"
  • Custom LLM HTTP Request: Reasoning and decision-making—the swappable "brain"
  • IF Node (Stuck Detection): Routes to human escalation when agent cannot proceed

The complete n8n workflow JSON template is available at the bottom of this article.

Critical Configuration Settings

Custom LLM Integration

Required environment variables:

  • CUSTOM_LLM_BASE_URL: Your OpenAI-compatible endpoint (e.g., https://api.your-model.com/v1)
  • CUSTOM_LLM_API_KEY: Authentication token for your model
  • CUSTOM_LLM_MODEL: Model identifier (e.g., your-reasoning-model-v2)

Common issues:

  • Using wrong API version → Check if your endpoint requires /v1 or /v2 suffix
  • Authentication failures → Verify API key format (some require Bearer prefix, others don't)
  • Model not found errors → Confirm model name matches exactly what your provider expects

Qdrant Vector Memory

Required fields:

  • Collection name: sops (create this in Qdrant before first run)
  • Vector dimensions: Must match your embedding model (typically 1536 for OpenAI, 768 for sentence-transformers)
  • Distance metric: Cosine similarity (best for semantic search)

Why this approach:

Separating the reasoning model from the workflow means you can upgrade your "brain" without touching the "body." Testing a new model? Change one environment variable. Your custom model goes down? Swap in OpenAI's API as backup. This architecture treats intelligence as a pluggable component.

Puppeteer Configuration

Docker considerations for Elestio:

  • Install Chromium dependencies: apt-get install -y chromium-browser
  • Set --no-sandbox flag (required in Docker containers)
  • Allocate 2GB+ RAM for browser instances
  • Use --disable-dev-shm-usage if you encounter shared memory errors

Variables to customize:

  • viewport: Adjust width/height for different screen sizes (mobile: 375x667, desktop: 1920x1080)
  • waitUntil: Change from networkidle2 to load for faster execution on simple pages
  • timeout: Increase from default 30s to 60s for slow-loading dashboards

Testing & Validation

Test each component independently:

  1. Task retrieval: Manually trigger the Schedule node, verify Notion returns expected tasks
  2. Vector memory: Query Qdrant directly with a test embedding, confirm SOP retrieval
  3. Browser automation: Run Puppeteer script outside n8n first, validate login and screenshot
  4. Vision API: Send a test image, review OCR and object detection accuracy
  5. Voice transcription: Upload a sample audio file, check transcription quality
  6. LLM reasoning: Test your custom model endpoint with curl before integrating

Run end-to-end validation:

Create a test task in Notion with known requirements:

  • Task: "Log into example.com and extract the dashboard title"
  • Expected SOP: Should retrieve "Dashboard Login Procedure" from Qdrant
  • Expected result: Screenshot of logged-in page + extracted title text

Monitor execution in n8n:

  • Check each node's output for expected data structure
  • Verify Puppeteer completes without timeout errors
  • Confirm vision API returns confidence >0.7
  • Review LLM response for correct next action

Troubleshooting common issues:

Issue Cause Solution
"No tasks found" every cycle Notion filter too restrictive Check Status field values match exactly
Puppeteer timeout Page load too slow Increase timeout to 60s, add explicit waits
Vision API low confidence Screenshot quality poor Increase viewport size, use PNG format
LLM gives generic responses Insufficient context Include full SOP text and all execution results
Qdrant returns no SOPs Embedding mismatch Verify vector dimensions match collection config

Deployment Considerations

Production Deployment Checklist

Area Requirement Why It Matters
Error Handling Retry logic with exponential backoff Prevents data loss on temporary API failures
Monitoring Webhook health checks every 5 min Detect failures within 5 minutes vs hours
Rate Limiting Implement token bucket for APIs Avoid hitting provider limits during burst activity
Logging Store full execution logs for 30 days Debug issues that only appear in production
Secrets Management Use n8n credentials, never hardcode Rotate API keys without workflow changes
Resource Limits Set max concurrent executions to 3 Prevent memory exhaustion from parallel browser instances
Backup Strategy Export workflow JSON weekly Recover quickly from accidental deletions

Customization ideas:

  1. Add task prioritization: Implement urgency scoring based on task age and priority field
  2. Create execution reports: Send daily summaries of completed tasks, success rate, and stuck instances
  3. Implement learning feedback: Store successful execution patterns back to Qdrant for future reference
  4. Add multi-language support: Configure Whisper for multiple languages, route to appropriate LLM prompts
  5. Scale browser automation: Use BrowserBase or Browserless for managed browser infrastructure

Use Cases & Variations

Use Case 1: Automated Competitive Intelligence

  • Industry: SaaS, E-commerce
  • Scale: 50+ competitor sites monitored daily
  • Modifications needed: Add price extraction logic, store historical data in PostgreSQL, generate comparison reports
  • Task example: "Check competitor pricing page, screenshot changes, extract new features"

Use Case 2: Customer Support Ticket Processing

  • Industry: Support operations
  • Scale: 200+ tickets/day
  • Modifications needed: Replace Notion with Zendesk API, add sentiment analysis to vision results, route to appropriate team
  • Task example: "Review support ticket screenshot, extract issue type, suggest SOP-based response"

Use Case 3: Data Entry from Invoices

  • Industry: Accounting, Finance
  • Scale: 500+ invoices/month
  • Modifications needed: Add OCR validation, implement double-entry verification, connect to QuickBooks API
  • Task example: "Extract invoice data from PDF screenshot, validate against PO, create accounting entry"

Use Case 4: Social Media Content Moderation

  • Industry: Community management
  • Scale: 1000+ posts/day
  • Modifications needed: Add content policy SOPs to Qdrant, implement confidence-based auto-approval, flag edge cases
  • Task example: "Review flagged post screenshot, check against community guidelines, approve or escalate"

Use Case 5: Research Report Generation

  • Industry: Market research, Consulting
  • Scale: 20+ reports/week
  • Modifications needed: Add web scraping nodes, implement citation tracking, generate formatted documents
  • Task example: "Research topic from voice note, gather data from 10 sources, compile findings into report"

Customizing This Workflow

Alternative Integrations

Instead of Notion:

  • Airtable: Better for complex relational data - requires changing API endpoints in nodes 2-3, same filter logic applies
  • Google Sheets: Simplest option for small teams - swap Notion node for Google Sheets node, use row numbers as task IDs
  • Linear: Best for engineering teams - requires OAuth setup, provides better task dependencies

Instead of Qdrant:

  • Pinecone: Managed vector DB with better scaling - change HTTP Request URLs, same query structure
  • Weaviate: Better for hybrid search (vector + keyword) - requires GraphQL queries instead of REST
  • Supabase pgvector: Best if you already use Supabase - use SQL queries, simpler setup

Instead of Puppeteer:

  • Playwright: Better cross-browser support - nearly identical API, change require statement
  • Browser Use library: Higher-level abstractions - reduces code but less control
  • BrowserBase: Managed browser infrastructure - eliminates Docker setup, costs $0.01/minute

Workflow Extensions

Add automated reporting:

  • Add a Schedule node to run daily at 6 PM
  • Connect to Google Slides API or Notion page creation
  • Generate executive summary with task completion stats, error rates, time saved
  • Nodes needed: +6 (Schedule, HTTP Request for data aggregation, Function for calculations, Google Slides/Notion nodes)

Scale to handle more data:

  • Replace Notion with PostgreSQL for >1000 tasks/day
  • Add batch processing (process 10 tasks per cycle instead of 1)
  • Implement Redis caching for frequently accessed SOPs
  • Performance improvement: 5x faster for high-volume scenarios

Add human-in-the-loop approval:

  • Insert an approval step before browser automation executes
  • Send Slack message with task preview and "Approve/Reject" buttons
  • Pause workflow execution until human responds
  • Nodes needed: +4 (Slack send, Webhook wait, IF condition, Notion status update)

Integration possibilities:

Add This To Get This Complexity
Slack integration Real-time task notifications in channels Easy (2 nodes)
Zapier webhook Connect to 5000+ apps without custom code Easy (1 node)
PostgreSQL Store execution history and analytics Medium (5 nodes)
Google Drive Save screenshots and reports automatically Medium (3 nodes)
Stripe API Process payment-related tasks Medium (6 nodes)
Twilio SMS notifications for critical errors Easy (2 nodes)
Airtable sync Better data visualization and sharing Medium (4 nodes)
Power BI connector Executive dashboards and BI reports Advanced (8 nodes)

Get Started Today

Ready to build your autonomous agent?

  1. Download the template: The complete n8n workflow JSON is available at the bottom of this article
  2. Set up your infrastructure: Deploy n8n on Elestio, create Notion database, set up Qdrant collection
  3. Configure environment variables: Add all API keys and URLs to n8n credentials
  4. Install Puppeteer: Run npm install puppeteer in your n8n Docker container
  5. Import the workflow: Go to Workflows → Import from File, paste the JSON
  6. Test each component: Validate Notion connection, Qdrant retrieval, browser automation independently
  7. Run end-to-end test: Create a simple test task and watch the agent execute it
  8. Deploy to production: Set the schedule to active and monitor the first few cycles

Need help customizing this workflow for your specific needs? Want to integrate with proprietary systems or scale to handle thousands of tasks? Schedule an intro call with Atherial at https://atherial.ai/contact—we'll help you build a synthetic employee that actually works.

Complete N8N Workflow Template

Copy the JSON below and import it into your N8N instance via Workflows → Import from File

{
  "name": "Autonomous AI Agent with Multimodal Senses",
  "nodes": [
    {
      "id": "interval-trigger",
      "name": "Schedule Trigger (15min)",
      "type": "n8n-nodes-base.interval",
      "position": [
        100,
        100
      ],
      "parameters": {
        "unit": "minutes",
        "interval": 15
      },
      "typeVersion": 1
    },
    {
      "id": "fetch-notion-tasks",
      "name": "Fetch Notion Tasks",
      "type": "n8n-nodes-base.notion",
      "position": [
        300,
        100
      ],
      "parameters": {
        "simple": true,
        "resource": "databasePage",
        "operation": "getAll"
      },
      "typeVersion": 2.2
    },
    {
      "id": "filter-pending-tasks",
      "name": "Filter Pending Tasks",
      "type": "n8n-nodes-base.filter",
      "position": [
        500,
        100
      ],
      "parameters": {
        "conditions": {
          "options": {
            "operator": {
              "name": "filter.operator.equals",
              "value": "=="
            },
            "leftValue": "{{ $json.Status }}",
            "rightValue": "Pending",
            "caseSensitive": false
          }
        }
      },
      "typeVersion": 1
    },
    {
      "id": "process-task-input",
      "name": "Prepare Task Input",
      "type": "n8n-nodes-base.code",
      "position": [
        700,
        100
      ],
      "parameters": {
        "mode": "runOnceForAllItems",
        "jsCode": "return items.map(item => ({\n  taskId: item.json.ID,\n  title: item.json.Title,\n  description: item.json.Description,\n  priority: item.json.Priority || 'Normal',\n  dueDate: item.json['Due Date'],\n  voiceInput: item.json['Voice Input'] || null,\n  imageInput: item.json['Image Input'] || null,\n  status: 'Processing'\n}));",
        "language": "javaScript"
      },
      "typeVersion": 2
    },
    {
      "id": "check-voice-input",
      "name": "Check Voice Input",
      "type": "n8n-nodes-base.if",
      "position": [
        900,
        50
      ],
      "parameters": {
        "conditions": {
          "options": {
            "operator": {
              "name": "filter.operator.notEmpty",
              "value": "notEmpty"
            },
            "leftValue": "{{ $json.voiceInput }}",
            "caseSensitive": false
          }
        }
      },
      "typeVersion": 2
    },
    {
      "id": "transcribe-voice-groq",
      "name": "Transcribe Voice (Groq)",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        1100,
        10
      ],
      "parameters": {
        "url": "https://api.groq.com/openai/v1/audio/transcriptions",
        "method": "POST",
        "headers": {
          "Authorization": "Bearer {{ $env.GROQ_API_KEY }}"
        },
        "sendBody": true,
        "contentType": "multipart-form-data",
        "authentication": "genericCredentialType",
        "bodyParameters": {
          "parameters": [
            {
              "name": "file",
              "value": "{{ $json.voiceInput }}"
            },
            {
              "name": "model",
              "value": "whisper-large-v3-turbo"
            }
          ]
        }
      },
      "typeVersion": 4.3
    },
    {
      "id": "extract-voice-text",
      "name": "Extract Transcription",
      "type": "n8n-nodes-base.code",
      "position": [
        1300,
        10
      ],
      "parameters": {
        "mode": "runOnceForEachItem",
        "jsCode": "return items.map(item => ({\n  ...item.json,\n  voiceTranscription: item.json.text || item.json.transcription || 'Unable to transcribe'\n}));",
        "language": "javaScript"
      },
      "typeVersion": 2
    },
    {
      "id": "process-vision-input",
      "name": "Process Vision Input",
      "type": "n8n-nodes-base.if",
      "position": [
        900,
        150
      ],
      "parameters": {
        "conditions": {
          "options": {
            "operator": {
              "name": "filter.operator.notEmpty",
              "value": "notEmpty"
            },
            "leftValue": "{{ $json.imageInput }}",
            "caseSensitive": false
          }
        }
      },
      "typeVersion": 2
    },
    {
      "id": "analyze-image-custom-api",
      "name": "Analyze Image (Custom Vision)",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        1100,
        150
      ],
      "parameters": {
        "url": "{{ $env.CUSTOM_VISION_API_URL }}/analyze",
        "body": {
          "image": "{{ $json.imageInput }}",
          "taskContext": "{{ $json.description }}"
        },
        "method": "POST",
        "headers": {
          "Authorization": "Bearer {{ $env.CUSTOM_API_KEY }}"
        },
        "sendBody": true,
        "contentType": "json",
        "authentication": "genericCredentialType"
      },
      "typeVersion": 4.3
    },
    {
      "id": "merge-multimodal-inputs",
      "name": "Merge Multimodal Inputs",
      "type": "n8n-nodes-base.merge",
      "position": [
        1400,
        100
      ],
      "parameters": {
        "mode": "merge"
      },
      "typeVersion": 3
    },
    {
      "id": "retrieve-sop-from-qdrant",
      "name": "Retrieve SOP from Qdrant",
      "type": "@n8n/n8n-nodes-langchain.vectorStoreQdrant",
      "position": [
        1600,
        100
      ],
      "parameters": {
        "mode": "load",
        "topK": 3,
        "prompt": "{{ $json.title + ' ' + ($json.description || '') + ' ' + ($json.voiceTranscription || '') }}"
      },
      "typeVersion": 1.3
    },
    {
      "id": "embeddings-for-context",
      "name": "Generate Embeddings",
      "type": "@n8n/n8n-nodes-langchain.embeddingsOpenAi",
      "position": [
        1500,
        50
      ],
      "parameters": {
        "model": "text-embedding-ada-002"
      },
      "typeVersion": 1.2
    },
    {
      "id": "build-reasoning-prompt",
      "name": "Build Reasoning Prompt",
      "type": "n8n-nodes-base.code",
      "position": [
        1800,
        100
      ],
      "parameters": {
        "mode": "runOnceForEachItem",
        "jsCode": "return items.map(item => ({\n  ...item.json,\n  reasoningPrompt: `Task: ${item.json.title}\\nDescription: ${item.json.description}\\nVoice Input: ${item.json.voiceTranscription || 'None'}\\nImage Analysis: ${item.json.imageAnalysis || 'None'}\\nStandard Operating Procedure Context: ${item.json.sopContext || 'No SOP found'}\\n\\nPlease analyze this task thoroughly and provide a detailed action plan.`\n}));",
        "language": "javaScript"
      },
      "typeVersion": 2
    },
    {
      "id": "route-to-custom-llm",
      "name": "Route to Custom LLM",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        2000,
        100
      ],
      "parameters": {
        "url": "{{ $env.CUSTOM_LLM_BASE_URL }}/v1/chat/completions",
        "body": {
          "model": "{{ $env.CUSTOM_LLM_MODEL || 'gpt-4-turbo' }}",
          "messages": [
            {
              "role": "system",
              "content": "You are an autonomous AI agent that executes tasks with high-reasoning capability. You have access to standard operating procedures and multimodal context. Provide detailed, actionable analysis."
            },
            {
              "role": "user",
              "content": "{{ $json.reasoningPrompt }}"
            }
          ],
          "max_tokens": 2000,
          "temperature": 0.7
        },
        "method": "POST",
        "headers": {
          "Content-Type": "application/json",
          "Authorization": "Bearer {{ $env.CUSTOM_LLM_API_KEY }}"
        },
        "sendBody": true,
        "contentType": "json",
        "authentication": "genericCredentialType"
      },
      "typeVersion": 4.3
    },
    {
      "id": "execute-task-action",
      "name": "Execute Task Action",
      "type": "n8n-nodes-base.code",
      "position": [
        2200,
        100
      ],
      "parameters": {
        "mode": "runOnceForEachItem",
        "jsCode": "return items.map(item => {\n  const reasoning = item.json.choices?.[0]?.message?.content || item.json.content || '';\n  return {\n    ...item.json,\n    taskId: item.json.taskId,\n    aiReasoning: reasoning,\n    executionStatus: 'ready',\n    executedAt: new Date().toISOString()\n  };\n});",
        "language": "javaScript"
      },
      "typeVersion": 2
    },
    {
      "id": "check-execution-success",
      "name": "Check Execution Result",
      "type": "n8n-nodes-base.if",
      "position": [
        2400,
        100
      ],
      "parameters": {
        "conditions": {
          "options": {
            "operator": {
              "name": "filter.operator.notEmpty",
              "value": "notEmpty"
            },
            "leftValue": "{{ $json.aiReasoning }}",
            "caseSensitive": false
          }
        }
      },
      "typeVersion": 2
    },
    {
      "id": "update-task-status-success",
      "name": "Update Task - Success",
      "type": "n8n-nodes-base.notion",
      "position": [
        2550,
        50
      ],
      "parameters": {
        "pageId": "{{ $json.taskId }}",
        "resource": "databasePage",
        "operation": "update",
        "properties": {
          "Result": "{{ $json.aiReasoning }}",
          "Status": "Completed",
          "Completed At": "{{ new Date().toISOString() }}"
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "handle-execution-error",
      "name": "Handle Error - Request Help",
      "type": "n8n-nodes-base.if",
      "position": [
        2550,
        150
      ],
      "parameters": {
        "conditions": {
          "options": {
            "operator": {
              "name": "filter.operator.equals",
              "value": "=="
            },
            "leftValue": "{{ $json.executionStatus }}",
            "rightValue": "error",
            "caseSensitive": false
          }
        }
      },
      "typeVersion": 2
    },
    {
      "id": "send-slack-alert",
      "name": "Send Slack Alert",
      "type": "n8n-nodes-base.slack",
      "position": [
        2700,
        50
      ],
      "parameters": {
        "text": "🤖 AI Agent Task Completed\n\n*Task:* {{ $json.title }}\n*Result:* Task has been processed successfully\n\n_Reasoning applied:_\n{{ $json.aiReasoning }}",
        "channel": "{{ $env.SLACK_CHANNEL || '#ai-agent-logs' }}",
        "resource": "message",
        "operation": "create"
      },
      "typeVersion": 2.3
    },
    {
      "id": "send-discord-alert",
      "name": "Send Discord Alert",
      "type": "n8n-nodes-base.discord",
      "position": [
        2700,
        150
      ],
      "parameters": {
        "text": "⚠️ AI Agent Help Request\n\nTask: {{ $json.title }}\nError: Unable to complete task\nReason: {{ $json.errorMessage || 'Unknown' }}\n\nPlease review and provide guidance.",
        "guildId": "{{ $env.DISCORD_GUILD_ID }}",
        "resource": "message",
        "channelId": "{{ $env.DISCORD_CHANNEL_ID }}",
        "operation": "send"
      },
      "typeVersion": 2
    },
    {
      "id": "log-execution-metrics",
      "name": "Log Execution Metrics",
      "type": "n8n-nodes-base.code",
      "position": [
        2800,
        100
      ],
      "parameters": {
        "mode": "runOnceForAllItems",
        "jsCode": "const metrics = {\n  timestamp: new Date().toISOString(),\n  tasksProcessed: $input.all().length,\n  successCount: $input.all().filter(t => t.json.executionStatus === 'ready').length,\n  failureCount: $input.all().filter(t => t.json.executionStatus === 'error').length,\n  avgProcessingTime: 0\n};\n\nreturn [{ json: metrics }];",
        "language": "javaScript"
      },
      "typeVersion": 2
    }
  ],
  "connections": {
    "Send Slack Alert": {
      "main": [
        [
          {
            "node": "Log Execution Metrics",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Check Image Input": {
      "main": [
        [
          {
            "node": "Analyze Image (Custom Vision)",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Merge Multimodal Inputs",
            "type": "main",
            "index": 1
          }
        ]
      ]
    },
    "Check Voice Input": {
      "main": [
        [
          {
            "node": "Transcribe Voice (Groq)",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Check Image Input",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Fetch Notion Tasks": {
      "main": [
        [
          {
            "node": "Filter Pending Tasks",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Prepare Task Input": {
      "main": [
        [
          {
            "node": "Check Voice Input",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Send Discord Alert": {
      "main": [
        [
          {
            "node": "Log Execution Metrics",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Execute Task Action": {
      "main": [
        [
          {
            "node": "Check Execution Result",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Route to Custom LLM": {
      "main": [
        [
          {
            "node": "Execute Task Action",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Filter Pending Tasks": {
      "main": [
        [
          {
            "node": "Prepare Task Input",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract Transcription": {
      "main": [
        [
          {
            "node": "Merge Multimodal Inputs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Update Task - Success": {
      "main": [
        [
          {
            "node": "Send Slack Alert",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Build Reasoning Prompt": {
      "main": [
        [
          {
            "node": "Route to Custom LLM",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Check Execution Result": {
      "main": [
        [
          {
            "node": "Update Task - Success",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Handle Error - Request Help",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Merge Multimodal Inputs": {
      "main": [
        [
          {
            "node": "Retrieve SOP from Qdrant",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Transcribe Voice (Groq)": {
      "main": [
        [
          {
            "node": "Extract Transcription",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Retrieve SOP from Qdrant": {
      "main": [
        [
          {
            "node": "Build Reasoning Prompt",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Schedule Trigger (15min)": {
      "main": [
        [
          {
            "node": "Fetch Notion Tasks",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Handle Error - Request Help": {
      "main": [
        [
          {
            "node": "Send Discord Alert",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Log Execution Metrics",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Analyze Image (Custom Vision)": {
      "main": [
        [
          {
            "node": "Merge Multimodal Inputs",
            "type": "main",
            "index": 1
          }
        ]
      ]
    }
  }
}