Finding the right startup prospects is time-consuming. You need to track funding announcements, research each company's team structure, evaluate founder quality, and prioritize who to contact first. This n8n workflow automates the entire pipeline—from sourcing newly funded startups to scoring them against your ideal customer profile. You'll learn how to build a system that continuously monitors prospects, enriches company data, and alerts you when high-value targets appear.
The Problem: Manual Prospect Research Doesn't Scale
Finance and strategy consulting firms serving venture-backed startups face a critical challenge: identifying which newly funded companies need their services before competitors do.
Current challenges:
- Manually checking Crunchbase for funding announcements takes 5-10 hours per week
- Researching each company's team structure requires visiting LinkedIn and careers pages individually
- No systematic way to prioritize which prospects match your ideal customer profile
- Missing time-sensitive opportunities when startups raise capital and need immediate support
- Inconsistent scoring criteria across team members leads to misaligned outreach efforts
Business impact:
- Time spent: 15-20 hours weekly on research that could be automated
- Opportunity cost: Missing 30-40% of qualified prospects due to delayed discovery
- Revenue impact: Competitors reach high-value targets first because they have better intelligence systems
Existing solutions like basic Crunchbase alerts or LinkedIn Sales Navigator provide raw data but lack the intelligence layer needed to automatically classify, score, and prioritize prospects based on your specific ICP criteria.
The Solution Overview
This n8n workflow creates an intelligent prospecting engine that runs continuously in the background. It sources startups that raised ≥$4M from Crunchbase, enriches each company using LinkedIn and web scraping, uses LLMs to detect whether they have an in-house finance team, scores them based on founder and investor quality, and stores everything in Airtable with automated Slack alerts for top prospects.
The system combines deterministic logic (funding thresholds, geographic filters) with AI-powered classification (team structure detection, founder quality assessment) to create a maintainable, scalable lead generation machine. You'll build this using n8n's visual workflow editor, requiring no backend infrastructure.
What You'll Build
This system delivers a complete prospecting automation pipeline with these capabilities:
| Component | Technology | Purpose |
|---|---|---|
| Data Source | Crunchbase CSV/API | Daily import of newly funded startups (≥$4M) |
| Company Enrichment | LinkedIn + Web Scraping | Extract team structure, careers pages, founder profiles |
| Finance Team Detection | LLM (GPT-4/Claude) | Classify whether company has in-house finance function |
| Scoring Engine | n8n Function Nodes | Calculate founder quality and investor quality scores |
| ICP Classification | Business Logic + LLM | Assign ICP-1, ICP-2, or ICP-3 tier based on signals |
| System of Record | Airtable | Searchable database with filtering by sector, geography, score |
| Change Detection | Scheduled Re-scans | Monitor companies for team changes, new hires, website updates |
| Alert System | Slack + Email | Real-time notifications for ICP-1 prospects, daily/weekly digests |
Key features:
- Automatic deduplication to prevent processing the same company twice
- Confidence scoring for each data point (high/medium/low)
- Subsector filtering (robotics, fintech, healthcare, etc.)
- Geographic filtering for regional targeting
- Historical change tracking to detect when companies become sales-ready
- Batch processing to handle 50-100 new companies daily
Prerequisites
Before starting, ensure you have:
- n8n instance (cloud or self-hosted with at least 2GB RAM)
- Airtable account with API access (free tier works initially)
- Crunchbase Pro account or CSV export access
- OpenAI API key or Anthropic Claude API access
- LinkedIn Sales Navigator (optional but recommended for enrichment)
- Slack workspace with webhook permissions
- Basic JavaScript knowledge for Function nodes and scoring logic
- Understanding of API rate limits and error handling
Step 1: Set Up Data Source and Import Pipeline
This phase establishes the foundation by connecting to Crunchbase and creating the initial data ingestion workflow.
Configure Crunchbase Data Import
- Create a new n8n workflow and add a Schedule Trigger node set to run daily at 6 AM
- Add an HTTP Request node to fetch Crunchbase data (or use CSV import via Google Sheets)
- Configure filtering logic to only process companies with funding_total ≥ $4,000,000
Node configuration for Crunchbase filtering:
// Function node: Filter by funding amount
const items = $input.all();
const filtered = items.filter(item => {
const funding = parseFloat(item.json.funding_total_usd);
return funding >= 4000000;
});
return filtered;
Why this works:
The Schedule Trigger ensures you check for new companies daily without manual intervention. Filtering at the source prevents wasting API calls and processing time on companies that don't meet your minimum criteria. Think of this as a bouncer at a club—only companies with the right credentials (funding level) get through the door.
Set up Airtable as system of record
- Create an Airtable base with tables: Companies, Enrichment_Data, Scores, Change_Log
- Add an Airtable node in n8n configured for "Create" operation
- Map Crunchbase fields to Airtable columns: company_name, website, funding_total, founded_date, location, crunchbase_url
Critical field mappings:
unique_id: Use Crunchbase permalink or domain as deduplication keylast_updated: Timestamp for change detectionprocessing_status: Track enrichment pipeline progress (pending/enriching/scored/complete)
Deduplication logic:
// Function node: Check if company exists
const domain = $json.website.replace(/^https?:\/\//, '').split('/')[0];
const existing = await $('Airtable').getAll({
filterByFormula: `{domain} = '${domain}'`
});
if (existing.length > 0) {
return { skip: true, reason: 'duplicate' };
}
return { skip: false, company: $json };
Why this approach:
Using domain as the deduplication key is more reliable than company name (which can have variations like "Acme Inc." vs "Acme"). The processing_status field lets you resume interrupted workflows without reprocessing completed companies. This is like a factory assembly line—each station knows exactly which products have already been processed.
Step 2: Build Company Enrichment Layer
This phase extracts detailed information about each company's team structure, leadership, and operational signals.
Configure LinkedIn Enrichment
- Add an HTTP Request node for LinkedIn Sales Navigator API (or use a scraping service like Apify)
- Extract founder profiles, C-suite executives, and finance team members
- Store results in Airtable Enrichment_Data table with confidence scores
LinkedIn data extraction:
// Function node: Parse LinkedIn profiles
const profiles = $json.people;
const founders = profiles.filter(p =>
p.title.toLowerCase().includes('founder') ||
p.title.toLowerCase().includes('ceo')
);
const financeTeam = profiles.filter(p =>
p.title.toLowerCase().includes('cfo') ||
p.title.toLowerCase().includes('controller') ||
p.title.toLowerCase().includes('finance')
);
return {
founder_count: founders.length,
founder_names: founders.map(f => f.name),
has_finance_team: financeTeam.length > 0,
finance_team_size: financeTeam.length,
confidence: profiles.length > 5 ? 'high' : 'medium'
};
Web scraping for careers pages
- Add an HTTP Request node to fetch company website
- Use a Function node with regex to find careers/team page URLs
- Extract job postings related to finance roles (CFO, Controller, Accountant)
Scraping logic:
// Function node: Detect finance hiring signals
const html = $json.body;
const financeKeywords = ['cfo', 'controller', 'accountant', 'finance manager', 'fp&a'];
const hasFinanceJobs = financeKeywords.some(keyword =>
html.toLowerCase().includes(keyword)
);
const teamPageUrl = html.match(/href=["'](.*?(?:team|about|careers)[^"']*)/i);
return {
has_finance_job_posting: hasFinanceJobs,
team_page_url: teamPageUrl ? teamPageUrl[1] : null,
scraped_at: new Date().toISOString()
};
Why this works:
LinkedIn provides structured data about current employees, while careers pages reveal hiring intent. A company posting for a CFO is a different signal than one that already has a full finance team. You're looking for the Goldilocks zone—companies big enough to need help but without an established finance function yet.
Step 3: Implement LLM-Based Classification
This phase uses large language models to make intelligent judgments about whether a company matches your ideal customer profile.
Configure OpenAI/Claude Integration
- Add an OpenAI node (or HTTP Request for Claude API)
- Set model to GPT-4 or Claude 3 Opus for best reasoning
- Configure for JSON output mode to ensure structured responses
LLM prompt for finance team detection:
// Function node: Prepare LLM prompt
const prompt = `Analyze this startup and determine if they have an in-house finance team.
Company: ${$json.company_name}
Employee count: ${$json.employee_count}
LinkedIn profiles: ${JSON.stringify($json.linkedin_profiles)}
Careers page data: ${$json.careers_data}
Website content: ${$json.website_excerpt}
Return JSON with this exact structure:
{
"has_finance_team": boolean,
"confidence": "high" | "medium" | "low",
"reasoning": "brief explanation",
"finance_team_size": number,
"key_signals": ["signal1", "signal2"]
}
Consider:
- Job titles containing CFO, Controller, Finance Manager
- Team pages listing finance department
- Job postings for finance roles
- Company size (>50 employees more likely to have finance team)`;
return { prompt };
OpenAI node configuration:
- Model:
gpt-4-turbo-preview - Temperature: 0.2 (lower = more consistent)
- Response format:
json_object - Max tokens: 500
Why this approach:
LLMs excel at synthesizing multiple weak signals into a confident judgment. A human would look at LinkedIn profiles, careers pages, and team descriptions to make this call—the LLM does the same reasoning at scale. The low temperature (0.2) ensures consistent classifications across similar companies. Think of this as hiring an analyst who reads all the data and gives you a clear yes/no answer with their reasoning.
Error handling for LLM failures:
// Function node: Handle LLM errors
try {
const result = JSON.parse($json.response);
return result;
} catch (error) {
return {
has_finance_team: null,
confidence: 'low',
reasoning: 'LLM parsing failed',
error: error.message,
requires_manual_review: true
};
}
Step 4: Build Scoring and ICP Classification Engine
This phase calculates numerical scores for founder quality and investor quality, then assigns each company to an ICP tier.
Configure Founder Quality Scoring
- Add a Function node to calculate founder score (0-100)
- Weight factors: previous exits, education (top universities), prior experience at notable companies, LinkedIn connections
Founder scoring logic:
// Function node: Calculate founder score
const founders = $json.founders;
let score = 0;
founders.forEach(founder => {
// Previous exits (30 points max)
if (founder.previous_exits > 0) score += Math.min(founder.previous_exits * 15, 30);
// Top university (20 points)
const topSchools = ['Stanford', 'MIT', 'Harvard', 'Berkeley', 'Carnegie Mellon'];
if (topSchools.some(school => founder.education.includes(school))) score += 20;
// Notable company experience (25 points)
const notableCompanies = ['Google', 'Meta', 'Amazon', 'Microsoft', 'Apple', 'Tesla'];
if (notableCompanies.some(co => founder.experience.includes(co))) score += 25;
// LinkedIn connections as proxy for network (25 points max)
score += Math.min(founder.linkedin_connections / 200, 25);
});
// Average across founders
const finalScore = Math.min(Math.round(score / founders.length), 100);
return {
founder_score: finalScore,
founder_tier: finalScore >= 70 ? 'A' : finalScore >= 50 ? 'B' : 'C'
};
Configure Investor Quality Scoring
- Add a Function node to score investors (0-100)
- Weight factors: tier-1 VCs (Sequoia, a16z, etc.), total AUM, successful exits from portfolio
Investor scoring logic:
// Function node: Calculate investor score
const investors = $json.investors;
let score = 0;
const tier1VCs = ['Sequoia', 'Andreessen Horowitz', 'Benchmark', 'Accel', 'Greylock'];
const tier2VCs = ['First Round', 'NEA', 'Lightspeed', 'Bessemer'];
investors.forEach(investor => {
if (tier1VCs.some(vc => investor.name.includes(vc))) {
score += 40;
} else if (tier2VCs.some(vc => investor.name.includes(vc))) {
score += 25;
} else {
score += 10; // Other institutional investors
}
});
const finalScore = Math.min(score, 100);
return {
investor_score: finalScore,
investor_tier: finalScore >= 70 ? 'A' : finalScore >= 40 ? 'B' : 'C',
lead_investors: investors.slice(0, 3).map(i => i.name)
};
ICP Classification Logic
- Add a Function node that combines all signals to assign ICP tier
- ICP-1: No finance team + high founder/investor scores + right subsector
- ICP-2: Partial finance team or medium scores
- ICP-3: Full finance team or low scores
ICP assignment:
// Function node: Assign ICP tier
const hasFinanceTeam = $json.has_finance_team;
const founderScore = $json.founder_score;
const investorScore = $json.investor_score;
const fundingAmount = $json.funding_total_usd;
const subsector = $json.subsector;
// Priority subsectors
const targetSubsectors = ['robotics', 'climate tech', 'enterprise software', 'fintech'];
const isTargetSubsector = targetSubsectors.includes(subsector.toLowerCase());
// ICP-1: Perfect fit
if (!hasFinanceTeam &&
founderScore >= 60 &&
investorScore >= 50 &&
fundingAmount >= 10000000 &&
isTargetSubsector) {
return { icp_tier: 'ICP-1', priority: 'high' };
}
// ICP-2: Good fit
if ((!hasFinanceTeam || $json.finance_team_size <= 1) &&
(founderScore >= 50 || investorScore >= 40) &&
fundingAmount >= 5000000) {
return { icp_tier: 'ICP-2', priority: 'medium' };
}
// ICP-3: Low priority
return { icp_tier: 'ICP-3', priority: 'low' };
Why this approach:
Scoring combines objective metrics (funding amount, investor names) with subjective signals (founder quality) into a single prioritization framework. The tiered system lets your sales team focus energy on ICP-1 prospects first. This is like a triage system in an emergency room—you treat the most critical patients first based on multiple vital signs, not just one symptom.
Variables to customize:
tier1VCs: Add your preferred top-tier investorstargetSubsectors: Adjust based on your firm's expertisefundingAmountthresholds: Change based on your typical deal size- Score weights: Increase/decrease importance of founder vs investor quality
Step 5: Implement Change Detection and Monitoring
This phase creates a system that periodically re-scans companies to detect when they become more sales-ready.
Configure Scheduled Re-scans
- Add a separate Schedule Trigger node set to run weekly
- Query Airtable for companies that haven't been checked in 7+ days
- Re-run enrichment and scoring workflows
Change detection logic:
// Function node: Detect meaningful changes
const previous = $json.previous_data;
const current = $json.current_data;
const changes = [];
// Finance team changes
if (!previous.has_finance_team && current.has_finance_team) {
changes.push({
type: 'finance_team_added',
significance: 'high',
message: 'Company hired finance team - may no longer need services'
});
}
// New job postings
if (current.finance_job_postings > previous.finance_job_postings) {
changes.push({
type: 'hiring_finance_roles',
significance: 'medium',
message: 'Actively hiring finance roles - window closing'
});
}
// Funding round
if (current.funding_total > previous.funding_total) {
const newFunding = current.funding_total - previous.funding_total;
changes.push({
type: 'new_funding',
significance: 'high',
message: `Raised $${(newFunding / 1000000).toFixed(1)}M - prime time to reach out`
});
}
// ICP tier improvement
if (previous.icp_tier === 'ICP-2' && current.icp_tier === 'ICP-1') {
changes.push({
type: 'icp_upgrade',
significance: 'high',
message: 'Moved to ICP-1 - prioritize outreach'
});
}
return {
has_changes: changes.length > 0,
changes: changes,
last_checked: new Date().toISOString()
};
Why this works:
Companies evolve rapidly. A startup without a finance team today might hire a CFO next month, closing your sales window. Change detection acts like a motion sensor—it only alerts you when something meaningful happens, not just because time passed. This prevents your team from reaching out too late or wasting time on companies that are no longer good fits.
Step 6: Build Alert and Digest System
This phase creates intelligent notifications that surface the right prospects at the right time.
Configure Slack Real-Time Alerts
- Add a Slack node for immediate notifications on ICP-1 prospects
- Include company name, scores, key signals, and direct link to Airtable record
Slack message format:
// Function node: Format Slack alert
const company = $json;
const message = {
text: `🎯 New ICP-1 Prospect: ${company.company_name}`,
blocks: [
{
type: 'section',
text: {
type: 'mrkdwn',
text: `*${company.company_name}* just raised *$${(company.funding_total_usd / 1000000).toFixed(1)}M*`
}
},
{
type: 'section',
fields: [
{ type: 'mrkdwn', text: `*Founder Score:*
${company.founder_score}/100` },
{ type: 'mrkdwn', text: `*Investor Score:*
${company.investor_score}/100` },
{ type: 'mrkdwn', text: `*Finance Team:*
${company.has_finance_team ? '❌ Yes' : '✅ No'}` },
{ type: 'mrkdwn', text: `*Subsector:*
${company.subsector}` }
]
},
{
type: 'section',
text: {
type: 'mrkdwn',
text: `*Why this matters:*
${company.icp_reasoning}`
}
},
{
type: 'actions',
elements: [
{
type: 'button',
text: { type: 'plain_text', text: 'View in Airtable' },
url: company.airtable_url
},
{
type: 'button',
text: { type: 'plain_text', text: 'LinkedIn' },
url: company.linkedin_url
}
]
}
]
};
return { message };
Configure Daily and Weekly Digests
- Add Schedule Triggers for daily (8 AM) and weekly (Monday 8 AM) digests
- Query Airtable for new ICP-1 and ICP-2 prospects from the period
- Send formatted email with summary statistics and top prospects
Daily digest logic:
// Function node: Generate daily digest
const yesterday = new Date();
yesterday.setDate(yesterday.getDate() - 1);
const newProspects = $json.companies.filter(c =>
new Date(c.created_at) >= yesterday
);
const icp1 = newProspects.filter(c => c.icp_tier === 'ICP-1');
const icp2 = newProspects.filter(c => c.icp_tier === 'ICP-2');
const summary = {
total_new: newProspects.length,
icp1_count: icp1.length,
icp2_count: icp2.length,
avg_funding: newProspects.reduce((sum, c) => sum + c.funding_total_usd, 0) / newProspects.length,
top_subsectors: [...new Set(newProspects.map(c => c.subsector))].slice(0, 5)
};
const emailBody = `
<h2>Daily Prospect Digest - ${new Date().toLocaleDateString()}</h2>
<h3>Summary</h3>
<ul>
<li>New prospects: ${summary.total_new}</li>
<li>ICP-1 (high priority): ${summary.icp1_count}</li>
<li>ICP-2 (medium priority): ${summary.icp2_count}</li>
<li>Avg funding: $${(summary.avg_funding / 1000000).toFixed(1)}M</li>
</ul>
<h3>Top ICP-1 Prospects to Contact Today</h3>
${icp1.slice(0, 5).map(c => `
<div style="border: 1px solid #ccc; padding: 10px; margin: 10px 0;">
<h4>${c.company_name}</h4>
<p><strong>Funding:</strong> $${(c.funding_total_usd / 1000000).toFixed(1)}M |
<strong>Founder Score:</strong> ${c.founder_score}/100 |
<strong>Investor Score:</strong> ${c.investor_score}/100</p>
<p><strong>Why now:</strong> ${c.icp_reasoning}</p>
<p><a href="${c.airtable_url}">View in Airtable</a> |
<a href="${c.linkedin_url}">LinkedIn</a> |
<a href="${c.website}">Website</a></p>
</div>
`).join('')}
`;
return { subject: `Daily Digest: ${summary.icp1_count} High-Priority Prospects`, body: emailBody };
Why this approach:
Real-time Slack alerts ensure you never miss a hot prospect, while digests prevent notification fatigue. The digest format lets your team plan their day—they see the top 5 prospects to focus on rather than being overwhelmed by 50 notifications. This is like the difference between a fire alarm (immediate Slack alert for ICP-1) and a daily weather report (digest showing overall trends).
Workflow Architecture Overview
This workflow consists of 47 nodes organized into 6 main sections:
- Data ingestion (Nodes 1-8): Schedule trigger fetches Crunchbase data, filters by funding amount, deduplicates against Airtable, and initiates enrichment pipeline
- Enrichment layer (Nodes 9-24): Parallel branches for LinkedIn scraping, website analysis, careers page extraction, and team structure detection
- LLM classification (Nodes 25-32): OpenAI nodes analyze enriched data to determine finance team presence with confidence scoring
- Scoring engine (Nodes 33-39): Function nodes calculate founder quality, investor quality, and assign ICP tiers
- Storage and change detection (Nodes 40-43): Airtable updates with new scores, comparison logic detects meaningful changes
- Alert system (Nodes 44-47): Conditional routing sends Slack alerts for ICP-1, queues daily/weekly digests
Execution flow:
- Trigger: Daily at 6 AM (new prospects) + Weekly on Mondays (re-scans)
- Average run time: 45-90 seconds per company (depending on API response times)
- Key dependencies: Crunchbase data access, OpenAI API, Airtable base configured with proper tables
Critical nodes:
- HTTP Request (Crunchbase): Handles rate limiting with retry logic, 3 attempts with exponential backoff
- OpenAI (Finance Team Detection): Uses GPT-4 with JSON mode, temperature 0.2 for consistency
- Function (ICP Classification): Combines 8+ signals into final tier assignment with reasoning
- Airtable (System of Record): Stores all data with timestamps for change detection and audit trail
The complete n8n workflow JSON template is available at the bottom of this article.
Key Configuration Details
Critical Configuration Settings
OpenAI Integration
Required fields:
- API Key: Your OpenAI API key from platform.openai.com
- Model:
gpt-4-turbo-preview(orgpt-4for better reasoning) - Temperature: 0.2 (lower = more consistent classifications)
- Response format:
json_object(ensures structured output) - Max tokens: 500 (sufficient for classification tasks)
Common issues:
- Using GPT-3.5 → Results in less accurate finance team detection (use GPT-4)
- Temperature > 0.5 → Inconsistent classifications for similar companies
- Not setting
response_format: json_object→ Parsing errors when LLM returns text
Airtable Configuration
Required tables and fields:
| Table | Key Fields | Purpose |
|---|---|---|
| Companies | company_name, domain, funding_total_usd, icp_tier, created_at | Main prospect database |
| Enrichment_Data | company_id (linked), linkedin_profiles, careers_data, confidence | Raw enrichment results |
| Scores | company_id (linked), founder_score, investor_score, reasoning | Scoring calculations |
| Change_Log | company_id (linked), change_type, detected_at, significance | Historical tracking |
API settings:
- Rate limit: 5 requests/second (Airtable free tier)
- Batch size: 10 records per request (for bulk updates)
- Retry logic: 3 attempts with 2-second delays
Why this approach:
Separating enrichment data from scores into different tables keeps your main Companies table clean and queryable. The Change_Log table acts as an audit trail—you can see exactly when a company moved from ICP-2 to ICP-1 and why. This structure mirrors how a data warehouse separates fact tables (Companies) from dimension tables (Enrichment_Data, Scores).
Crunchbase Data Handling
CSV import method:
- Export daily funding announcements from Crunchbase Pro
- Upload CSV to Google Sheets
- Use Google Sheets node in n8n to read new rows
- Trigger workflow on new entries
API method (if available):
- Endpoint:
https://api.crunchbase.com/api/v4/searches/organizations - Filter:
funding_total: {gte: 4000000}andlast_funding_at: {gte: [yesterday]} - Authentication: Bearer token in header
Rate limits:
- CSV method: No rate limits, but manual upload required
- API method: 200 requests/day on basic plan, 5000/day on enterprise
Scoring Customization
Variables to customize:
// Adjust these constants in Function nodes
const SCORING_WEIGHTS = {
founder_previous_exits: 15, // Points per previous exit (max 30)
founder_top_university: 20, // Points for elite education
founder_notable_company: 25, // Points for FAANG experience
founder_network_size: 25, // Max points for LinkedIn connections
investor_tier1: 40, // Points for Sequoia, a16z, etc.
investor_tier2: 25, // Points for First Round, NEA, etc.
investor_other: 10, // Points for other institutional VCs
funding_minimum: 4000000, // Minimum funding to process
funding_ideal: 10000000 // Funding amount for ICP-1 consideration
};
const ICP_THRESHOLDS = {
icp1_founder_min: 60, // Minimum founder score for ICP-1
icp1_investor_min: 50, // Minimum investor score for ICP-1
icp2_founder_min: 50, // Minimum founder score for ICP-2
icp2_investor_min: 40 // Minimum investor score for ICP-2
};
Why these numbers:
These weights reflect typical venture capital evaluation criteria. Previous exits are the strongest signal of founder quality (hence 15 points each), while education and experience provide supporting evidence. Tier-1 VCs get 40 points because their due diligence is rigorous—if Sequoia invested, the company likely has strong fundamentals. Adjust these based on your firm's specific ICP definition.
Testing & Validation
Component Testing Strategy
Test each workflow section independently before connecting them:
1. Data ingestion testing:
- Manually trigger workflow with 5 test companies from Crunchbase
- Verify deduplication logic catches existing companies
- Check Airtable records are created with all required fields
- Expected result: 5 new records in Companies table, no duplicates
2. Enrichment testing:
- Use a known company (e.g., a recent Series A startup)
- Verify LinkedIn data extraction returns founder profiles
- Check website scraping finds careers page
- Validate data quality: confidence scores should be "high" for companies with >10 employees
- Expected result: Enrichment_Data table populated with LinkedIn profiles, careers URLs, team size
3. LLM classification testing:
- Test with 3 companies: one with clear finance team, one without, one ambiguous
- Review LLM reasoning field for each classification
- Verify confidence scores match data quality (high confidence when LinkedIn shows CFO, low when data is sparse)
- Expected result: Accurate classifications with clear reasoning
4. Scoring validation:
- Calculate scores manually for 2 test companies
- Compare manual calculations to workflow output
- Verify ICP tier assignments match your expectations
- Expected result: Scores within 5 points of manual calculation, correct ICP tier
5. Alert testing:
- Create a test company that meets ICP-1 criteria
- Verify Slack message appears in correct channel
- Check email digest formatting and links
- Expected result: Slack alert within 60 seconds, properly formatted with working links
Common Issues and Fixes
| Issue | Cause | Solution |
|---|---|---|
| "OpenAI API rate limit exceeded" | Too many concurrent requests | Add 2-second delay between LLM calls using Wait node |
| "Airtable record not found" | Deduplication logic failing | Change unique key from company_name to domain |
| "LinkedIn scraping returns empty" | Rate limiting or blocked IP | Use residential proxy or LinkedIn API alternative |
| "Scores always return 0" | Function node syntax error | Check console logs, verify variable names match input data |
| "Slack alerts not sending" | Webhook URL expired | Regenerate webhook in Slack app settings |
Evaluation Metrics
Track these metrics to validate system performance:
// Function node: Calculate system metrics
const metrics = {
// Data quality
enrichment_success_rate: (successful_enrichments / total_companies) * 100,
avg_confidence_score: enrichment_data.reduce((sum, d) => sum + d.confidence_score, 0) / enrichment_data.length,
// Classification accuracy (requires manual review of sample)
finance_team_accuracy: correct_classifications / total_reviewed,
// Scoring distribution
icp1_percentage: (icp1_count / total_companies) * 100,
icp2_percentage: (icp2_count / total_companies) * 100,
// Alert effectiveness
alert_response_time: avg_time_to_first_contact_after_alert,
false_positive_rate: (incorrect_alerts / total_alerts) * 100
};
// Target benchmarks
const targets = {
enrichment_success_rate: 85, // 85% of companies successfully enriched
avg_confidence_score: 0.75, // Average confidence of 75%
finance_team_accuracy: 90, // 90% accurate classifications
icp1_percentage: 15, // 15% of companies are ICP-1 (adjust based on market)
alert_response_time: 24, // Contact within 24 hours of alert
false_positive_rate: 10 // <10% of alerts are not actually good fits
};
Manual review process:
- Weekly: Review 10 random ICP-1 classifications to verify accuracy
- Monthly: Audit scoring logic by manually calculating scores for 20 companies
- Quarterly: Survey sales team on alert quality and adjust thresholds
Deployment Considerations
Production Deployment Checklist
| Area | Requirement | Why It Matters |
|---|---|---|
| Error Handling | Retry logic with exponential backoff on all API nodes | Prevents data loss when LinkedIn or OpenAI APIs have temporary outages |
| Monitoring | Webhook health checks every 5 minutes | Detect workflow failures within 5 minutes vs discovering 3 days later when sales team asks why no alerts |
| Rate Limiting | Implement delays between API calls (2-5 seconds) | Avoid getting IP banned by LinkedIn or hitting OpenAI rate limits |
| Data Backup | Daily Airtable exports to Google Drive | Recover from accidental deletions or data corruption |
| Documentation | Node-by-node comments explaining logic | Reduces modification time from 4 hours to 30 minutes when adjusting scoring |
| Access Control | Separate API keys for dev/prod environments | Prevent test data from polluting production database |
| Cost Monitoring | Track OpenAI API usage daily | Catch runaway costs before $500 bill (GPT-4 is $0.03/1K tokens) |
| Version Control | Export workflow JSON weekly | Roll back to previous version if changes break system |
Error Handling Strategy
Implement these error handlers at critical points:
// Function node: Comprehensive error handler
try {
// Main workflow logic here
const result = await processCompany($json);
return result;
} catch (error) {
// Log error details
const errorLog = {
company_id: $json.id,
error_type: error.name,
error_message: error.message,
timestamp: new Date().toISOString(),
node_name: 'LinkedIn Enrichment',
input_data: JSON.stringify($json)
};
// Store in Airtable Error_Log table
await $('Airtable').create('Error_Log', errorLog);
// Send Slack alert for critical errors
if (error.name === 'APIRateLimitError' || error.name === 'AuthenticationError') {
await $('Slack').send({
channel: '#automation-alerts',
text: `🚨 Critical error in lead sourcing workflow: ${error.message}`
});
}
// Return partial data to continue workflow
return {
...($json),
enrichment_status: 'failed',
error: error.message,
requires_manual_review: true
};
}
Monitoring Setup
Create a separate monitoring workflow:
- Schedule Trigger: Every 5 minutes
- HTTP Request: Ping main workflow webhook to verify it's responsive
- Airtable Query: Check for companies stuck in "processing" status >30 minutes
- Function: Calculate key metrics (companies processed today, error rate, avg processing time)
- Slack: Send alert if any metric exceeds threshold
Performance Optimization
For high-volume processing (100+ companies/day):
// Function node: Batch processing
const companies = $input.all();
const batchSize = 10;
const batches = [];
for (let i = 0; i < companies.length; i += batchSize) {
batches.push(companies.slice(i, i + batchSize));
}
// Process batches with delays
for (const batch of batches) {
await processBatch(batch);
await sleep(5000); // 5-second delay between batches
}
Caching strategy:
- Cache LinkedIn profile data for 30 days (profiles don't change often)
- Cache investor scores indefinitely (Sequoia's quality doesn't change)
- Re-fetch website data weekly (careers pages update frequently)
Customization Ideas
Geographic filtering:
Add a Function node after Crunchbase import:
// Filter by geography
const targetRegions = ['San Francisco Bay Area', 'New York', 'Boston', 'Austin'];
const location = $json.location;
const isTargetRegion = targetRegions.some(region =>
location.toLowerCase().includes(region.toLowerCase())
);
if (!isTargetRegion) {
return { skip: true, reason: 'outside_target_geography' };
}
return { skip: false };
Subsector prioritization:
Modify ICP classification to boost scores for priority subsectors:
// Subsector boost
const prioritySubsectors = {
'robotics': 15, // +15 points
'climate tech': 12, // +12 points
'enterprise software': 10 // +10 points
};
const boost = prioritySubsectors[$json.subsector] || 0;
const adjustedScore = $json.total_score + boost;
Use Cases & Variations
Real-World Use Cases
Use Case 1: Venture Capital Firm Deal Sourcing
- Industry: Venture Capital
- Scale: 200 new companies/week across all funding stages
- Modifications needed:
- Add Series filter (only Series A and B)
- Score based on market size and competitive moat instead of finance team
- Integrate with Pitchbook for deeper company data
- Alert partners via email digest with investment memo template
Use Case 2: SaaS Sales Intelligence
- Industry: B2B SaaS selling to startups
- Scale: 500 companies/month, focus on Series A-C
- Modifications needed:
- Replace "finance team" detection with "uses competitor product" detection
- Add technographic data (current tech stack from BuiltWith or Clearbit)
- Score based on tech stack compatibility and growth rate
- Integrate with Salesforce to auto-create leads
Use Case 3: Recruiting Firm Candidate Sourcing
- Industry: Executive Search
- Scale: 50 high-growth companies/week
- Modifications needed:
- Focus on companies actively hiring (careers page analysis)
- Score based on funding velocity (raised 2+ rounds in 18 months)
- Extract hiring manager names from LinkedIn
- Alert recruiters when target companies post C-suite roles
Use Case 4: Market Research & Competitive Intelligence
- Industry: Strategy Consulting
- Scale: 1000+ companies tracked continuously
- Modifications needed:
- Add competitor tracking (monitor when competitors raise funding)
- Track product launches via website change detection
- Score based on market positioning and differentiation
- Generate weekly market landscape reports by subsector
Use Case 5: Partnership Development
- Industry: Enterprise Software Company
- Scale: 100 potential partners/quarter
- Modifications needed:
- Score based on complementary product fit
- Track integration partnerships (via press releases and blog scraping)
- Identify companies using similar tech stack
- Alert BD team when target companies announce new features
Customizations & Extensions
Alternative Integrations
Instead of Crunchbase:
- PitchBook: Best for deeper financial data and exit history - requires swapping HTTP Request node to PitchBook API, adding authentication header
- Dealroom: Better European startup coverage - change data source URL, adjust field mappings for different JSON structure
- AngelList: Use when targeting earlier-stage companies - API returns different funding data format, requires new parsing logic
Instead of Airtable:
- Notion Database: Better for team collaboration - use Notion API node, create linked databases for Companies/Enrichment/Scores
- PostgreSQL + Retool: Better for high-volume (10K+ companies) - replace Airtable nodes with Postgres nodes, build Retool interface for filtering
- Google Sheets: Simplest option for small teams - use Google Sheets node, but performance degrades above 5000 rows
Instead of OpenAI:
- Anthropic Claude: Better reasoning for complex classifications - swap OpenAI node for HTTP Request to Claude API, adjust prompt format
- Open-source LLMs (Llama 3): Best for cost control at scale - requires self-hosted inference server, add HTTP Request node to your endpoint
Workflow Extensions
Add automated reporting:
- Add a Schedule node to run every Monday at 8 AM
- Query Airtable for weekly metrics (new ICP-1 count, avg scores, top subsectors)
- Connect to Google Slides API to generate executive summary presentations
- Email PDF report to leadership team
- Nodes needed: +7 (Schedule, Airtable query, Function for calculations, HTTP Request to Slides API, PDF conversion, Email)
Scale to handle more data:
- Replace Google Sheets with PostgreSQL for Crunchbase data (handles 100K+ rows)
- Add batch processing: process 50 companies at a time instead of one-by-one
- Implement Redis caching layer for LinkedIn profiles (reduce API calls by 60%)
- Use n8n's Queue mode for parallel processing
- Performance improvement: 10x faster for >5000 companies (from 2 hours to 12 minutes)
Add outbound sequencing:
- Connect to Instantly.ai or Lemlist for email campaigns
- Auto-generate personalized first lines using GPT-4 based on company data
- Create separate sequences for ICP-1 (aggressive 5-touch) vs ICP-2 (gentle 3-touch)
- Track email opens and replies in Airtable
- Nodes needed: +12 (Instantly.ai API, GPT-4 for personalization, Airtable updates for tracking)
Integration possibilities:
| Add This | To Get This | Complexity |
|---|---|---|
| Clearbit Enrichment | Technographic data (tech stack, employee count) | Easy (3 nodes: HTTP Request, Function to parse, Airtable update) |
| BuiltWith API | Detect competitor product usage | Easy (2 nodes: HTTP Request, conditional logic) |
| HubSpot CRM | Auto-create deals for ICP-1 prospects | Medium (5 nodes: Airtable query, HubSpot create deal, custom properties mapping) |
| Zapier Webhooks | Connect to 1000+ other tools | Easy (1 node: Webhook trigger sends data to Zapier) |
| Meilisearch | Fast full-text search across all companies | Medium (8 nodes: Meilisearch index creation, document sync, search API) |
| Salesforce | Enterprise CRM integration | Hard (12 nodes: OAuth flow, field mapping, duplicate detection, opportunity creation) |
Advanced customization: Multi-stage scoring
Replace simple scoring with a funnel-based approach:
// Function node: Multi-stage scoring
const stages = {
awareness: {
weight: 0.2,
signals: ['website_traffic', 'social_media_presence', 'press_mentions']
},
consideration: {
weight: 0.3,
signals: ['funding_amount', 'investor_quality', 'team_size']
},
decision: {
weight: 0.5,
signals: ['no_finance_team', 'active_hiring', 'recent_funding']
}
};
let totalScore = 0;
for (const [stage, config] of Object.entries(stages)) {
const stageScore = config.signals.reduce((sum, signal) =>
sum + ($json[signal] || 0), 0
) / config.signals.length;
totalScore += stageScore * config.weight;
}
return {
total_score: Math.round(totalScore),
stage_scores: {
awareness: awarenessScore,
consideration: considerationScore,
decision: decisionScore
},
recommended_action: totalScore >= 70 ? 'immediate_outreach' :
totalScore >= 50 ? 'nurture_sequence' : 'monitor'
};
This approach weights signals differently based on where the prospect is in their buying journey. A company with recent funding and no finance team (decision stage signals) scores higher than one with great investors but a full finance team (consideration stage only).
Get Started Today
Ready to automate your startup prospecting?
- Download the template: Scroll to the bottom of this article to copy the n8n workflow JSON
- Import to n8n: Go to Workflows → Import from URL or File, paste the JSON
- Configure your services: Add your API credentials for Crunchbase, OpenAI, Airtable, and Slack
- Customize scoring logic: Adjust the founder/investor scoring weights in Function nodes to match your ICP
- Test with sample data: Run the workflow with 5 test companies to verify enrichment and classification
- Deploy to production: Set your schedule to daily at 6 AM and activate the workflow
Expected setup time: 2-4 hours for initial configuration, 1-2 hours for customization and testing.
Need help customizing this workflow for your specific needs? Schedule an intro call with Atherial at atherial.ai. We specialize in building custom automation systems for consulting firms, VCs, and B2B companies.
n8n Workflow JSON Template
{
"name": "AI Lead Sourcing & Scoring System",
"nodes": [
{
"parameters": {
"rule": {
"interval": [
{
"field": "cronExpression",
"expression": "0 6 * * *"
}
]
}
},
"name
