AI/ML12 min read · 15 March 2026

Building a Production Multi-Agent AI Pipeline with LangChain & FastAPI

How I built ContentForge AI's 6-agent content repurposing pipeline — from YouTube transcript extraction to 27-page deep research reports — and the lessons learned shipping it to real users.

LangChainFastAPIPythonAgentsProduction

The Problem With Simple LLM Wrappers

Most tutorials show you how to call GPT-4 and get a response. That's fine for demos. But when you're building a product that needs to reliably process YouTube videos, research topics across 30 web sources, apply brand voice, and export polished Word documents — a single LLM call doesn't cut it.

This is what I learned building ContentForge AI's content repurposing pipeline.

The Architecture

The pipeline uses 6 specialised agents chained together:

python
from langchain.agents import AgentExecutor
from langchain_openai import ChatOpenAI

class ContentPipeline:
    def __init__(self):
        self.transcript_agent = TranscriptAgent()
        self.research_agent = DeepResearchAgent()
        self.writer_agent = WriterAgent()
        self.qa_agent = QAAgent()
        self.brand_agent = BrandVaultAgent()
        self.competitor_agent = CompetitorAgent()
    
    async def run(self, youtube_url: str, platform: str) -> dict:
        # Stage 1: Extract transcript
        transcript = await self.transcript_agent.extract(youtube_url)
        
        # Stage 2: Parallel research (6 web searches simultaneously)
        research = await self.research_agent.research(transcript.topics)
        
        # Stage 3: Write platform-specific content
        draft = await self.writer_agent.write(transcript, research, platform)
        
        # Stage 4: QA check
        reviewed = await self.qa_agent.review(draft)
        
        # Stage 5: Apply brand voice
        branded = await self.brand_agent.apply(reviewed)
        
        return branded

The YouTube Proxy Problem

Railway's IPs are blocked by YouTube. I spent 3 days debugging this before discovering Webshare residential proxies:

python
import yt_dlp

def extract_transcript(url: str) -> str:
    ydl_opts = {
        'proxy': f'http://{WEBSHARE_USER}:{WEBSHARE_PASS}@proxy.webshare.io:80',
        'writeautomaticsub': True,
        'subtitleslangs': ['en'],
        'skip_download': True,
    }
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        info = ydl.extract_info(url, download=False)
        return info.get('subtitles', {})

WebSocket Stability

The pipeline takes 2-4 minutes. Users need real-time progress updates. I use FastAPI WebSockets with heartbeats:

python
@app.websocket("/ws/pipeline/{task_id}")
async def pipeline_ws(websocket: WebSocket, task_id: str):
    await websocket.accept()
    
    async def send_progress(stage: str, pct: int):
        await websocket.send_json({
            "stage": stage, 
            "progress": pct,
            "timestamp": datetime.utcnow().isoformat()
        })
    
    try:
        result = await pipeline.run(task_id, progress_callback=send_progress)
        await websocket.send_json({"status": "complete", "result": result})
    except Exception as e:
        await websocket.send_json({"status": "error", "message": str(e)})
    finally:
        await websocket.close()

Key Lessons

  1. Always add retry logic — LLM APIs fail. Wrap every call with exponential backoff.
  2. Parallel where possible — Research across 6 topics simultaneously cuts time by 70%.
  3. Cache aggressively — Same YouTube URL shouldn't re-process. Use Redis with a URL hash as key.
  4. Separate concerns — Each agent has one job. Don't let the writer agent also do research.
  5. Log everything — When something fails at 3am, you need to know exactly which agent failed and why.

The full ContentForge AI pipeline is live at contentforge.net. It's processing hundreds of videos weekly with ~94% success rate.

MH
Mahmudul Hassan Mithun
AI SaaS Builder · BSc Data Science & AI, UEL · Building ContentForge AI

Related Posts

Deploying FastAPI to Railway: The Production Checklist
Deploying FastAPI to Railway: The Production Checklist
8 min read →
Achieving 94% Accuracy with YOLOv8 for Real-Time Traffic Management
Achieving 94% Accuracy with YOLOv8 for Real-Time Traffic Management
11 min read →