AI/ML12 min read · 15 March 2026
Building a Production Multi-Agent AI Pipeline with LangChain & FastAPI
How I built ContentForge AI's 6-agent content repurposing pipeline — from YouTube transcript extraction to 27-page deep research reports — and the lessons learned shipping it to real users.
LangChainFastAPIPythonAgentsProduction
The Problem With Simple LLM Wrappers
Most tutorials show you how to call GPT-4 and get a response. That's fine for demos. But when you're building a product that needs to reliably process YouTube videos, research topics across 30 web sources, apply brand voice, and export polished Word documents — a single LLM call doesn't cut it.
This is what I learned building ContentForge AI's content repurposing pipeline.
The Architecture
The pipeline uses 6 specialised agents chained together:
python
from langchain.agents import AgentExecutor
from langchain_openai import ChatOpenAI
class ContentPipeline:
def __init__(self):
self.transcript_agent = TranscriptAgent()
self.research_agent = DeepResearchAgent()
self.writer_agent = WriterAgent()
self.qa_agent = QAAgent()
self.brand_agent = BrandVaultAgent()
self.competitor_agent = CompetitorAgent()
async def run(self, youtube_url: str, platform: str) -> dict:
# Stage 1: Extract transcript
transcript = await self.transcript_agent.extract(youtube_url)
# Stage 2: Parallel research (6 web searches simultaneously)
research = await self.research_agent.research(transcript.topics)
# Stage 3: Write platform-specific content
draft = await self.writer_agent.write(transcript, research, platform)
# Stage 4: QA check
reviewed = await self.qa_agent.review(draft)
# Stage 5: Apply brand voice
branded = await self.brand_agent.apply(reviewed)
return branded
The YouTube Proxy Problem
Railway's IPs are blocked by YouTube. I spent 3 days debugging this before discovering Webshare residential proxies:
python
import yt_dlp
def extract_transcript(url: str) -> str:
ydl_opts = {
'proxy': f'http://{WEBSHARE_USER}:{WEBSHARE_PASS}@proxy.webshare.io:80',
'writeautomaticsub': True,
'subtitleslangs': ['en'],
'skip_download': True,
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
info = ydl.extract_info(url, download=False)
return info.get('subtitles', {})
WebSocket Stability
The pipeline takes 2-4 minutes. Users need real-time progress updates. I use FastAPI WebSockets with heartbeats:
python
@app.websocket("/ws/pipeline/{task_id}")
async def pipeline_ws(websocket: WebSocket, task_id: str):
await websocket.accept()
async def send_progress(stage: str, pct: int):
await websocket.send_json({
"stage": stage,
"progress": pct,
"timestamp": datetime.utcnow().isoformat()
})
try:
result = await pipeline.run(task_id, progress_callback=send_progress)
await websocket.send_json({"status": "complete", "result": result})
except Exception as e:
await websocket.send_json({"status": "error", "message": str(e)})
finally:
await websocket.close()
Key Lessons
- Always add retry logic — LLM APIs fail. Wrap every call with exponential backoff.
- Parallel where possible — Research across 6 topics simultaneously cuts time by 70%.
- Cache aggressively — Same YouTube URL shouldn't re-process. Use Redis with a URL hash as key.
- Separate concerns — Each agent has one job. Don't let the writer agent also do research.
- Log everything — When something fails at 3am, you need to know exactly which agent failed and why.
The full ContentForge AI pipeline is live at contentforge.net. It's processing hundreds of videos weekly with ~94% success rate.