zEt/n;h-b& !P JyDs>m g+a^U9O3
Knowledge Search with RAG & Bedrock
Unified search across 21+ enterprise data sources (Slack, Teams, Confluence, Google Drive, Jira) with hybrid RAG, cited AI answers, and automatic failover across 100+ LLM models.
Lead architect and infrastructure engineer. Hub-and-spoke AWS deployment, cross-account Bedrock access, Terraform + Azure DevOps CI/CD across DEV/STG/PRD.
ARCHITECTURE

HIGHLIGHTS
Vector similarity (pgvector) and full-text search (tsvector) run in parallel, merged with Reciprocal Rank Fusion. Optional reranking via Pinecone, Cohere, or Flashrank
LiteLLM sits in front of 100+ models. If Claude is rate-limited, the request goes to GPT-4o without the user noticing. Cooldown logic prevents hammering failing providers
ECS tasks in spoke accounts assume IAM roles in a separate AI account via STS with external ID validation. Credentials rotate hourly. 4 AWS accounts total
Answers stream in with source citations. Chunk IDs are preserved through the pipeline so users can click through to the original document