Independent Project · Nov 2023 – Dec 2024

Channel AI

A conversational business intelligence platform enabling non-technical users to query enterprise data through natural language

LangGraphOpenAI Agents SDKApache IcebergSQLRAGLlamaIndexQdrantWhatsApp Business APIAWSAzure

The Problem

In enterprise environments, accessing business intelligence typically requires either technical SQL expertise or waiting days for analysts to generate reports. Small businesses face similar challenges without the luxury of dedicated analytics teams. This creates a fundamental barrier between decision-makers and their data.

I identified this gap during my final undergraduate year and continued developing the solution alongside my full-time role at Baxter. The goal was simple: make enterprise data accessible through conversation, reducing what typically takes days into minutes.

Technical Architecture

Multi-Agent Orchestration

At the core of Channel AI is a multi-agent orchestration system built using LangGraph and the OpenAI Agents SDK. Rather than relying on a single LLM to handle all tasks, I designed a system where specialized agents collaborate:

This architecture allows each agent to specialize in its domain while maintaining contextual awareness through the LangGraph orchestration layer.

Unified Data Layer with Apache Iceberg

One of the most significant technical challenges was handling heterogeneous data sources across different enterprises. I chose Apache Iceberg as the unified data layer because it provides:

This foundation enabled the system to support multi-million row OLAP workloads with sub-20s query latency, even when dealing with complex joins and aggregations.

RAG-Enhanced Schema Understanding

Understanding enterprise schemas goes beyond just knowing table structures. I implemented a RAG (Retrieval-Augmented Generation) system using LlamaIndex and Qdrant to:

This approach dramatically improved the accuracy of SQL generation, especially for domain-specific queries that required understanding business context beyond raw schema information.

Performance Optimizations

Semantic Caching

Rather than using exact-match caching, I implemented semantic caching that recognizes when queries are functionally equivalent, even if phrased differently. This reduced redundant database hits and LLM calls by ~60% in production.

Optimized SQL Templates

Through analysis of thousands of generated queries across pilot deployments, I identified common patterns and created optimized SQL templates. These templates:

Persona-Based Access Control

Security was critical, especially for enterprise deployments. I implemented persona-based access control that ensures users only see data they're authorized to access, with row-level security enforced at the SQL generation stage.

Deployment & Validation

Enterprise Pilots

I conducted 4 enterprise pilot evaluations across retail and hospitality sectors. Key learnings:

WhatsApp Business API Integration

For small businesses, I deployed 12 lightweight instances accessible via WhatsApp. This removed the friction of learning new software—business owners could literally ask questions about their sales or inventory through a familiar messaging interface.

The WhatsApp integration proved particularly successful in emerging markets where business owners are comfortable with messaging but resistant to complex software interfaces.

Impact & Results

Key Metrics

  • Reporting turnaround: Reduced from ~2 days to under 30 minutes
  • Query latency: Sub-20s for multi-million row OLAP workloads
  • Cache hit rate: ~60% through semantic caching
  • Accuracy: Cross-domain schema generalization validated across 16 pilot instances

Technical Challenges & Solutions

Challenge 1: Schema Complexity

Enterprise schemas are rarely clean. I encountered scenarios with hundreds of tables, ambiguous naming conventions, and undocumented relationships.

Solution: Built a schema profiling system that automatically discovers relationships through foreign key analysis, naming pattern matching, and statistical correlation. This metadata feeds into the RAG system, improving query accuracy over time.

Challenge 2: Natural Language Ambiguity

Business users often ask ambiguous questions. "Show me sales" could mean total revenue, unit sales, specific products, or trending data over time.

Solution: Implemented a clarification agent that identifies ambiguous queries and asks targeted follow-up questions before generating SQL. This reduced incorrect query generations by ~75%.

Challenge 3: Cost Management

Running multiple LLM agents for every query can become expensive at scale.

Solution: Hybrid approach combining smaller, fine-tuned models for routine tasks (schema lookup, validation) with frontier models (GPT-4) only for complex reasoning and generation. This reduced per-query cost by ~80% while maintaining quality.

Lessons Learned

Building Channel AI while working full-time taught me valuable lessons about product development and technical architecture:

  1. Start with the data layer: Investing time in a robust data foundation (Apache Iceberg) paid massive dividends later in query performance and reliability.
  2. User feedback is gold: The WhatsApp interface wasn't in the original design. It emerged from observing how small business owners actually wanted to interact with their data.
  3. Specialization over generalization: Multi-agent architecture outperformed single-model approaches because each agent could focus on doing one thing exceptionally well.
  4. Performance is a feature: Sub-20s query latency was critical. If users had to wait 60+ seconds, they'd go back to requesting reports from analysts.

Future Direction

While the pilot deployments validated the core concept, several areas remain for future development:

Technical Deep Dive Resources

For those interested in the technical implementation details:

INTERESTED IN LEARNING MORE?

I'm happy to discuss the technical architecture, share lessons learned, or explore potential collaborations.

Get in Touch