Building the AI Ingredient Scanner: A Multi-Agent Approach
The AI Ingredient Scanner started as an exploration into multi-agent LLM architectures and evolved into a full-stack application with mobile support and multi-language OCR.
Project Visionโ
Create an application that analyzes food and cosmetic ingredient labels, providing personalized safety assessments based on user profiles (allergies, skin type, dietary restrictions).
Phase 1: Multi-Agent Architectureโ
The Agent Designโ
Built a three-agent system, each with a specific role:
-
Research Agent: Retrieves ingredient safety data
- Primary: Qdrant vector database with pre-indexed safety information
- Fallback: Google Search for unknown ingredients
- Caches results for performance
-
Analysis Agent: Generates comprehensive reports
- Powered by Gemini 2.0 Flash
- Considers user profile for personalization
- Produces structured safety assessments
-
Critic Agent: Quality validation
- 5-gate validation system
- Checks for accuracy, completeness, and relevance
- Can request re-analysis if quality thresholds aren't met
Tech Stack (Phase 1)โ
- LLM: Google Gemini 2.0 Flash
- Vector DB: Qdrant Cloud
- Framework: LangChain + LangGraph
- UI: Streamlit
- Observability: LangSmith tracing
Key Featuresโ
- PDF export with colored safety bars
- Share via Email/WhatsApp/Twitter
- User profiles for personalized analysis
- Ingredient-by-ingredient breakdown
Phase 2: Mobile App & OCRโ
The Mobile Challengeโ
Users wanted to scan labels directly from products. This required:
- Native camera integration
- OCR for text extraction
- Multi-language support (labels aren't always in English)
Solution Architectureโ
[Mobile App] --> [FastAPI Backend] --> [Multi-Agent System]
| |
v v
[Camera/Gallery] [OCR + Translation]
React Native/Expo Implementationโ
Built the mobile app with Expo for cross-platform support:
- ImageCapture: Camera interface with gallery picker
- IngredientCard: Expandable details with safety metrics
- ProfileSelector: Allergies, skin type, preferences
- Dark/Light theme toggle
Multi-Language OCRโ
Implemented support for 9+ languages:
- Auto-detection of source language
- Translation to English for analysis
- Original text preserved in results
Languages supported: English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Chinese
FastAPI REST Backendโ
Created dedicated endpoints for mobile:
POST /ocr- Extract text from imagesPOST /analyze- Run ingredient analysis- Swagger docs at
/docs
Phase 3: Web Platform Supportโ
The Web Export Challengeโ
After building the mobile app, the next step was making it accessible via web browsers. Expo provides web support through react-native-web, but some components needed platform-specific implementations.
Platform-Specific Componentsโ
Created dual implementations for components that differ between native and web:
ImageCapture.tsx # Native: expo-camera, expo-image-picker
ImageCapture.web.tsx # Web: MediaDevices API, file input
React Native's bundler automatically selects the correct file based on platform.
Web Camera Implementationโ
The web version uses browser APIs:
navigator.mediaDevices.getUserMedia()for camera access- Falls back to file picker if camera unavailable
- Canvas API for image capture from video stream
API Environment Detectionโ
Updated the API service to auto-detect environment:
const getApiBaseUrl = (): string => {
if (Platform.OS === 'web') {
return 'https://api.zeroleaf.dev'; // Production
}
return __DEV__ ? LOCAL_IP : PRODUCTION_API;
};
Testing Suiteโ
Added comprehensive Jest tests:
- Type validation tests for API contracts
- Component rendering tests
- Theme context behavior tests
- API service tests
Browser-Specific Challengesโ
Building for web uncovered platform differences:
- Camera initialization: Browser camera requires async permission flow with loading states
- File picker: Web uses native
<input type="file">instead of expo-image-picker - Mode switching: Added
modeprop to ImageCapture for direct camera vs gallery access
Deploymentโ
| Service | Platform | URL |
|---|---|---|
| Backend API | Railway | api.zeroleaf.dev |
| Streamlit UI | Railway | ingredient-analyzer.zeroleaf.dev |
| Web App | Cloudflare Pages | scanner.zeroleaf.dev |
| Mobile | Expo Go / Native | - |
Lessons Learnedโ
-
Agent orchestration matters: The critic agent catches errors that would slip through a single-agent approach.
-
Vector DB as primary source: Faster and more reliable than web search for known ingredients.
-
Mobile-first considerations: Camera permissions, image sizing, and network handling add complexity.
-
Multi-language is hard: OCR accuracy varies by language and image quality.
-
Platform abstractions help: React Native Web makes cross-platform development feasible, but platform-specific components still need careful handling.
-
Environment detection is crucial: Automatically switching between development and production APIs reduces configuration errors.
What's Nextโ
- App store deployment (iOS/Android)
- Barcode scanning for product lookup
- Ingredient history and favorites
- Community-contributed safety data
