Prathamesh's Portfolio

Python FastAPI LLM Vector Search Document Analysis OCR Prompt Engineering CUDA Parameter Optimization WebSockets

Project Overview

AlyaAloft is a sophisticated PDF document analysis and question-answering application that I built from the ground up, leveraging advanced Large Language Models and vector similarity search to provide detailed responses to user queries about PDF documents. The system extracts, processes, and analyzes PDF content in depth, enabling users to ask natural language questions and receive comprehensive, context-aware answers.

Key Features

Document Processing Pipeline:
- Implemented text extraction from PDFs with OCR support for scanned documents
- Developed document structure analysis to identify sections and hierarchies
- Created semantic chunking algorithm for context preservation during retrieval
- Built vector embedding storage system for efficient similarity search
AI Model Optimization:
- Thoroughly evaluated multiple model options including Mistral-7B before selecting Flan-T5-Base for optimal balance of quality and resource usage
- Implemented advanced parameter optimization to improve T5 performance for document QA tasks
- Designed and fine-tuned specialized prompt templates for different query types
- Optimized GPU acceleration to reduce VRAM usage by over 60% compared to larger models
System Architecture:
- Designed modular, layered architecture with clear separation of concerns
- Built real-time WebSocket communication for interactive responses
- Developed asynchronous processing pipeline with FastAPI for concurrency
- Implemented comprehensive error handling and fallback mechanisms for system resilience

Video Preview

Technical Implementation

I architected AlyaAloft with a four-layer architecture that separates concerns and provides clear responsibility boundaries:

Frontend Layer: Web UI and REST API clients for user interaction
API Layer: FastAPI endpoints for document upload, queries, and WebSocket connections
Service Layer: Document processing, AI response generation, and structure analysis components
Storage Layer: JSON-based storage for documents, vector embeddings, and chat history

AI Model Integration

I integrated and optimized AI models for different aspects of document processing, making critical decisions about model selection and optimization:

Model Selection Journey: Initially evaluated larger models like Mistral-7B, but discovered that T5 provided superior performance-to-resource ratio for our specific document QA tasks, reducing GPU memory usage from 4GB+ to under 2GB
Flan-T5-Base: Optimized this versatile model through extensive parameter tuning, achieving excellent question-answering performance while maintaining reasonable resource requirements
Sentence Transformers: Implemented for generating semantic text embeddings for efficient retrieval
Advanced Query Recognition: Developed algorithms to automatically detect query types and select appropriate processing strategies

Technical Challenges Solved

In developing AlyaAloft, I overcame significant technical challenges:

Memory Optimization: Reduced the memory footprint by over 60% through strategic model selection and parameter optimization, enabling operation on systems with limited GPU resources
Context Window Management: Developed algorithms to intelligently select and merge document chunks to stay within model context windows while preserving semantic meaning
Prompt Engineering: Created sophisticated prompt templates with chain-of-thought reasoning to improve response quality and factual accuracy
Concurrent Processing: Implemented asynchronous operations for document processing and response generation to maintain UI responsiveness
Hardware Adaptation: Designed a flexible system that can adapt to different hardware configurations, from CPU-only to GPU-accelerated environments

Results & Innovations

Through this project, I achieved significant innovations in document processing and AI-driven information retrieval:

Created a unified document understanding system that combines OCR, structure analysis, and semantic search
Developed novel prompt engineering techniques for improved reasoning and answer quality
Built robust fallback mechanisms to handle various edge cases and error conditions
Achieved excellent performance on consumer hardware through careful model selection and optimization
Demonstrated expertise in end-to-end AI system development, from data processing to model optimization to user interface

GitHub Documentation