# Cross-Reference Tab Implementation Plan ## Overview Create a standalone tab that allows users to upload a document, process it, and find related documents in the existing database. ## Current System Analysis ### Backend (FastAPI) - ✅ `/search` endpoint exists - can find related documents - ✅ `/documents` endpoint exists - can retrieve documents - ❌ No document upload endpoint - ❌ No document processing for uploaded files ### Frontend - ✅ Tab system exists - ✅ Basic cross-reference function exists (hardcoded) - ❌ No file upload functionality - ❌ No dedicated cross-reference tab ## Implementation Plan ### Phase 1: Backend API Extensions #### New Endpoints Needed 1. **`POST /upload-document`** - Accept file upload (PDF, DOC, TXT) - Extract text content from uploaded file - Return processed text and document metadata - **No database storage** - temporary processing only 2. **`POST /find-cross-references`** - Accept processed document text - Use existing search functionality internally - Return related documents with similarity scores - Include cross-reference analysis #### Leverage Existing APIs - Use existing `/search` endpoint logic for finding related documents - Use existing `/documents` endpoint to fetch full related documents - Use existing database connection and document retrieval functions ### Phase 2: Frontend Implementation #### New Tab Structure 1. **Upload Section** - File drop zone - File type validation (PDF, DOC, DOCX, TXT) - Upload progress indicator - File preview/summary 2. **Processing Section** - Processing status indicator - Document analysis summary - Key terms extraction display 3. **Results Section** - Related documents list - Similarity scores - Cross-reference details - Document preview capability #### UI Components Needed - File upload widget - Progress bars - Results grid/list - Document preview modal - Cross-reference visualization ### Phase 3: Processing Logic #### Document Processing Pipeline 1. **File Upload & Validation** - Validate file type and size - Extract text content using appropriate libraries - Clean and normalize text 2. **Content Analysis** - Extract key terms and phrases - Identify legal concepts - Generate search queries from content 3. **Cross-Reference Matching** - Use existing search service (enhanced_rag_service or simple_search_service) - Multiple search strategies: - Full text similarity - Key terms matching - Legal concept matching - Rank results by relevance 4. **Results Processing** - Format cross-reference results - Include similarity metrics - Group by document type or relevance ## Technical Approach ### Backend Dependencies ```python # New libraries needed - python-multipart # For file uploads - PyPDF2 or pdfplumber # PDF text extraction - python-docx # Word document processing ``` ### API Strategy **Recommendation: Create new endpoints** because: - Current `/search` expects a text query, not document content - Need specialized document processing logic - Need different response format for cross-references - Upload functionality is entirely new ### Frontend Strategy - Add new tab to existing tab system - Use existing styling and components where possible - Implement file upload using HTML5 File API - Use existing API calling patterns ## File Structure ### New Backend Files ``` embedding/ ├── document_processor.py # Handle file uploads and text extraction ├── cross_reference_service.py # Cross-reference logic ``` ### New Frontend Components ``` frontend/ ├── js/ │ ├── cross-reference.js # Cross-reference tab logic │ └── file-upload.js # File upload utilities ├── css/ │ └── cross-reference.css # Specific styling ``` ### API Endpoints Summary 1. **`POST /upload-document`** - New endpoint needed 2. **`POST /find-cross-references`** - New endpoint needed 3. **`GET /documents`** - Use existing 4. **`GET /documents/{id}`** - Use existing ## Development Priority 1. Backend document upload and processing 2. Cross-reference matching logic 3. Frontend tab and upload interface 4. Results display and formatting 5. Error handling and validation ## Benefits of This Approach - Leverages existing search infrastructure - Maintains separation of concerns - Scalable and maintainable - Consistent with current API patterns - No database changes needed