agc-chatbot/cf-plan.md

# Cross-Reference Tab Implementation Plan

## Overview

Create a standalone tab that allows users to upload a document, process it, and find related documents in the existing database.

## Current System Analysis

### Backend (FastAPI)

- ✅ `/search` endpoint exists - can find related documents
- ✅ `/documents` endpoint exists - can retrieve documents
- ❌ No document upload endpoint
- ❌ No document processing for uploaded files

### Frontend

- ✅ Tab system exists
- ✅ Basic cross-reference function exists (hardcoded)
- ❌ No file upload functionality
- ❌ No dedicated cross-reference tab

## Implementation Plan

### Phase 1: Backend API Extensions

#### New Endpoints Needed

1. **`POST /upload-document`**

   - Accept file upload (PDF, DOC, TXT)
   - Extract text content from uploaded file
   - Return processed text and document metadata
   - **No database storage** - temporary processing only

2. **`POST /find-cross-references`**
   - Accept processed document text
   - Use existing search functionality internally
   - Return related documents with similarity scores
   - Include cross-reference analysis

#### Leverage Existing APIs

- Use existing `/search` endpoint logic for finding related documents
- Use existing `/documents` endpoint to fetch full related documents
- Use existing database connection and document retrieval functions

### Phase 2: Frontend Implementation

#### New Tab Structure

1. **Upload Section**

   - File drop zone
   - File type validation (PDF, DOC, DOCX, TXT)
   - Upload progress indicator
   - File preview/summary

2. **Processing Section**

   - Processing status indicator
   - Document analysis summary
   - Key terms extraction display

3. **Results Section**
   - Related documents list
   - Similarity scores
   - Cross-reference details
   - Document preview capability

#### UI Components Needed

- File upload widget
- Progress bars
- Results grid/list
- Document preview modal
- Cross-reference visualization

### Phase 3: Processing Logic

#### Document Processing Pipeline

1. **File Upload & Validation**

   - Validate file type and size
   - Extract text content using appropriate libraries
   - Clean and normalize text

2. **Content Analysis**

   - Extract key terms and phrases
   - Identify legal concepts
   - Generate search queries from content

3. **Cross-Reference Matching**

   - Use existing search service (enhanced_rag_service or simple_search_service)
   - Multiple search strategies:
     - Full text similarity
     - Key terms matching
     - Legal concept matching
   - Rank results by relevance

4. **Results Processing**
   - Format cross-reference results
   - Include similarity metrics
   - Group by document type or relevance

## Technical Approach

### Backend Dependencies

```python
# New libraries needed
- python-multipart  # For file uploads
- PyPDF2 or pdfplumber  # PDF text extraction
- python-docx  # Word document processing
```

### API Strategy

**Recommendation: Create new endpoints** because:

- Current `/search` expects a text query, not document content
- Need specialized document processing logic
- Need different response format for cross-references
- Upload functionality is entirely new

### Frontend Strategy

- Add new tab to existing tab system
- Use existing styling and components where possible
- Implement file upload using HTML5 File API
- Use existing API calling patterns

## File Structure

### New Backend Files

```
embedding/
├── document_processor.py     # Handle file uploads and text extraction
├── cross_reference_service.py  # Cross-reference logic
```

### New Frontend Components

```
frontend/
├── js/
│   ├── cross-reference.js    # Cross-reference tab logic
│   └── file-upload.js        # File upload utilities
├── css/
│   └── cross-reference.css   # Specific styling
```

### API Endpoints Summary

1. **`POST /upload-document`** - New endpoint needed
2. **`POST /find-cross-references`** - New endpoint needed
3. **`GET /documents`** - Use existing
4. **`GET /documents/{id}`** - Use existing

## Development Priority

1. Backend document upload and processing
2. Cross-reference matching logic
3. Frontend tab and upload interface
4. Results display and formatting
5. Error handling and validation

## Benefits of This Approach

- Leverages existing search infrastructure
- Maintains separation of concerns
- Scalable and maintainable
- Consistent with current API patterns
- No database changes needed