ammarhamzi2019278344 cb2a56f70d env file

2025-06-04 15:04:53 +08:00

4.4 KiB

Raw Permalink Blame History

Cross-Reference Tab Implementation Plan

Overview

Create a standalone tab that allows users to upload a document, process it, and find related documents in the existing database.

Current System Analysis

Backend (FastAPI)

✅ /search endpoint exists - can find related documents
✅ /documents endpoint exists - can retrieve documents
❌ No document upload endpoint
❌ No document processing for uploaded files

Frontend

✅ Tab system exists
✅ Basic cross-reference function exists (hardcoded)
❌ No file upload functionality
❌ No dedicated cross-reference tab

Implementation Plan

Phase 1: Backend API Extensions

New Endpoints Needed

POST /upload-document
- Accept file upload (PDF, DOC, TXT)
- Extract text content from uploaded file
- Return processed text and document metadata
- No database storage - temporary processing only
POST /find-cross-references
- Accept processed document text
- Use existing search functionality internally
- Return related documents with similarity scores
- Include cross-reference analysis

Leverage Existing APIs

Use existing /search endpoint logic for finding related documents
Use existing /documents endpoint to fetch full related documents
Use existing database connection and document retrieval functions

Phase 2: Frontend Implementation

New Tab Structure

Upload Section
- File drop zone
- File type validation (PDF, DOC, DOCX, TXT)
- Upload progress indicator
- File preview/summary
Processing Section
- Processing status indicator
- Document analysis summary
- Key terms extraction display
Results Section
- Related documents list
- Similarity scores
- Cross-reference details
- Document preview capability

UI Components Needed

File upload widget
Progress bars
Results grid/list
Document preview modal
Cross-reference visualization

Phase 3: Processing Logic

Document Processing Pipeline

File Upload & Validation
- Validate file type and size
- Extract text content using appropriate libraries
- Clean and normalize text
Content Analysis
- Extract key terms and phrases
- Identify legal concepts
- Generate search queries from content
Cross-Reference Matching
- Use existing search service (enhanced_rag_service or simple_search_service)
- Multiple search strategies:
  - Full text similarity
  - Key terms matching
  - Legal concept matching
- Rank results by relevance
Results Processing
- Format cross-reference results
- Include similarity metrics
- Group by document type or relevance

Technical Approach

Backend Dependencies

# New libraries needed
- python-multipart  # For file uploads
- PyPDF2 or pdfplumber  # PDF text extraction
- python-docx  # Word document processing

API Strategy

Recommendation: Create new endpoints because:

Current /search expects a text query, not document content
Need specialized document processing logic
Need different response format for cross-references
Upload functionality is entirely new

Frontend Strategy

Add new tab to existing tab system
Use existing styling and components where possible
Implement file upload using HTML5 File API
Use existing API calling patterns

File Structure

New Backend Files

embedding/
├── document_processor.py     # Handle file uploads and text extraction
├── cross_reference_service.py  # Cross-reference logic

New Frontend Components

frontend/
├── js/
│   ├── cross-reference.js    # Cross-reference tab logic
│   └── file-upload.js        # File upload utilities
├── css/
│   └── cross-reference.css   # Specific styling

API Endpoints Summary

POST /upload-document - New endpoint needed
POST /find-cross-references - New endpoint needed
GET /documents - Use existing
GET /documents/{id} - Use existing

Development Priority

Backend document upload and processing
Cross-reference matching logic
Frontend tab and upload interface
Results display and formatting
Error handling and validation

Benefits of This Approach

Leverages existing search infrastructure
Maintains separation of concerns
Scalable and maintainable
Consistent with current API patterns
No database changes needed

4.4 KiB Raw Permalink Blame History