179 lines
4.4 KiB
Markdown
179 lines
4.4 KiB
Markdown
# Cross-Reference Tab Implementation Plan
|
|
|
|
## Overview
|
|
|
|
Create a standalone tab that allows users to upload a document, process it, and find related documents in the existing database.
|
|
|
|
## Current System Analysis
|
|
|
|
### Backend (FastAPI)
|
|
|
|
- ✅ `/search` endpoint exists - can find related documents
|
|
- ✅ `/documents` endpoint exists - can retrieve documents
|
|
- ❌ No document upload endpoint
|
|
- ❌ No document processing for uploaded files
|
|
|
|
### Frontend
|
|
|
|
- ✅ Tab system exists
|
|
- ✅ Basic cross-reference function exists (hardcoded)
|
|
- ❌ No file upload functionality
|
|
- ❌ No dedicated cross-reference tab
|
|
|
|
## Implementation Plan
|
|
|
|
### Phase 1: Backend API Extensions
|
|
|
|
#### New Endpoints Needed
|
|
|
|
1. **`POST /upload-document`**
|
|
|
|
- Accept file upload (PDF, DOC, TXT)
|
|
- Extract text content from uploaded file
|
|
- Return processed text and document metadata
|
|
- **No database storage** - temporary processing only
|
|
|
|
2. **`POST /find-cross-references`**
|
|
- Accept processed document text
|
|
- Use existing search functionality internally
|
|
- Return related documents with similarity scores
|
|
- Include cross-reference analysis
|
|
|
|
#### Leverage Existing APIs
|
|
|
|
- Use existing `/search` endpoint logic for finding related documents
|
|
- Use existing `/documents` endpoint to fetch full related documents
|
|
- Use existing database connection and document retrieval functions
|
|
|
|
### Phase 2: Frontend Implementation
|
|
|
|
#### New Tab Structure
|
|
|
|
1. **Upload Section**
|
|
|
|
- File drop zone
|
|
- File type validation (PDF, DOC, DOCX, TXT)
|
|
- Upload progress indicator
|
|
- File preview/summary
|
|
|
|
2. **Processing Section**
|
|
|
|
- Processing status indicator
|
|
- Document analysis summary
|
|
- Key terms extraction display
|
|
|
|
3. **Results Section**
|
|
- Related documents list
|
|
- Similarity scores
|
|
- Cross-reference details
|
|
- Document preview capability
|
|
|
|
#### UI Components Needed
|
|
|
|
- File upload widget
|
|
- Progress bars
|
|
- Results grid/list
|
|
- Document preview modal
|
|
- Cross-reference visualization
|
|
|
|
### Phase 3: Processing Logic
|
|
|
|
#### Document Processing Pipeline
|
|
|
|
1. **File Upload & Validation**
|
|
|
|
- Validate file type and size
|
|
- Extract text content using appropriate libraries
|
|
- Clean and normalize text
|
|
|
|
2. **Content Analysis**
|
|
|
|
- Extract key terms and phrases
|
|
- Identify legal concepts
|
|
- Generate search queries from content
|
|
|
|
3. **Cross-Reference Matching**
|
|
|
|
- Use existing search service (enhanced_rag_service or simple_search_service)
|
|
- Multiple search strategies:
|
|
- Full text similarity
|
|
- Key terms matching
|
|
- Legal concept matching
|
|
- Rank results by relevance
|
|
|
|
4. **Results Processing**
|
|
- Format cross-reference results
|
|
- Include similarity metrics
|
|
- Group by document type or relevance
|
|
|
|
## Technical Approach
|
|
|
|
### Backend Dependencies
|
|
|
|
```python
|
|
# New libraries needed
|
|
- python-multipart # For file uploads
|
|
- PyPDF2 or pdfplumber # PDF text extraction
|
|
- python-docx # Word document processing
|
|
```
|
|
|
|
### API Strategy
|
|
|
|
**Recommendation: Create new endpoints** because:
|
|
|
|
- Current `/search` expects a text query, not document content
|
|
- Need specialized document processing logic
|
|
- Need different response format for cross-references
|
|
- Upload functionality is entirely new
|
|
|
|
### Frontend Strategy
|
|
|
|
- Add new tab to existing tab system
|
|
- Use existing styling and components where possible
|
|
- Implement file upload using HTML5 File API
|
|
- Use existing API calling patterns
|
|
|
|
## File Structure
|
|
|
|
### New Backend Files
|
|
|
|
```
|
|
embedding/
|
|
├── document_processor.py # Handle file uploads and text extraction
|
|
├── cross_reference_service.py # Cross-reference logic
|
|
```
|
|
|
|
### New Frontend Components
|
|
|
|
```
|
|
frontend/
|
|
├── js/
|
|
│ ├── cross-reference.js # Cross-reference tab logic
|
|
│ └── file-upload.js # File upload utilities
|
|
├── css/
|
|
│ └── cross-reference.css # Specific styling
|
|
```
|
|
|
|
### API Endpoints Summary
|
|
|
|
1. **`POST /upload-document`** - New endpoint needed
|
|
2. **`POST /find-cross-references`** - New endpoint needed
|
|
3. **`GET /documents`** - Use existing
|
|
4. **`GET /documents/{id}`** - Use existing
|
|
|
|
## Development Priority
|
|
|
|
1. Backend document upload and processing
|
|
2. Cross-reference matching logic
|
|
3. Frontend tab and upload interface
|
|
4. Results display and formatting
|
|
5. Error handling and validation
|
|
|
|
## Benefits of This Approach
|
|
|
|
- Leverages existing search infrastructure
|
|
- Maintains separation of concerns
|
|
- Scalable and maintainable
|
|
- Consistent with current API patterns
|
|
- No database changes needed
|