This commit is contained in:
ammarhamzi2019278344 2025-06-04 15:04:53 +08:00
parent b84bdf17f0
commit cb2a56f70d
3 changed files with 199 additions and 1 deletions

178
cf-plan.md Normal file
View File

@ -0,0 +1,178 @@
# Cross-Reference Tab Implementation Plan
## Overview
Create a standalone tab that allows users to upload a document, process it, and find related documents in the existing database.
## Current System Analysis
### Backend (FastAPI)
- ✅ `/search` endpoint exists - can find related documents
- ✅ `/documents` endpoint exists - can retrieve documents
- ❌ No document upload endpoint
- ❌ No document processing for uploaded files
### Frontend
- ✅ Tab system exists
- ✅ Basic cross-reference function exists (hardcoded)
- ❌ No file upload functionality
- ❌ No dedicated cross-reference tab
## Implementation Plan
### Phase 1: Backend API Extensions
#### New Endpoints Needed
1. **`POST /upload-document`**
- Accept file upload (PDF, DOC, TXT)
- Extract text content from uploaded file
- Return processed text and document metadata
- **No database storage** - temporary processing only
2. **`POST /find-cross-references`**
- Accept processed document text
- Use existing search functionality internally
- Return related documents with similarity scores
- Include cross-reference analysis
#### Leverage Existing APIs
- Use existing `/search` endpoint logic for finding related documents
- Use existing `/documents` endpoint to fetch full related documents
- Use existing database connection and document retrieval functions
### Phase 2: Frontend Implementation
#### New Tab Structure
1. **Upload Section**
- File drop zone
- File type validation (PDF, DOC, DOCX, TXT)
- Upload progress indicator
- File preview/summary
2. **Processing Section**
- Processing status indicator
- Document analysis summary
- Key terms extraction display
3. **Results Section**
- Related documents list
- Similarity scores
- Cross-reference details
- Document preview capability
#### UI Components Needed
- File upload widget
- Progress bars
- Results grid/list
- Document preview modal
- Cross-reference visualization
### Phase 3: Processing Logic
#### Document Processing Pipeline
1. **File Upload & Validation**
- Validate file type and size
- Extract text content using appropriate libraries
- Clean and normalize text
2. **Content Analysis**
- Extract key terms and phrases
- Identify legal concepts
- Generate search queries from content
3. **Cross-Reference Matching**
- Use existing search service (enhanced_rag_service or simple_search_service)
- Multiple search strategies:
- Full text similarity
- Key terms matching
- Legal concept matching
- Rank results by relevance
4. **Results Processing**
- Format cross-reference results
- Include similarity metrics
- Group by document type or relevance
## Technical Approach
### Backend Dependencies
```python
# New libraries needed
- python-multipart # For file uploads
- PyPDF2 or pdfplumber # PDF text extraction
- python-docx # Word document processing
```
### API Strategy
**Recommendation: Create new endpoints** because:
- Current `/search` expects a text query, not document content
- Need specialized document processing logic
- Need different response format for cross-references
- Upload functionality is entirely new
### Frontend Strategy
- Add new tab to existing tab system
- Use existing styling and components where possible
- Implement file upload using HTML5 File API
- Use existing API calling patterns
## File Structure
### New Backend Files
```
embedding/
├── document_processor.py # Handle file uploads and text extraction
├── cross_reference_service.py # Cross-reference logic
```
### New Frontend Components
```
frontend/
├── js/
│ ├── cross-reference.js # Cross-reference tab logic
│ └── file-upload.js # File upload utilities
├── css/
│ └── cross-reference.css # Specific styling
```
### API Endpoints Summary
1. **`POST /upload-document`** - New endpoint needed
2. **`POST /find-cross-references`** - New endpoint needed
3. **`GET /documents`** - Use existing
4. **`GET /documents/{id}`** - Use existing
## Development Priority
1. Backend document upload and processing
2. Cross-reference matching logic
3. Frontend tab and upload interface
4. Results display and formatting
5. Error handling and validation
## Benefits of This Approach
- Leverages existing search infrastructure
- Maintains separation of concerns
- Scalable and maintainable
- Consistent with current API patterns
- No database changes needed

20
demoEnv.txt Normal file
View File

@ -0,0 +1,20 @@
# MySQL Database Configuration
MYSQL_HOST=47.130.80.140
MYSQL_PORT=3333
MYSQL_USER=root
MYSQL_PASSWORD=1ibL5A5cGevvM7Ax0ZDqyKXQTHMlEW5D5hwG6OcR7KPF77kMkEfxFEbLDtwzr6Ci
MYSQL_DATABASE=agc
# App Configuration
DEBUG=True
STREAMLIT_SERVER_PORT=8501
# OpenAI API Configuration
OPENAI_API_KEY=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
OPENAI_CHAT_MODEL=gpt-4o
# Application Settings
MAX_SEARCH_RESULTS=5
SIMILARITY_THRESHOLD=0.7

View File

@ -2267,7 +2267,7 @@
<div class="footer-bottom">
<p>
&copy; 2023 Attorney General's Chambers of Malaysia. All Rights
&copy; 2025 AGC Malaysia. All Rights
Reserved.
</p>
</div>