# AGC Document Search System A comprehensive AI-powered document search and chat system for the Attorney General's Chambers (AGC) of Malaysia. This system provides intelligent document discovery, natural language search capabilities, and an AI assistant for legal research and analysis. ## ๐Ÿ—๏ธ Project Overview The AGC Document Search System is designed for Malaysian government legal staff, prosecutors, and legal researchers to efficiently search, browse, and interact with legal documents including: - **LKK (Laporan Keputusan Kes)** - Case Decision Reports - **Legal Cases** - Various legal precedents and case law - **Criminal Cases** - Criminal law documents and decisions - **Government Legal Documents** - Various AGC documents and legal resources ## ๐Ÿš€ Key Features ### ๐Ÿ” Advanced Search Capabilities - **AI-Enhanced Search**: OpenAI-powered semantic search with query enhancement - **Natural Language Processing**: Ask questions in plain language - **Relevancy Scoring**: Documents ranked by similarity and relevance - **Query Enhancement**: AI automatically improves search queries for better results - **Search History**: Track and revisit previous searches ### ๐Ÿ“š Document Management - **Smart Document Browser**: Grid and list view options with advanced filtering - **Document Types Filtering**: Filter by Legal, Criminal, LKK, and other types - **Metadata Display**: Comprehensive document information and classification - **Document Viewer**: Rich document display with structured content presentation - **Bookmark System**: Save and organize important documents ### ๐Ÿค– AI Assistant - **Legal Chat Interface**: Interactive AI assistant for legal queries - **Document Analysis**: AI-powered document summaries and key point extraction - **Legal Concept Explanations**: Get definitions and explanations of legal terms - **Cross-Reference Analysis**: Find related documents and cases - **Precedent Search**: Locate relevant legal precedents ### ๐ŸŽจ Modern User Interface - **Responsive Design**: Optimized for desktop, tablet, and mobile devices - **Government-Appropriate Styling**: Professional navy and gold color scheme - **Accessibility Compliant**: WCAG 2.1 AA standards - **Smooth Animations**: Modern micro-interactions and transitions - **Inter Font Family**: Modern, readable typography ## ๐Ÿ›๏ธ System Architecture ``` AGC Document Search System โ”œโ”€โ”€ Frontend (Web Interface) โ”‚ โ”œโ”€โ”€ HTML5 with modern CSS3 โ”‚ โ”œโ”€โ”€ Vanilla JavaScript (ES6+) โ”‚ โ””โ”€โ”€ Responsive design with animations โ”œโ”€โ”€ Backend API (FastAPI) โ”‚ โ”œโ”€โ”€ RESTful API endpoints โ”‚ โ”œโ”€โ”€ Document retrieval and search โ”‚ โ””โ”€โ”€ AI integration layer โ”œโ”€โ”€ Database (MySQL) โ”‚ โ”œโ”€โ”€ Document storage and metadata โ”‚ โ”œโ”€โ”€ Search history and logs โ”‚ โ””โ”€โ”€ Vector embeddings for AI search โ”œโ”€โ”€ AI/ML Components โ”‚ โ”œโ”€โ”€ OpenAI integration for enhanced search โ”‚ โ”œโ”€โ”€ Text embeddings and similarity matching โ”‚ โ””โ”€โ”€ Fallback simple search service โ””โ”€โ”€ Additional Interfaces โ”œโ”€โ”€ Streamlit dashboard (app.py) โ””โ”€โ”€ Direct API testing tools ``` ## ๐Ÿ“ Project Structure ``` agc-chatbot/ โ”œโ”€โ”€ ๐Ÿ“„ Frontend โ”‚ โ”œโ”€โ”€ index.html # Main web interface โ”‚ โ”œโ”€โ”€ js/ โ”‚ โ”‚ โ”œโ”€โ”€ main.js # Core application logic โ”‚ โ”‚ โ””โ”€โ”€ api.js # API service layer โ”‚ โ”œโ”€โ”€ css/ # Styling (embedded in HTML) โ”‚ โ””โ”€โ”€ assets/ # Images, fonts, static files โ”œโ”€โ”€ ๐Ÿ”ง Backend API โ”‚ โ”œโ”€โ”€ api.py # FastAPI REST API โ”‚ โ”œโ”€โ”€ serve.py # Alternative server implementation โ”‚ โ””โ”€โ”€ config.py # Configuration settings โ”œโ”€โ”€ ๐Ÿ—„๏ธ Database โ”‚ โ”œโ”€โ”€ db/ โ”‚ โ”‚ โ”œโ”€โ”€ db_utils.py # Database utilities and connections โ”‚ โ”‚ โ”œโ”€โ”€ schema.sql # Database schema definition โ”‚ โ”‚ โ””โ”€โ”€ import_lkk_data.py # Data import utilities โ”œโ”€โ”€ ๐Ÿค– AI/ML Components โ”‚ โ”œโ”€โ”€ embedding/ โ”‚ โ”‚ โ”œโ”€โ”€ enhanced_rag_service.py # OpenAI-powered search โ”‚ โ”‚ โ””โ”€โ”€ simple_search_service.py # Fallback keyword search โ”‚ โ””โ”€โ”€ utils/ โ”‚ โ””โ”€โ”€ text_processing.py # Text utilities โ”œโ”€โ”€ ๐Ÿ“Š Streamlit Interface โ”‚ โ””โ”€โ”€ app.py # Alternative Streamlit dashboard โ”œโ”€โ”€ ๐Ÿ“ Data โ”‚ โ””โ”€โ”€ [Various legal document folders organized by type] โ”œโ”€โ”€ ๐Ÿ“‹ Documentation โ”‚ โ”œโ”€โ”€ plan.md # Comprehensive project planning โ”‚ โ””โ”€โ”€ templates/ # Template files and examples โ””โ”€โ”€ โš™๏ธ Configuration โ”œโ”€โ”€ requirements.txt # Python dependencies โ”œโ”€โ”€ .gitignore # Git ignore rules โ””โ”€โ”€ test_api.py # API testing utilities ``` ## ๐Ÿ› ๏ธ Installation & Setup ### Prerequisites - **Python 3.8+** - **MySQL Database** - **OpenAI API Key** (optional, for enhanced AI features) - **Modern Web Browser** ### 1. Clone the Repository ```bash git clone https://github.com/your-repo/agc-chatbot.git cd agc-chatbot ``` ### 2. Install Python Dependencies ```bash # Create virtual environment (recommended) python -m venv venv source venv/bin/activate # Linux/Mac # or venv\Scripts\activate # Windows # Install dependencies pip install -r requirements.txt ``` ### 3. Database Setup ```bash # Create MySQL database mysql -u root -p < db/schema.sql # Import sample data (optional) python db/import_lkk_data.py ``` ### 4. Environment Configuration Create a `.env` file in the root directory: ```env # Database Configuration MYSQL_HOST=localhost MYSQL_USER=root MYSQL_PASSWORD=your_password MYSQL_DATABASE=agc MYSQL_PORT=3306 # OpenAI Configuration (optional) OPENAI_API_KEY=your_openai_api_key OPENAI_EMBEDDING_MODEL=text-embedding-ada-002 OPENAI_CHAT_MODEL=gpt-3.5-turbo # Application Settings MAX_SEARCH_RESULTS=10 SIMILARITY_THRESHOLD=0.7 ``` ### 5. Start the Backend API ```bash # Method 1: Using FastAPI directly uvicorn api:app --reload --host 0.0.0.0 --port 8000 # Method 2: Using the custom server python serve.py # Method 3: Using the Streamlit interface streamlit run app.py ``` ### 6. Access the Frontend - **Web Interface**: Open `frontend/index.html` in a web browser - **API Documentation**: Visit `http://localhost:8000/docs` - **Streamlit Dashboard**: Visit `http://localhost:8501` (if using Streamlit) ## ๐Ÿ”Œ API Endpoints ### Document Management - `GET /documents` - List all documents with optional filtering - `GET /documents/{id}` - Get specific document by ID - `GET /document-types` - Get available document types ### Search - `POST /search` - Perform AI-enhanced or simple search - `GET /ping` - Health check endpoint ### Example API Usage ```javascript // Search for documents const searchResults = await fetch("http://localhost:8000/search", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ query: "cross-border financial fraud investigations", profile_search: false, }), }); // Get document by ID const document = await fetch("http://localhost:8000/documents/1"); ``` ## ๐ŸŽฏ Usage Guide ### Web Interface 1. **Browse Documents**: Use the "Browse Documents" tab to explore all available documents with filtering options 2. **Search**: Use the "Search" tab for AI-powered document search with natural language queries 3. **AI Assistant**: Use the "AI Assistant" tab to chat with the AI about legal concepts and documents ### Search Features - **Natural Language**: "Find cases about money laundering in Malaysia" - **Specific Terms**: "AMLA 2001 cross-border investigations" - **Question Format**: "What is the procedure for international evidence collection?" ### AI Assistant Tools - **Analyze Document**: Get detailed analysis of legal documents - **Summarize**: Create concise summaries of lengthy documents - **Legal Concepts**: Explain complex legal terminology - **Cross Reference**: Find related documents and precedents - **Case Analysis**: Comprehensive case law analysis ## ๐Ÿ”ง Technical Details ### Frontend Technologies - **HTML5** with semantic markup - **CSS3** with modern features (Grid, Flexbox, Custom Properties) - **Vanilla JavaScript** (ES6+) with modular architecture - **Font Awesome** icons - **Inter** font family for modern typography ### Backend Technologies - **FastAPI** - Modern Python web framework - **Pydantic** - Data validation and serialization - **MySQL** - Relational database for document storage - **OpenAI API** - For enhanced search and AI chat features - **LangChain** - For AI/ML pipeline management ### AI/ML Features - **Text Embeddings** - Vector representations for semantic search - **Similarity Matching** - Cosine similarity for relevance scoring - **Query Enhancement** - AI-powered query improvement - **RAG (Retrieval-Augmented Generation)** - Context-aware AI responses ## ๐Ÿงช Testing ```bash # Test API endpoints python test_api.py # Manual testing using the web interface # Open frontend/index.html in a browser # API documentation and testing # Visit http://localhost:8000/docs ``` ## ๐Ÿš€ Deployment ### Production Deployment 1. **Environment Setup**: - Configure production database - Set up environment variables - Configure CORS settings for production domains 2. **Backend Deployment**: ```bash # Using gunicorn for production gunicorn -w 4 -k uvicorn.workers.UvicornWorker api:app ``` 3. **Frontend Deployment**: - Serve static files through a web server (nginx, Apache) - Update API base URL in `frontend/js/api.js` 4. **Database**: - Set up production MySQL instance - Run migrations and import data - Configure backup strategies ## ๐Ÿ” Security Considerations - **API Security**: Implement authentication and authorization - **Database Security**: Use secure connections and proper credentials - **Environment Variables**: Keep sensitive data in environment files - **CORS Configuration**: Restrict to allowed domains in production - **Input Validation**: All user inputs are validated and sanitized ## ๐Ÿค Contributing 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/new-feature`) 3. Commit your changes (`git commit -am 'Add new feature'`) 4. Push to the branch (`git push origin feature/new-feature`) 5. Create a Pull Request ## ๐Ÿ“„ License This project is developed for the Attorney General's Chambers of Malaysia. All rights reserved. ## ๐Ÿ“ž Support For technical support or questions about the AGC Document Search System, please contact the development team or refer to the project documentation. --- **Built with โค๏ธ for the Attorney General's Chambers of Malaysia**