AGC Document Search System

A comprehensive AI-powered document search and chat system for the Attorney General's Chambers (AGC) of Malaysia. This system provides intelligent document discovery, natural language search capabilities, and an AI assistant for legal research and analysis.

🏗️ Project Overview

The AGC Document Search System is designed for Malaysian government legal staff, prosecutors, and legal researchers to efficiently search, browse, and interact with legal documents including:

  • LKK (Laporan Keputusan Kes) - Case Decision Reports
  • Legal Cases - Various legal precedents and case law
  • Criminal Cases - Criminal law documents and decisions
  • Government Legal Documents - Various AGC documents and legal resources

🚀 Key Features

🔍 Advanced Search Capabilities

  • AI-Enhanced Search: OpenAI-powered semantic search with query enhancement
  • Natural Language Processing: Ask questions in plain language
  • Relevancy Scoring: Documents ranked by similarity and relevance
  • Query Enhancement: AI automatically improves search queries for better results
  • Search History: Track and revisit previous searches

📚 Document Management

  • Smart Document Browser: Grid and list view options with advanced filtering
  • Document Types Filtering: Filter by Legal, Criminal, LKK, and other types
  • Metadata Display: Comprehensive document information and classification
  • Document Viewer: Rich document display with structured content presentation
  • Bookmark System: Save and organize important documents

🤖 AI Assistant

  • Legal Chat Interface: Interactive AI assistant for legal queries
  • Document Analysis: AI-powered document summaries and key point extraction
  • Legal Concept Explanations: Get definitions and explanations of legal terms
  • Cross-Reference Analysis: Find related documents and cases
  • Precedent Search: Locate relevant legal precedents

🎨 Modern User Interface

  • Responsive Design: Optimized for desktop, tablet, and mobile devices
  • Government-Appropriate Styling: Professional navy and gold color scheme
  • Accessibility Compliant: WCAG 2.1 AA standards
  • Smooth Animations: Modern micro-interactions and transitions
  • Inter Font Family: Modern, readable typography

🏛️ System Architecture

AGC Document Search System
├── Frontend (Web Interface)
│   ├── HTML5 with modern CSS3
│   ├── Vanilla JavaScript (ES6+)
│   └── Responsive design with animations
├── Backend API (FastAPI)
│   ├── RESTful API endpoints
│   ├── Document retrieval and search
│   └── AI integration layer
├── Database (MySQL)
│   ├── Document storage and metadata
│   ├── Search history and logs
│   └── Vector embeddings for AI search
├── AI/ML Components
│   ├── OpenAI integration for enhanced search
│   ├── Text embeddings and similarity matching
│   └── Fallback simple search service
└── Additional Interfaces
    ├── Streamlit dashboard (app.py)
    └── Direct API testing tools

📁 Project Structure

agc-chatbot/
├── 📄 Frontend
│   ├── index.html              # Main web interface
│   ├── js/
│   │   ├── main.js            # Core application logic
│   │   └── api.js             # API service layer
│   ├── css/                   # Styling (embedded in HTML)
│   └── assets/                # Images, fonts, static files
├── 🔧 Backend API
│   ├── api.py                 # FastAPI REST API
│   ├── serve.py               # Alternative server implementation
│   └── config.py              # Configuration settings
├── 🗄️ Database
│   ├── db/
│   │   ├── db_utils.py        # Database utilities and connections
│   │   ├── schema.sql         # Database schema definition
│   │   └── import_lkk_data.py # Data import utilities
├── 🤖 AI/ML Components
│   ├── embedding/
│   │   ├── enhanced_rag_service.py    # OpenAI-powered search
│   │   └── simple_search_service.py   # Fallback keyword search
│   └── utils/
│       └── text_processing.py        # Text utilities
├── 📊 Streamlit Interface
│   └── app.py                 # Alternative Streamlit dashboard
├── 📁 Data
│   └── [Various legal document folders organized by type]
├── 📋 Documentation
│   ├── plan.md                # Comprehensive project planning
│   └── templates/             # Template files and examples
└── ⚙️ Configuration
    ├── requirements.txt       # Python dependencies
    ├── .gitignore            # Git ignore rules
    └── test_api.py           # API testing utilities

🛠️ Installation & Setup

Prerequisites

  • Python 3.8+
  • MySQL Database
  • OpenAI API Key (optional, for enhanced AI features)
  • Modern Web Browser

1. Clone the Repository

git clone https://github.com/your-repo/agc-chatbot.git
cd agc-chatbot

2. Install Python Dependencies

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

3. Database Setup

# Create MySQL database
mysql -u root -p < db/schema.sql

# Import sample data (optional)
python db/import_lkk_data.py

4. Environment Configuration

Create a .env file in the root directory:

# Database Configuration
MYSQL_HOST=localhost
MYSQL_USER=root
MYSQL_PASSWORD=your_password
MYSQL_DATABASE=agc
MYSQL_PORT=3306

# OpenAI Configuration (optional)
OPENAI_API_KEY=your_openai_api_key
OPENAI_EMBEDDING_MODEL=text-embedding-ada-002
OPENAI_CHAT_MODEL=gpt-3.5-turbo

# Application Settings
MAX_SEARCH_RESULTS=10
SIMILARITY_THRESHOLD=0.7

5. Start the Backend API

# Method 1: Using FastAPI directly
uvicorn api:app --reload --host 0.0.0.0 --port 8000

# Method 2: Using the custom server
python serve.py

# Method 3: Using the Streamlit interface
streamlit run app.py

6. Access the Frontend

  • Web Interface: Open frontend/index.html in a web browser
  • API Documentation: Visit http://localhost:8000/docs
  • Streamlit Dashboard: Visit http://localhost:8501 (if using Streamlit)

🔌 API Endpoints

Document Management

  • GET /documents - List all documents with optional filtering
  • GET /documents/{id} - Get specific document by ID
  • GET /document-types - Get available document types
  • POST /search - Perform AI-enhanced or simple search
  • GET /ping - Health check endpoint

Example API Usage

// Search for documents
const searchResults = await fetch("http://localhost:8000/search", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    query: "cross-border financial fraud investigations",
    profile_search: false,
  }),
});

// Get document by ID
const document = await fetch("http://localhost:8000/documents/1");

🎯 Usage Guide

Web Interface

  1. Browse Documents: Use the "Browse Documents" tab to explore all available documents with filtering options
  2. Search: Use the "Search" tab for AI-powered document search with natural language queries
  3. AI Assistant: Use the "AI Assistant" tab to chat with the AI about legal concepts and documents

Search Features

  • Natural Language: "Find cases about money laundering in Malaysia"
  • Specific Terms: "AMLA 2001 cross-border investigations"
  • Question Format: "What is the procedure for international evidence collection?"

AI Assistant Tools

  • Analyze Document: Get detailed analysis of legal documents
  • Summarize: Create concise summaries of lengthy documents
  • Legal Concepts: Explain complex legal terminology
  • Cross Reference: Find related documents and precedents
  • Case Analysis: Comprehensive case law analysis

🔧 Technical Details

Frontend Technologies

  • HTML5 with semantic markup
  • CSS3 with modern features (Grid, Flexbox, Custom Properties)
  • Vanilla JavaScript (ES6+) with modular architecture
  • Font Awesome icons
  • Inter font family for modern typography

Backend Technologies

  • FastAPI - Modern Python web framework
  • Pydantic - Data validation and serialization
  • MySQL - Relational database for document storage
  • OpenAI API - For enhanced search and AI chat features
  • LangChain - For AI/ML pipeline management

AI/ML Features

  • Text Embeddings - Vector representations for semantic search
  • Similarity Matching - Cosine similarity for relevance scoring
  • Query Enhancement - AI-powered query improvement
  • RAG (Retrieval-Augmented Generation) - Context-aware AI responses

🧪 Testing

# Test API endpoints
python test_api.py

# Manual testing using the web interface
# Open frontend/index.html in a browser

# API documentation and testing
# Visit http://localhost:8000/docs

🚀 Deployment

Production Deployment

  1. Environment Setup:

    • Configure production database
    • Set up environment variables
    • Configure CORS settings for production domains
  2. Backend Deployment:

    # Using gunicorn for production
    gunicorn -w 4 -k uvicorn.workers.UvicornWorker api:app
    
  3. Frontend Deployment:

    • Serve static files through a web server (nginx, Apache)
    • Update API base URL in frontend/js/api.js
  4. Database:

    • Set up production MySQL instance
    • Run migrations and import data
    • Configure backup strategies

🔐 Security Considerations

  • API Security: Implement authentication and authorization
  • Database Security: Use secure connections and proper credentials
  • Environment Variables: Keep sensitive data in environment files
  • CORS Configuration: Restrict to allowed domains in production
  • Input Validation: All user inputs are validated and sanitized

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/new-feature)
  3. Commit your changes (git commit -am 'Add new feature')
  4. Push to the branch (git push origin feature/new-feature)
  5. Create a Pull Request

📄 License

This project is developed for the Attorney General's Chambers of Malaysia. All rights reserved.

📞 Support

For technical support or questions about the AGC Document Search System, please contact the development team or refer to the project documentation.


Built with ❤️ for the Attorney General's Chambers of Malaysia

Description
No description provided
Readme 116 MiB
Languages
Python 100%