AGC Document Chatbot
A Streamlit-based web application that provides intelligent search and chat capabilities for Attorney General's Chambers (AGC) documents. The system uses Retrieval-Augmented Generation (RAG) to enhance search accuracy and provide context-aware responses.
Features
- Document Browsing: Browse through all available AGC documents with filtering by document type and title/content search
- Enhanced RAG Search: Search documents using AI-enhanced query understanding
- Document Detail View: View full document details with contextual information
- Chat Interface: Chat with AI about document content with context-awareness
Architecture
The application consists of several key components:
- Web Interface: Built with Streamlit
- Document Database: MySQL database for storing document metadata and content
- Embedding Services: Vector embeddings for semantic search capabilities
- RAG Enhancement: Improved search using OpenAI's capabilities
Project Structure
.
├── app.py # Main Streamlit application
├── config.py # Configuration settings
├── db/ # Database utilities
│ └── import_lkk_data.py # Script for importing LKK data
├── embedding/ # Embedding and RAG services
│ ├── embedding_service.py
│ ├── enhanced_rag_service.py
│ └── rag_service.py
├── utils/ # Utility functions
├── Data/ # Document data
└── requirements.txt # Python dependencies
Installation
- Clone the repository
- Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Configure environment variables (create a
.env
file based on requirements)
Database Setup
-
Install and run XAMPP:
- Download XAMPP from https://www.apachefriends.org/
- Install and launch XAMPP Control Panel
- Start the Apache and MySQL services
- Access phpMyAdmin at http://localhost/phpmyadmin
- Create a new database named
agc
-
Configure the database connection in your
.env
file:MYSQL_HOST=localhost MYSQL_USER=root MYSQL_PASSWORD= MYSQL_DATABASE=agc
Data Import (Optional)
To import LKK (Laporan Keputusan Kes) data into the system:
python -m db.import_lkk_data
This script will:
- Set up required database tables
- Import available data from SQL or PDF files in the Data directory
- Generate document embeddings for search functionality
Usage
Run the Streamlit application:
streamlit run app.py
The application will be available at http://localhost:8501 by default.
Requirements
- Python 3.7+
- MySQL database (via XAMPP)
- OpenAI API key (for embedding and RAG features)
Description
Languages
Python
100%