99 lines
2.9 KiB
Markdown
99 lines
2.9 KiB
Markdown
# AGC Document Chatbot
|
|
|
|
A Streamlit-based web application that provides intelligent search and chat capabilities for Attorney General's Chambers (AGC) documents. The system uses Retrieval-Augmented Generation (RAG) to enhance search accuracy and provide context-aware responses.
|
|
|
|
## Features
|
|
|
|
- **Document Browsing**: Browse through all available AGC documents with filtering by document type and title/content search
|
|
- **Enhanced RAG Search**: Search documents using AI-enhanced query understanding
|
|
- **Document Detail View**: View full document details with contextual information
|
|
- **Chat Interface**: Chat with AI about document content with context-awareness
|
|
|
|
## Architecture
|
|
|
|
The application consists of several key components:
|
|
|
|
- **Web Interface**: Built with Streamlit
|
|
- **Document Database**: MySQL database for storing document metadata and content
|
|
- **Embedding Services**: Vector embeddings for semantic search capabilities
|
|
- **RAG Enhancement**: Improved search using OpenAI's capabilities
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
.
|
|
├── app.py # Main Streamlit application
|
|
├── config.py # Configuration settings
|
|
├── db/ # Database utilities
|
|
│ └── import_lkk_data.py # Script for importing LKK data
|
|
├── embedding/ # Embedding and RAG services
|
|
│ ├── embedding_service.py
|
|
│ ├── enhanced_rag_service.py
|
|
│ └── rag_service.py
|
|
├── utils/ # Utility functions
|
|
├── Data/ # Document data
|
|
└── requirements.txt # Python dependencies
|
|
```
|
|
|
|
## Installation
|
|
|
|
1. Clone the repository
|
|
2. Create a virtual environment:
|
|
```
|
|
python -m venv venv
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
```
|
|
3. Install dependencies:
|
|
```
|
|
pip install -r requirements.txt
|
|
```
|
|
4. Configure environment variables (create a `.env` file based on requirements)
|
|
|
|
## Database Setup
|
|
|
|
1. Install and run XAMPP:
|
|
|
|
- Download XAMPP from [https://www.apachefriends.org/](https://www.apachefriends.org/)
|
|
- Install and launch XAMPP Control Panel
|
|
- Start the Apache and MySQL services
|
|
- Access phpMyAdmin at [http://localhost/phpmyadmin](http://localhost/phpmyadmin)
|
|
- Create a new database named `agc`
|
|
|
|
2. Configure the database connection in your `.env` file:
|
|
```
|
|
MYSQL_HOST=localhost
|
|
MYSQL_USER=root
|
|
MYSQL_PASSWORD=
|
|
MYSQL_DATABASE=agc
|
|
```
|
|
|
|
## Data Import (Optional)
|
|
|
|
To import LKK (Laporan Keputusan Kes) data into the system:
|
|
|
|
```
|
|
python -m db.import_lkk_data
|
|
```
|
|
|
|
This script will:
|
|
|
|
- Set up required database tables
|
|
- Import available data from SQL or PDF files in the Data directory
|
|
- Generate document embeddings for search functionality
|
|
|
|
## Usage
|
|
|
|
Run the Streamlit application:
|
|
|
|
```
|
|
streamlit run app.py
|
|
```
|
|
|
|
The application will be available at http://localhost:8501 by default.
|
|
|
|
## Requirements
|
|
|
|
- Python 3.7+
|
|
- MySQL database (via XAMPP)
|
|
- OpenAI API key (for embedding and RAG features)
|