agc-chatbot/README.md

99 lines
2.9 KiB
Markdown

# AGC Document Chatbot
A Streamlit-based web application that provides intelligent search and chat capabilities for Attorney General's Chambers (AGC) documents. The system uses Retrieval-Augmented Generation (RAG) to enhance search accuracy and provide context-aware responses.
## Features
- **Document Browsing**: Browse through all available AGC documents with filtering by document type and title/content search
- **Enhanced RAG Search**: Search documents using AI-enhanced query understanding
- **Document Detail View**: View full document details with contextual information
- **Chat Interface**: Chat with AI about document content with context-awareness
## Architecture
The application consists of several key components:
- **Web Interface**: Built with Streamlit
- **Document Database**: MySQL database for storing document metadata and content
- **Embedding Services**: Vector embeddings for semantic search capabilities
- **RAG Enhancement**: Improved search using OpenAI's capabilities
## Project Structure
```
.
├── app.py # Main Streamlit application
├── config.py # Configuration settings
├── db/ # Database utilities
│ └── import_lkk_data.py # Script for importing LKK data
├── embedding/ # Embedding and RAG services
│ ├── embedding_service.py
│ ├── enhanced_rag_service.py
│ └── rag_service.py
├── utils/ # Utility functions
├── Data/ # Document data
└── requirements.txt # Python dependencies
```
## Installation
1. Clone the repository
2. Create a virtual environment:
```
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. Install dependencies:
```
pip install -r requirements.txt
```
4. Configure environment variables (create a `.env` file based on requirements)
## Database Setup
1. Install and run XAMPP:
- Download XAMPP from [https://www.apachefriends.org/](https://www.apachefriends.org/)
- Install and launch XAMPP Control Panel
- Start the Apache and MySQL services
- Access phpMyAdmin at [http://localhost/phpmyadmin](http://localhost/phpmyadmin)
- Create a new database named `agc`
2. Configure the database connection in your `.env` file:
```
MYSQL_HOST=localhost
MYSQL_USER=root
MYSQL_PASSWORD=
MYSQL_DATABASE=agc
```
## Data Import (Optional)
To import LKK (Laporan Keputusan Kes) data into the system:
```
python -m db.import_lkk_data
```
This script will:
- Set up required database tables
- Import available data from SQL or PDF files in the Data directory
- Generate document embeddings for search functionality
## Usage
Run the Streamlit application:
```
streamlit run app.py
```
The application will be available at http://localhost:8501 by default.
## Requirements
- Python 3.7+
- MySQL database (via XAMPP)
- OpenAI API key (for embedding and RAG features)