AGC Document Chatbot

A Streamlit-based web application that provides intelligent search and chat capabilities for Attorney General's Chambers (AGC) documents. The system uses Retrieval-Augmented Generation (RAG) to enhance search accuracy and provide context-aware responses.

Features

  • Document Browsing: Browse through all available AGC documents with filtering by document type and title/content search
  • Enhanced RAG Search: Search documents using AI-enhanced query understanding
  • Document Detail View: View full document details with contextual information
  • Chat Interface: Chat with AI about document content with context-awareness

Architecture

The application consists of several key components:

  • Web Interface: Built with Streamlit
  • Document Database: MySQL database for storing document metadata and content
  • Embedding Services: Vector embeddings for semantic search capabilities
  • RAG Enhancement: Improved search using OpenAI's capabilities

Project Structure

.
├── app.py                  # Main Streamlit application
├── config.py               # Configuration settings
├── db/                     # Database utilities
│   └── import_lkk_data.py  # Script for importing LKK data
├── embedding/              # Embedding and RAG services
│   ├── embedding_service.py
│   ├── enhanced_rag_service.py
│   └── rag_service.py
├── utils/                  # Utility functions
├── Data/                   # Document data
└── requirements.txt        # Python dependencies

Installation

  1. Clone the repository
  2. Create a virtual environment:
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies:
    pip install -r requirements.txt
    
  4. Configure environment variables (create a .env file based on requirements)

Database Setup

  1. Install and run XAMPP:

  2. Configure the database connection in your .env file:

    MYSQL_HOST=localhost
    MYSQL_USER=root
    MYSQL_PASSWORD=
    MYSQL_DATABASE=agc
    

Data Import (Optional)

To import LKK (Laporan Keputusan Kes) data into the system:

python -m db.import_lkk_data

This script will:

  • Set up required database tables
  • Import available data from SQL or PDF files in the Data directory
  • Generate document embeddings for search functionality

Usage

Run the Streamlit application:

streamlit run app.py

The application will be available at http://localhost:8501 by default.

Requirements

  • Python 3.7+
  • MySQL database (via XAMPP)
  • OpenAI API key (for embedding and RAG features)
Description
No description provided
Readme 116 MiB
Languages
Python 100%