Installation
Running the system
- install docker >= v24.0 (and nvidia-docker if an Nvidia GPU is available, test the GPU setup with a simple container)
- check out the git repository
git clone https://github.com/QuiddityAI/InsightHub.git cd InsightHub - create a
.envfile according to the variables listed in required_environment_variables.txt
cp required_environment_variables.txt .env
.env file to set the variables. File specifies which environment variables are required for the system to run. The LLM functionality (handled by LLMonkey) uses Mistral models by default, so it's sufficient to only set LLMONKEY_MISTRAL_API_KEY env var. See their manual for details.
- add
docker-compose.override.pdferret.yamlto yourCOMPOSE_FILEenv variable (colon separated) if you want to be able to upload and parse PDF files (and other documents) - run
docker compose up -d. If your setup requires using docker withsudo, runsudo -E docker compose up -dinstead. This important because docker-compose is mounting HOME directory in the containe to get access to gcloud credentials. Ifsudois used without-E, the HOME directory will be different (/root) and the credentials will not be found. - go to
localhost:55140and log in with e-mailadmin@example.comand passwordadmin(if not changed using env variables) - visit the Django admin interface (using the top right user menu and the "database" icon) for more settings
Things to try after initial installation
- Chat (if you added LLM API keys): create a new empty collection, click on "Summaries" at the top right, type in a question and press enter. You can use it like a classic ChatGPT chat.
- Upload PDF documents (if you added the PDFerret docker container + LLM API keys): click on "upload documents" on the top left -> "My Dataset" -> "+ Choose" to select files -> "Upload". You should then be able to search for those documents in new collections.
Upload your own data via Upload
You can upload individual files as well as .zip files with multiple files in the user interface. You can also upload CSV files with multiple entries for some dataset schemas (e.g. scientific documents).
Upload your own data via API
You can upload individual files or arbitrary JSON documents with your own data using the API. See the folder scripts_and_examples/import_scripts for examples (some might be outdated, check import_local_german_files.py first).