Huggingface Chat UI — your own ChatGPT part 1

Large language model serving and chatbot

3 min readSep 16, 2023

Huggingface Chat UI overview

Huggingface Chat UI allows you to deploy your own ChatGPT-like conversational UI that can interact with models on Huggingface, Huggingface text generation inference or custom API powered by LLM.

For practitioners in LLM space, what are the options to stand up conversational UI which wire up to LLM? Streamlit or Gradio are typical options to quickly develop UI for machine learning models, while it is easy to place textbox input and text output, when you need a more feature-rich chatbot, like conversational history, memory, authentication, theme, you need to develop on your own. Introducing Huggingface Chat UI, a SvelteKit Typescript based conversational UI.

Get started

The default configuration supports OpenAssistant model. The most basic configuration should contain a conversation data store, huggingface token (if model can only be downloaded with authentication), and model configuration.

Chat history is stored in MongoDB.

MONGODB_URL=<the URL to your mongoDB instance>

You need to register with Hugginface to get access token.
HF_ACCESS_TOKEN=<your access token>

Model configuration


MODELS=`[
  {
    "name": "OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5",
    "datasetName": "OpenAssistant/oasst1",
    "description": "A good alternative to ChatGPT",
    "websiteUrl": "https://open-assistant.io",
    "userMessageToken": "<|prompter|>", # This does not need to be a token, can be any string
    "assistantMessageToken": "<|assistant|>", # This does not need to be a token, can be any string
    "userMessageEndToken": "<|endoftext|>", # Applies only to user messages. Can be any string.
    "assistantMessageEndToken": "<|endoftext|>", # Applies only to assistant messages. Can be any string.
    "preprompt": "Below are a series of dialogues between various people and an AI assistant. The AI tries to be helpful, polite, honest, sophisticated, emotionally aware, and humble-but-knowledgeable. The assistant is happy to help with almost anything, and will do its best to understand exactly what is needed. It also tries to avoid giving false or misleading information, and it caveats when it isn't entirely sure about the right answer. That said, the assistant is practical and really does its best, and doesn't let caution get too much in the way of being useful.\n-----\n",
    "promptExamples": [
      {
        "title": "Write an email from bullet list",
        "prompt": "As a restaurant owner, write a professional email to the supplier to get these products every week: \n\n- Wine (x10)\n- Eggs (x24)\n- Bread (x12)"
      }, {
        "title": "Code a snake game",
        "prompt": "Code a basic snake game in python, give explanations for each step."
      }, {
        "title": "Assist in a task",
        "prompt": "How do I make a delicious lemon cheesecake?"
      }
    ],
    "parameters": {
      "temperature": 0.9,
      "top_p": 0.95,
      "repetition_penalty": 1.2,
      "top_k": 50,
      "truncate": 1000,
      "max_new_tokens": 1024,
      "stop": ["<|endoftext|>"]  # This does not need to be tokens, can be any list of strings
    }
  }
]`

Prompt format

Generally for most chat models, it requires model-specific prompt format, e.g. you can find OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 prompt format here, which match userMessageToken, assistantMessageToken, userMessageEndToken, assistantMessageEndToken defined above. The popular LLAMA 2 chat prompt format is here. There is also “preprompt” which is the system message to prime the model with context, instructions, or other information relevant to your use case. (You can use the system message to describe the assistant’s personality, define what the model should and shouldn’t answer, and define the format of model responses.)

Parameters

For text generation, there are lots of parameters to configure, e.g. temperature (how random the model output is), max_new_tokens ( estimated size of generated text you want), top_k and top_p to control sampling next word.

Run with default configuration (openassistant) on Ubuntu 22.04

# start mongodb
docker run -d --rm -p 0.0.0.0:27017:27017 --name mongo-chatui mongo:latest
# start chat ui
# install nodejs, chat ui requires 18+
sudo apt-get update -y
sudo apt-get install -y ca-certificates curl gnupg
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg

NODE_MAJOR=20
echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | sudo tee /etc/apt/sources.list.d/nodesource.list

sudo apt-get update -y
sudo apt-get install nodejs -y

# start chat ui
npm install
npm run dev -- --port 8080 --host 0.0.0.0