LLaMA-2 Fine-tuning

Yet another open-source large language model fine-tuning

3 min readJul 30, 2023

If you are in Generative AI field like me, you must have heard LLaMA, Alpaca, Vicuna. They all derive from Facebook LLaMA model, enabling you to create your own “ChatGPT”. However, they all have one issue, the original LLaMA model is not released for commericial use, hence its descedants. Community has released lots of similar open-source LLM models during last several months, but now finally LLaMA 2 comes out with commercial use.

Certainly I like to try new things when it comes out (and certainly these days there are too many things you cannot try them all). There are lots of blogs, videos about LLaMA 2, but I still encountered roadblocks during my journey, so I would like to document them to so you can avoid them.

The most popular Transfomer model is on Huggingface, so I decided to start from below.

Llama 2 is here - get it on Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

The article has 2 sections to quickly try out: use huggingface transformers library for inference and fine-tuning base model.

Base model Inference

Currently, llama-2 is not publicly downloadable from Hugginface. You need to submit your access request for Meta’s approval, after you login to Hugginface from portal, you need to find the model and submit access request. Then you can do as in the article, login with access token using huggingface-cli. Then you can do inference. It is straightforward.

Fine-tuning

I encountered issues here. The article says about using TRL/transformer reinforcement learning library SFT (supervised fine-tuning) trainer (which is the first step for training LLM). However, on my side, I need to fix a few things.

I need to add lora_dropout to PEFT’s LoraConfig in trl/examples/scripts/sft_trainer.py
Here is my environment that does not cause a problem

OS: Ubuntu LTS 22.04

CUDA driver: 12.1

Python: 3.10

PyTorch: no 12.1 package yet, use 11.8, pip3 install torch torchvision torchaudio — index-url https://download.pytorch.org/whl/cu118

Python packages: transformers==4.31.0, bitsandbytes==0.39.0 (0.41.0 version has issue with cuda 12.1), peft==https://github.com/huggingface/peft (installing from pypi.org causing matrix multiplication issue)

3. The script assumes that dataset has a column called “text” containing all instructions. If you write your own training script, you can do more preprocessing, tokenization, etc.

Dataset preparation

OpenAI Platform

Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

platform.openai.com

For simple instruction tuning training text, generally there are 3 pieces:

Instruction (e.g. tell me who won most medals in Olympics; summarize the following text)
Context (e.g. the text you want model to summarize)
Response (ideal response you want model to generate)

It is easier for model to learn and easier for us to extract response if we separate them using clear separator. Generally I use

### Instruction: <instruction> ### Context: <context> ### Response: <response> ### End

However, the following should also work

### Instruction: <instruction>\n\n### Context: <context>\n\n### Response: <response>\n\n### End

From article below, you can use whatever prompt format with base model. However, for chat model, it is better to follow the prompt format below

<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>

{{ user_message }} [/INST]

Does finetuning need to follow the Llama 2 system prompt format? · Issue #229 · artidoro/qlora

The finetuning Llama 2 example uses the oasst1 dataset with the "### Human: ... ### Assistant: " system prompt format…

github.com

Prompts for finetuning Llama v2 7B/13B · Issue #114 · facebookresearch/llama-recipes

🚀 The feature, motivation and pitch Hi! I am following the tutorial here…

github.com

Appendix

LLaMA 2 - Every Resource you need

All Resources for LLaMA 2, How to test, train, and deploy it.

www.philschmid.de

LLAMA 2 resources: playground, benchmark, prompt llama-2 chat, fine-tuning, deployment

Llama 2: an incredible open LLM

Meta is continuing to deliver high-quality research artifacts and not backing down from pressure against open source.

www.interconnects.ai

Introducing Llama 2 on Azure

Get started with Llama 2 on Azure. Finetune, evaluate and deploy the model with built-in Azure AI Content Safety…

techcommunity.microsoft.com

Llama 2 foundation models from Meta are now available in Amazon SageMaker JumpStart | Amazon Web…

Today, we are excited to announce that Llama 2 foundation models developed by Meta are available for customers through…

aws.amazon.com

Llama2 is available in Azure Machine Learning and AWS Sagemaker

Other youtuber’s on tuning LLaMA-2 using your GPU