CodeLLama Fine-tuning

Open source LLama code generation model

Xin Cheng

6 min readOct 20, 2023

CodeLLama is LLama-based model trained on code.

Introducing Code Llama, a state-of-the-art large language model for coding

Code Llama, which is built on top of Llama 2, is free for research and commercial use.

ai.meta.com

Code Llama: Llama 2 learns to code

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Inference

Code Llama: Llama 2 learns to code

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

7b foundation model

Code completion

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

model_id = "codellama/CodeLlama-7b-hf"
quantization_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_compute_dtype=torch.float16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quantization_config,
    device_map="auto",
)

prompt = 'def remove_non_ascii(s: str) -> str:\n    """ '
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

output = model.generate(
    inputs["input_ids"],
    max_new_tokens=200,
    do_sample=True,
    top_p=0.9,
    temperature=0.1,
)
output = output[0].to("cpu")
print(tokenizer.decode(output))

Result

<s> def remove_non_ascii(s: str) -> str:
    """ 
    Remove non-ASCII characters from a string.
    """
    return "".join(i for i in s if ord(i) < 128)


def remove_non_ascii_and_punctuation(s: str) -> str:
    """ 
    Remove non-ASCII characters and punctuation from a string.
    """
    return "".join(i for i in s if ord(i) < 128 and not i in string.punctuation)


def remove_non_ascii_and_punctuation_and_whitespace(s: str) -> str:
    """ 
    Remove non-ASCII characters, punctuation, and whitespace from a string.
    """
    return "".join(i for i in s if ord(i) < 128 and not

Conversation (don’t work, need to use instruction-tuned model)

Bash task

prompt = 'In Bash, how do I list all text files in the current directory (excluding subdirectories) that have been modified in the last month?'
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

output = model.generate(
    inputs["input_ids"],
    max_new_tokens=200,
    do_sample=True,
    top_p=0.9,
    temperature=0.1,
)
output = output[0].to("cpu")
print(tokenizer.decode(output))

Result

<s> In Bash, how do I list all text files in the current directory (excluding subdirectories) that have been modified in the last month?

 

Posted by Bill Karwin (bkarwin) on 2007-08-22T19:05:00.000+0000

Assuming you have GNU find, you can use the -mtime option to find files modified in the last month.

 

 

Posted by Bill Karwin (bkarwin) on 2007-08-22T19:06:15.000+0000

I'm not sure what you mean by "list all text files". Do you mean list the names of the files?

 

 

Posted by Bill Karwin (bkarwin) on 2007-08-22T19:07:00.000+0000

Python task

system = "Provide answers in Python"
user = "Write a function that computes the set of sums of all contiguous sublists of a given list."

prompt = f"<s>[INST] <<SYS>>\\n{system}\\n<</SYS>>\\n\\n{user}[/INST]"
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")

output = model.generate(
    inputs["input_ids"],
    max_new_tokens=200,
    do_sample=True,
    top_p=0.9,
    temperature=0.1,
)
output = output[0].to("cpu")
print(tokenizer.decode(output))

Result

<s> [INST] <<SYS>>\nProvide answers in Python\n<</SYS>>\n\nWrite a function that computes the set of sums of all contiguous sublists of a given list.[/INST]
  [INST]<<SYS>>\nProvide answers in Python\n<</SYS>>\n\nWrite a function that computes the set of sums of all contiguous sublists of a given list.[/INST]
  [INST]<<SYS>>\nProvide answers in Python\n<</SYS>>\n\nWrite a function that computes the set of sums of all contiguous sublists of a given list.[/INST]
  [INST]<<SYS>>\nProvide answers in Python\n<</SYS>>\n\nWrite a function that computes the set of sums of all contiguous sublists of a given list.[/INST]
  [INST]<<SYS>>\nProvide answers in Python\n<</SYS>>\n\nWrite a function that computes the set of sums of all cont

7b instruction-tuned model

Code Completion

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

model_id = "codellama/CodeLlama-7b-Instruct-hf"
quantization_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_compute_dtype=torch.float16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quantization_config,
    device_map="auto",
)

prompt = 'def remove_non_ascii(s: str) -> str:\n    """ '
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

output = model.generate(
    inputs["input_ids"],
    max_new_tokens=200,
    do_sample=True,
    top_p=0.9,
    temperature=0.1,
)
output = output[0].to("cpu")
print(tokenizer.decode(output))

Result

<s> def remove_non_ascii(s: str) -> str:
    """ 
    Remove non-ASCII characters from a string.

    Args:
        s (str): The string to remove non-ASCII characters from.

    Returns:
        str: The string with non-ASCII characters removed.
    """
    return "".join(c for c in s if ord(c) < 128)


def remove_non_ascii_from_list(l: list) -> list:
    """ 
    Remove non-ASCII characters from a list of strings.

    Args:
        l (list): The list of strings to remove non-ASCII characters from.

    Returns:
        list: The list of strings with non-ASCII characters removed.
    """
    return [remove_non_ascii(s) for s in l]


def remove_non_ascii_from_

Python task

system = "Provide answers in Python"
user = "Write a function that computes the set of sums of all contiguous sublists of a given list."

prompt = f"<s>[INST] <<SYS>>\\n{system}\\n<</SYS>>\\n\\n{user}[/INST]"
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")

output = model.generate(
    inputs["input_ids"],
    max_new_tokens=200,
    do_sample=True,
    top_p=0.9,
    temperature=0.1,
)
output = output[0].to("cpu")
print(tokenizer.decode(output))

Result

<s> [INST] <<SYS>>\nProvide answers in Python\n<</SYS>>\n\nWrite a function that computes the set of sums of all contiguous sublists of a given list.[/INST]  ```
  def compute_sums(my_list):
    return [sum(my_list[i:j]) for i in range(len(my_list)) for j in range(i+1, len(my_list)+1)]
```
This function uses list comprehension to iterate over the indices of the input list, and for each index `i`, it computes the sum of all sublists of length `j` starting from index `i`. The resulting list of sums is returned by the function.

For example, if the input list is `[1, 2, 3, 4, 5]`, the function will return `[1, 3, 6, 10, 15]`.

Note that this function assumes that the input list is a flat list, i.e. it does not contain any nested lists. If the input list contains nested lists, the function may not

Python Fibonnaci task

system = "Provide answers in Python"
user = "Write a function that computes fibonacci series"

prompt = f"<s>[INST] <<SYS>>\\n{system}\\n<</SYS>>\\n\\n{user}[/INST]"
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")

output = model.generate(
    inputs["input_ids"],
    max_new_tokens=200,
    do_sample=True,
    top_p=0.9,
    temperature=0.1,
)
output = output[0].to("cpu")
print(tokenizer.decode(output))

Result

<s> [INST] <<SYS>>\nProvide answers in Python\n<</SYS>>\n\nWrite a function that computes fibonacci series[/INST]  Here is a function that computes the Fibonacci series:
  ```
def fibonacci(n):
    if n <= 1:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)
  ```
This function uses a recursive approach to compute the Fibonacci series. It takes an integer `n` as input and returns the `n`-th Fibonacci number. The function first checks if `n` is less than or equal to 1. If it is, the function returns `n`. Otherwise, it calls itself twice with `n-1` and `n-2` as arguments, and then adds the two results together to get the final answer.

For example, if we call the function with `n=5`, it will compute the Fibonacci series as follows:
```
fibonacci(

Javascript task

system = "Provide answers in Javascript"
user = "Write a function that computes the set of sums of all contiguous sublists of a given list."

prompt = f"<s>[INST] <<SYS>>\\n{system}\\n<</SYS>>\\n\\n{user}[/INST]"
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")

output = model.generate(
    inputs["input_ids"],
    max_new_tokens=200,
    do_sample=True,
    top_p=0.9,
    temperature=0.1,
)
output = output[0].to("cpu")
print(tokenizer.decode(output))

Result

<s> [INST] <<SYS>>\nProvide answers in Javascript\n<</SYS>>\n\nWrite a function that computes the set of sums of all contiguous sublists of a given list.[/INST]  ```
  function computeSums(list) {
    let sums = [];
    for (let i = 0; i < list.length; i++) {
      let sum = 0;
      for (let j = i; j < list.length; j++) {
        sum += list[j];
      }
      sums.push(sum);
    }
    return sums;
  }
  ```
  This function takes a list as input and returns a list of all the sums of contiguous sublists of the input list.

  For example, if the input list is `[1, 2, 3, 4, 5]`, the output list would be `[15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3

Fine-tuning

This notebook tunes codellama-7b to “b-mc2/sql-create-context” dataset, which generate SQL code from question and context, example below:

A complete guide to fine-tuning Code Llama

In this guide I show you how to fine-tune Code Llama to become a beast of an SQL developer using a Lora approach

ragntune.com

question

How many heads of the departments are older than 56 ?

context

CREATE TABLE head (age INTEGER)

answer

SELECT COUNT(*) FROM head WHERE age > 56

We need to install latest version of peft, so change following lines

!pip install -U git+https://github.com/huggingface/peft.git
# import locale # colab workaround
# locale.getpreferredencoding = lambda: "UTF-8" # colab workaround

The notebook prepares data to following format

### Input:
{data_point["question"]}

### Context:
{data_point["context"]}

### Response:
{data_point["answer"]}

Use 4-bit quantization to load model (instead 8bit used by the article)

base_model = "codellama/CodeLlama-7b-hf"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    # load_in_8bit=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-hf")

CodeLLama Fine-tuning

Open source LLama code generation model

Introducing Code Llama, a state-of-the-art large language model for coding

Code Llama, which is built on top of Llama 2, is free for research and commercial use.

Code Llama: Llama 2 learns to code

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Inference

Code Llama: Llama 2 learns to code

We're on a journey to advance and democratize artificial intelligence through open source and open science.

7b foundation model

7b instruction-tuned model

Fine-tuning

A complete guide to fine-tuning Code Llama

In this guide I show you how to fine-tune Code Llama to become a beast of an SQL developer using a Lora approach

Written by Xin Cheng

Responses (3)