Machine Learning stories roundup 2023.9

Machine learning articles that are interesting to read

6 min readSep 5, 2023

General

Automated Feature Engineering in Python

A guide to augmenting your dataset with new and informative features using Upgini

towardsdatascience.com

Upgini can search public and premium external data sources to enrich our features for LLM or neural network models. Can also use OpenAI GPT to generate features.

LLM

LLM Ecosystem

When Should You Fine-Tune LLMs?

There has been a flurry of exciting open-source LLMs which can be fine-tuned. But how does that compare to just using a…

towardsdatascience.com

Cost for customizing LLMs:

Closed source APIs + Document Embedding Database: This first solution is probably the easiest to get started off with, and considering the high quality of ChatGPT API — might even give you a good enough (if not the best) performance. And it’s cheap!
Fine-tune LLMs: Recent progress from fine-tuning LLaMA-like models has shown this costs ~500$ to get a baseline performance similar to ChatGPT in certain domains. Could be worthwhile if you had a database with ~50–100k instructions or conversations to fine-tune a baseline model.
Train from scratch: As LLaMA and the more recent MPT-7B models have shown, this costs ~100–200k and takes a week or two.

Building with Instruction-Tuned LLMs, datasets used: instruction tuning (align with human preferences): https://huggingface.co/datasets/databricks/databricks-dolly-15k, fine-tuning: https://huggingface.co/datasets/FourthBrainGenAI/MarketMail-AI

Efficient Fine-Tuning for Llama-v2–7b on a Single GPU, PEFT, LORA, QLORA, Quantization, paged optimizer/adam optimizer weights offload to CPU, gradient accumulation; coding fine-tuning, dataset: https://huggingface.co/datasets/HuggingFaceH4/CodeAlpaca_20K, use declarative/low-code ML Ludwig

Deep Dive into LLM Evaluation with Weights & Biases, besides traditional accuracy, F1, exact match, ROUGE, also mentioned squad metrics, sematic answer similarity, G-Eval

Inside Lamini: A New Framework for Fine-Tuning LLMs

The framework streamlines the process of using techniques such as RLHF in your LLM models.

pub.towardsai.net

Many LLMs, but fine-tuning techniques such as reinforcement learning with human feedback(RLHF) require a particularly complicated workflow. Lamini is one of the open-source initiatives to streamline LLM fine-tuning process. Main capabilities:

· The Lamini library includes optimized prompt-tuning and typed outputs, which you can try out in our playground right now.

· With only a few lines of code, you can access the advanced Lamini library for fine-tuning and RLHF by signing up for early access.

· The hosted data generator enables the building blocks for creating data necessary to train instruction-following LLMs.

· An instruction-following LLM that can be used with few lines of code.

Huggingface trl package has pipeline support for SFT, reward modelling, reinforcement learning training with human feedback, under research projects.

PPO: Instruction Tuning using RLHF involves training a reward model and then using RL to find a policy that maximizes the learned reward.

DPO: Direct Preference Optimization (DPO) is an efficient alternative to RLHF. It eliminates the requirement of training reward model (which is hard to find perfect reward function) and then using reinforcement learning.

Human preference dataset from stack exchange, dataset processed into human rejected and accepted

DPO evaluates the consistency of a reward function with empirical preference data using a theoretical preference model. While conventional approaches define a preference loss using the preference model to train a reward model, DPO instead trains a policy that maximizes the learned reward model using a variable switch. Therefore, DPO may optimize a policy with a simple binary cross-entropy goal given a dataset of human preferences over model responses without explicitly learning a reward function or sampling from the policy during training.

How to use Grounding for your LLMs with text embeddings | Google Cloud Blog

Embeddings for text, vector search, large language models and integration methods for your internal apps.

cloud.google.com

How Google Vertex AI can prevent LLM from hallucination: grounding with embeddings and vector search. Enablers:

Vertex AI Embeddings for Text/Image enabling Semantic Search, Recommendation, Clustering, Anomaly Detection, Sentiment Analysis

Vertex AI Matching Engine: vector search

PrivateGPT: The Ultimate Solution for Offline, Secure Language Processing That Turns Your PDFs…

Hello, fellow tech enthusiasts! If you’re anything like me, you’re probably always on the lookout for cutting-edge…

medium.com

Interesting project: build your self-hosted GPT with LangChain, GPT4All, LlamaCpp, Chroma and SentenceTransformers, in case data leak is serious risk.

Training Your Own LLM using privateGPT

Learn how to train your own language model without exposing your private data to the provider

levelup.gitconnected.com

Another article on privategpt, notable things besides private GPT model and private embeddings provider (the example use sentencetransformers embedding all-MiniLM-L6-v2 model), you have token limit setting

MODEL_N_CTX — Maximum token limit for both embeddings and LLM models

ingest.py allows you to ingest documents and list supported file types

Fine-tuning with Instruction prompts, Model Evaluation Metrics, and Evaluation Benchmarks for LLMs

Methods to improve performance for specific use cases

blog.gopenai.com

Evaluating Large Language Model Chatbots for Decision Support 🔎

medium.com

Automatic Metrics

Text-generation/Summarization tasks

BLEU: Measures precision of n-grams between generated and reference texts. Useful for evaluating language fluency.
ROUGE: Measures recall of n-grams between generated and reference texts. Also useful for language fluency.
BLEU vs ROUGE, BLUE: Precision oriented score, ROUGE: Recall oriented score
BERTScore: Calculates similarity between BERT embeddings of generated and reference texts. Evaluates semantic similarity.

Classification tasks

Accuracy: Fraction of examples predicted correctly. Good for classification tasks.
F1 Score: Harmonic mean of precision and recall. Useful when classes are imbalanced.

Extraction tasks

Exact Match: Binary score indicating if generated text exactly matches reference. Useful for extraction tasks.

Human Evaluation

Language Quality: Have humans rate grammar, fluency, consistency on a Likert scale.
Engagingness: Score how interesting, diverse, and engaging conversations are.
Correctness: Evaluate if responses are factually accurate and logically valid.
Helpfulness: Assess if conversations resolve user queries appropriately.
User Satisfaction: Overall subjective rating of conversation experience.
Soundness — Assess logical validity of recommendations provided by the chatbot. Have human experts review conversations to check that the reasoning is analytically sound.
Ethicality — Evaluate whether recommendations adhere to ethical principles like transparency, fairness, avoiding bias etc. Human evaluations needed.
Actionability — Score how precise and actionable the decision support provided is. Rate on a scale whether humans can act on the advice easily.

Inside LLaMA: Meta AI New Large Language Model that Outperforms GPT-3 Across Many Tasks

An open-source implementation of LLaMA is already available.

pub.towardsai.net

Meet Vicuna: The Latest Meta’s Llama Model that Matches ChatGPT Performance

The modle was created by researchers from UC Berkeley, CMU, Stanford, and UC San Diego.

pub.towardsai.net

Vicuna is an open-source chatbot that has been fine-tuned (supervised instruction fine-tuning) from a LLaMA base model using approximately 70,000 user-shared conversations collected from ShareGPT.com with public APIs.

The team expanded the max context length from 512 in alpaca to 2048 to enable a better understanding of long conversations.

Vicuna beats LLaMa, Alpaca in most tasks.

How to stream ChatGPT API responses?

This tutorial introduces a simple method to stream ChatGPT & GPT-4 responses. Streamed responses are served sooner to…

tmmtt.medium.com

Streaming ChatGPT responses: the new API is simpler, set stream=True, for each event in response variable, get event dealta response, then retrieve content

SELF-INSTRUCT, the Low-cost LLM Alignment Process

This post is an easy-to-digest explanation of the seminal SELF-INSTRUCT paper that led to another influential work…

medium.com

Easy-to-digest explanation of the seminal SELF-INSTRUCT paper that led to another influential work, Stanford Alpaca.

Motivation for instruction following: Large Language Models are trained to predict the next token which, in general, can lead to untruthful, toxic, and unhelpful token generations. A better goal is to follow the user’s instructions. In doing so, LLMs generations can be truthful, helpful, and safe. This process is known as alignment.

SELF-INSTRUCT’s Motivation: reduce the dependence on human annotators, from cost, diversity and creativity perspectives.

6 steps:

A bootstrapped pipeline generates tasks (and instances of those tasks) with a pre-trained model. This can be broken down into steps zero through four.

Step 0 — Manual task creation seeding
Step 1 — Instruction generation (with prompting like “come up with series of new tasks)
Step 2 — Classification task identification
Step 3 — Instance generation (input-first approach for non-classification, output-first approach for classification)
Step 4 — Filter out similar tasks (ROUGE-L should be less than 0.7 to ensure diversity)

2. Step 5 — The final step, step 5, takes the generated tasks and fine-tunes them in order to align the language model to follow instructions better.

Other

Understanding Contrastive Learning

Learn how to learn without labels.

towardsdatascience.com

Contrastive learning allows machine learning model to look at which pairs of data points are “similar” and “different” in order to learn higher-level features about the data, before even having a task such as classification or segmentation. The process is:

Create different versions of the same image with two augmentation combinations (i.e. crop + resize + recolor, resize + recolor, crop + recolor, etc.)
Train the model to output similar representations for similar images.
Maximize the similarity of the two vector representations by minimizing a contrastive loss function.

Machine Learning stories roundup 2023.9

Machine learning articles that are interesting to read

General

Automated Feature Engineering in Python

A guide to augmenting your dataset with new and informative features using Upgini

LLM

When Should You Fine-Tune LLMs?

There has been a flurry of exciting open-source LLMs which can be fine-tuned. But how does that compare to just using a…

Inside Lamini: A New Framework for Fine-Tuning LLMs

The framework streamlines the process of using techniques such as RLHF in your LLM models.

How to use Grounding for your LLMs with text embeddings | Google Cloud Blog

Embeddings for text, vector search, large language models and integration methods for your internal apps.

PrivateGPT: The Ultimate Solution for Offline, Secure Language Processing That Turns Your PDFs…

Hello, fellow tech enthusiasts! If you’re anything like me, you’re probably always on the lookout for cutting-edge…

Training Your Own LLM using privateGPT

Learn how to train your own language model without exposing your private data to the provider

Fine-tuning with Instruction prompts, Model Evaluation Metrics, and Evaluation Benchmarks for LLMs

Methods to improve performance for specific use cases

Evaluating Large Language Model Chatbots for Decision Support 🔎

Inside LLaMA: Meta AI New Large Language Model that Outperforms GPT-3 Across Many Tasks

An open-source implementation of LLaMA is already available.

Meet Vicuna: The Latest Meta’s Llama Model that Matches ChatGPT Performance

The modle was created by researchers from UC Berkeley, CMU, Stanford, and UC San Diego.

How to stream ChatGPT API responses?

This tutorial introduces a simple method to stream ChatGPT & GPT-4 responses. Streamed responses are served sooner to…

SELF-INSTRUCT, the Low-cost LLM Alignment Process

This post is an easy-to-digest explanation of the seminal SELF-INSTRUCT paper that led to another influential work…

Other

Understanding Contrastive Learning

Learn how to learn without labels.

Written by Xin Cheng

No responses yet