LLAMA 2 on AWS Bedrock

Generative AI, foundation model

Xin Cheng
2 min readNov 30, 2023

With AWS re:invent, finally the popular LLAMA 2 models are added to the catalog. They also support on-demand which requires no deployment at all, although you can also setup provisioned throughput. Here is how you ineract with on-demand LLAMA 2 models. Suppose you setup a AWS profile named ‘dev’.

export AWS_PROFILE=dev


import boto3
import json
from botocore.exceptions import ClientError

def invoke_llama2(bedrock_runtime_client, prompt):
Invokes the Meta Llama 2 large-language model to run an inference
using the input provided in the request body.

:param prompt: The prompt that you want Jurassic-2 to complete.
:return: Inference response from the model.

# The different model providers have individual request and response formats.
# For the format, ranges, and default values for Meta Llama 2 Chat, refer to:
# https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html

body = {
"prompt": prompt,
"temperature": 0.5,
"top_p": 0.9,
"max_gen_len": 512,

# model_id = 'meta.llama2-13b-chat-v1'
model_id = 'meta.llama2-70b-chat-v1'
response = bedrock_runtime_client.invoke_model(
modelId=model_id, body=json.dumps(body)

response_body = json.loads(response["body"].read())
completion = response_body["generation"]

return completion

except ClientError:
logger.error("Couldn't invoke Llama 2")

# management plane, use bedrock
# brt = boto3.client(service_name='bedrock')
# brt.list_foundation_models()
brt = boto3.client(service_name='bedrock-runtime')
result = invoke_llama2(brt, 'what is llama 2?')

Fine-tuning a foundation model is also added (like AzureML).

If you encounter “UnknownServiceError: Unknown service: ‘bedrock-runtime’.” Just upgrade with latest boto3.

pip install -U boto3




Xin Cheng

Multi/Hybrid-cloud, Kubernetes, cloud-native, big data, machine learning, IoT developer/architect, 3x Azure-certified, 3x AWS-certified, 2x GCP-certified