Machine Learning Model Deployment on Azure and AWS

Azure Machine Learning, Amazon Sagemaker

6 min readJan 1, 2024

A model not used is not useful, after model is trained, you would like to deploy it. Let’s first review common concept deploying model to cloud services.

Concept

Model

After a model is trained, the framework generally generates binary file which later can be loaded and perform inference. Model can have different format supported by the training framework (e.g. scikit learn pickle file, joblib, PyTorch, TensorFlow, ONNX, Spark MLlib, etc.)

Environment/Dependency

It is same as training environment, when you deploy models, you need to have correct dependency like OS, Python, Python packages in order to run the model. It is usually represented as Docker image and needs to versioned.

Inference script

Code that executes the model on a given input request. The scoring script receives data submitted to a deployed web service and passes it to the model. The script then executes the model and returns its response to the client. The scoring script is specific to your model and must understand the data that the model expects as input and returns as output

Endpoint

According to Azure ML, An endpoint, in this context, is an HTTPS path that provides an interface for clients to send requests (input data) to a trained model and receive the inferencing (scoring) results back from the model. An endpoint provides:

Authentication using “key or token” based auth
TLS(SSL) termination
A stable scoring URI

Deployment

A deployment is a set of resources required for hosting the model that does the actual inferencing. Separating endpoint from deployment enables you to deploy multiple model/model versions to same endpoint, enabling A/B test or traffic split.

Update

The same model can have multiple versions (like code). Therefore, when you train a newer version of a model and update existing model deployment with the new version, you probably want to do similar things like when you deploying a newer version of code (e.g. you want to minimize downtime of your endpoint, you want to do A/B testing to evaluate change impact of the newer version)

Model version

It is best that you know what model version is deployed, so later if you want to know model performance for different versions or rollback to specific model versions, it is easy to fetch correct model version to deploy

Instance type

Can be CPU, GPU and also different SKU for different computing requirements

Reference

MLOps: Machine learning model management - Azure Machine Learning

Learn about model management (MLOps) with Azure Machine Learning. Deploy, manage, track lineage, and monitor your…

learn.microsoft.com

Azure

Deploy machine learning models to online endpoints for inference - Azure Machine Learning

Learn to deploy your machine learning model as an online endpoint in Azure.

learn.microsoft.com

Create Azure ML client to interact with Azure ML API

# get a handle to the workspace
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

Create endpoint

# Define an endpoint name
endpoint_name = "my-endpoint"

# Example way to define a random name
import datetime

endpoint_name = "endpt-" + datetime.datetime.now().strftime("%m%d%H%M%f")

# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name = endpoint_name, 
    description="this is a sample endpoint",
    auth_mode="key"
)
ml_client.online_endpoints.begin_create_or_update(endpoint)

Specify model

model = Model(path="../model-1/model/sklearn_regression_model.pkl")

Specify environment

env = Environment(
    conda_file="../model-1/environment/conda.yaml",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
)

Scoring script

Need to define init (This function is called when the container is initialized/started, typically after create/update of the deployment. You can write the logic here to perform init operations like caching the model in memory)

run (This function is called for every invocation of the endpoint to perform the actual scoring/prediction. In the example we extract the data from the json input and call the scikit-learn model’s predict() method and return the result back)

def init():
    """
    This function is called when the container is initialized/started, typically after create/update of the deployment.
    You can write the logic here to perform init operations like caching the model in memory
    """
    global model
    # AZUREML_MODEL_DIR is an environment variable created during deployment.
    # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
    # Please provide your model's folder name if there is one
    model_path = os.path.join(
        os.getenv("AZUREML_MODEL_DIR"), "model/sklearn_regression_model.pkl"
    )
    # deserialize the model file back into a sklearn model
    model = joblib.load(model_path)
    logging.info("Init complete")


def run(raw_data):
    """
    This function is called for every invocation of the endpoint to perform the actual scoring/prediction.
    In the example we extract the data from the json input and call the scikit-learn model's predict()
    method and return the result back
    """
    logging.info("model 1: request received")
    data = json.loads(raw_data)["data"]
    data = numpy.array(data)
    result = model.predict(data)
    logging.info("Request processed")
    return result.tolist()

Create deployment

Specify deployment name, model, environment, inference script, instance type (compute SKU)

blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=endpoint_name,
    model=model,
    environment=env,
    code_configuration=CodeConfiguration(
        code="../model-1/onlinescoring", scoring_script="score.py"
    ),
    instance_type="Standard_DS3_v2",
    instance_count=1,
)
ml_client.online_deployments.begin_create_or_update(blue_deployment)

Invoke endpoint

# test the blue deployment with some sample data
ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    deployment_name="blue",
    request_file="../model-1/sample-request.json",
)

Training and deploying your PyTorch model in the cloud with Azure ML

You’ve been training your PyTorch models on your machine, and getting by just fine. Why would you want to train and…

medium.com

Deploying Hugging Face Hub models in Azure Machine Learning

Microsoft has partnered with Hugging Face to bring open-source models to Azure Machine Learning. Hugging Face is the…

techcommunity.microsoft.com

Deploy models from HuggingFace hub to Azure Machine Learning online endpoints for real-time…

Deploy and score transformers based large language models from the Hugging Face hub.

learn.microsoft.com

Safe rollout for online endpoints - Azure Machine Learning

Roll out newer versions of ML models without disruption.

learn.microsoft.com

AWS

Deploy a Trained PyTorch Model - Amazon SageMaker Examples 1.0.0 documentation

The PyTorchModel class allows you to define an environment for making inference using your model artifact. Like the…

sagemaker-examples.readthedocs.io

Like sagemaker training, there is no dedicated environment concept. You need to manage Docker image (e.g. build docker image and store in ECR). Also sagemaker does not have deployment concept (but you can do blue/green deployment with multi-model endpoint)

Define model

Sagemaker model has inference script, environment (you can use docker image URI or framework/py version to specify what docker image to use).

model = PyTorchModel(
    entry_point="inference.py",
    source_dir="code",
    role=role,
    model_data=pt_mnist_model_data,
    framework_version="1.5.0",
    py_version="py3",
)

Inference script

Beside model initialize and model run, Sagemaker also has functions to let you define input/output data. Functions:

model_fn: tell the inference image how to load the model checkpoint

input_fn: SageMaker PyTorch model server will invoke the input_fn function in your inference entry point. This function handles data decoding

predict_fn: After the inference request has been deserialized by input_fn, the SageMaker PyTorch model server invokes predict_fn on the return value of input_fn

output_fn: After invoking predict_fn, the model server invokes output_fn for data post-process.

Below are a sample inference script

def model_fn(model_dir):
    model = Net()
    with open(os.path.join(model_dir, "model.pth"), "rb") as f:
        model.load_state_dict(torch.load(f))
    model.to(device).eval()
    return model

def input_fn(request_body, request_content_type):
    assert request_content_type=='application/json'
    data = json.loads(request_body)['inputs']
    data = torch.tensor(data, dtype=torch.float32, device=device)
    return data

def predict_fn(input_object, model):
    with torch.no_grad():
        prediction = model(input_object)
    return prediction

def output_fn(predictions, content_type):
    assert content_type == 'application/json'
    res = predictions.cpu().numpy().tolist()
    return json.dumps(res)

Create endpoint

:
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

# set local_mode to False if you want to deploy on a remote
# SageMaker instance

instance_type = "ml.c4.xlarge"

predictor = model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer(),

Dummies Guide to Deploying a Custom Pytorch Model on AWS Sagemaker

If you are offended by the ‘Dummies’ in the title, this post is not for you. In this post, I’ll walk you through each…

solharsh.medium.com

How to Deploy a Pytorch Model on SageMaker

In this tutorial, you’ll learn how to deploy a Pytorch model on AWS cloud infrastructure.

samuelabiodun.medium.com

Host multiple models in one container behind one endpoint

Create an endpoint that can host multiple Amazon SageMaker models to help reduce cost.

docs.aws.amazon.com

Deploy a Custom ML Model as a SageMaker Endpoint

A quick and easy guide for creating an AWS SageMaker endpoint for your model

towardsdatascience.com

Build custom container image with additional Python packages

Deploy Llama 2 7B/13B/70B on Amazon SageMaker

Learn how to deploy Llama 2 models (7B - 70B) to Amazon SageMaker using the Hugging Face LLM Inference DLC.

www.philschmid.de

https://predictifsolutions.com/tech-blog/how-to-custom-models-sagemaker

Creating a machine learning-powered REST API with Amazon API Gateway mapping templates and Amazon…

July 2022: Post was reviewed for accuracy. Amazon SageMaker enables organizations to build, train, and deploy machine…

aws.amazon.com

Call an Amazon SageMaker model endpoint using Amazon API Gateway and AWS Lambda.

This article is part of a series where we walk step by step through solving fintech problems with different Machine…

medium.com

aws-apigateway-sagemakerendpoint

All classes are under active development and subject to non-backward compatible changes or removal in any future…

docs.aws.amazon.com

Appendix

Train and deploy a machine learning model with Azure Machine Learning - Training

To train a machine learning model with Azure Machine Learning, you need to make data available and configure compute…

learn.microsoft.com

Machine Learning Model Deployment on Azure and AWS

Azure Machine Learning, Amazon Sagemaker

Concept

MLOps: Machine learning model management - Azure Machine Learning

Learn about model management (MLOps) with Azure Machine Learning. Deploy, manage, track lineage, and monitor your…

Azure

Deploy machine learning models to online endpoints for inference - Azure Machine Learning

Learn to deploy your machine learning model as an online endpoint in Azure.

Training and deploying your PyTorch model in the cloud with Azure ML

You’ve been training your PyTorch models on your machine, and getting by just fine. Why would you want to train and…

Deploying Hugging Face Hub models in Azure Machine Learning

Microsoft has partnered with Hugging Face to bring open-source models to Azure Machine Learning. Hugging Face is the…

Deploy models from HuggingFace hub to Azure Machine Learning online endpoints for real-time…

Deploy and score transformers based large language models from the Hugging Face hub.

Safe rollout for online endpoints - Azure Machine Learning

Roll out newer versions of ML models without disruption.

AWS

Deploy a Trained PyTorch Model - Amazon SageMaker Examples 1.0.0 documentation

The PyTorchModel class allows you to define an environment for making inference using your model artifact. Like the…

Dummies Guide to Deploying a Custom Pytorch Model on AWS Sagemaker

If you are offended by the ‘Dummies’ in the title, this post is not for you. In this post, I’ll walk you through each…

How to Deploy a Pytorch Model on SageMaker

In this tutorial, you’ll learn how to deploy a Pytorch model on AWS cloud infrastructure.

Host multiple models in one container behind one endpoint

Create an endpoint that can host multiple Amazon SageMaker models to help reduce cost.

Deploy a Custom ML Model as a SageMaker Endpoint

A quick and easy guide for creating an AWS SageMaker endpoint for your model

Deploy Llama 2 7B/13B/70B on Amazon SageMaker

Learn how to deploy Llama 2 models (7B - 70B) to Amazon SageMaker using the Hugging Face LLM Inference DLC.

Creating a machine learning-powered REST API with Amazon API Gateway mapping templates and Amazon…

July 2022: Post was reviewed for accuracy. Amazon SageMaker enables organizations to build, train, and deploy machine…

Call an Amazon SageMaker model endpoint using Amazon API Gateway and AWS Lambda.

This article is part of a series where we walk step by step through solving fintech problems with different Machine…

aws-apigateway-sagemakerendpoint

All classes are under active development and subject to non-backward compatible changes or removal in any future…

Appendix

Train and deploy a machine learning model with Azure Machine Learning - Training

To train a machine learning model with Azure Machine Learning, you need to make data available and configure compute…

Written by Xin Cheng

No responses yet