Machine Learning Model Deployment on Azure and AWS

Azure Machine Learning, Amazon Sagemaker

Xin Cheng
6 min readJan 1, 2024

A model not used is not useful, after model is trained, you would like to deploy it. Let’s first review common concept deploying model to cloud services.

Concept

Model

After a model is trained, the framework generally generates binary file which later can be loaded and perform inference. Model can have different format supported by the training framework (e.g. scikit learn pickle file, joblib, PyTorch, TensorFlow, ONNX, Spark MLlib, etc.)

Environment/Dependency

It is same as training environment, when you deploy models, you need to have correct dependency like OS, Python, Python packages in order to run the model. It is usually represented as Docker image and needs to versioned.

Inference script

Code that executes the model on a given input request. The scoring script receives data submitted to a deployed web service and passes it to the model. The script then executes the model and returns its response to the client. The scoring script is specific to your model and must understand the data that the model expects as input and returns as output

Endpoint

According to Azure ML, An endpoint, in this context, is an HTTPS path that provides an interface for clients to send requests (input data) to a trained model and receive the inferencing (scoring) results back from the model. An endpoint provides:

  • Authentication using “key or token” based auth
  • TLS(SSL) termination
  • A stable scoring URI

Deployment

A deployment is a set of resources required for hosting the model that does the actual inferencing. Separating endpoint from deployment enables you to deploy multiple model/model versions to same endpoint, enabling A/B test or traffic split.

Update

The same model can have multiple versions (like code). Therefore, when you train a newer version of a model and update existing model deployment with the new version, you probably want to do similar things like when you deploying a newer version of code (e.g. you want to minimize downtime of your endpoint, you want to do A/B testing to evaluate change impact of the newer version)

Model version

It is best that you know what model version is deployed, so later if you want to know model performance for different versions or rollback to specific model versions, it is easy to fetch correct model version to deploy

Instance type

Can be CPU, GPU and also different SKU for different computing requirements

Reference

Azure

Create Azure ML client to interact with Azure ML API

# get a handle to the workspace
ml_client = MLClient(
DefaultAzureCredential(), subscription_id, resource_group, workspace
)

Create endpoint

# Define an endpoint name
endpoint_name = "my-endpoint"

# Example way to define a random name
import datetime

endpoint_name = "endpt-" + datetime.datetime.now().strftime("%m%d%H%M%f")

# create an online endpoint
endpoint = ManagedOnlineEndpoint(
name = endpoint_name,
description="this is a sample endpoint",
auth_mode="key"
)
ml_client.online_endpoints.begin_create_or_update(endpoint)

Specify model

model = Model(path="../model-1/model/sklearn_regression_model.pkl")

Specify environment

env = Environment(
conda_file="../model-1/environment/conda.yaml",
image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
)

Scoring script

Need to define init (This function is called when the container is initialized/started, typically after create/update of the deployment. You can write the logic here to perform init operations like caching the model in memory)

run (This function is called for every invocation of the endpoint to perform the actual scoring/prediction. In the example we extract the data from the json input and call the scikit-learn model’s predict() method and return the result back)

def init():
"""
This function is called when the container is initialized/started, typically after create/update of the deployment.
You can write the logic here to perform init operations like caching the model in memory
"""
global model
# AZUREML_MODEL_DIR is an environment variable created during deployment.
# It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
# Please provide your model's folder name if there is one
model_path = os.path.join(
os.getenv("AZUREML_MODEL_DIR"), "model/sklearn_regression_model.pkl"
)
# deserialize the model file back into a sklearn model
model = joblib.load(model_path)
logging.info("Init complete")


def run(raw_data):
"""
This function is called for every invocation of the endpoint to perform the actual scoring/prediction.
In the example we extract the data from the json input and call the scikit-learn model's predict()
method and return the result back
"""
logging.info("model 1: request received")
data = json.loads(raw_data)["data"]
data = numpy.array(data)
result = model.predict(data)
logging.info("Request processed")
return result.tolist()

Create deployment

Specify deployment name, model, environment, inference script, instance type (compute SKU)

blue_deployment = ManagedOnlineDeployment(
name="blue",
endpoint_name=endpoint_name,
model=model,
environment=env,
code_configuration=CodeConfiguration(
code="../model-1/onlinescoring", scoring_script="score.py"
),
instance_type="Standard_DS3_v2",
instance_count=1,
)
ml_client.online_deployments.begin_create_or_update(blue_deployment)

Invoke endpoint

# test the blue deployment with some sample data
ml_client.online_endpoints.invoke(
endpoint_name=endpoint_name,
deployment_name="blue",
request_file="../model-1/sample-request.json",
)

AWS

Like sagemaker training, there is no dedicated environment concept. You need to manage Docker image (e.g. build docker image and store in ECR). Also sagemaker does not have deployment concept (but you can do blue/green deployment with multi-model endpoint)

Define model

Sagemaker model has inference script, environment (you can use docker image URI or framework/py version to specify what docker image to use).

model = PyTorchModel(
entry_point="inference.py",
source_dir="code",
role=role,
model_data=pt_mnist_model_data,
framework_version="1.5.0",
py_version="py3",
)

Inference script

Beside model initialize and model run, Sagemaker also has functions to let you define input/output data. Functions:

model_fn: tell the inference image how to load the model checkpoint

input_fn: SageMaker PyTorch model server will invoke the input_fn function in your inference entry point. This function handles data decoding

predict_fn: After the inference request has been deserialized by input_fn, the SageMaker PyTorch model server invokes predict_fn on the return value of input_fn

output_fn: After invoking predict_fn, the model server invokes output_fn for data post-process.

Below are a sample inference script

def model_fn(model_dir):
model = Net()
with open(os.path.join(model_dir, "model.pth"), "rb") as f:
model.load_state_dict(torch.load(f))
model.to(device).eval()
return model

def input_fn(request_body, request_content_type):
assert request_content_type=='application/json'
data = json.loads(request_body)['inputs']
data = torch.tensor(data, dtype=torch.float32, device=device)
return data

def predict_fn(input_object, model):
with torch.no_grad():
prediction = model(input_object)
return prediction

def output_fn(predictions, content_type):
assert content_type == 'application/json'
res = predictions.cpu().numpy().tolist()
return json.dumps(res)

Create endpoint

:
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

# set local_mode to False if you want to deploy on a remote
# SageMaker instance

instance_type = "ml.c4.xlarge"

predictor = model.deploy(
initial_instance_count=1,
instance_type=instance_type,
serializer=JSONSerializer(),
deserializer=JSONDeserializer(),

Build custom container image with additional Python packages

https://predictifsolutions.com/tech-blog/how-to-custom-models-sagemaker

Appendix

--

--

Xin Cheng
Xin Cheng

Written by Xin Cheng

Multi/Hybrid-cloud, Kubernetes, cloud-native, big data, machine learning, IoT developer/architect, 3x Azure-certified, 3x AWS-certified, 2x GCP-certified