Distributed Python application with Ray

5 min readMay 22, 2021

In Python world, multiprocessing library has been widely used to parallelize computation in single machine. However, distributed multi-node, multi-process applications are still hard to write. This article talks about how you can easily turn multiprocessing Python application into distributed containerized application with Ray and Kubernetes. This article is inspired by the following articles.

10x Faster Parallel Python Without Python Multiprocessing

While Python’s multiprocessing library has been used successfully for a wide range of applications, in this blog post…

towardsdatascience.com

How to scale Python multiprocessing to a cluster with one line of code

TL;DR: This blog post shows how multiprocessing.Pool can be seamlessly scaled from a single machine to a cluster.

medium.com

If you are familiar with Python native multiprocessing.Pool to do parallel processing, you will find migrating to ray.util.multiprocessing.Pool quite easy. If you are not familiar, here is a quick pattern.

Python Multiprocessing Example - JournalDev

In our previous tutorial, we learned about Python CSV Example. In this tutorial we are going to learn Python…

www.journaldev.com

from multiprocessing import Pool
import time# tasks to be parallelized
work = (["A", 5], ["B", 2], ["C", 1], ["D", 3])# worker function
def work_log(work_data):
    print(" Process %s waiting %s seconds" % (work_data[0], work_data[1]))
    time.sleep(int(work_data[1]))
    print(" Process %s Finished." % work_data[0])# driver
def pool_handler():
    p = Pool(2)
    # magic here, pass list of tasks, python multiprocessing takes care of parallelization and distribution
    p.map(work_log, work)if __name__ == '__main__':
    pool_handler()

It turns out, to convert Python single-machine multiprocessing into distributed Ray Project Pool, you only need to do the following in italics:

# from multiprocessing import Pool
from ray.util.multiprocessing import Poolimport time# tasks to be parallelized
work = (["A", 5], ["B", 2], ["C", 1], ["D", 3])# worker function
def work_log(work_data):
    print(" Process %s waiting %s seconds" % (work_data[0], work_data[1]))
    time.sleep(int(work_data[1]))
    print(" Process %s Finished." % work_data[0])# driver
def pool_handler():
    # if not using ray cluster, the following line is not needed, by the way, just pool = Pool(ray_address="<ip_address>:<port>") encounters redis.exception, Protocol error
    ray.util.connect(<ray_cluster_address>)    # p = Pool(2)
    pool = Pool(ray_address="auto")    # magic here, pass list of tasks, python multiprocessing takes care of parallelization and distribution
    pool.map(work_log, work)if __name__ == '__main__':
    pool_handler()

I used AWS EKS to run EasyOCR powered by Ray. Here are high-level steps:

Create AWS Elastic Container registry repository for hosting our images
Build docker image to host our code, model and push to ECR
Create EKS
Deploy Ray cluster on EKS and verify
Run distributed OCR app on Ray

Create ECR

Creating a repository

Before you can push your Docker images to Amazon ECR, you must create a repository to store them in. You can create…

docs.aws.amazon.com

e.g. if you want to push hello-world image, create repository named hello-world

Get credential so that we can push docker image to ECR

Private registry authentication

You can use the AWS Management Console, the AWS CLI, or the AWS SDKs to create and manage private repositories. You can…

docs.aws.amazon.com

Amazon ECR public registries

Amazon ECR public registries host your container images in a highly available and scalable architecture, allowing you…

docs.aws.amazon.com

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com

public registry

aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws

Build docker and push to ecr

I baked easyocr model, so it does not need to download model during runtime. I find download location at

JaidedAI/EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic…

github.com

mkdir -p models
cd models && wget https://github.com/JaidedAI/EasyOCR/releases/download/pre-v1.1.6/craft_mlt_25k.zip
wget https://github.com/JaidedAI/EasyOCR/releases/download/v1.3/english_g2.zip
unzip craft_mlt_25k.zip
unzip english_g2.zip
mkdir -p unzipped
mv craft_mlt_25k.pth unzipped/
mv english_g2.pth unzipped/
# get test images
mkdir -p images && wget -O images/english.png https://github.com/JaidedAI/EasyOCR/raw/master/examples/english.png
cd images && for i in {1..50}; do cp english.png "english$i.png"; done

Here is my Dockerfile

app.py

Build and push

docker built -t aws_account_id.dkr.ecr.region.amazonaws.com/rayocr:0.1 -f Dockerfile .
docker push aws_account_id.dkr.ecr.region.amazonaws.com/rayocr:0.1

Install related tools (terraform, kubectl, aws) and create EKS

Kubernetes cluster on Azure, AWS, GCP with Terraform and Seldon

Major cloud providers provide managed Kubernetes cluster offering and cli tool to create it easily. However, lots of…

billtcheng2013.medium.com

I chose t2.2xlarge to give enough resources.

Deploy ray cluster on Kubernetes

Ray v2.0.0.dev0

You can leverage your Kubernetes cluster as a substrate for execution of distributed Ray programs. The Ray Autoscaler…

docs.ray.io

Notice ray/values.yaml, operatorImage: rayproject/ray:nightly. If rayproject/ray:latest or rayproject/ray:1.3.0, I encounter “Python ValueError: invalid literal for int() with base 10”

ray-project/ray

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged…

github.com

I took shortcut of having EasyOCR dependency in header and worker node. So I modify the image in the following file to my image

ray-project/ray

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged…

github.com

Also I change default header and worker resource to have enough (header: 1 CPUs, 8Gi mem; worker: 4 CPUs, 16Gi mem). Otherwise, I encounter “A worker died or was killed while executing task”, due to lack of resource.

Deploy with helm

cd ray/deploy/charts
helm -n ray install example-cluster — create-namespace ./ray

Verify

kubectl -n ray get rayclusters
kubectl -n ray get pods
kubectl -n ray get service
kubectl get deployment ray-operator
kubectl get pod -l cluster.ray.io/component=operator
kubectl get crd rayclusters.cluster.ray.io

Execute sample job

kubectl -n ray create -f https://raw.githubusercontent.com/ray-project/ray/master/doc/kubernetes/job-example.yaml

Grant EKS access to ECR (if using private registry)

Using Amazon ECR Images with Amazon EKS

You can use your Amazon ECR images with Amazon EKS, but you need to satisfy the following prerequisites.

docs.aws.amazon.com

Test

Finally you can run Ray EasyOCR now.

kubectl -n ray create -f job.yaml

I see that header node got a few tasks, each worker node of the 2 got 4 processes (related to number of CPUs).

AKS steps

Azure cloud shell already has AZ cli, docker, kubectl, terraform installed, so use it for a quick start.

Create azure cloud shell

Azure Cloud Shell Quickstart - Bash

This document details how to use Bash in Azure Cloud Shell in the Azure portal. Launch Cloud Shell from the top…

docs.microsoft.com

Create service principal

You can think of service principal as a service account. It can be used by Azure AD to identify an application.

az ad sp create-for-rbac — skip-assignment

Then you can copy corresponding appId and password according to article below.

Provision an AKS Cluster (Azure) | Terraform - HashiCorp Learn

The Azure Kubernetes Service (AKS) is a fully managed Kubernetes service for deploying, managing, and scaling…

learn.hashicorp.com

Change aks-cluster.tf to following code to provision AKS and ACR.

You can use following code to verify if AKS is created.

kubectl get nodes
kubectl get ns

Build docker image for ACR

Quickstart - Build a container image on-demand in Azure - Azure Container Registry

In this quickstart, you use Azure Container Registry Tasks commands to quickly build, push, and run a Docker container…

docs.microsoft.com

echo FROM mcr.microsoft.com/hello-world > Dockerfile
az acr build — image sample/hello-world:v1 \
— registry testacr1011 \
— file Dockerfile .kubectl create deployment — image=testacr1011.azurecr.io/sample/hello-world:v1 hello-app# ray easyocr
az acr build --image rayeasyocr/rayeasyocr:0.1 --registry testacr1011 --file Dockerfile .

If AKS cannot pull image from ACR, attach AKS to your ACR

Integrate Azure Container Registry with Azure Kubernetes Service - Azure Kubernetes Service

When you're using Azure Container Registry (ACR) with Azure Kubernetes Service (AKS), an authentication mechanism needs…

docs.microsoft.com

az aks update -n <AKS cluster name> -g myResourceGroup — attach-acr <acr-name>

More info

Anyscale - The 2021 Ray Community Pulse Survey is Now Open

We are excited to launch our first annual Ray Community Pulse Survey ! The Ray project began several years ago at UC…

www.anyscale.com

Scaling Applications on Kubernetes with Ray

Introduction

vishnudeva.medium.com

Scaling AI with Ray

Ray is emerging in AI engineering and becomes essential to scale LLM and RL

pub.towardsai.net

Distributed Python application with Ray

10x Faster Parallel Python Without Python Multiprocessing

While Python’s multiprocessing library has been used successfully for a wide range of applications, in this blog post…

How to scale Python multiprocessing to a cluster with one line of code

TL;DR: This blog post shows how multiprocessing.Pool can be seamlessly scaled from a single machine to a cluster.

Python Multiprocessing Example - JournalDev

In our previous tutorial, we learned about Python CSV Example. In this tutorial we are going to learn Python…

Create ECR

Creating a repository

Before you can push your Docker images to Amazon ECR, you must create a repository to store them in. You can create…

Get credential so that we can push docker image to ECR

Private registry authentication

You can use the AWS Management Console, the AWS CLI, or the AWS SDKs to create and manage private repositories. You can…

Amazon ECR public registries

Amazon ECR public registries host your container images in a highly available and scalable architecture, allowing you…

Build docker and push to ecr

JaidedAI/EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic…

Install related tools (terraform, kubectl, aws) and create EKS

Kubernetes cluster on Azure, AWS, GCP with Terraform and Seldon

Major cloud providers provide managed Kubernetes cluster offering and cli tool to create it easily. However, lots of…

Deploy ray cluster on Kubernetes

Ray v2.0.0.dev0

You can leverage your Kubernetes cluster as a substrate for execution of distributed Ray programs. The Ray Autoscaler…

ray-project/ray

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged…

ray-project/ray

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged…

Deploy with helm

Verify

Execute sample job

Grant EKS access to ECR (if using private registry)

Using Amazon ECR Images with Amazon EKS

You can use your Amazon ECR images with Amazon EKS, but you need to satisfy the following prerequisites.

Test

AKS steps

Create azure cloud shell

Azure Cloud Shell Quickstart - Bash

This document details how to use Bash in Azure Cloud Shell in the Azure portal. Launch Cloud Shell from the top…

Provision an AKS Cluster (Azure) | Terraform - HashiCorp Learn

The Azure Kubernetes Service (AKS) is a fully managed Kubernetes service for deploying, managing, and scaling…

Build docker image for ACR

Quickstart - Build a container image on-demand in Azure - Azure Container Registry

In this quickstart, you use Azure Container Registry Tasks commands to quickly build, push, and run a Docker container…

Integrate Azure Container Registry with Azure Kubernetes Service - Azure Kubernetes Service

When you're using Azure Container Registry (ACR) with Azure Kubernetes Service (AKS), an authentication mechanism needs…

More info

Anyscale - The 2021 Ray Community Pulse Survey is Now Open

We are excited to launch our first annual Ray Community Pulse Survey ! The Ray project began several years ago at UC…

Scaling Applications on Kubernetes with Ray

Introduction

Scaling AI with Ray

Ray is emerging in AI engineering and becomes essential to scale LLM and RL

Written by Xin Cheng

Responses (1)