MLOps: Deploy custom model with AWS Sagemaker batch transform — Part Ⅱ

7 min readJul 13, 2021

In part Ⅰ, MLOps: Data versioning with DVC, I introduced how to use DVC for versioning data in ML projects, it is needed and helpful when you do ML experiments. After you obtained the trained model and any other necessary ML artefacts, you want to deploy your ML models remotely on cloud and running prediction in production. If you have large datasets and want to predict in production efficiently, Sagemaker batch transform is a good choice for you.

Well, in this blog, you’ll learn:

How to deploy your ML models, which is built and trained locally with your custom algorithm, as RESTful API using docker, AWS ECS and AWS Sagemaker. Utilise Sagemaker batch transform to run ML prediction in production efficiently.

I’ll break the process into 4 sections:

A introduction about AWS Sagemaker batch transform
Build your own docker image with custom ML prediction algorithm
Create model on Sagemaker and apply your prediction logic to it
Configure Sagemaker Batch Transform job and start prediction

Introduction

Sagemaker batch transform allows you get inferences from large datasets. The idea of batch transform is that by using simple API, you can run predictions on large or small batch datasets easily, there is no need to break down the datasets into multiple chunks or run prediction in real-time which could be expensive. There is a parameter that allows you to customise the payload size per mini-batch, which means it will load as much as records in the dataset it can to perform the prediction.

Notice: Sagemaker batch transform job only supports CSV file and JSON file as input file, as data file will be split by lines, and passed into batch transform job as several batches.

The diagram below shows how batch transform job works, the detailed explanation can be found on AWS Sagemaker official documentation: Get Inferences for an Entire Dataset with Batch Transform.

Build your own docker image with custom ML prediction algorithm

Sagemaker allows you to pack your own algorithms, trained model and deploy in Sagemaker environment. Here is an example git repository showing how to deploy your custom model as RESTful API using docker. Sagemaker requires this git repository structure. The official document can be found here.

Project architecture

We followed the example git structure showed above:

|-- sagemaker-docker-repo
    |-- docker_test
        |-- sample_data.json   --> Your local testing file
        |-- train_local.sh
        |-- serve_local.sh
        |-- predict.sh
    |-- model
        |-- nginx.conf
        |-- predictor.py  --> Do your magic here
        |-- serve
        |-- train
        |-- wsgi.py
    |-- Dockerfile
    |-- requirements.txt

Dockerfile: The Dockerfile describes how the image is built and what it contains.

FROM python:3.7COPY requirements.txt /opt/program/requirements.txt 
RUN apt-get -y update && apt-get install -y --no-install-recommends \
wget \
nginx \         
ca-certificates \    
&& rm -rf /var/lib/apt/lists/*RUN pip3 install --no-cache-dir -r /opt/program/requirements.txtENV AWS_DEFAULT_REGION=eu-central-1
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"
ENV MODEL_PATH="/opt/ml/model"# Set up the program in the image
COPY ./model /opt/program
COPY ./properties /opt/program/propertiesWORKDIR /opt/program

model: The directory that contains the application to run in the container. Put all your ML project related file/data (e.g. model.pkl) under model/. See next session for the details about each file.
docker_test: A directory containing scripts and a set up for running a simple prediction job locally on mock data
sample_data.json: In our case, we used json format. When writing data to json using pandas, please set orient=’records’, and lines=True. So it looks like:

{column -> value}
{column -> value}
...
{column -> value}

When Sagemaker starts a container, it will invoke the container with an argument of either train or serve. We have set this container up so that the argument in treated as the command that the container executes. When training, it will run the train program included and, when serving, it will run the serve program.

train: It only copies the model.pkl under your project folder model/ to /opt/ml/model.pkl.
serve: It starts the webserver that sagemaker uses to push the chunks of data. In most cases, you can use this file as-is.
wsgi.py: The start up shell for the individual server workers. This only needs to be changed if you changed where predictor.py is located or is named.
nginx.conf: The configuration for the nginx master server that manages the multiple workers.
predictor.py: The algorithm-specific inference server. You can customize your own prediction related functions here.

Load model and other ML artefacts during runtime

This project will be packaged into a docker image and push to AWS ECR, which will be used later in Sagemaker model creation. In our case, we have different models and artefacts that will be used depends on the situation, therefore, we don’t want to keep all model files and other ML artefacts in this git repository and package into docker image. Instead, we wish to load them during runtime.

To solve this issue, we will need the 1st step, MLOps: Data versioning with DVC(put link here). In that project (let’s call it DVC repo), you already have your ML models and artefacts tracked by DVC. Then you can use DVC API to load models and artefacts from that DVC repo during runtime.

# Example of using dvc.api to load model from that DVC repomodel = pickle.loads(
    dvc.api.read(
        path="model/model.pkl",
        repo="https://github.com/example-dvc-repo.git",
        mode="rb",
        #rev="BRANCH"   # you can also specify branch or tag
    )
)

You can also check this useful video here!

Local testing

The subdirectory docker_test contains scripts and sample data for testing the built container image on the local machine. If you followed the mentioned example git repository, in this step, you need to change the sample data format in docker_test/predict.sh from x.csv to sample_data.json .

Testing steps:

Build image (inside the project folder)

$ docker build -t IMAGE_NAME:TAG .

Instantiate the container configured for training and serving (from the inside of docker_test folder)

$ ./train_local.sh IMAGE_NAME:TAG
$ ./serve_local.sh IMAGE_NAME:TAG

Run predictions against a locally instantiated server (start a new terminal -> terminal 2)

$ ./predict.sh

If the prediction algorithms succeed (in your predictor.py, you will see your result in this terminal (terminal 2). If not, check the error logs in the first terminal where you start the server. If local testing is all good, which means the docker image is ready to push to ECR for Sagemaker deploying models.

Create model on Sagemaker and apply your prediction logic to it

When the docker image is ready, the 2nd step is to create model on Sagemaker, and specify the model name and primary container. For the primary container, you specify the docker image (from last step) which contains the inference code, and environment that the inference code uses when you deploy the model for predictions.

You can create Sagemaker model via AWS CLI create-model or via AWS Amazon console. We use console in this case, so first login with your username and password and search for service Amazon Sagemaker. Then you can find the model here, shown in the Fig 2:

Click Models and choose Create model, and fill in all necessary part, shown in Fig 3. Then your model should be ready.

Configure Sagemaker Batch Transform job and start prediction

Now your model is available on Sagemaker, the next step is to configure the batch transform job and use that Sagemaker model for prediction.

From Fig 2, we click Batch transform jobs, and choose Create batch transform job. The details shows below:

Job name: The name of your batch transform job
Model name: Model created in 2nd step, see Fig 3
Instance type: Choose the instance based on your needs, check the price for different instances here.
Max payload size: Maximum size allowed for a mini-batch. Use 5MB here as an example, it means it will load as much as records in the dataset it can and perform prediction on them.
Batch strategy: Because we set max payload size, so here we use multi-record
Input/Output data configuration: We use JSON, and our data is formed line by line. So configure the split type for it.
S3 location for input data: It can be a specific data path or a directory. If it is a directory, it will execute batch transform on all data files in that directory.
S3 output path: it is a directory, after batch transform, it will download your data into that directory with same name, plus an extension: .out
Assemble with: Defines how to assemble the results of the transform job as a single S3 object.

Now your batch transform job is ready to go! If it succeed, you can see the status of your job is Completed (see Fig 5). And the output data should be in the corresponding directory on S3. If your job failed, you can check logs on CloudWatch (see Fig 6).