TensorFlow Serving

In the chapter 8, we used Keras for predicting the classes of images. In the chapter 9, we converted the model to TF Lite and used it for making predictions from AWS Lambda. In this chapter, we’ll achieve this using TensorFlow Serving.

TensorFlow Serving, or simply TF Serving, is a system developed for serving TensorFlow models, primarily focused on servers. Unlike AWS Lambda deploying models with TF serving and Kubernetes is a better option when we need to dealing with large amount of data.

The serving architecture

Putting the model into TF Serving is not enough, as TF Serving is solely dedicated to serving the model. Therefore, we need to implement the preprocessing method for the data in a separate service.

We need two components for a system for serving a deep learning model:

image.png

We would can take the code from that lambda function of the previous chapter, put it in Flask, and use it for serving the model. But it won’t be the most effective one.

When working with a large amount of data, it’s important to properly utilize resources. This is why we need to split one service into two distinct services.

The saved_model format

In the chapter 8, we trained a Keras model and saved it in the keras format. You can download the model with the following code:

!wget <https://github.com/aletbm/MySolutions_MLZoomcamp2024_DataTalks.Club/raw/refs/heads/main/08_Neural_Networks_and_Deep_Learning/xception_model.keras>

<aside> ⚠️

We use TensorFlow version 2.14 to save the model. I recommend that you use the same version to load the model.

!pip install tensorflow==2.14

</aside>

TF Serving needs a special format which is called saved_model format.

Resuming the saved model: