Build

Deploying your model to ComputeX (CX) allows you to leverage our infrastructure optimized for delivering the fastest serverless inference times. The deployment involves creating two Docker images: model-download-image and model-run-image.

The model-download-image solely contains the model which will be tensorized and stored on an attached NVMe SSD for exceptional model load times.
The model-run-image is a slim image carrying the necessary logic for executing inference.

Model Download Image

How it Works

The script, model_download.py, instructs the system on which HuggingFace model to download.
The model is tensorized and stored on an attached NVMe SSD, enabling rapid model load times.

Follow along with the example to see how to create the model download image.

Steps

Modify the model_download.py script to include the HuggingFace model you aim to download.
Build and upload the image to your private CX Docker registry hosted at registry.computex.ai.

docker build -t registry.computex.ai/<org-name>/model-download-image-{MODEL-NAME}:v1 .
docker push registry.computex.ai/<org-name>/model-download-image-{MODEL-NAME}:v1

Deploy your model to CX using the CLI command:

cx deploy --app <app-name> --model-image registry.computex.ai/<org-name>/model-download-image-{MODEL-NAME}:v1

Directory structure

Inside the container, the directory structure will be as follows:

/mnt
    └── {model_name}
        ├── pretrained_tokenizer
        ├── {model_name}
        └── {model_name}.tensors

Each app gets its own Persistent Volume ensuring exceptionally fast bootup times.

Model Run Image

How it Works

The script is tailored to load the model and execute inferences.
Depending on the model, you might need to adjust the script to use specific HuggingFace libraries.

Follow along with the example to see how to deploy a model to CX.

Steps

Update the code in load_model.py and predict.py based on your model's needs. Refer to your model's card in HuggingFace to ascertain which HuggingFace libraries you need to call.
Once done, package the image using Docker:

docker build -t registry.computex.ai/<org-name>/model-run-image-{MODEL-NAME}:v1 .
docker push registry.computex.ai/<org-name>/model-run-image-{MODEL-NAME}:v1

Deploy the model to CX using the CLI command and specify the --image flag. You should also define the GPU, number of CPU cores, and memory capacity:

cx deploy --app <app-name> --image registry.computex.ai/<org-name>/model-run-image-{MODEL-NAME}:v1 --gpu A40 --memory 16 --num-cpu-cores 8

PreviousGetting Started NextDeploy

Last updated 2 years ago