Build
Last updated
Last updated
Deploying your model to ComputeX (CX) allows you to leverage our infrastructure optimized for delivering the fastest serverless inference times. The deployment involves creating two Docker images: model-download-image
and model-run-image
.
The model-download-image
solely contains the model which will be tensorized and stored on an attached NVMe SSD for exceptional model load times.
The model-run-image
is a slim image carrying the necessary logic for executing inference.
The script, , instructs the system on which HuggingFace model to download.
The model is tensorized and stored on an attached NVMe SSD, enabling rapid model load times.
Follow along with the to see how to create the model download image.
Modify the script to include the HuggingFace model you aim to download.
Build and upload the image to your private CX Docker registry hosted at registry.computex.ai
.
Deploy your model to CX using the CLI command:
Inside the container, the directory structure will be as follows:
Each app gets its own Persistent Volume ensuring exceptionally fast bootup times.
The script is tailored to load the model and execute inferences.
Depending on the model, you might need to adjust the script to use specific HuggingFace libraries.
Update the code in load_model.py and predict.py based on your model's needs. Refer to your model's card in HuggingFace to ascertain which HuggingFace libraries you need to call.
Once done, package the image using Docker:
Deploy the model to CX using the CLI command and specify the --image flag. You should also define the GPU, number of CPU cores, and memory capacity:
Follow along with the to see how to deploy a model to CX.