Models Build SDK

Build and deploy ML models effortlessly from your notebook or any Python environment

Overview

Data scientists often train models in Workspaces, Jupyter notebooks or locally, and require a seamless process to save, register and manage model versions for production use.

JFrog ML provides the Build Model SDK to address this need, simplifying the transition from model training to deployment, from your local machine or your Jupyter notebook.


Key Features

1. Build models from Workspaces

JFrog ML Build SDK simplifies the registration of locally or Jupyter notebook-based model training. It ensures precise versioning, tracking, and effortless transition from research to deployment in production.

2. Python-driven model builds

Automate model builds using Python and seamlessly integrate with continuous integration/continuous deployment (CI/CD) pipelines and various automation workflows.

3. Streamlined versioning for pre-built models

Build and register pre-trained models from any Python environment by supplying existing trained model instances, skipping remote build phases.


Using Build SDK

This document will explore the various options available when using the Build SDK.

In general, there are two choices when working with the Build SDK:

  1. Providing a pre-trained model artifact to the build_model method.
  2. Omitting a pre-built model, in which case the SDK will upload local model files and builds the model on JFrog ML.
from qwak import QwakClient
from qwak.model.tools import run_local

# Creating an instance of the Qwak client
client = QwakClient()

# Triggering a build with model files from the local `main` directory
# This option does not provide a pre-built model, and the model is build on Qwak 
client.build_model(
  model_id='my_example_model',
)

# Triggering a build with model files from the local `main` directory
# This option provides a pre-built model, and the build method will not be called remotely.
model = MyQwakModel()
model.build()

client.build_model(
  model_id='my_example_model',
  prebuilt_qwak_model=model
)

Folder Structure

❗️

Note: File and folder structure is important when using the Build SDK, as the files are uploaded to Qwak in that structure.

The Build SDK uploads local model files together with the trained model object. By default, the Build SDK uploads the main folder under the current file location.

Make sure to place your model files in the main directory.

-> your-model-directory
---- build_sdk_runner.py
---> main
------ model.py

It's possible to change the uploaded directory by providing an explicit path as will be described in this document.


Building pre-trained models

Use the Build SDK to build models with an existing instance of a trained model to upload the pre-trained model artifact. This flexibility empowers data scientists to train or fine-tune models within notebooks, effortlessly incorporate the trained versions into the model registry, and deploy them to production environments.

Creating a model instance

In this example, we'll use the Titanic model, which can be found on the Qwak Examples repository.

Our folder structure will look as follows:

titanic
-- run_build.py
-- main
---- __init__.py
---- model.py
---- requirements.txt

🚧

Note: Make sure to import from qwak.model.tools import run_local when using the build SDK. The build command cannot complete without it.

pandas
scikit-learn
catboost
from .model import TitanicSurvivalPrediction


def load_model():
    return TitanicSurvivalPrediction()
import numpy as np
import pandas as pd
import qwak
# Important to call run_local when using the Build SDK
from qwak.model.tools import run_local
from catboost import CatBoostClassifier, Pool, cv
from catboost.datasets import titanic
from qwak.model.base import QwakModel
from sklearn.model_selection import train_test_split


class TitanicSurvivalPrediction(QwakModel):
    def __init__(self):
        self.model = CatBoostClassifier(
            iterations=1000,
            custom_loss=["Accuracy"],
            loss_function="Logloss",
            learning_rate=None,
        )

    def build(self):
        titanic_train, _ = titanic()
        titanic_train.fillna(-999, inplace=True)

        x = titanic_train.drop(["Survived", "PassengerId"], axis=1)
        y = titanic_train.Survived

        x_train, x_test, y_train, y_test = train_test_split(
            x, y, train_size=0.85, random_state=42
        )

        # mark categorical features
        cate_features_index = np.where(x_train.dtypes != float)[0]

        self.model.fit(
            x_train,
            y_train,
            cat_features=cate_features_index,
            eval_set=(x_test, y_test),
        )

        # Cross validating the model (5-fold)
        cv_data = cv(
            Pool(x, y, cat_features=cate_features_index),
            self.model.get_params(),
            fold_count=5,
        )

    @qwak.api()
    def predict(self, df: pd.DataFrame) -> pd.DataFrame:
        df = df.drop(["PassengerId"], axis=1)
        return pd.DataFrame(
            self.model.predict_proba(df[self.model.feature_names_])[:, 1],
            columns=['Survived_Probability']
        )

Training the model

Let's create a new model instance and run the build method to train it.

from titanic.main import TitanicSurvivalPrediction

# Create a new model instance
qwak_model_instance = TitanicSurvivalPrediction()

# Run the build function which trains the model
qwak_model_instance.build()
Learning rate set to 0.029583
0:	learn: 0.6756870	test: 0.6751626	best: 0.6751626 (0)	total: 66.5ms	remaining: 1m 6s
1:	learn: 0.6578988	test: 0.6579213	best: 0.6579213 (1)	total: 69.8ms	remaining: 34.8s
2:	learn: 0.6427410	test: 0.6427901	best: 0.6427901 (2)	total: 72.5ms	remaining: 24.1s

Registering the trained model

Now that we trained a model locally, we want to register this model version and save in the in Qwak model register as a new build, so we can later deploy it to production.

The below code will register a new build under the titanic_survival_prediction model, with the trained titanic model we just created and a tag: prebuilt

from qwak import QwakClient
from qwak.model.tools import run_local

# Creating an instance of the Qwak client
client = QwakClient()

# Triggering a build with model files from the local `main` directory
client.build_model(
  model_id='titanic_survival_prediction',
  prebuilt_qwak_model=qwak_model_instance,  ## Providing a trained instance to skip remote build
  tags=['prebuilt']
)

Fetching model code - Using given build ID - 248f3261-da2d-4fdc-8b7a-53f93f5c9909
Fetching model code - Found dependency type: PIP by file: main/requirements.txt
Fetching model code - Successfully fetched model code
Registering qwak build -  10%
Registering qwak build -  20%
Registering qwak build -  30%
Registering qwak build -  40%
Registering qwak build -  50%
Registering qwak build -  60%
Registering qwak build -  70%
Registering qwak build -  80%
Registering qwak build -  90%
Registering qwak build - 100%
Registering qwak build - Start remote build - 248f3261-da2d-4fdc-8b7a-53f93f5c9909
Registering qwak build - Remote build started successfully

Build ID 248f3261-da2d-4fdc-8b7a-53f93f5c9909 was triggered remotely

To follow build logs using Qwak platform:
https://app.qwak.ai/projects/176fc6e5-0725-42af-a574-415066c01a12/titanic_survival_prediction/build/248f3261-da2d-4fdc-8b7a-53f93f5c9909

Build SDK configuration

The Build SDK supports a multitude of parameters which users may configure

DescriptionRequiredDefault ValueDescription
model_idYesModel ID on the Qwak platform
main_module_pathNo"main"Path to the local folder where model files exists
dependencies_fileNoPath to a Python dependencies file, in pip, poetry or conda format.
dependencies_listNoList of strict Python dependencies
tagsNoList of tags saved on the remote model build
instanceNo"small"Instance type during mode build
gpu_compatibleNoBuild the model using a GPU compatible image
run_testsNoTrueRun tests during model build
validate_build_artifactNoTrueValidate model deployment during build phase
validate_build_artifact_timeoutNoModel validation timeout
qwak_modelNoProviding a prebuilt QwakModel instance will skip the build phase and use a pre-existing trained model version.

For example, the below is an example using the advanced features of the Build SDK.

The code snippet using a medium instance to build the model, provide build tags and build a GPU compatible image.

from qwak import QwakClient
from qwak.model.tools import run_local

qwak_model = TitanicSurvivalPrediction()
qwak_model.build()

# Creating an instance of the Qwak client
client = QwakClient()

client.build_model(
  model_id='titanic_survival_prediction',
  main_module_path='main',
  dependencies_file="requirements.txt",
  prebuilt_qwak_model=qwak_model,
  tags=['prebuilt', 'local'],
  instance="medium",
  gpu_compatible=True 
)

Unsupported parameters in Build SDK

The build SDK support most of the parameters that are supported in the Qwak CLI under qwak models build

The following parameters are not supported:

environmentQwak environment
purchase-optionReceiving only the build id and any exception as return values (Depends on --programmatic in order to avoid UI output)
deployment-instanceThe instance size to automatically deploy the build after completion
deployAutomatically deploy build after completion
json-logsReturn the live build logs as JSON
param-listProvide a list of parameters to the build
main-dirChange the name of the main model directory
env-varsProvide a list of environment variables
base-imageChange the base image of the model build
--cache / -no-cacheUse or disable docker cache
git-credentialsProvide git credentials token
git-credentials-secretThe git credentials secret
git-branchUse a different git branch