Models Build SDK

Overview

Data scientists often train models in Workspaces, Jupyter notebooks or locally, and require a seamless process to save, register and manage model versions for production use.

JFrog ML provides the Build Model SDK to address this need, simplifying the transition from model training to deployment, from your local machine or your Jupyter notebook.

Key Features

1. Build models from Workspaces

JFrog ML Build SDK simplifies the registration of locally or Jupyter notebook-based model training. It ensures precise versioning, tracking, and effortless transition from research to deployment in production.

2. Python-driven model builds

Automate model builds using Python and seamlessly integrate with continuous integration/continuous deployment (CI/CD) pipelines and various automation workflows.

3. Streamlined versioning for pre-built models

Build and register pre-trained models from any Python environment by supplying existing trained model instances, skipping remote build phases.

Using Build SDK

This document will explore the various options available when using the Build SDK.

In general, there are two choices when working with the Build SDK:

Providing a pre-trained model artifact to the build_model method.
Omitting a pre-built model, in which case the SDK will upload local model files and builds the model on JFrog ML.

from qwak import QwakClient
from qwak.model.tools import run_local

# Creating an instance of the Qwak client
client = QwakClient()

# Triggering a build with model files from the local `main` directory
# This option does not provide a pre-built model, and the model is build on Qwak 
client.build_model(
  model_id='my_example_model',
)

# Triggering a build with model files from the local `main` directory
# This option provides a pre-built model, and the build method will not be called remotely.
model = MyQwakModel()
model.build()

client.build_model(
  model_id='my_example_model',
  prebuilt_qwak_model=model
)

Folder Structure

❗️
File Structure Requirements
Note:
When using the Build SDK, your file and folder structure is preserved when uploading to JFrogML.
Please ensure:

Avoid Python files with top-level executable statements in the build directory

All code with side effects should be guarded with if __name__ == "__main__" blocks

Place shared functionality in properly encapsulated classes and functions

Failure to follow these guidelines may cause unintended code execution during the import process.

The Build SDK uploads local model files together with the trained model object. By default, the Build SDK uploads the main folder under the current file location.

Make sure to place your model files in the main directory.

-> your-model-directory
---- build_sdk_runner.py
---> main
------ model.py

It's possible to change the uploaded directory by providing an explicit path as will be described in this document.

Building pre-trained models

Use the Build SDK to build models with an existing instance of a trained model to upload the pre-trained model artifact. This flexibility empowers data scientists to train or fine-tune models within notebooks, effortlessly incorporate the trained versions into the model registry, and deploy them to production environments.

Creating a model instance

In this example, we'll use the Titanic model, which can be found on the Qwak Examples repository.

Our folder structure will look as follows:

titanic
-- run_build.py
-- main
---- __init__.py
---- model.py
---- requirements.txt

🚧
Note: Make sure to import from qwak.model.tools import run_local when using the build SDK. The build command cannot complete without it.

pandas
scikit-learn
catboost

from .model import TitanicSurvivalPrediction


def load_model():
    return TitanicSurvivalPrediction()

import numpy as np
import pandas as pd
import qwak
# Important to call run_local when using the Build SDK
from qwak.model.tools import run_local
from catboost import CatBoostClassifier, Pool, cv
from catboost.datasets import titanic
from qwak.model.base import QwakModel
from sklearn.model_selection import train_test_split


class TitanicSurvivalPrediction(QwakModel):
    def __init__(self):
        self.model = CatBoostClassifier(
            iterations=1000,
            custom_loss=["Accuracy"],
            loss_function="Logloss",
            learning_rate=None,
        )

    def build(self):
        titanic_train, _ = titanic()
        titanic_train.fillna(-999, inplace=True)

        x = titanic_train.drop(["Survived", "PassengerId"], axis=1)
        y = titanic_train.Survived

        x_train, x_test, y_train, y_test = train_test_split(
            x, y, train_size=0.85, random_state=42
        )

        # mark categorical features
        cate_features_index = np.where(x_train.dtypes != float)[0]

        self.model.fit(
            x_train,
            y_train,
            cat_features=cate_features_index,
            eval_set=(x_test, y_test),
        )

        # Cross validating the model (5-fold)
        cv_data = cv(
            Pool(x, y, cat_features=cate_features_index),
            self.model.get_params(),
            fold_count=5,
        )

    @qwak.api()
    def predict(self, df: pd.DataFrame) -> pd.DataFrame:
        df = df.drop(["PassengerId"], axis=1)
        return pd.DataFrame(
            self.model.predict_proba(df[self.model.feature_names_])[:, 1],
            columns=['Survived_Probability']
        )

Training the model

Let's create a new model instance and run the build method to train it.

from titanic.main import TitanicSurvivalPrediction

# Create a new model instance
qwak_model_instance = TitanicSurvivalPrediction()

# Run the build function which trains the model
qwak_model_instance.build()

Learning rate set to 0.029583
0:	learn: 0.6756870	test: 0.6751626	best: 0.6751626 (0)	total: 66.5ms	remaining: 1m 6s
1:	learn: 0.6578988	test: 0.6579213	best: 0.6579213 (1)	total: 69.8ms	remaining: 34.8s
2:	learn: 0.6427410	test: 0.6427901	best: 0.6427901 (2)	total: 72.5ms	remaining: 24.1s

Registering the trained model

Now that we trained a model locally, we want to register this model version and save in the in Qwak model register as a new build, so we can later deploy it to production.

The below code will register a new build under the titanic_survival_prediction model, with the trained titanic model we just created and a tag: prebuilt

from qwak import QwakClient
from qwak.model.tools import run_local

# Creating an instance of the Qwak client
client = QwakClient()

# Triggering a build with model files from the local `main` directory
client.build_model(
  model_id='titanic_survival_prediction',
  prebuilt_qwak_model=qwak_model_instance,  ## Providing a trained instance to skip remote build
  tags=['prebuilt']
)


Fetching model code - Using given build ID - 248f3261-da2d-4fdc-8b7a-53f93f5c9909
Fetching model code - Found dependency type: PIP by file: main/requirements.txt
Fetching model code - Successfully fetched model code
Registering qwak build -  10%
Registering qwak build -  20%
Registering qwak build -  30%
Registering qwak build -  40%
Registering qwak build -  50%
Registering qwak build -  60%
Registering qwak build -  70%
Registering qwak build -  80%
Registering qwak build -  90%
Registering qwak build - 100%
Registering qwak build - Start remote build - 248f3261-da2d-4fdc-8b7a-53f93f5c9909
Registering qwak build - Remote build started successfully

Build ID 248f3261-da2d-4fdc-8b7a-53f93f5c9909 was triggered remotely

To follow build logs using Qwak platform:
https://app.qwak.ai/projects/176fc6e5-0725-42af-a574-415066c01a12/titanic_survival_prediction/build/248f3261-da2d-4fdc-8b7a-53f93f5c9909

Build SDK configuration

The Build SDK supports a multitude of parameters which users may configure

Description	Required	Default Value	Description
`model_id`	Yes		Model ID on the Qwak platform
`main_module_path`	No	"main"	Path to the local folder where model files exists
`dependencies_file`	No		Path to a Python dependencies file, in pip, poetry or conda format.
`dependencies_list`	No		List of strict Python dependencies
`tags`	No		List of tags saved on the remote model build
`instance`	No	"small"	Instance type during mode build
`gpu_compatible`	No		Build the model using a GPU compatible image
`run_tests`	No	True	Run tests during model build
`validate_build_artifact`	No	True	Validate model deployment during build phase
`validate_build_artifact_timeout`	No		Model validation timeout
`qwak_model`	No		Providing a prebuilt QwakModel instance will skip the build phase and use a pre-existing trained model version.

For example, the below is an example using the advanced features of the Build SDK.

The code snippet using a medium instance to build the model, provide build tags and build a GPU compatible image.

from qwak import QwakClient
from qwak.model.tools import run_local

qwak_model = TitanicSurvivalPrediction()
qwak_model.build()

# Creating an instance of the Qwak client
client = QwakClient()

client.build_model(
  model_id='titanic_survival_prediction',
  main_module_path='main',
  dependencies_file="requirements.txt",
  prebuilt_qwak_model=qwak_model,
  tags=['prebuilt', 'local'],
  instance="medium",
  gpu_compatible=True 
)

Unsupported parameters in Build SDK

The build SDK support most of the parameters that are supported in the Qwak CLI under qwak models build

The following parameters are not supported:


`environment`	Qwak environment
`purchase-option`	Receiving only the build id and any exception as return values (Depends on --programmatic in order to avoid UI output)
`deployment-instance`	The instance size to automatically deploy the build after completion
`deploy`	Automatically deploy build after completion
`json-logs`	Return the live build logs as JSON
`param-list`	Provide a list of parameters to the build
`main-dir`	Change the name of the `main` model directory
`env-vars`	Provide a list of environment variables
`base-image`	Change the base image of the model build
`--cache / -no-cache`	Use or disable docker cache
`git-credentials`	Provide git credentials token
`git-credentials-secret`	The git credentials secret
`git-branch`	Use a different git branch