Qwak Model Anatomy

Preface

As an example, we will use the well-known Iris Classifier SVM model, which typically looks as follows:

from sklearn import svm
from sklearn import datasets

# Load training data
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Model Training
clf = svm.SVC(gamma='scale')
clf.fit(X, y)

Model directory structure

Below, we show the default and recommended structure of a model project on JFrog ML.

qwak_based_model/
ā”œā”€ā”€ main/
ā”‚   ā”œā”€ā”€ __init__.py    # Required for exporting model from main
ā”‚   ā”œā”€ā”€ model.py       # Qwak model definition.
ā”‚   ā”œā”€ā”€ util.py        # Represents any helper/util python modules.
ā”‚   ā”œā”€ā”€ <dependecies>  # Dependencies file - Poetry/PyPI/Conda
ā”œā”€ā”€ tests/
ā”‚   ā”œā”€ā”€ ut/            # Unit tests
ā”‚   ā”‚   ā”œā”€ā”€ sample_test.py
ā”‚   ā”œā”€ā”€ it/            # Integration tests
ā”‚   ā”‚   ā”œā”€ā”€ sample_test.py

Generating the directory structure

First, we will generate the directory structure for our JFrog ML-based model.

To do so, you can use the following command:

qwak models init \ 
   --model-directory <model-dir-path> \
   --model-class-name <model-class> \
   <dest>

Where

  • <model-dir-path>: Model directory name.
  • <model-class>: JFrog ML-based model class name (Camel case)
  • <dest>: Destination path on local host.

As an example, we will use the following command:

qwak models init \
   --model-directory iris_model \
   --model-class-name IrisClassifier \
   ~/

That will create a new directory named iris_model at the user's home directory.

main directory

main is the most important directory of a JFrog ML project.

Everything that is supposed to be part of the model artifact should be located in it.

QwakModel class

The first step is creating a model class, which defines the two mandatory functions:

  1. build - defines the model training / loading logic, invoked on build time.
  2. predict - defines the serving logic, invoked on every inference request.

And two optional ones:

  1. schema - defines the model interface - input and the output of your model.
  2. initialize_model - invoked when the model is loaded during the serving container initialization.

Read more about Qwak's model class method and how they can be used in the dedicated section.

For example, we can implement the Iris classifier in the following way:

import pandas as pd 
from sklearn import svm, datasets
from qwak.model.base import QwakModel
from qwak.model.schema import ModelSchema, InferenceOutput, ExplicitFeature
 
class IrisClassifier(QwakModel):
    def __init__(self):
        self._gamma = 'scale'
        self._model = None
 
    def build(self):
        iris = datasets.load_iris()
        X, y = iris.data, iris.target

        clf = svm.SVC(gamma=self._gamma)
        self._model = clf.fit(X, y)

    @api()
    def predict(self, df: pd.DataFrame) -> pd.DataFrame:
        return pd.DataFrame(data=self._model.predict(df), columns=['species'])
        
    def schema(self):
        return ModelSchema(
            inputs=[
                ExplicitFeature(name="sepal_length", type=float),
                ExplicitFeature(name="sepal_width", type=float),
                ExplicitFeature(name="petal_length", type=float),
                ExplicitFeature(name="petal_width", type=float)
            ],
            outputs=[
                InferenceOutput(name="species", type=str)
            ])

The main directory should be a valid Python module, meaning it should include a __init__.py file.

The __init__.py file lets the Python interpreter know that a directory contains code for a Python module (However the file must exist but its content isn't mandatory, It used for constructor fields initialization). This file should set up the imports for the Qwak model class, so it will be picked up by the model build process - in one of two ways:

from .model import IrisClassifier

Or

from .model import IrisClassifier

def load_model():
    return IrisClassifier()

The load model function gives more control over how the model class should be initialized during the build process.

Dependency files

Most projects depend on external packages to build and run correctly. JFrog ML downloads and links the dependencies on build time - based on a Python virtual environment which is used both for build and serving contexts.

JFrog ML supports the following types of dependency descriptors. Pick one! Do not include multiple dependency configuration files at once.

šŸ“˜

Qwak SDK automatic dependency

Note that qwak-sdk does not need to be added as an explicit dependency.

During build time, the version of the Qwak SDK used to build the model is automatically injected.

Conda

The conda.yml or conda.yaml should be stored in the main directory. For example:

name: iris-calssifier
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.8
  - pip=20.0.3
  - scikit-learn=1.0.1

pip

Store the requirements.txt in the main directory:

scikit-learn==1.0.1

Poetry

In this case, the main directory should contain the pyproject.toml file:

[tool.poetry]
name = "iris-classifier"
version = "0.1.0"
description = ""
authors = ["Your Name <[email protected]>"]

[tool.poetry.dependencies]
python = "^3.8"
scikit-learn = "1.0.1"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

Packages with Qwak-compatible models

It's also possible to store a Qwak-compatible model in a Python package and add it as a dependency to the Qwak model. We have to implement the model class and build it as a Python package. Note that our package needs qwak-sdk as its dependency.

In our example, we assume that we have created a package called importqwakmodelfrompackage that contains a submodule model with the TestModel class. The TestModel class implements the QwakModel.

Later, when we configure the Qwak model:

  1. Create a new Qwak model using the qwak models init command.

  2. Add the package to the dependencies.

For example, if we use pip to manage dependencies and the whl file is stored in a GitHub repository, we must add the following line to our requirements.txt file:

https://raw.githubusercontent.com/[github_user]/[github_repository_name]/[branch_name]/[path_to_directory]/[whl_file_name]#egg=[package_name]

If we use a different package manager, we should follow their instructions regarding adding whl files or private package repositories as dependencies.

  1. We must remove the content of the __init__.py file.

  2. In the model.py file, we import the Qwak model class (the one added as a dependency in pip) and implement the load_model function to return the class (not an instance of the class!!!)

    For example:

from importqwakmodelfrompackage.model import TestModel


def load_model():
    return TestModel

tests directory

The directory tests are where tests of each component in the model reside.

Unit testing

This Python file will define unit tests that will run during the build process. For example, if we will define a helper function in <model-dir>/main/util.py as follows:

def add(x, y):
    return x + y

Then we can define the following test:

from main.util import add

def add_test():
    assert add(3,2) == 5

Integration testing

This Python file will define integration tests we will run during the build process after the serving container is built and initialized.

During the integration tests, a real model deployment will be running, and you will be able to perform inference using pytest fixture, which will be auto-configured with a client that can be invoked against the model currently being built.

For example:

import pandas as pd
from qwak.testing.fixtures import real_time_client

def test_iris_classifier(real_time_client):
    feature_vector = [[5.1, 3.5, 1.4, 0.2]]
    iris_type = real_time_client.predict(feature_vector)
    assert iris_type == 1

Notes

  • Files operations - The current working directory of build execution is the root directory of the model. For example, if a file is located in ./main/sample.txt and you want to read it simply open it in path ./main/sample.txt where . represents the model root directory.
  • Model fields - Model fields should be objects which can be pickled, S3 client, for example, can't pickle due to the fact that the session should remain active.

Additional directories

By default, the Qwak-SDK does not copy any other directories from the build directory, only main and tests.

However, it is possible to modify this behavior by adding the --dependency_required_folders parameter to the model build command.

Examples:

qwak models build --model-id your_model_id --dependency_required_folders additional_dir .
qwak models build --model-id your_model_id --dependency_required_folders additional_dir --dependency_required_folders some_other_dir .

Within the build container, additional dependency folders are placed alongside main and tests in the /qwak/model_dir/.

Here's an illustrative directory structure:

/qwak/model_dir/
.
ā”œā”€ā”€ main                   # Main directory containing core code
ā”‚   ā”œā”€ā”€ __init__.py        
ā”‚   ā”œā”€ā”€ model.py            
ā”‚   ā””ā”€ā”€ requirements.txt   
ā”‚ 
ā”œā”€ā”€ additional_dir         # Additional dependency directory added with --dependency-required-folders
ā”‚   ā””ā”€ā”€ files..
ā”‚ 
ā”œā”€ā”€ tests                  
ā”‚   ā””ā”€ā”€ ...                
ā””ā”€ā”€ 

For smooth integration, ensure that your additional folders are placed in the same directory with main. When specifying these directories in your build command, use their names directly, omitting any ./ prefix.

Should your additional dependency folder not be located within the current working directory, it's necessary to specify its absolute path. However, this will lead to the recreation of the specified path hierarchy within the build container, so plan accordingly to maintain the desired structure.