Qwak Model Anatomy
Preface
As an example, we will use the well-known Iris Classifier SVM model, which typically looks as follows:
from sklearn import svm
from sklearn import datasets
# Load training data
iris = datasets.load_iris()
X, y = iris.data, iris.target
# Model Training
clf = svm.SVC(gamma='scale')
clf.fit(X, y)
Model directory structure
Below, we show the default and recommended structure of a model project on JFrog ML.
qwak_based_model/
āāā main/
ā āāā __init__.py # Required for exporting model from main
ā āāā model.py # Qwak model definition.
ā āāā util.py # Represents any helper/util python modules.
ā āāā <dependecies> # Dependencies file - Poetry/PyPI/Conda
āāā tests/
ā āāā ut/ # Unit tests
ā ā āāā sample_test.py
ā āāā it/ # Integration tests
ā ā āāā sample_test.py
Generating the directory structure
First, we will generate the directory structure for our JFrog ML-based model.
To do so, you can use the following command:
qwak models init \
--model-directory <model-dir-path> \
--model-class-name <model-class> \
<dest>
Where
<model-dir-path>
: Model directory name.<model-class>
: JFrog ML-based model class name (Camel case)<dest>
: Destination path on local host.
As an example, we will use the following command:
qwak models init \
--model-directory iris_model \
--model-class-name IrisClassifier \
~/
That will create a new directory named iris_model
at the user's home directory.
main
directory
main
directorymain
is the most important directory of a JFrog ML project.
Everything that is supposed to be part of the model artifact should be located in it.
QwakModel
class
QwakModel
classThe first step is creating a model class, which defines the two mandatory functions:
build
- defines the model training / loading logic, invoked on build time.predict
- defines the serving logic, invoked on every inference request.
And two optional ones:
schema
- defines the model interface - input and the output of your model.initialize_model
- invoked when the model is loaded during the serving container initialization.
Read more about Qwak's model class method and how they can be used in the dedicated section.
For example, we can implement the Iris classifier in the following way:
import pandas as pd
from sklearn import svm, datasets
from qwak.model.base import QwakModel
from qwak.model.schema import ModelSchema, InferenceOutput, ExplicitFeature
class IrisClassifier(QwakModel):
def __init__(self):
self._gamma = 'scale'
self._model = None
def build(self):
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf = svm.SVC(gamma=self._gamma)
self._model = clf.fit(X, y)
@api()
def predict(self, df: pd.DataFrame) -> pd.DataFrame:
return pd.DataFrame(data=self._model.predict(df), columns=['species'])
def schema(self):
return ModelSchema(
inputs=[
ExplicitFeature(name="sepal_length", type=float),
ExplicitFeature(name="sepal_width", type=float),
ExplicitFeature(name="petal_length", type=float),
ExplicitFeature(name="petal_width", type=float)
],
outputs=[
InferenceOutput(name="species", type=str)
])
The main
directory should be a valid Python module, meaning it should include a __init__.py
file.
The __init__.py
file lets the Python interpreter know that a directory contains code for a Python module (However the file must exist but its content isn't mandatory, It used for constructor fields initialization). This file should set up the imports for the Qwak model class, so it will be picked up by the model build process - in one of two ways:
from .model import IrisClassifier
Or
from .model import IrisClassifier
def load_model():
return IrisClassifier()
The load model function gives more control over how the model class should be initialized during the build process.
Dependency files
Most projects depend on external packages to build and run correctly. JFrog ML downloads and links the dependencies on build time - based on a Python virtual environment which is used both for build and serving contexts.
JFrog ML supports the following types of dependency descriptors. Pick one! Do not include multiple dependency configuration files at once.
Qwak SDK automatic dependency
Note that
qwak-sdk
does not need to be added as an explicit dependency.During build time, the version of the Qwak SDK used to build the model is automatically injected.
Conda
The conda.yml
or conda.yaml
should be stored in the main
directory. For example:
name: iris-calssifier
channels:
- defaults
- conda-forge
dependencies:
- python=3.8
- pip=20.0.3
- scikit-learn=1.0.1
pip
Store the requirements.txt
in the main
directory:
scikit-learn==1.0.1
Poetry
In this case, the main
directory should contain the pyproject.toml
file:
[tool.poetry]
name = "iris-classifier"
version = "0.1.0"
description = ""
authors = ["Your Name <[email protected]>"]
[tool.poetry.dependencies]
python = "^3.8"
scikit-learn = "1.0.1"
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
Packages with Qwak-compatible models
It's also possible to store a Qwak-compatible model in a Python package and add it as a dependency to the Qwak model. We have to implement the model class and build it as a Python package. Note that our package needs qwak-sdk
as its dependency.
In our example, we assume that we have created a package called importqwakmodelfrompackage
that contains a submodule model
with the TestModel
class. The TestModel
class implements the QwakModel
.
Later, when we configure the Qwak model:
-
Create a new Qwak model using the
qwak models init
command. -
Add the package to the dependencies.
For example, if we use pip to manage dependencies and the whl file is stored in a GitHub repository, we must add the following line to our requirements.txt
file:
https://raw.githubusercontent.com/[github_user]/[github_repository_name]/[branch_name]/[path_to_directory]/[whl_file_name]#egg=[package_name]
If we use a different package manager, we should follow their instructions regarding adding whl files or private package repositories as dependencies.
-
We must remove the content of the
__init__.py
file. -
In the
model.py
file, we import the Qwak model class (the one added as a dependency in pip) and implement theload_model
function to return the class (not an instance of the class!!!)For example:
from importqwakmodelfrompackage.model import TestModel
def load_model():
return TestModel
tests
directory
tests
directoryThe directory tests are where tests of each component in the model reside.
Unit testing
This Python file will define unit tests that will run during the build process. For example, if we will define a helper function in <model-dir>/main/util.py
as follows:
def add(x, y):
return x + y
Then we can define the following test:
from main.util import add
def add_test():
assert add(3,2) == 5
Integration testing
This Python file will define integration tests we will run during the build process after the serving container is built and initialized.
During the integration tests, a real model deployment will be running, and you will be able to perform inference using pytest fixture, which will be auto-configured with a client that can be invoked against the model currently being built.
For example:
import pandas as pd
from qwak.testing.fixtures import real_time_client
def test_iris_classifier(real_time_client):
feature_vector = [[5.1, 3.5, 1.4, 0.2]]
iris_type = real_time_client.predict(feature_vector)
assert iris_type == 1
Notes
- Files operations - The current working directory of build execution is the root directory of the model. For example, if a file is located in
./main/sample.txt
and you want to read it simply open it in path./main/sample.txt
where.
represents the model root directory. - Model fields - Model fields should be objects which can be pickled, S3 client, for example, can't pickle due to the fact that the session should remain active.
Additional directories
By default, the Qwak-SDK does not copy any other directories from the build directory, only main
and tests
.
However, it is possible to modify this behavior by adding the --dependency_required_folders
parameter to the model build command.
Examples:
qwak models build --model-id your_model_id --dependency_required_folders additional_dir .
qwak models build --model-id your_model_id --dependency_required_folders additional_dir --dependency_required_folders some_other_dir .
Within the build container, additional dependency folders are placed alongside main
and tests
in the /qwak/model_dir/
.
Here's an illustrative directory structure:
/qwak/model_dir/
.
āāā main # Main directory containing core code
ā āāā __init__.py
ā āāā model.py
ā āāā requirements.txt
ā
āāā additional_dir # Additional dependency directory added with --dependency-required-folders
ā āāā files..
ā
āāā tests
ā āāā ...
āāā
For smooth integration, ensure that your additional folders are placed in the same directory with main
. When specifying these directories in your build command, use their names directly, omitting any ./
prefix.
Should your additional dependency folder not be located within the current working directory, it's necessary to specify its absolute path. However, this will lead to the recreation of the specified path hierarchy within the build container, so plan accordingly to maintain the desired structure.
Updated 2 months ago