Qwak Model Anatomy
Preface
As an example, we will use the well-known Iris Classifier SVM model, which typically looks as follows:
# train.py
from sklearn import svm
from sklearn import datasets
# Load training data
iris = datasets.load_iris()
X, y = iris.data, iris.target
# Model Training
clf = svm.SVC(gamma='scale')
clf.fit(X, y)
Model Directory Structure
Below, we show the default (and the recommended) structure of a model project.
The Qwak platform offers multiple ways to customize the directory structure. We show them in other build tutorials.
qwak_based_model/
├── main/
│ ├── __init__.py # Required for exporting model from main
│ ├── model.py # Qwak model definition.
│ ├── util.py # Represents any helper/util python modules.
│ ├── <dependecies> # Dependencies file - Poetry/PyPI/Conda
├── tests/
│ ├── ut/ # Unit tests
│ │ ├── sample_test.py
│ ├── it/ # Integration tests
│ │ ├── sample_test.py
Generating a Directory Structure
First, we will generate the directory structure for our Qwak-based model as described above.
To do so, you can use the following command:
qwak models init \
--model-directory <model-dir-path> \
--model-class-name <model-class> \
<dest>
Where
<model-dir-path>
: Model directory name.
<model-class>
: Qwak-based model class name (Camel case)
<dest>
: Destination path on local host.
As an example, we will use the following command:
qwak models init \
--model-directory iris_model \
--model-class-name IrisClassifier \
~/
That will create a new directory named iris_model
at the user's home directory.
'main' directory
main
is the most important directory of a Qwak project.
Everything that is supposed to be part of the model artifact should be located in it.
Qwak Model Class
The first step is creating a model class, which defines the two mandatory functions:
build
- defines the model training / loading logic, invoked on build time.predict
- defines the serving logic, invoked on every inference request.
And two optional ones:
schema
- defines the model interface - input and the output of your model.initialize_model
- invoked when the model is loaded during the serving container initialization.
Read more about Qwak's model class method and how they can be used in the dedicated section.
For example, we can implement the Iris classifier in the following way:
import pandas as pd
from sklearn import svm, datasets
from qwak import api, QwakModelInterface
from qwak.model.schema import ModelSchema, Prediction, ExplicitFeature
class IrisClassifier(QwakModelInterface):
def __init__(self):
self._gamma = 'scale'
self._model = None
def build(self):
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf = svm.SVC(gamma=self._gamma)
self._model = clf.fit(X, y)
@api()
def predict(self, df: pd.DataFrame) -> pd.DataFrame:
return pd.DataFrame(data=self._model.predict(df), columns=['species'])
def schema(self):
return ModelSchema(
features=[
ExplicitFeature(name="sepal_length", type=float),
ExplicitFeature(name="sepal_width", type=float),
ExplicitFeature(name="petal_length", type=float),
ExplicitFeature(name="petal_width", type=float)
],
predictions=[
Prediction(name="species", type=str)
])
The main
directory should be a valid Python module, meaning it should include a __init__.py
file.
The __init__.py
file lets the Python interpreter know that a directory contains code for a Python module (However the file must exist but its content isn't mandatory, It used for constructor fields initialization). This file should set up the imports for the Qwak model class, so it will be picked up by the model build process - in one of two ways:
from .model import IrisClassifier
Or
from .model import IrisClassifier
def load_model():
return IrisClassifier()
The load model function gives more control over how the model class should be initialized during the build process.
Dependency File
Most projects depend on external packages to build and run correctly. Qwak downloads and links the dependencies on build time - based on a Python virtual environment which is used both for build and serving contexts.
Qwak supports the following types of dependency descriptors. Pick one! Do not include multiple dependency configuration files at once.
Qwak Dependency
Note that
qwak-sdk
does not need to be added as an explicit dependency. During build time, the version of the Qwak SDK used to build the model is automatically injected.
Conda
The conda.yml
or conda.yaml
should be stored in the main
directory. For example:
name: iris-calssifier
channels:
- defaults
- conda-forge
dependencies:
- python=3.8
- pip=20.0.3
- scikit-learn=1.0.1
pip
Store the requirements.txt
in the main
directory:
scikit-learn==1.0.1
Poetry
In this case, the main
directory should contain the pyproject.toml
file:
[tool.poetry]
name = "iris-classifier"
version = "0.1.0"
description = ""
authors = ["Your Name <[email protected]>"]
[tool.poetry.dependencies]
python = "^3.8"
scikit-learn = "1.0.1"
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
Building and deploying Python packages with Qwak-compatible models
It's also possible to store a Qwak-compatible model in a Python package and add it as a dependency to the Qwak model. We have to implement the model class and build it as a Python package. Note that our package needs qwak-sdk
as its dependency.
In our example, we assume that we have created a package called importqwakmodelfrompackage
that contains a submodule model
with the TestModel
class. The TestModel
class implements the QwakModelInterface
.
Later, when we configure the Qwak model:
-
Create a new Qwak model using the
qwak models init
command. -
Add the package to the dependencies.
For example, if we use pip to manage dependencies and the whl file is stored in a GitHub repository, we must add the following line to our requirements.txt
file:
https://raw.githubusercontent.com/[github_user]/[github_repository_name]/[branch_name]/[path_to_directory]/[whl_file_name]#egg=[package_name]
If we use a different package manager, we should follow their instructions regarding adding whl files or private package repositories as dependencies.
-
We must remove the content of the
__init__.py
file. -
In the
model.py
file, we import the Qwak model class (the one added as a dependency in pip) and implement theload_model
function to return the class (not an instance of the class!!!)For example:
from importqwakmodelfrompackage.model import TestModel
def load_model():
return TestModel
'tests' Directory
The directory tests are where tests of each component in the model reside.
Unit Testing
This Python file will define unit tests that will run during the build process. For example, if we will define a helper function in <model-dir>/main/util.py
as follows:
def add(x, y):
return x + y
Then we can define the following test:
from main.util import add
def add_test():
assert add(3,2) == 5
Integration Testing
This Python file will define integration tests we will run during the build process after the serving container is built and initialized.
During the integration tests, a real model deployment will be running, and you will be able to perform inference using pytest fixture, which will be auto-configured with a client that can be invoked against the model currently being built.
For example:
import pandas as pd
from qwak_mock import real_time_client
def test_iris_classifier(real_time_client):
feature_vector = [[5.1, 3.5, 1.4, 0.2]]
iris_type = real_time_client.predict(feature_vector)
assert iris_type == 1
Notes
- Files operations - The current working directory of build execution is the root directory of the model. For example, if a file is located in
./main/sample.txt
and you want to read it simply open it in path./main/sample.txt
where.
represents the model root directory. - Model fields - Model fields should be objects which can be pickled, S3 client, for example, can't pickle due to the fact that the session should remain active.
Additional directories
By default, the Qwak-SDK does not copy any other directories from the build directory, only main
and tests
.
However, it is possible to modify this behavior by adding the --dependency_required_folders
parameter to the model build command.
Examples:
qwak models build --model-id your_model_id --dependency_required_folders additional_dir .
qwak models build --model-id your_model_id --dependency_required_folders additional_dir --dependency_required_folders some_other_dir .
Additional directories in tests
To access files added with the --dependency-required_folders
parameter in tests, use the qwak_tests_additional_dependencies
fixture. For example:
def test_print_content_from_variable(qwak_tests_additional_dependencies):
print(qwak_tests_additional_dependencies)
directories = os.listdir(qwak_tests_additional_dependencies)
print(directories)
All --dependency-required-folders
are inside the directory which path is passed as the qwak_tests_additional_dependencies
parameter.
Updated 8 months ago