Logging Data and Metadata

You can use the Qwak platform to log model metadata and track experiments. To use this feature, make the following changes in the build configuration.

Logging Build Metrics

When executing a build you can choose to store the model metrics. You can log any decimal number as a model metric using the log_metric function:

import qwak

qwak.log_metric({"<key>": "<value>"})

or

from qwak.model.experiment_tracking import log_metric

log_metric({"<key>": "<value>"})

For example (Store the F1 score):

from qwak import QwakModelInterface
from sklearn import svm
from sklearn import datasets
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
from qwak.model.experiment_tracking import log_metric
 
 
class IrisClassifier(QwakModelInterface):

    def __init__(self):
        self._gamma = 'scale'
 
    def build(self):
        # Load training data
        iris = datasets.load_iris()
        X, y = iris.data, iris.target
        X_train, X_test, y_train, y_test = train_test_split(X, y)
 
        # Model Training
        clf = svm.SVC(gamma=self._gamma)
        self.model = clf.fit(X_train, y_train)
     
        # Store model metrics
        y_predicted = self.model.predict(X_test)
        f1 = f1_score(y_test, y_predicted)
        log_metrics({"f1": f1})
 
    def predict(self, df):
        return self.model.predict(df)

Logging Build Parameters

When executing a build, you can log model parameters. The parameters can be logged in two ways.

Using the log_param Function During a Build

An API which can be used from Qwak-based models:

import qwak

qwak.log_param({"<key>": "<value>"})

or

from qwak.model.experiment_tracking import log_param

log_param({"<key>": "<value>"})

Supported data types

"object", "int64", "float64", "datetime64","datetime64[ns]","datetime64[ns, UTC]" ,"bool"

For example:

import qwak
from qwak import QwakModelInterface
from sklearn import svm
from sklearn import datasets
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
from qwak.model.experiment_tracking import log_param
 
 
class MyModel(QwakModelInterface):

    def __init__(self):
        self._gamma = 'scale'
        log_param({"gamma": self._gamma})
 
    def build(self):
        # Load training data
        iris = datasets.load_iris()
        X, y = iris.data, iris.target
        X_train, X_test, y_train, y_test = train_test_split(X, y)
 
        # Model Training
        clf = svm.SVC(gamma=self._gamma)
        self.model = clf.fit(X_train, y_train)
 
    def predict(self, df):
        return self.model.predict(df)

Using the CLI

You can add parameters to the build CLI command when you start a new build:

qwak models build \
    --model-id <model-id> \
    -P <key>=<value> -P <key>=<value> \
    <uri>

<model-id> - The model ID associated with the build.
<key> - The model parameter key.
<value> - The model parameter value.
<uri> - The Qwak-based model URI.

Logging Files

When executing a build, you can explicitly log files and attach them to a tag for reference (they can also be downloaded later). You can use this method to share files between models and builds.

For example, we can persist the catboost classifier using pickle and add it to the logged files:

import qwak

def build():
    ...
    
    with open('model.pkl', 'wb') as handle:
       pickle.dump(self.catboost, handle, protocol=pickle.HIGHEST_PROTOCOL)

    qwak.log_file(from_path='model.pkl', tag='catboost_model')

or

from qwak.model.loggers.artifact_logger import log_file

log_file(from_path='model.pkl', tag='catboost_model')

And in order to load that file we need the file identifier, the tag, and the model and build where the file was persisted:

from qwak.model.loggers.artifact_logger import load_file

load_file(to_path='model.pkl', tag='catboost_model', model_id='some_model_id', build_id='some_build_id')

The to_path parameter defines the location where we want to write the file inside the currently used Docker container.

Files can also be logged without a build context:

from qwak.model.loggers.artifact_logger import log_file

log_file(from_path='model.pkl', tag='catboost_model', model_id='some_model_id')

πŸ“˜

Note

A model_id must be provided.

Versioning Your Build Data

When you execute a build, you can store the build data:

import qwak

qwak.log_data(pd.dataframe, tag)

or

from qwak.model.loggers.data_logger import log_data

log_data(pd.dataframe, tag)

The data is exposed in the Qwak UI, and you can query the data and view the feature distribution - under the Data tab of a specific build.

For example:

import qwak
from qwak import QwakModelInterface
from sklearn import svm
from sklearn import datasets
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
import pandas as pd
from qwak.model.loggers.data_logger import log_data
 
 
class IrisClassifier(QwakModelInterface):
   def __init__(self):
       self._gamma = 'scale'
 
   def build(self):
     # Load training data
     iris = datasets.load_iris()
     # Save training data
     log_data(iris, "training_data")
     X, y = iris.data, iris.target
 
     # Model Training
     clf = svm.SVC(gamma=self._gamma)
     self.model = clf.fit(X, y)
   
   def predict(self, df):
       return self.model.predict(df)

The data is saved under the build, and attached to the given tag.

After a DataFrame is persisted, you are able to load the data for future use by using the load_data() function:

from qwak.model.loggers.data_logger import load_data

df : pd.DataFrame = load_data(tag=<tag>, model_id=<model_id>)

Versioning Non-Build Related Data

Similar to files, Data can be logged without a specific build context. In order to do that, specify a model_id you'd like the DataFrame to be attached to:

from qwak.model.loggers.data_logger import load_data

log_data(df:pd.DataFrame, tag=<tag>, model_id=<model_id>)

Automated model logging

During every build, the Qwak platform automatically logs the model as an artifact.

You can retrieve the model using the load_model function. The function accepts two arguments: a model id and the build identifier. It returns an instance of the QwakModelInterface class.

from qwak.model.loggers.model_logger import load_model

loaded_model: QwakModelInterface = load_model('the model id', 'the_build_id')

Remember that the current Python environment must contain all dependencies required to create a valid Python object from the pickled object. For example, if you logged a Tensorflow model, the tensorflow library must be available (in the same version) when you call load_model.

🚧

Model log won't work if Qwak cannot pickle the object!

The automatic model logging works only when the class that extends QwakModelInterface can be pickled using the pickle.dump function.

If the model cannot be pickled, the build log will contain a warning message "Failed to log model to Qwak." This error won't stop the build, and the trained model can be deployed in the Qwak platform.

Automated dependency logging

During every build, the Qwak platform runs pip freeze to log all of the dependencies used during the build. The file is stored in the /qwak/model_dir/qwak_artifacts/requirements.lock file and logged as a build artifact.

🚧

Please don't include your file requirements.lock in the qwak_artifacts directory.

It will be overwritten!