Model Registry and Metadata
Keep track of parameters, metrics, data, and artifacts while building, training, and validating models.
The Qwak platform enables you to log model metadata, artifacts, and DataFrames, as well as to track experiments effectively. To utilize this capability, integrate the utility functions described below into your QwakModel
.
Log build metrics
When executing a build you can choose to store the model metrics. You can log any decimal number as a model metric using the log_metric
function:
import qwak
qwak.log_metric({"<key>": "<value>"})
Alternatively, import the log_metric
method directly:
from qwak.model.experiment_tracking import log_metric
log_metric({"<key>": "<value>"})
Logging training metrics
In the below example, we're logging the model F1 score:
from qwak import QwakModel
from sklearn import svm, datasets
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
from qwak.model.experiment_tracking import log_metric
class IrisClassifier(QwakModel):
def __init__(self):
self._gamma = 'scale'
def build(self):
# Load training data
iris = datasets.load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y)
# Train model
clf = svm.SVC(gamma=self._gamma)
self.model = clf.fit(X_train, y_train)
# Store model metrics
y_predicted = self.model.predict(X_test)
f1 = f1_score(y_test, y_predicted)
# Log metrics to Qwak
log_metrics({"f1": f1})
def predict(self, df):
return self.model.predict(df)
Logging Build Parameters
When executing a build, you can log model parameters. The parameters can be logged in two ways.
Using log_param
log_param
An API which can be used from Qwak-based models:
import qwak
qwak.log_param({"<key>": "<value>"})
Alternatively, import the log_param
method directly:
from qwak.model.experiment_tracking import log_param
log_param({"<key>": "<value>"})
The supported data types for logging are:
- bool
- object
- int64
- float64
- datetime64
- datetime64[ns]
- datetime64[ns, UTC]
For example:
import qwak
from qwak import QwakModel
from sklearn import svm, datasets
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
from qwak.model.experiment_tracking import log_param
class MyModel(QwakModel):
def __init__(self):
self._gamma = 'scale'
log_param({"gamma": self._gamma})
def build(self):
# Load training data
iris = datasets.load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y)
# Model Training
clf = svm.SVC(gamma=self._gamma)
self.model = clf.fit(X_train, y_train)
def predict(self, df):
return self.model.predict(df)
Using the CLI
You can add parameters to the build CLI command when you start a new build:
qwak models build \
--model-id <model-id> \
-P <key>=<value> -P <key>=<value> \
<uri>
<model-id>
- The model ID associated with the build.
<key>
- The model parameter key.
<value>
- The model parameter value.
<uri>
- The Qwak-based model URI.
Logging Build Files
When executing a build, you can explicitly log files and attach them to a tag for reference (they can also be downloaded later). You can use this method to share files between models and builds.
Note:
model_id
must be provided when logging parameters to files.
For example, we can persist the catboost classifier using pickle and add it to the logged files:
import qwak
def build():
...
with open('model.pkl', 'wb') as handle:
pickle.dump(self.catboost, handle, protocol=pickle.HIGHEST_PROTOCOL)
qwak.log_file(from_path='model.pkl', tag='catboost_model')
Alternatively, import the log_file
method directly:
from qwak.model_loggers.artifact_logger import log_file
log_file(from_path='model.pkl', tag='catboost_model')
And in order to load that file we need the file identifier, the tag, and the model and build where the file was persisted:
from qwak.model_loggers.artifact_logger import load_file
load_file(to_path='model.pkl', tag='catboost_model', model_id='some_model_id', build_id='some_build_id')
The to_path
parameter defines the location where we want to write the file inside the currently used Docker container.
Files can also be logged without a build context:
from qwak.model_loggers.artifact_logger import log_file
log_file(from_path='model.pkl', tag='catboost_model', model_id='some_model_id')
Size Limitation
Currently,
log_file
allows for the logging of files to Qwak Cloud with a maximum size limit of 5GB and thetag
should contain underscores_
, not dashes-
.
Versioning Build Data
When you execute a build, you can store the build data:
import qwak
qwak.log_data(pd.dataframe, tag)
Alternatively, import the log_data
method directly:
from qwak.model.loggers.data_logger import log_data
log_data(pd.dataframe, tag)
The data is exposed in the Qwak UI, and you can query the data and view the feature distribution - under the Data tab of a specific build.
For example:
import qwak
from qwak import QwakModel
from sklearn import svm
from sklearn import datasets
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
import pandas as pd
from qwak.model.loggers.data_logger import log_data
class IrisClassifier(QwakModel):
def __init__(self):
self._gamma = 'scale'
def build(self):
# Load training data
iris = datasets.load_iris()
# Save training data
log_data(iris, "training_data")
X, y = iris.data, iris.target
# Model Training
clf = svm.SVC(gamma=self._gamma)
self.model = clf.fit(X, y)
def predict(self, df):
return self.model.predict(df)
The data is saved under the build, and attached to the given tag.
Supported data types
The Dataframe supported types are the :
['object', 'uint8', 'int64', 'float64', 'datetime64', 'datetime64[ns]', 'datetime64[ns, UTC]', 'bool']
To modify a column's data type you can use the following syntax:
validation_df['column1'] = validation_df['column1'].astype('float64')
Versioning data outside build
Similar to files, Data can be logged without a specific build context. In order to do that, specify a model_id
you'd like the DataFrame
to be attached to:
from qwak.model_loggers.data_logger import log_data
from pandas import DataFrame
df = DataFrame()
log_data(df, tag="some-tag", model_id="your-model-id", build_id="your-build-id")
Loading Data from Builds
To access data logged during the build process, utilize the following Qwak function for downloading based on your model ID, build ID, and the specified data tag. This code can be executed either locally on your machine or in a remote workspace, making it optional to run within the context of a Qwak model build.
from qwak.model_loggers.data_logger import load_data
df = load_data(tag="some-tag", model_id="your-model-id", build_id="your-build-id")
Automatic model logging
During every build, the Qwak platform automatically logs the model as an artifact.
Objects must be pickled
Models logs do not works when your objects cannot be pickled.
The automatic model logging works only when the class that extends
QwakModel
can be pickled using thepickle.dump
function.If the model cannot be pickled, the build log will contain a warning message "Failed to log model to Qwak." This error won't stop the build, and the trained model can be deployed in the Qwak platform.
You can retrieve the model using the load_model
function. The function accepts two arguments: a model id and the build identifier. It returns an instance of the QwakModel
class.
from qwak.model_loggers.model_logger import load_model
from qwak.model.base import QwakModel
loaded_model: QwakModel = load_model('<your_model_id>', '<your_build_id>')
Remember that the current Python environment must contain all dependencies required to create a valid Python object from the pickled object. For example, if you logged a Tensorflow model, the tensorflow library must be available (in the same version) when you call load_model
.
Automatic dependency logging
During every build, the Qwak platform runs pip freeze
to log all of the dependencies used during the build. The file is stored in the /qwak/model_dir/qwak_artifacts/requirements.lock
file and logged as a build artifact.
Note: Do not include your file
requirements.lock
in theqwak_artifacts
directory.It will be overwritten!
Updated 10 months ago