Local Testing

Local testing before triggering remote builds is essential for optimizing the model development process. This approach enhances efficiency by identifying and resolving errors early in the development cycle, minimizing the time and resources spent on remote builds.

Debugging is more interactive and streamlined locally, allowing quick iteration and error resolution. Additionally, local testing helps validate the entire workflow, ensuring correct configurations and dependencies before incurring potential costs associated with remote builds.

Ultimately, incorporating local testing into the development workflow promotes a more efficient, cost-effective, and error-resistant model building process.

Local debugging and inference

It is possible to easily run and debug your Qwak models locally.

The below example contains a simple FLAN-T5 model loaded from HuggingFace. To test inference locally, and can simple run the below code.

📘
Please make sure to install the qwak-sdk in your local environment.

Running models locally

We import from qwak.model.tools import run_local and call our local model via run_local(m, input_vector), which invokes all the relevant model methods.

❗️
Production models
Please do not leave from qwak.model.tools import run_local imports in production models, as it may affect model behavior. One option would be to create a separate file outside of your main directory where you have all the local testing code, including the run_local import.

This example will load a model FLANT5Model and run inference with a data frame vector of your choice:

import pandas as pd
import qwak
from pandas import DataFrame
from qwak.model.base import QwakModel
from qwak.model.schema import ModelSchema, ExplicitFeature
from transformers import T5Tokenizer, T5ForConditionalGeneration


class FLANT5Model(QwakModel):

    def __init__(self):
        self.model_id = "google/flan-t5-small"
        self.model = None
        self.tokenizer = None

    def build(self):
        pass

    def schema(self):
        model_schema = ModelSchema(
            inputs=[
                ExplicitFeature(name="prompt", type=str),
            ])
        return model_schema

    def initialize_model(self):
        self.tokenizer = T5Tokenizer.from_pretrained(self.model_id)
        self.model = T5ForConditionalGeneration.from_pretrained(self.model_id)

    @qwak.api()
    def predict(self, df):
        input_text = list(df['prompt'].values)
        input_ids = self.tokenizer(input_text, return_tensors="pt")
        outputs = self.model.generate(**input_ids, max_new_tokens=100)
        decoded_outputs = self.tokenizer.batch_decode(outputs, skip_special_tokens=True)
        return pd.DataFrame([{"generated_text": decoded_outputs}])

🚧
Note: Call run_local directly instead of the model_object.predict() , as it will not work locally.

To run local inference, add the following code to your model code file:


from qwak.model.tools import run_local

if __name__ == '__main__':
    # Create a new instance of the model
    m = FLANT5Model()
    
    # Create an input vector and convert it to JSON
    input_vector = DataFrame(
        [{
            "prompt": "Why does it matter if a Central Bank has a negative rather than 0% interest rate?"
        }]
    ).to_json()
    
    # Run local inference using the model
    prediction = run_local(m, input_vector)
    print(prediction)

🚧
For local testing, remember to import run_local at the start of your test file, before importing your QwakModel based class.

Debugging the model life cycle

Running the run_local method calls the following methods in a single command:

build()
initialize_model()
predict()

The build and initialize_model functions are called during the first run_local run only.

🚧
Debugging input and output adapters
Calling the predict method locally doesn't trigger the input and output adapters. Please use run_local instead.

Using Proto adapter

Let's assume that you have the following model class, created an instance and executed the build method.

# This example model uses ProtoBuf input and output adapters
class MyQwakModel(QwakModel):

    def build(self):
			...

    @qwak.api(
        input_adapter=ProtoInputAdapter(ModelInput),
      	output_adapter=ProtoOutputAdapter()
    )
    def predict(self, input_: ModelInput) -> ModelOutput:
			return ...

In this example, we import a ProtoAdapter and use it to perform inference.

from qwak.model.tools import run_local

# Create a local instance of your model
model = MyQwakModel()

# ModelInput is the model proto
input_ = ModelInput(f1=0, f2=0).SerializeToString()

# The run_local() method calls build(), initialize_model() and predict()
result = run_local(model, input_)

# ModelOutput is the model proto
output_ = ModelOutput()
output_.ParseFromString(result)

Running local inference

You can test the entire inference code, including input and output adapters, by calling the execute function:

# ModelInput is the model proto
input_ = ModelInput(f1=0, f2=0).SerializeToString()

# The execute() calls the predict method with the input and output adapters
result = model.execute(input_)

# ModelOutput is the model proto
output_ = ModelOutput()
output_.ParseFromString(result)