Building a GPU-Based Qwak Model

When training a Deep Learning model, you'll often want to use GPUs during the training process.
The Qwak platform has got you covered:

To see that a GPU(s) was provided, print the number of available GPUs:

# catboost
from catboost.utils import get_gpu_device_count
print(f'{get_gpu_device_count()} GPU devices')

# tensorflow
import tensorflow as tf
print(f'{len(tf.config.list_physical_devices("GPU"))} GPU devices')

# pytorch
import torch
print(f'{torch.cuda.device_count()} GPU devices')

The GPU type and amount are specified at build time:

qwak models build  ... --gpu-type NVIDIA_K80 --gpu-amount 1 .



We use an EC2 Spot Instances during the build, so when setting a GPU-based build, it might take a little longer until a GPU Spot Instance is available.

Available GPU Types and Amounts

GPU typeAvailable amounts
NVIDIA K801 / 8 / 16
NVIDIA Tesla V1001 / 4 / 8
NVIDIA T41 / 2 / 4
NVIDIA A101 / 2 / 4 / 8

Training an Example Model using GPUs

Let's train a text classifier using a pre-trained Huggingface model.

Let's assume that you've already generated an empty Qwak model project. No worries if you haven't - have a look at our Getting Started guide.

First, we have to specify the model dependencies. In this example, we use conda to provide dependencies:

name: GpuExample
  - defaults
  - conda-forge
  - huggingface
  - pytorch
  - python=3.7
  - pip=20.0.3
  - pandas=1.1.5
  - transformers
  - datasets
  - pytorch
  - huggingface_hub==0.2.1
  - scikit-learn

After specifying the dependencies, we can open the file and start implementing the Qwak model. Let's begin with all the required imports:

import pandas as pd
import numpy as np
import qwak
from qwak.model.base import QwakModelInterface
from datasets import load_dataset, load_metric
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer

Now, we can create the tokenizer and model instance in the __init__ function:

class GpuExample(QwakModelInterface):

    def __init__(self):
        self.tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
        self.model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=2)

We define the model training code in the build function. We will train a classifier to determine whether a Yelp review is positive or negative using one of the datasets provided by Huggingface.

Note that we limit the input dataset by selecting only 500 entries (.select(range(500))), and we run only one epoch. We do it to speed up the training in this example. Don't do it in production.

    def build(self):
        def tokenize(examples):
            return self.tokenizer(examples['text'], padding='max_length', truncation=True)

        dataset = load_dataset('yelp_polarity')
        tokenized_dataset =, batched=True)

        print('Selecting the data...')
        train_dataset = tokenized_dataset['train'].shuffle(seed=42).select(range(500))
        eval_dataset = tokenized_dataset['test'].shuffle(seed=42).select(range(500))
        tokenized_dataset = None # We don't need them anymore. We can let the garbage collector free the memory.
        dataset = None

        metric = load_metric('accuracy')

        def compute_metrics(eval_pred):
            logits, labels = eval_pred
            predictions = np.argmax(logits, axis=1)
            return metric.compute(predictions=predictions, references=labels)

        training_args = TrainingArguments(

        trainer = Trainer(


The last thing we need in the is the predict function that will be deployed and run the ML inference. We will extract the text from the input DataFrame, run tokenization, and pass the tensors to the model. Finally, we extract the predictions from the output and send them back as the model response.

    def predict(self, df: pd.DataFrame) -> pd.DataFrame:
        input_data = list(df['text'].values)
        tokenized = self.tokenizer(input_data, padding='max_length', truncation=True, return_tensors='pt')
        response = self.model(**tokenized)
        return pd.DataFrame(response.logits.softmax(dim=1).tolist())

Finally, we will implement a test to check whether everything works fine. In production, you should check all business cases supported by the model. Here, we will only show the test structure by implementing a single test.

Let's copy the following code to the tests/ file:

import pandas as pd
from qwak_mock import real_time_client

def test_realtime_api(real_time_client):
    feature_vector = [
            'text': 'The best place ever!'

    classification: pd.DataFrame = real_time_client.predict(feature_vector)
    assert classification.values[0][1] > 0.5

Our model is quite large, so in addition to requesting a GPU, we'll also request to increase the number of GPU's that will translate to more memory:

qwak models build --model-id example-gpu --gpu-type NVIDIA_K80 --gpu-amount 8 .