Training Models with GPUs
Easily train your models on various GPUs and scalable resources
Overview
Qwak's GPU instances provide high-performance computing resources to accelerate the training process.
Easily customize your training resources to achieve faster training times and better results.
Building your first model?
Please refer to our Getting Started guide if you're creating your first model. The guide provides step-by-step instructions on how to install all relevant dependencies to get you up and running.
Training HuggingFace models
Let's train a text classifier using a pre-trained HuggingFace model.
In this tutorial, we use a distilbert
text classifier from HuggingFace and to train it using GPUs.
Choosing the right GPU
Visit the GPU Instance Sizes page to view the full specifications of Qwak's GPU instance selection.
Project dependencies
This is the content of our conda.yml
file which contains the necessary dependencies for our GPU build.
channels:
- defaults
- conda-forge
- huggingface
- pytorch
dependencies:
- python=3.9
- pip
- pandas=1.1.5
- transformers
- scikit-learn
- datasets
- pytorch
- huggingface_hub
- evaluate
Adding imports
We need to import all the relevant methods from Qwak and from the other packages we're using
import pandas as pd
import numpy as np
import qwak
import evaluate
from datasets import load_dataset
from transformers import TrainingArguments, Trainer
Initializing Qwak model
The QwakModel
is our base class that implements all relevant helper methods to build and deploy a model on Qwak.
In this example, we load the distilbert-base-uncased
model from HuggingFace.
from qwak.model.base import QwakModel
from transformers import AutoTokenizer, AutoModelForSequenceClassification
class HuggingFaceTokenizerModel(QwakModel):
def __init__(self):
model_id = "distilbert-base-uncased"
self.tokenizer = AutoTokenizer.from_pretrained(model_id)
self.model = AutoModelForSequenceClassification.from_pretrained(model_id, num_labels=2)
Defining model build
The build method is called once and only during the model build phase.
This method is called when generating a docker image of your model build.
def build(self):
"""
The build() method is called once during the remote build process on Qwak.
We use it to train the model on the Yelp dataset
"""
def tokenize(examples):
return self.tokenizer(examples['text'],
padding='max_length',
truncation=True)
dataset = load_dataset('yelp_polarity')
print('Tokenizing dataset...')
tokenized_dataset = dataset.map(tokenize, batched=True)
print('Splitting data to training and evaluation sets')
train_dataset = tokenized_dataset['train'].shuffle(seed=42).select(range(50))
eval_dataset = tokenized_dataset['test'].shuffle(seed=42).select(range(50))
# We don't need the tokenized dataset
del tokenized_dataset
del dataset
# Defining parameters for the training process
metric = evaluate.load('accuracy')
# A helper method to evaluate the model during training
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=1)
return metric.compute(predictions=predictions, references=labels)
training_args = TrainingArguments(
output_dir='training_output',
evaluation_strategy='epoch',
num_train_epochs=1
)
# Defining all the training parameters for our tokenizer model
trainer = Trainer(
model=self.model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics
)
print('Training the model...')
trainer.train()
# Evaluate on the validation dataset
eval_output = trainer.evaluate()
# Extract the validation accuracy from the evaluation metrics
eval_acc = eval_output['eval_accuracy']
# Log metrics into Qwak
qwak.log_metric({"val_accuracy" : eval_acc})
Configuring inference
The inference method is called when the model is invoked through the real-time endpoint, batch or streaming inference.
This method is only triggered when the model is deployed or during local testing.
The inference method receives and returns a Pandas DataFrame
by default. Provide it with Input & Output Adapters to receive and return different data types.
@qwak.api()
def predict(self, df: pd.DataFrame) -> pd.DataFrame:
"""
The predict() method takes a pandas DataFrame (df) as input
and returns a pandas DataFrame with the prediction output.
"""
input_data = list(df['text'].values)
# Tokenize the input data using a pre-trained tokenizer
tokenized = self.tokenizer(input_data,
padding='max_length',
truncation=True,
return_tensors='pt')
response = self.model(**tokenized)
return pd.DataFrame(
response.logits.softmax(dim=1).tolist()
)
Complete model code
The following code should be placed in the model.py
and will allow you to build the HuggingFace based tokenizer we described in this tutorial.
import pandas as pd
import numpy as np
import qwak
import evaluate
from qwak.model.base import QwakModel
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer
class HuggingFaceTokenizerModel(QwakModel):
def __init__(self):
model_id = "distilbert-base-uncased"
self.tokenizer = AutoTokenizer.from_pretrained(model_id)
self.model = AutoModelForSequenceClassification.from_pretrained(model_id, num_labels=2)
def build(self):
"""
The build() method is called once during the remote build process on Qwak.
We use it to train the model on the Yelp dataset
"""
def tokenize(examples):
return self.tokenizer(examples['text'],
padding='max_length',
truncation=True)
dataset = load_dataset('yelp_polarity')
print('Tokenizing dataset...')
tokenized_dataset = dataset.map(tokenize, batched=True)
print('Splitting data to training and evaluation sets')
train_dataset = tokenized_dataset['train'].shuffle(seed=42).select(range(50))
eval_dataset = tokenized_dataset['test'].shuffle(seed=42).select(range(50))
# We don't need the tokenized dataset
del tokenized_dataset
del dataset
# Defining parameters for the training process
metric = evaluate.load('accuracy')
# A helper method to evaluate the model during training
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=1)
return metric.compute(predictions=predictions, references=labels)
training_args = TrainingArguments(
output_dir='training_output',
evaluation_strategy='epoch',
num_train_epochs=1
)
# Defining all the training parameters for our tokenizer model
trainer = Trainer(
model=self.model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics
)
print('Training the model...')
trainer.train()
# Evaluate on the validation dataset
eval_output = trainer.evaluate()
# Extract the validation accuracy from the evaluation metrics
eval_acc = eval_output['eval_accuracy']
# Log metrics into Qwak
qwak.log_metric({"val_accuracy" : eval_acc})
@qwak.api()
def predict(self, df: pd.DataFrame) -> pd.DataFrame:
"""
The predict() method takes a pandas DataFrame (df) as input
and returns a pandas DataFrame with the prediction output.
"""
input_data = list(df['text'].values)
# Tokenize the input data using a pre-trained tokenizer
tokenized = self.tokenizer(input_data,
padding='max_length',
truncation=True,
return_tensors='pt')
response = self.model(**tokenized)
return pd.DataFrame(
response.logits.softmax(dim=1).tolist()
)
Adding integration tests
We can define remote integration test on Qwak that are performed before saving the built model artifact in the model repository.
This below code should be copied into a new Python file under the tests folder in your local project: tests/test_qwak_model.py
import pandas as pd
from qwak.testing.fixtures import real_time_client
def test_realtime_api(real_time_client):
feature_vector = [
{
'text': 'The best place ever!'
}]
classification: pd.DataFrame = real_time_client.predict(feature_vector)
assert classification.values[0][1] > 0.4
Initiating remote GPU build
It's now time to build the model!
Run the below commands in the terminal to remotely build the model we created.
Creating a model on Qwak
qwak models create "Hugging Face Tokenizer Model" --project "Examples"
Building your models on GPUs
Our model is quite large, so we need to ask for a large GPU based machine that has enough memory.
qwak models build --model-id hugging_face_tokenizer_model --instance "gpu.t4.xl" .
Visit the Qwak GPU Instance Sizes page to choose the resources which fit your use-case best.
Each GPU type has its own configuration of pre-defined memory and number of CPUs.
Using GPU Spot Instances
Qwak uses EC2 Spot instances for GPU-based builds to keep costs low for users.
As a result, it may take slightly longer for a GPU Spot Instance to become available.
Building for GPU deployments
When deploying a model on a GPU instance, we must verify that the model was build using a GPU compatible image. Build a model using a GPU compatible image installs additional dependencies and drivers.
Creating a GPU compatible image is simply done by adding the --gpu-compatible
flag:
qwak models build --model-id <model-id> --gpu-compatible .
Discovering GPU cores
To see which GPUs were provided on your build machine, print the number of available GPUs:
# catboost
from catboost.utils import get_gpu_device_count
print(f'{get_gpu_device_count()} GPU devices')
# tensorflow
import tensorflow as tf
print(f'{len(tf.config.list_physical_devices("GPU"))} GPU devices')
# pytorch
import torch
print(f'{torch.cuda.device_count()} GPU devices')
Running the above command will build your model on a regular CPU instance, but will allow you to later deploy it on a GPU instance.
Deploying GPU-trained Models on CPU
To facilitate the deployment of models trained on GPU environments onto CPU-based infrastructure, it is advised to adapt the model loading process within the initialize_model()
method. Specifically, when employing Torch
for model training, ensure the model is loaded to target the CPU explicitly:
class MyModel(QwakModel):
def init():
...
def build():
...
def initialize_model():
self.model = torch.load("model.pkl", map_location=torch.device('cpu'))
Occasionally, you might encounter a Torch-related issue during model deserialization that disregards the specified CPU target, prompting a RuntimeError due to an attempt to deserialize on a CUDA device while CUDA is unavailable:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
By employing a custom unpickler you can ensure the model is properly directed to the CPU during loading. The following example demonstrates how to implement such a solution:
import pickle
import torch
import io
class CPU_Unpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == 'torch.storage' and name == '_load_from_bytes':
# Redirects storage loading to CPU
return lambda b: torch.load(io.BytesIO(b), map_location='cpu')
else:
# Default class resolution
return super().find_class(module, name)
class MyModel(QwakModel):
def init():
...
def build():
...
def initialize_model():
#contents = pickle.load(f) becomes...
with open("model.pkl", "rb") as handle:
self.model = CPU_Unpickler(handle).load()
print (self.model.params)
This adjustment ensures the model, trained within a GPU-accelerated environment, is seamlessly transitioned for execution on CPU-based deployment targets.
Updated 10 months ago