Hyperparameter Optimization (HPO)

Introduction

This advanced build pattern enables you to specify parameters or parameter ranges for hyperparameter tuning jobs, helping you to enhance model performance by identifying the optimal settings.

Currently, JFrog ML supports training only on a single instance, whether CPU or GPU. As a result, all options described will apply to single-instance training.

For efficient model development, especially when training sessions are lengthy, consider using this guide in conjunction with reusing trained artifacts. This approach allows for frequent iteration on model code without the need to retrain from scratch each time.


Key Concepts

  • Hyperparameters: These are configuration variables that control the learning process of a model. In the JFrog ML context, these are the parameters you'll be adjusting and testing in your Build jobs.
  • Hyperparameter Tuning: The process of finding the optimal set of hyperparameters for a model. In JFrog ML, this is done through Build jobs, where different combinations of hyperparameters are tested.
  • Build Jobs: JFrog ML mechanism for training and building models. These jobs provide the environment where your hyperparameter tuning takes place.

Passing hyperparameters to Build jobs

In JFrog ML, there are two primary methods for passing hyperparameters to your Build jobs:


1. Via Configuration File

This method involves defining your hyperparameters in a JSON file within your project structure.

Project Structure

.
├── README.md
├── main
│   ├── __init__.py
│   ├── conda.yml
│   └── model.py
└── tests
    └── it
        └── test_something.py

The hyperparameters.json file is placed in the main directory, which ensures it will be automatically uploaded to the JFrog ML Build environment.


Example JSON Configuration

{  
    "n_estimators": [50, 100, 200],  
    "max_depth": [10, 20, 30],  
    "learning_rate": [0.01, 0.1, 0.2]  
}

This JSON structure defines ranges for each hyperparameter, which JFrog ML will use to test different combinations.


Reading the Configuration in Python

In your model.py file, you can access these hyperparameters as follows:

import json

class SampleModel(QwakModel):
    def __init__(self):
        with open('hyperparameters.json') as f:
            self.params = json.load(f)
        
        # Now self.params contains your hyperparameter ranges

This method allows you to keep your hyperparameters separate from your code, making it easier to version and modify them.


2. Via Environment Variables or Build Parameters

This method involves passing hyperparameters directly through the command line interface (CLI) when initiating a build job.

Using Environment Variables

qwak models builds --model-id sample_model \
-E N_ESTIMATORS="50,100,200" \
-E MAX_DEPTH="10,20,30" \
-E LEARNING_RATE="0.01,0.1,0.2"
.

Here, -E flag sets environment variables that will be available in your Build job.


Using Build Parameters

qwak models builds --model-id sample_model \
-P N_ESTIMATORS="50,100,200" \
-P MAX_DEPTH="10,20,30" \
-P LEARNING_RATE="0.01,0.1,0.2"
.

The -P flag sets JFrog ML Build parameters. These are logged to the JFrog ML Platform and can be compared between Builds, offering better traceability.


Reading Parameters in Python

In your model.py, you can access these parameters:

import os

class SampleModel(QwakModel):
  
    def parse_param_list(self, list_as_str):
        return list(map(float, list_as_str.split(',')))

    def __init__(self):
        self.params = {
            'n_estimators': self.parse_param_list(os.getenv('N_ESTIMATORS')),
            'max_depth': self.parse_param_list(os.getenv('MAX_DEPTH')),
            'learning_rate': self.parse_param_list(os.getenv('LEARNING_RATE'))
        }

This method allows for more dynamic parameter setting and is useful for automated pipelines or when you need to change parameters frequently without modifying files.


Implementing Hyperparameter Optimization

Once you have your hyperparameters set up, you can implement various optimization techniques within your JFrog ML Build job. Here are examples of three common methods:

1. Grid Search Example

Grid Search exhaustively searches through a specified subset of the hyperparameter space.

import xgboost as xgb
from sklearn.model_selection import GridSearchCV
from qwak.model.base import QwakModel

class XGBoostModel(QwakModel):
  
    def __init__(self):
        self.params = self.read_hyperparameters()
        self.model = xgb.XGBClassifier()

    def build(self):
        X_train, y_train = self.fetch_and_preprocess_data()
        
        grid_search = GridSearchCV(estimator=self.model, 
                                   param_grid=self.params, 
                                   cv=5, 
                                   scoring='accuracy')
        
        grid_search.fit(X_train, y_train)
        
        qwak.log_param(grid_search.best_params_)
        qwak.log_metric(grid_search.best_score_)

In this example, GridSearchCV tries all possible combinations of the specified hyperparameters.


2. Random Search Example

Random Search samples random combinations of hyperparameters, which can be more efficient than Grid Search for high-dimensional spaces.

from sklearn.model_selection import RandomizedSearchCV
from qwak.model.base import QwakModel
import xgboost as xgb
import os
import numpy as np

class XGBoostRandomModel(QwakModel):
    def __init__(self):
        self.model = xgb.XGBClassifier()
        self.param_distributions = self.read_hyperparameters()

    def build(self):
        X_train, y_train = self.fetch_and_preprocess_data()
        
        random_search = RandomizedSearchCV(
            estimator=self.model,
            param_distributions={
                'n_estimators': self.param_distributions['n_estimators'],
                'max_depth': [int(x) for x in self.param_distributions['max_depth']],
                'learning_rate': self.param_distributions['learning_rate']
            },
            n_iter=100,  # number of parameter settings sampled
            cv=5,
            scoring='accuracy',
            random_state=42
        )
        
        random_search.fit(X_train, y_train)
        
        qwak.log_param(random_search.best_params_)
        qwak.log_metric('best_accuracy', random_search.best_score_)

In this example:

  • n_estimators and learning_rate are used as-is, allowing RandomizedSearchCV to sample from the provided lists. max_depth values are converted to integers, as this parameter requires integer values.
  • We run the random search for 100 iterations (n_iter=100), but this can be adjusted based on your specific needs and time constraints.
  • The best parameters and score are logged using JFrog ML logging functions.

3. Bayesian Optimization Example with Optuna

Optuna uses Bayesian optimization to efficiently search the hyperparameter space.

import optuna
import xgboost as xgb
from sklearn.model_selection import cross_val_score
from qwak.model.base import QwakModel
import os

class XGBoostOptunaModel(QwakModel):
    def __init__(self):
        self.best_params = None
        self.best_score = None
        self.param_ranges = self.read_hyperparameters()

    def objective(self, trial):
        param = {
            'n_estimators': trial.suggest_int('n_estimators', 
                min(self.param_ranges['n_estimators']),
                max(self.param_ranges['n_estimators'])),
            'max_depth': trial.suggest_int('max_depth', 
                min(self.param_ranges['max_depth']),
                max(self.param_ranges['max_depth'])),
            'learning_rate': trial.suggest_loguniform('learning_rate', 
                min(self.param_ranges['learning_rate']),
                max(self.param_ranges['learning_rate']))
        }

        model = xgb.XGBClassifier(**param)
        score = cross_val_score(model, self.X, self.y, cv=5, scoring='accuracy')
        return score.mean()

    def build(self):
        self.X, self.y = self.fetch_and_preprocess_data()
        
        study = optuna.create_study(direction='maximize')
        study.optimize(self.objective, n_trials=100)

        self.best_params = study.best_params
        self.best_score = study.best_value

        qwak.log_param(self.best_params)
        qwak.log_metric('best_accuracy', self.best_score)

In this example:

  • We define an objective function that Optuna will optimize. This function creates an XGBoost model with hyperparameters suggested by Optuna, then evaluates it using cross-validation.
  • The hyperparameter ranges are read from environment variables.
  • In the build method, we create an Optuna study and run the optimization for 100 trials.
  • The best parameters and score are logged using JFrog ML logging functions for easier comparisons later on.

Considerations

  • Single Instance Training: Currently, JFrog ML supports training only on a single instance and does not offer distributed training capabilities. When performing hyperparameter optimization (HPO), be aware of the potential length of the HPO task and resource consumption. This is especially important if you are exploring hyperparameters sequentially, as it may impact the overall training time and resource usage.
  • Resource Management: Monitor memory requirements to avoid running out of memory (OOM) during later stages of hyperparameter optimization. Implement checkpointing where appropriate. Use the Resources tab in JFrog ML to track instance resource consumption.
  • Logging: Use JFrog ML logging capabilities (qwak.log_param() and qwak.log_metric()) to track your optimization process and compare results between different builds.