Backfill API

Overview

The process and functionalities of the feature set backfill enables users to replace data within a feature set, either entirely or within a specific time interval. It ensures that both online and offline data are appropriately updated without affecting data outside the backfill boundaries.

Functionalities

The backfill process supports the following functionalities:

  1. Replace all data in the feature set: Users can trigger a backfill process to replace all existing data within a feature set.
  2. Replace data within a specific interval: Users can specify a time interval within which data should be replaced in the feature set.
  3. Support for different data sources and transforms: The data sources and transformation methods can vary for different backfill processes.
  4. Support for online and offline data: The backfill process ensures that both online and offline data are replaced as per the specified requirements.
  5. Non-affected data outside backfill boundaries: Data outside the specified backfill boundaries remains unaffected and available at all times.

CLI Command

The CLI command to trigger the backfill process is as follows:

qwak features backfill --reset-backfill [--start-time <start_time>] [--stop-time <stop_time>] [--cluster-template <cluster_template>] [--comment <comment>] --environment <environment> --feature-set <feature_set_name>

Command Options

  • --reset-backfill, --reset: Perform a complete reset of the feature set's data. This option results in the deletion of the current existing data.
  • --start-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d %H:%M:%S]: The start time from which the feature set's data should be backfilled in UTC. Defaults to the feature set's configured backfill start time.
  • --stop-time [%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d %H:%M:%S]: The stop time up until which the feature set's data should be backfilled in UTC. Defaults to the current timestamp. If the time provided is in the future, the stop time will be rounded down to the current time.
  • --cluster-template TEXT: Backfill resource configuration, expects a ClusterType size. Defaults to the feature set's resource configuration.
  • --comment TEXT: Optional comment tag line for the backfill job.
  • --environment ENVIRONMENT: JFrog ML environment.
  • --feature-set, --name TEXT: The name of the feature set for which the backfill process is to be performed. This option is required.

The backfill process provides users with the capability to efficiently update feature set data, ensuring accuracy and consistency across online and offline data sources. By adhering to the outlined functionalities and utilizing the provided CLI command, users can effectively manage and execute data backfill operations within their environments.