Feature Store Overview

JFrog ML's Feature Store is a data platform for machine learning models that facilitates the discoverability, reuse and accuracy of features.

It provides a centralized method for developing features using batch or streaming data, and for serving those features instantly or retrieving them as training data. It also allows the discovery and reuse of available features, instead of recreating identical or similar ones.

The Feature Store serves the following main purposes:

Source of truth: Creating a single and discoverable source of truth for features that are to be used by machine learning models.
Feature collaboration: Allowing data scientists and machine learning engineers to share features between projects.
Training/serving skew: Systematically ensuring the matching between features that have been generated online and offline.

📘
Using JFrog ML Cloud
The feature store JFrog ML cloud (SaaS) supports Batch Feature Sets only. To use real-time and streaming features, please opt for JFrog ML hybrid deployments.

Features definition

The JFrog ML Feature Store follows three main concepts:

Keys: Features created via the Feature Store are calculated for a specific key. For example: user_id, transaction_id, merchant_id, etc.
Data Sources: External data sources that are the features' ingestion source.
Feature Sets: The operational unit of the Feature Store. A computation definition which takes raw data as its input and outputs features. Types of feature sets include:
- Batch: Defining features based on batch data sources (e.g. Snowflake, BigQuery, etc.).
- Streaming: Defining features based on streaming data sources (e.g. Kafka).
- Real-time: Features based on data received upon an inference request.

Feature consumption

With the JFrog ML Feature Store you can define features once, calculate them once, and reuse them at any time.

JFrog ML's Feature Store systematically ensures that there are no discrepancies between data generated for training and serving, since both the offline and the online stores are populated from the same singular feature extraction process.

Inference: Serve a Key's up-to-date feature values from one centralized location.
Training: Keep a log of all features, and then retrieve them for training, at any point in time.