Storage-Based Execution
The low-level API of the batch execution allows you to start and execution from a specific remote path, and write the results to another path.
Each file in the input path is translated to a single unit of processing (a Qwak task). Each input file is read and passed as a single chunk to the model predict
function.
Task Processing
As the file is transformed to a single prediction request, you must ensure that the model is deployed on an instance with sufficient resources to handle the batch request.
Where batch predictions are resource-heavy, consider splitting the input dataset into multiple smaller files.
Execution Configuration
Paramater | Description | Default Value |
---|---|---|
Model ID [Required] | The Model ID, as displayed on the model header. | |
Build ID | The Qwak-assigned build ID. You can optionally add this to use a different build in order to perform the execution. | |
Bucket | The source and destination bucket. If you read and write to the same bucket, you can specify this, but it is not mandatory. Note: This parameter is required if 'Source Bucket' and 'Destination Bucket' are not set. | |
Source bucket | The bucket from which the input files are read, to start an execution. Note: This parameter is required if 'Bucket' is not set. | |
Destination bucket | The bucket into which the execution output files are written. Note: This parameter is required if 'Bucket' is not set. | |
Source folder [Required] | The path to the source bucket where all the inference files are located. | |
Destination Folder [Required] | The path to the destination bucket where the result files are stored. | |
Input File Type | The file types supported by Qwak. The supported formats are: CSV, Parquet and Feather. | CSV |
Output File Type | The types of the files stored by Qwak. The supported formats are: CSV, Parquet and Feather. | CSV |
Access Token Name | The name of the secret (created using our Secret Service) that contains the Access Token with permission to the source and destination buckets. | |
Access Secret Name | The name of the secret (created using our Secret Service) that contains the Access Secret with permission to the source and destination buckets. | |
Job Timeout | The job timeout, in seconds. By setting the job timeout, you will limit the execution time, and it will fail if not completed in time. | 0 - No Timeout |
File Timeout | A single file timeout, in seconds. Setting this will limit the processing time for a single file, and fail the entire execution if one of the files does not finish in time. | 0 - No Timeout |
IAM role ARN | The user-provided AWS custom IAM role. | None |
Pods | The number of k8s pods which will be used at batch inference time. Number of pods sets the maximum parallelism for an inference job. Each pod handles one or more files/tasks. This configuration takes precedence over the deployed model configuration. | The number of executors/pods defined in the deployment. |
CPU fraction | The CPU fraction that is allocated to the pod. The CPU resource is measured in CPU units. One CPU, in Qwak, is equivalent to: 1 AWS vCPU 1 GCP Core 1 Azure vCore 1 Hyperthread on a bare-metal Intel processor with Hyperthreading This configuration takes precedence over the deployed model configuration. | The number of CPU's defined in the deployment |
Memory | The amount of RAM memory (MB) to allocate to each pod. This configuration takes precedence over the deployed model configuration. | The amount of memory defined in the deployment. |
GPU Type | The GPU Type to use in batch execution. Supported options are, NVIDIA K80, NVIDIA Tesla V100, and NVIDIA T4. | None |
GPU Amount | The number of GPUs available for the batch execution. Varies based on the selected GPU type. | Based on GPU type |
Paramaters | A list of parameters expressed as key-value pairs which will be passed to the execution request. The parameters are passed to the inference container as environment variables |
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<AWS_ACCOUNT_ID>:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/<OIDC_EKS_CLUSTER_ID>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.us-east-1.amazonaws.com/id/<OIDC_PROVIDER_EKS_CLUSTER_ID>:aud": "sts.amazonaws.com"
},
"ForAnyValue:StringEquals": {
"oidc.eks.us-east-1.amazonaws.com/id/<OIDC_PROVIDER_EKS_CLUSTER_ID>:sub": [
"system:serviceaccount:qwak:kube-deployment-captain-access"
]
}
}
}
]
}
Batch Execution
To start an execution from the SDK, use the following command:
from qwak.models.executions.logic.client import BatchJobManagerClient
from qwak.models.executions.logic.results import StartExecutionResult
from qwak.models.executions.logic.executions_config import ExecutionConfig
// The execution configuration
execution_spec = ExecutionConfig.Execution(
model_id=<model-id>,
source_bucket=<source-bucket-name>,
destination_bucket=<destination-bucket-name>,
source_folder=<source-folder-path>,
destination_folder=<destination-folder-path>,
input_file_type=<input-file-type>,
output_file_type=<output-file-type>,
access_token_name=<access_token_name>,
access_secret_name=<access-secret-name>,
job_timeout=<job-timeout>,
file_timeout=<file-timeout>,
parameters=<dictionary of user provided paramaters>
)
resources_config = ExecutionConfig.Resources(
pods=<number-of-pods>,
cpus=<number-of-cpus>,
memory=<memory-amount>,
gpu_type=<gpu_type>,
gpus=<number-of-gpus>,
)
execution_config = ExecutionConfig(execution=execution_spec, resources=resources_config)
batch_job_manager_client = BatchJobManagerClient()
execution_result: StartExecutionResult = batch_job_manager_client.start_execution(execution_config)
execution_id = execution_result.execution_id
qwak models execution start \ TERM ✘ 8m 34s base 12:58:22
--model-id <model-id> \
--source-bucket <source-bucket-name> \
--source-folder <source-folder-path> \
--destination-bucket <destination-bucket-name> \
--destination-folder <destination-folder-path> \
--input-file-type <input-file-type> \
--output-file-type <output-file-type> \
--access-token-name <buckets-access-token-secret-name> \
--access-secret-name <buckets-access-secret-secret-name> \
--job-timeout <entire-job-timeout-in-seconds> \
--file-timeout <single-file-timeout-in-seconds> \
--pods <pods-count> \
--cpus <cpus-fraction> \
--memory <memory-size> \
--build-id <alternate-build-id>
Here is a simplified version with all the default values:
from qwak.models.executions.logic.client import BatchJobManagerClient
from qwak.models.executions.logic.results import StartExecutionResult
from qwak.models.executions.logic.executions_config import ExecutionConfig
// The execution configuration
execution_spec = ExecutionConfig.Execution(
model_id=<model-id>,
bucket=<bucket-name>,
destination_bucket=<destination-bucket-name>,
source_folder=<source-folder-path>,
destination_folder=<destination-folder-path>,
access_token_name=<access_token_name>,
access_secret_name=<access-secret-name>
)
execution_config = ExecutionConfig(execution=execution_spec)
batch_job_manager_client = BatchJobManagerClient()
execution_result: StartExecutionResult = batch_job_manager_client.start_execution(execution_config)
execution_id = execution_result.execution_id
qwak models execution start \ TERM ✘ 8m 34s base 12:58:22
--model-id <model-id> \
--bucket <bucket-name> \
--source-folder <source-folder-path> \
--destination-folder <destination-folder-path> \
--access-token-name <buckets-access-token-secret-name> \
--access-secret-name <buckets-access-secret-secret-name>
Batch Job Parallelism
Note that every file in the given input path is considered a task. A task is the main unit of parallelism for a batch execution job.
For example, if five pods are requested during the batch execution (whether specified by the deployment or in the batch execution job itself), and 10 files need to processed, five files (or tasks) are executed in parallel, out of the 10 tasks that comprise the batch job.
For this reason, there is no point in requesting more pods than the number of files which need to be processed.
Concurrent Executions
You can run multiple executions, concurrently. The only limitation is not running executions with identical values for the following parameters:
- Model Id
- Build Id
- Source Bucket
- Source Folder
- Destination Bucket
- Destination Folder
The assumption is that running two executions with the same parameters, is redundant.
Updated 3 days ago
Next, learn regarding the different options to manage and get visibility on an execution