Batch Execution Management
See that various commands that help you manage and track the execution status of your batch models.
Getting batch execution status
To check the current status of an execution, use the following command:
qwak models execution status --execution-id <execution-id>
from qwak.models.executions.logic.client import BatchJobManagerClient
from qwak.models.executions.logic.results import ExecutionStatusResult
batch_job_manager_client = BatchJobManagerClient()
status_response: ExecutionStatusResult = batch_job_manager_client.get_execution_status("<execution-id>")
status = status_response.status
The execution_id
is returned when an execution is created, and is also visible in the UI.
Cancelling a batch execution
To cancel an execution, use the following command:
qwak models execution cancel --execution-id <execution-id>
from qwak.models.executions.logic.client import BatchJobManagerClient
batch_job_manager_client = BatchJobManagerClient()
batch_job_manager_client.cancel_execution("<execution-id>")
Using warmup
In some cases, the execution is a single step in a larger workflow orchestration. If the speed of execution is critical, use the warmup option.
The warmup option allows you to allocate the resources for execution before the execution starts. The resources are raised and kept running until the execution itself starts. This is especially relevant when a lot of resources are needed, or when reducing the running time by even 5 minutes is critical.
from qwak.clients.batch_job_management import BatchJobManagerClient
from qwak.clients.batch_job_management.executions_config import ExecutionConfig
# execution configuration
execution_spec = ExecutionConfig.Execution(
model_id=<model-id>,
bucket=<bucket-name>,
destination_bucket=<destination-bucket-name>,
source_folder=<source-folder-path>,
destination_folder=<destination-folder-path>,
access_token_name=<access_token_name>,
access_secret_name=<access-secret-name>,
build_id=<alternate-build-id>
)
warmup_spec = ExecutionConfig.Warmup(
timeout=0 # warmup timeout in seconds
)
batch_job_manager_client = BatchJobManagerClient()
execution_config = ExecutionConfig(execution=execution_spec, warmup=warmup_spec)
batch_job_manager_client = BatchJobManagerClient()
batch_job_manager_client.start_warmup_job(execution_config)
from qwak.inference.clients import BatchInferenceClient
# You can also set QWAK_MODEL_ID environment variable instead of passing it
batch_inference_client = BatchInferenceClient(model_id="<model-id>")
batch_inference_client.warmup(
executors=<number-of-pods>,
cpus=<number-of-cpus>,
memory=<memory-amount>,
timeout=<timeout-for-warmup>,
build_id=<alternate-build-id>)
Troubleshooting
For each execution there are two types of logs.
- Execution Report: Contains the initial request, status updates, as well as the cancel and failed requests.
- Model Logs: These are available once the execution advances to the stage during which the files are processed.
To view both log types, use the following command:
qwak models execution report --execution-id <execution-id>
from qwak.models.executions.logic.client import BatchJobManagerClient
from qwak.models.executions.logic.results import GetExecutionReportResult
execution_report: GetExecutionReportResult = batch_job_manager_client.get_execution_report(<execution-id>)
report_records = execution_report.records
model_logs = execution_report.model_logs
In some cases you might want to output logs from the model itself in order to better understand the model processing behavior. In order to make the logs available, you need to use the JFrog ML Logger in your model's code.
from qwak.tools.logger import get_qwak_logger
logger = get_qwak_logger()
logger.info("your message here")
Updated about 2 months ago