This topic describes the runtime options available on the gene analysis platform and how they affect analysis results.
Call-Caching
Call-Caching allows the gene analysis platform to reuse the results of a previous job execution if the new job is identical to the previous one. This feature avoids re-computation, which saves time and money.
In simple terms, after a job completes successfully, if a new job has the same inputs and execution command, the platform skips the execution and returns the existing result.
Rule hits: The platform calculates a hash value based on a job's input parameters, runtime properties (such as CPU, memory, Docker, and software), and the command line. A Call-Caching hit occurs if this hash value matches the hash of a historical job record on the platform and that record is within its retention period.
Retention period: The retention period for records of successfully completed jobs is 30 days. If you manually delete intermediate results, the corresponding job record becomes invalid and cannot be used for Call-Caching.
Hit result: If a job generated by a task triggers a Call-Caching hit, the job skips execution and reuses the output results from the historical record. The platform does not create a backend job, consume resources, or incur charges.
Delete intermediate results
The gene analysis platform saves the intermediate results of your tasks to your OSS bucket. These results are used as input files for subsequent analysis steps.
After a task completes, intermediate results from the flow are retained in addition to the final outputs defined by the application. These files can be used to resume failed tasks or as job inputs for Call-Caching hits. However, retaining these files consumes storage space and increases storage costs.
When you submit a task, you can select the "Delete intermediate results" option. After the task completes, the gene analysis platform deletes all intermediate result files except for the output files defined in the application. Only basic script and log files are retained, which saves you storage space.
If you choose to delete intermediate results, the task cannot be resumed if it is interrupted. The jobs within the task also cannot trigger Call-Caching hits.
Output file location
All files from your tasks are uploaded to a specified OSS storage location. By default, the results of tasks in a workspace are saved to the OSS bucket attached to that workspace.
oss://<OSS Bucket attached to the workspace>/analysis/
The directory structure for all generated files is as follows:
oss://<OSS Bucket attached to the workspace>/analysis/
|- <RunID>/<WorkflowName>/<WorkflowID>/
|- call-xxxx
|- call-xxxx
|- call-hc
|- bcs-stderr
|- bcs-stdout
|- rc
|- script
|- stderr
|- stdout
|- test.g.vcf.gz
|- test.vcf.gz
|- worker
For a task, you can specify any OSS storage location with read and write permissions as the output file location, such as oss://bucket/dir/.
All files in the output location follow the same directory structure.
RUNID: The unique ID of a task on the gene analysis platform.
WorkflowName: The name of the workflow in the Workflow Definition Language (WDL) application used by the task.
WorkflowID: The UUID for the workflow execution, automatically generated by the execution engine.
Call-xxxx: The name of a call within the workflow.
stderr: The standard error of the task.
stdout: The standard output of the task.