This topic describes applications on the genomics analysis platform and how to create, edit, and run them in a workspace.
Genomic data analysis involves using various bioinformatics tools to perform a series of complex calculations. For example, analyzing data for genetic diseases requires multiple steps, such as data quality control, sequence alignment, variant detection, and variant annotation.
Figure 1. Genomic data analysis workflow (Source: Chinese Journal of Medical Genetics)
These analysis scripts are known as workflows or pipelines. Simply put, a workflow can be a Shell or Python script that executes multiple commands to complete a specific genomic data analysis task. Most analysis workflows are designed to run locally. This often ties their software dependencies and workflow logic to a specific execution environment, such as a High-Performance Computing (HPC) cluster. This makes the workflows difficult to migrate and the results hard to reproduce.
Workflow description languages, such as Snakemake, CWL, WDL, and Nextflow, were created to solve these problems. They are a type of Domain-Specific Language (DSL) with their own syntax and rules. They help users write standardized, portable, and reproducible bioinformatics tool workflows easily and efficiently.
The Alibaba Cloud genomics analysis platform uses Workflow Description Language (WDL) as its standard for defining applications. WDL is supported by the GA4GH community. The platform provides production-level support for developing, testing, and running these applications.
A unified workflow language standard enables you to import public WDL workflows from the research community or use public applications from platform developers. This approach lowers the barrier to bioinformatics analysis and helps you complete genomic data analysis tasks easily and efficiently.