What Is Gene Analysis Platform?

更新时间:
复制 MD 格式

Alibaba Cloud Genomics Computing Platform is a user-friendly, one-stop platform for gene analysis. It provides end-to-end core capabilities, including data transmission, storage management, and bioinformatics analysis. The platform supports the open standards of the Global Alliance for Genomics and Health (GA4GH) community and integrates a rich set of workflow tools and public datasets. This helps you securely and efficiently process genomic data of any scale with agility and elasticity.

Genomics Computing Platform provides a complete, serverless gene computing service that is simple, cost-effective, flexible, reliable, and highly scalable. The platform combines Alibaba Cloud's massive storage and computing resources, a user network that connects upstream and downstream sequencing, and an ecosystem of data and application partners. Genomics Computing Platform is widely used for the entire genomics data analysis process, from sample to report. It can serve as the computing foundation for various application systems to meet the needs of scientific research and clinical applications in genomics. For more information, see Benefits.

Service architecture

The service architecture of Genomics Computing Platform is as follows:

基因分析平台产品架构图

  • Your genomic data is securely encrypted and stored in your Object Storage Service (OSS) instance. You grant Genomics Computing Platform access only during the computation process.

  • The platform provides accelerated file access on the compute side with a cache. This allows compute jobs to directly read and write OSS files and resolves I/O and throughput issues for parallel tasks.

  • Large-scale parallel computing scheduling provides container or virtual machine execution environments, supports various types of heterogeneous computing, and accelerates gene analysis.

  • The workflow execution engine is engineered to support GA4GH standards, such as WDL and CWL, without requiring migration or modification.

  • The platform provides user workspaces with fine-grained resource and permission control.

  • The platform provides many out-of-the-box public applications, including Sentieon and GATK, for users worldwide.

Features

The Gene Analysis Platform provides the following features:

  • Gene Data Management

    Genomic data is securely stored in your Alibaba Cloud OSS instance. The data is encrypted during transmission and storage, with 99.999999999% data reliability. You can also use features such as versioning, three-availability zone (AZ) deployment, and cross-region replication for disaster recovery.

    The platform supports various OSS transfer tools, including the command line, graphical clients, and web-based interfaces. You can use these tools to quickly upload genomic data to your workspace for analysis.

    The platform provides entity tables to help you organize and manage genomic data. You can store OSS files and other information associated with biological samples, such as sample and experiment details, in a structured format. This structure makes it easy to retrieve, display, and use the data for subsequent batch analysis tasks.

  • Bioinformatics workflow development

    Genomics Computing Platform primarily supports the Workflow Definition Language (WDL) standard from the GA4GH consortium. You can develop and test workflows locally, and then use them on the platform for large-scale production analysis tasks. Your applications are standardized, portable, and reproducible, and support multiple execution environments.

    The platform provides a development and editing environment for your bioinformatics workflows that supports version management and modular reuse. The platform also provides resources for WDL workflow development, including public tool images, third-party commercial software, and public applications. These resources help you easily build your own analysis applications.

  • Genomic computing tasks

    After you run a WDL analysis application and specify the OSS input files and runtime parameters, the platform automatically executes the analysis task until it is complete. The platform provides workflow management features such as intelligent scheduling, error retry, and interruption recovery. It also offers basic developer features, including status queries, performance monitoring, and log collection for running jobs.

Pricing

When you use Genomics Computing Platform, the following resources are billable:

  • Compute resource: Resources such as CPU, memory, and disk that are consumed during the execution of your compute job.

  • Software and algorithms: Third-party commercial software that might be used in your compute jobs.

For more information, see Billing overview.

Related concepts

To learn about the fundamentals of Object Storage Service, including its definition, features, and applications, see What is Object Storage Service.