This topic describes the release history of the Gene Analysis Platform, including release dates, changes in each version, and core features.
Q: Are there ready-to-use analysis applications available?
A: Yes. EasyGene provides out-of-the-box public applications in the application repository. You can install these applications into a workspace to use them. These applications are provided by our partners and the open source community. They are designed to cover mainstream analysis scenarios in the industry, such as whole genome analysis, whole exome analysis, and tumor analysis. If these applications do not meet your business needs, contact us to discuss developing new ones.
Q: Does the platform support other workflow language standards besides WDL?
A: Currently, EasyGene primarily supports Workflow Definition Language (WDL). Support for other languages, such as CWL and Nextflow, is on our product roadmap. The bioinformatics field does not have a single standard workflow language. Many users use other standards, such as Nextflow, CWL, or Snakemake. Alibaba Cloud EasyGene follows industry standards to reduce the cost of learning and application migration. It also aims to build an open application ecosystem with the open source community and bioinformatics developers. WDL is a workflow language standard supported by the Global Alliance for Genomics and Health (GA4GH). It offers complete solutions for both local and cloud execution that meet development, testing, and analysis needs. For these reasons, WDL is our current preference and part of our long-term support plan. We welcome your feedback on support for other languages.
Q: How does the platform help users accelerate the analysis of genetic data?
A: The gene analysis platform offers several methods to accelerate computational analysis: 1) Computational cache acceleration: Optimizes I/O for large data volumes by enabling streaming access to input files and using compute-side caching for public reference files. 2) Large-scale parallel computing: Overcomes the limitations of local compute resources. The platform supports the Scatter-Gather pattern to optimize bioinformatics workflows. 3) Accelerated hardware and algorithms: Provides various acceleration methods, such as Sentieon software, FPGAs, and GPUs, to optimize time-consuming computation steps. These methods can be combined with user scripts.
Q: Does the gene analysis platform support cross-account access to OSS resources?
A: Yes, it can. EasyGene lets you access OSS resources across accounts. To do this, the resource owner must grant the required permissions to the account that needs access.
The authorization policy for another account is as follows: `arn:sts::123456789:assumed-role/aliyuneasygenedefaultrole/*`. In this policy, `123456789` is the ID of the third-party account that is granted access to the bucket.
Q: How do I troubleshoot common task errors on the Gene Analysis Platform?
A: You can troubleshoot a job error on the Gene Analysis Platform by following the steps below:
Check the error message on the task page to locate the error and identify its cause.
If the error message on the page does not help you identify the cause, check the task's stdout, stderr, and redirected output files for more detailed error messages.
If the cause of the error is still unclear, check the memory and disk usage on the performance monitoring page. If memory or disk usage approaches 100% or climbs rapidly before the task ends, increase the compute resources and retry the task. Note: Due to the monitoring interval, the performance monitoring data might not show 100% usage even if a task fails due to resource exhaustion.
If these steps do not help you find the cause, contact the product team or submit a ticket.
To facilitate future troubleshooting, add sufficient logging to your task execution process. This helps locate the cause of errors.
Q: How can I improve the efficiency of running many tasks concurrently?
A: For each task, EasyGene prepares machine resources and pulls a Docker image. During periods of high concurrency, preparing resources and pulling images simultaneously can create a scheduling bottleneck, which reduces efficiency and increases analysis costs. To optimize concurrent task performance, consider the following suggestions:
Merge short-running tasks to ensure each task runs for more than 20 minutes. Preparing resources and pulling Docker images takes time. Submitting many short tasks, such as those that run for less than 10 minutes, causes the platform to repeatedly prepare and release resources, which significantly degrades scheduling performance. Merging these short tasks improves efficiency and reduces costs. When you merge tasks, you must also rebuild the relevant Docker images.
Submit tasks at intervals instead of submitting many tasks at once. Gene sequencing data is processed in batches, which means analysis jobs are also often submitted in batches. Submitting many tasks at once causes a sudden spike in resource demand that degrades scheduling performance. Instead of waiting for all data to finish uploading, submit analysis tasks in batches as you upload the data. This approach shortens the time required to obtain results.
You can use the following methods to submit analysis tasks while uploading data: 1) After the sequencing data is processed and split, use the OSS software development kit (SDK) locally to upload it by sample. Immediately after uploading, use the EasyGene SDK to submit the analysis task. This method automates data upload and analysis, provides the highest scheduling performance, and offers the shortest delivery cycle for results. 2) After the sequencing data is processed and split, upload it in batches using a tool such as ossutil. After each batch upload is complete, submit the analysis tasks using the EasyGene SDK or from the console. Each batch should not exceed 100 samples.
If performance is still unsatisfactory after you apply these optimizations, or if you have a short-term need to run many analysis tasks, contact the product team or submit a ticket before you submit the tasks.