LucaOne User Guide

更新时间:
复制 MD 格式

This topic describes how to use LucaOne, which is the result of a research collaboration between Alibaba Cloud and scientific institutions.

Use the LucaOne service through the GUI

LucaOne endpoint

Open the page:

Procedure

Step 1. Upload a file

  • Click the file upload button and select a local file.

  • The supported file formats are CSV and FASTA.

    • If you use the CSV format, you must specify the index for the ID (id_idx) and the sequence (seq_idx).

    • If you use the FASTA format, you do not need to specify these indexes. Proceed to the next step.

  • A single task supports a maximum of 500 sequences. If you submit more than 500 sequences, the system reports an error. To process more sequences, you must submit them in batches.

Step 2. Configure parameters

The following table describes the parameters.

Parameter Name

Description

Notes

seq_type

Specify the sequence type:

  • gene: Gene (DNA or RNA)

  • prot: Protein

embedding_type

The sequence embedding method. The output can be a matrix (seq_len × 2560 or (seq_len + 2) × 2560) or a 2560-dimensional vector.

  • matrix: If you select matrix, you must specify whether to enable matrix_add_special_token.

  • vector: If you select vector, you must also select a vector_type.

Recommendations:

1) For analyses such as clustering on a sequence set, embed the sequences as vectors. The mean and max options are preferred.

2) For downstream AI model building, such as standard classification or regression tasks, use the matrix option. Use the matrix as the input for the downstream model and use a parameterized pooling method in the model.

vector_type

Vector type:

  • mean: The average of each column in the matrix.

  • max: The maximum value of each column in the matrix.

  • cls: The vector of the special start character in the matrix.

matrix_add_special_token

Specifies whether the embedding matrix includes vectors for special characters.

If this option is enabled, the embedding matrix includes special character vectors at the beginning and end. The number of rows in the matrix is sequence length + 2. The corresponding sequence is [CLS]ATCGATCG[EOS].

If this option is disabled, the embedding matrix does not include special character vectors.

embedding_complete

If this option is enabled, the entire sequence is calculated for the embedding.

If this option is disabled, you must select a truncation type and length:

trunc_type: The truncation direction.

  • right: From right to left.

  • left: From left to right.

truncation_seq_length: The sequence truncation length. This must be a positive integer.

Step 3. Submit the task

Click Submit to run the task. You can submit only one task at a time. To run tasks in batches, you must use API calls.

Step 4. View and download the results

After you submit the task, a record of the task appears in the generation history.

If the task is running, the progress is displayed:

You can close the page or switch to another page while the task is running. However, you cannot submit another task until the current one is complete.

When the task is complete, the status changes to Completed.

  • After the task is complete, you can download the model output from the Task Run Result section at the bottom of the task card.

  • If the task fails, you can view the cause of the error.

    • If the task fails due to invalid input parameters, reconfigure the parameters and submit the task again.

    • If the task fails for other reasons, such as network issues, you can use the one-click resubmit feature.

Call LucaOne using an API

Prerequisites

  1. You have used the LucaOne service through the GUI (endpoint) and successfully generated a result.

  2. You have integrated with the Model Studio software development kit (SDK). For more information, see Legacy SDK Reference.

  3. You have obtained an AccessKey pair for authentication. For more information, see API authorization.

Submit a task

For more information about the API for submitting tasks, see Submit a workflow task - comfy_prompt.

Request body format

Name

Type

Required

Description

workflow_id

string

Fixed value:

01jfyvw0vmvhewzdxk9jqgznb5

The workflow ID. This is a fixed value that you can copy from the workflow management page.

alias_id

string

Fixed value:

main

The workflow alias. An alias acts as a pointer to a specific version. You can use aliases to easily perform operations such as publishing and rollbacks.

inputs

object

Required

The format of the inputs object depends on the specific workflow. It corresponds to the field mapping that you configure when you publish the workflow. The following content is an example.

input_file

string

Required

The protein or gene sequence. The CSV and FASTA formats are supported.

seq_type

string

Required

The sequence type.

id_idx

integer

Required if input_file is in CSV format.

The index of the column that contains the sequence ID. The index starts from 0.

seq_idx

integer

Required if input_file is in CSV format.

The index of the column that contains the sequence. The index starts from 0.

embedding_type

string

The policy for pooling the sequence embedding matrix into a vector.

  • matrix: If you select matrix, you must specify whether to enable matrix_add_special_token.

  • vector: If you select vector, you must also select a vector_type.

Recommendations:

1) For analyses such as clustering on a sequence set, embed the sequences as vectors. The mean and max options are preferred.

2) For downstream AI model building, such as standard classification or regression tasks, use the matrix option. Use the matrix as the input for the downstream model and use a parameterized pooling method in the model.

vector_type

string

Required when embedding_type is set to vector.

The large language model (LLM) vector embedding type. Valid values: mean, max, and cls.

trunc_type

string

If the sequence exceeds the maximum length, it is truncated. Valid values: right and left.

truncation_seq_length

integer

The maximum length, excluding [CLS] and [SEP]. The length is not limited by the parameter itself but by the available GPU memory for inference.

matrix_add_special_token

integer

Does the embedding matrix include the [CLS] and [SEP] vectors?

embedding_complete

When embedding_complete is set, truncation_seq_length is invalid. If there is not enough GPU memory to run inference on the entire sequence at once, this parameter specifies whether to perform segmented completion. If you do not use this parameter, the sequence is repeatedly truncated to 95% of its length until it fits into the available GPU memory.

Sample request body

{
    "workflow_id": "01jfyvw0vmvhewzdxk9jqgznb5",
    "alias_id": "main",
    "inputs": {
        "input_file": "https://example.com/example.csv",
        "seq_type": "prot",
        "id_idx": 0,
        "seq_idx": 1,
        "embedding_type": "vector",
        "vector_type": "cls",
        "embedding_complete": true
    },
    "randomise_seeds": true
}

Polling progress

For more information about the API for polling for progress, see Query node progress - comfy_get_progress.

Note

If you do not need to track the execution progress percentage, you can skip this step and directly query the result.

Sample progress query response

{
  status: 10,
  apiInvokeId: 'i_677e377e9b80590025ff8822',
  data: {
    etaRelative: 0.09264909,
    currentImage: '',
    progress: 0.6666667,
    state: {
      maxPasses: 0,
      pass: 0,
      nodeTitle: 'Sp LucaOne Infer Embedding',
      step: 0,
      maxSteps: -1,
      nodeLabel: 'Sp LucaOne Infer Embedding'
    },
    message: 'Executing node Sp LucaOne Infer Embedding',
    taskId: '01jh2gha21panx38qb5gfxd085',
    status: 'running'
  }

Query results

For more information about the API for querying results, see Query workflow execution result - comfy_get_result.

Output field format

Name

Type

Description

status

integer

The gateway status code. 10 indicates success. 20 indicates failure.

apiInvokeId

string

The system-generated ID that uniquely identifies the call.

errCode

string

The gateway error code. Ignore this parameter if the call is successful.

errMessage

string

The gateway error details have been ignored.

subErrCode

string

The service error code. Ignore this parameter if the call is successful.

subErrMessage

string

The service error details have been successfully ignored.

data

object

The result returned by the service.

status

string

The task status.

Task succeeded.

failed: The task failed.

Task status: Running.

Queuing: Waiting to be processed.

taskId

string

The task ID.

taskDuration

long

The execution duration in milliseconds.

taskBeginTime

long

The time when the task started. This is a 13-digit timestamp.

taskEndTime

long

The time when the task ended. This is a 13-digit timestamp.

result

object

A custom output field. This field is valid only if a custom output is configured when the API is published.

embedding

object

The word embedding file in ZIP format.

url

string

A publicly accessible file URL.

Important

The file has an expiration date. For long-term use, save the file to a private storage repository.

Sample query results:

{
    "status": 10,
    "apiInvokeId": "i_677e388319c46d002503b84c",
    "data": {
        "result": {
            "embedding": {
                "filename": "embedding5f6305a1a7bb4f90bf941b104699add3.zip",
                "object_key": "comfy/output/lucana/embedding5f6305a1a7bb4f90bf941b104699add3.zip",
                "subfolder": "lucana",
                "type": "output",
                "url": "https://example.com/example.zip"
            }
        },
        "images": [
            "http://example.com/example.zip"
        ],
        "taskDuration": 34642,
        "taskBeginTime": 1736325216019,
        "taskId": "01jh2grdqs7t24x60dpay3qajv",
        "status": "succeeded",
        "taskEndTime": 1736325250661
    },
    "subErrCode": null,
    "subErrMessage": null,
    "errCode": null,
    "errMessage": null,
    "startTime": null,
    "endTime": null,
    "requestId": null
}