LucaOne User Guide-Intelligent Education Platform(IEP)-阿里云帮助中心

This topic describes how to use LucaOne, which is the result of a research collaboration between Alibaba Cloud and scientific institutions.

Use the LucaOne service through the GUI

LucaOne endpoint

Open the page:

Procedure

Step 1. Upload a file

Click the file upload button and select a local file.
The supported file formats are CSV and FASTA.
- If you use the CSV format, you must specify the index for the ID (id_idx) and the sequence (seq_idx).
- If you use the FASTA format, you do not need to specify these indexes. Proceed to the next step.
A single task supports a maximum of 500 sequences. If you submit more than 500 sequences, the system reports an error. To process more sequences, you must submit them in batches.

Step 2. Configure parameters

The following table describes the parameters.

Parameter Name	Description	Notes
seq_type	Specify the sequence type: gene: Gene (DNA or RNA) prot: Protein
embedding_type	The sequence embedding method. The output can be a matrix (seq_len × 2560 or (seq_len + 2) × 2560) or a 2560-dimensional vector. matrix: If you select matrix, you must specify whether to enable matrix_add_special_token. vector: If you select vector, you must also select a vector_type.	Recommendations: 1) For analyses such as clustering on a sequence set, embed the sequences as vectors. The mean and max options are preferred. 2) For downstream AI model building, such as standard classification or regression tasks, use the matrix option. Use the matrix as the input for the downstream model and use a parameterized pooling method in the model.
vector_type	Vector type: mean: The average of each column in the matrix. max: The maximum value of each column in the matrix. cls: The vector of the special start character in the matrix.
matrix_add_special_token	Specifies whether the embedding matrix includes vectors for special characters. If this option is enabled, the embedding matrix includes special character vectors at the beginning and end. The number of rows in the matrix is sequence length + 2. The corresponding sequence is [CLS]ATCGATCG[EOS]. If this option is disabled, the embedding matrix does not include special character vectors.
embedding_complete	If this option is enabled, the entire sequence is calculated for the embedding. If this option is disabled, you must select a truncation type and length: trunc_type: The truncation direction. right: From right to left. left: From left to right. truncation_seq_length: The sequence truncation length. This must be a positive integer.

Step 3. Submit the task

Click Submit to run the task. You can submit only one task at a time. To run tasks in batches, you must use API calls.

Step 4. View and download the results

After you submit the task, a record of the task appears in the generation history.

If the task is running, the progress is displayed:

You can close the page or switch to another page while the task is running. However, you cannot submit another task until the current one is complete.

When the task is complete, the status changes to Completed.

After the task is complete, you can download the model output from the Task Run Result section at the bottom of the task card.
If the task fails, you can view the cause of the error.
- If the task fails due to invalid input parameters, reconfigure the parameters and submit the task again.
- If the task fails for other reasons, such as network issues, you can use the one-click resubmit feature.

Call LucaOne using an API

Prerequisites

You have used the LucaOne service through the GUI (endpoint) and successfully generated a result.
You have integrated with the Model Studio software development kit (SDK). For more information, see Legacy SDK Reference.
You have obtained an AccessKey pair for authentication. For more information, see API authorization.

Submit a task

For more information about the API for submitting tasks, see Submit a workflow task - comfy_prompt.

Request body format

Name		Type	Required	Description
workflow_id		string	Fixed value: 01jfyvw0vmvhewzdxk9jqgznb5	The workflow ID. This is a fixed value that you can copy from the workflow management page.
alias_id		string	Fixed value: main	The workflow alias. An alias acts as a pointer to a specific version. You can use aliases to easily perform operations such as publishing and rollbacks.
inputs		object	Required	The format of the inputs object depends on the specific workflow. It corresponds to the field mapping that you configure when you publish the workflow. The following content is an example.
	input_file	string	Required	The protein or gene sequence. The CSV and FASTA formats are supported.
	seq_type	string	Required	The sequence type.
	id_idx	integer	Required if input_file is in CSV format.	The index of the column that contains the sequence ID. The index starts from 0.
	seq_idx	integer	Required if input_file is in CSV format.	The index of the column that contains the sequence. The index starts from 0.
	embedding_type	string	The policy for pooling the sequence embedding matrix into a vector. matrix: If you select matrix, you must specify whether to enable matrix_add_special_token. vector: If you select vector, you must also select a vector_type.	Recommendations: 1) For analyses such as clustering on a sequence set, embed the sequences as vectors. The mean and max options are preferred. 2) For downstream AI model building, such as standard classification or regression tasks, use the matrix option. Use the matrix as the input for the downstream model and use a parameterized pooling method in the model.
	vector_type	string	Required when embedding_type is set to vector.	The large language model (LLM) vector embedding type. Valid values: mean, max, and cls.
	trunc_type	string		If the sequence exceeds the maximum length, it is truncated. Valid values: right and left.
	truncation_seq_length	integer		The maximum length, excluding [CLS] and [SEP]. The length is not limited by the parameter itself but by the available GPU memory for inference.
	matrix_add_special_token	integer		Does the embedding matrix include the [CLS] and [SEP] vectors?
	embedding_complete			When `embedding_complete` is set, `truncation_seq_length` is invalid. If there is not enough GPU memory to run inference on the entire sequence at once, this parameter specifies whether to perform segmented completion. If you do not use this parameter, the sequence is repeatedly truncated to 95% of its length until it fits into the available GPU memory.

Sample request body

{
    "workflow_id": "01jfyvw0vmvhewzdxk9jqgznb5",
    "alias_id": "main",
    "inputs": {
        "input_file": "https://example.com/example.csv",
        "seq_type": "prot",
        "id_idx": 0,
        "seq_idx": 1,
        "embedding_type": "vector",
        "vector_type": "cls",
        "embedding_complete": true
    },
    "randomise_seeds": true
}

Polling progress

For more information about the API for polling for progress, see Query node progress - comfy_get_progress.

Note

If you do not need to track the execution progress percentage, you can skip this step and directly query the result.

Sample progress query response

{
  status: 10,
  apiInvokeId: 'i_677e377e9b80590025ff8822',
  data: {
    etaRelative: 0.09264909,
    currentImage: '',
    progress: 0.6666667,
    state: {
      maxPasses: 0,
      pass: 0,
      nodeTitle: 'Sp LucaOne Infer Embedding',
      step: 0,
      maxSteps: -1,
      nodeLabel: 'Sp LucaOne Infer Embedding'
    },
    message: 'Executing node Sp LucaOne Infer Embedding',
    taskId: '01jh2gha21panx38qb5gfxd085',
    status: 'running'
  }

Query results

For more information about the API for querying results, see Query workflow execution result - comfy_get_result.

Output field format

Name				Type	Description
status				integer	The gateway status code. 10 indicates success. 20 indicates failure.
apiInvokeId				string	The system-generated ID that uniquely identifies the call.
errCode				string	The gateway error code. Ignore this parameter if the call is successful.
errMessage				string	The gateway error details have been ignored.
subErrCode				string	The service error code. Ignore this parameter if the call is successful.
subErrMessage				string	The service error details have been successfully ignored.
data				object	The result returned by the service.
	status			string	The task status. Task succeeded. failed: The task failed. Task status: Running. Queuing: Waiting to be processed.
	taskId			string	The task ID.
	taskDuration			long	The execution duration in milliseconds.
	taskBeginTime			long	The time when the task started. This is a 13-digit timestamp.
	taskEndTime			long	The time when the task ended. This is a 13-digit timestamp.
	result			object	A custom output field. This field is valid only if a custom output is configured when the API is published.
		embedding		object	The word embedding file in ZIP format.
			url	string	A publicly accessible file URL. Important The file has an expiration date. For long-term use, save the file to a private storage repository.

Sample query results:

{
    "status": 10,
    "apiInvokeId": "i_677e388319c46d002503b84c",
    "data": {
        "result": {
            "embedding": {
                "filename": "embedding5f6305a1a7bb4f90bf941b104699add3.zip",
                "object_key": "comfy/output/lucana/embedding5f6305a1a7bb4f90bf941b104699add3.zip",
                "subfolder": "lucana",
                "type": "output",
                "url": "https://example.com/example.zip"
            }
        },
        "images": [
            "http://example.com/example.zip"
        ],
        "taskDuration": 34642,
        "taskBeginTime": 1736325216019,
        "taskId": "01jh2grdqs7t24x60dpay3qajv",
        "status": "succeeded",
        "taskEndTime": 1736325250661
    },
    "subErrCode": null,
    "subErrMessage": null,
    "errCode": null,
    "errMessage": null,
    "startTime": null,
    "endTime": null,
    "requestId": null
}