LucaOne User Guide
This topic describes how to use LucaOne, which is the result of a research collaboration between Alibaba Cloud and scientific institutions.
Use the LucaOne service through the GUI
Open the page:
Procedure
Step 1. Upload a file
Click the file upload button and select a local file.
The supported file formats are CSV and FASTA.
If you use the CSV format, you must specify the index for the ID (id_idx) and the sequence (seq_idx).
If you use the FASTA format, you do not need to specify these indexes. Proceed to the next step.
A single task supports a maximum of 500 sequences. If you submit more than 500 sequences, the system reports an error. To process more sequences, you must submit them in batches.
Step 2. Configure parameters
The following table describes the parameters.
Parameter Name |
Description |
Notes |
seq_type |
Specify the sequence type:
|
|
embedding_type |
The sequence embedding method. The output can be a matrix (seq_len × 2560 or (seq_len + 2) × 2560) or a 2560-dimensional vector.
|
Recommendations: 1) For analyses such as clustering on a sequence set, embed the sequences as vectors. The mean and max options are preferred. 2) For downstream AI model building, such as standard classification or regression tasks, use the matrix option. Use the matrix as the input for the downstream model and use a parameterized pooling method in the model. |
vector_type |
Vector type:
|
|
matrix_add_special_token |
Specifies whether the embedding matrix includes vectors for special characters. If this option is enabled, the embedding matrix includes special character vectors at the beginning and end. The number of rows in the matrix is sequence length + 2. The corresponding sequence is [CLS]ATCGATCG[EOS]. If this option is disabled, the embedding matrix does not include special character vectors. |
|
embedding_complete |
If this option is enabled, the entire sequence is calculated for the embedding. If this option is disabled, you must select a truncation type and length: trunc_type: The truncation direction.
truncation_seq_length: The sequence truncation length. This must be a positive integer. |
Step 3. Submit the task
Click Submit to run the task. You can submit only one task at a time. To run tasks in batches, you must use API calls.
Step 4. View and download the results
After you submit the task, a record of the task appears in the generation history.
If the task is running, the progress is displayed:
You can close the page or switch to another page while the task is running. However, you cannot submit another task until the current one is complete.
When the task is complete, the status changes to Completed.
After the task is complete, you can download the model output from the Task Run Result section at the bottom of the task card.
If the task fails, you can view the cause of the error.
If the task fails due to invalid input parameters, reconfigure the parameters and submit the task again.
If the task fails for other reasons, such as network issues, you can use the one-click resubmit feature.
Call LucaOne using an API
Prerequisites
You have used the LucaOne service through the GUI (endpoint) and successfully generated a result.
You have integrated with the Model Studio software development kit (SDK). For more information, see Legacy SDK Reference.
You have obtained an AccessKey pair for authentication. For more information, see API authorization.
Submit a task
For more information about the API for submitting tasks, see Submit a workflow task - comfy_prompt.
Request body format
Name |
Type |
Required |
Description |
|
workflow_id |
string |
Fixed value: 01jfyvw0vmvhewzdxk9jqgznb5 |
The workflow ID. This is a fixed value that you can copy from the workflow management page. |
|
alias_id |
string |
Fixed value: main |
The workflow alias. An alias acts as a pointer to a specific version. You can use aliases to easily perform operations such as publishing and rollbacks. |
|
inputs |
object |
Required |
The format of the inputs object depends on the specific workflow. It corresponds to the field mapping that you configure when you publish the workflow. The following content is an example. |
|
input_file |
string |
Required |
The protein or gene sequence. The CSV and FASTA formats are supported. |
|
seq_type |
string |
Required |
The sequence type. |
|
id_idx |
integer |
Required if input_file is in CSV format. |
The index of the column that contains the sequence ID. The index starts from 0. |
|
seq_idx |
integer |
Required if input_file is in CSV format. |
The index of the column that contains the sequence. The index starts from 0. |
|
embedding_type |
string |
The policy for pooling the sequence embedding matrix into a vector.
|
Recommendations: 1) For analyses such as clustering on a sequence set, embed the sequences as vectors. The mean and max options are preferred. 2) For downstream AI model building, such as standard classification or regression tasks, use the matrix option. Use the matrix as the input for the downstream model and use a parameterized pooling method in the model. |
|
vector_type |
string |
Required when embedding_type is set to vector. |
The large language model (LLM) vector embedding type. Valid values: mean, max, and cls. |
|
trunc_type |
string |
If the sequence exceeds the maximum length, it is truncated. Valid values: right and left. |
||
truncation_seq_length |
integer |
The maximum length, excluding [CLS] and [SEP]. The length is not limited by the parameter itself but by the available GPU memory for inference. |
||
matrix_add_special_token |
integer |
Does the embedding matrix include the [CLS] and [SEP] vectors? |
||
embedding_complete |
When |
|||
Sample request body
{
"workflow_id": "01jfyvw0vmvhewzdxk9jqgznb5",
"alias_id": "main",
"inputs": {
"input_file": "https://example.com/example.csv",
"seq_type": "prot",
"id_idx": 0,
"seq_idx": 1,
"embedding_type": "vector",
"vector_type": "cls",
"embedding_complete": true
},
"randomise_seeds": true
}
Polling progress
For more information about the API for polling for progress, see Query node progress - comfy_get_progress.
If you do not need to track the execution progress percentage, you can skip this step and directly query the result.
Sample progress query response
{
status: 10,
apiInvokeId: 'i_677e377e9b80590025ff8822',
data: {
etaRelative: 0.09264909,
currentImage: '',
progress: 0.6666667,
state: {
maxPasses: 0,
pass: 0,
nodeTitle: 'Sp LucaOne Infer Embedding',
step: 0,
maxSteps: -1,
nodeLabel: 'Sp LucaOne Infer Embedding'
},
message: 'Executing node Sp LucaOne Infer Embedding',
taskId: '01jh2gha21panx38qb5gfxd085',
status: 'running'
}
Query results
For more information about the API for querying results, see Query workflow execution result - comfy_get_result.
Output field format
Name |
Type |
Description |
|||
status |
integer |
The gateway status code. 10 indicates success. 20 indicates failure. |
|||
apiInvokeId |
string |
The system-generated ID that uniquely identifies the call. |
|||
errCode |
string |
The gateway error code. Ignore this parameter if the call is successful. |
|||
errMessage |
string |
The gateway error details have been ignored. |
|||
subErrCode |
string |
The service error code. Ignore this parameter if the call is successful. |
|||
subErrMessage |
string |
The service error details have been successfully ignored. |
|||
data |
object |
The result returned by the service. |
|||
status |
string |
The task status. Task succeeded. failed: The task failed. Task status: Running. Queuing: Waiting to be processed. |
|||
taskId |
string |
The task ID. |
|||
taskDuration |
long |
The execution duration in milliseconds. |
|||
taskBeginTime |
long |
The time when the task started. This is a 13-digit timestamp. |
|||
taskEndTime |
long |
The time when the task ended. This is a 13-digit timestamp. |
|||
result |
object |
A custom output field. This field is valid only if a custom output is configured when the API is published. |
|||
embedding |
object |
The word embedding file in ZIP format. |
|||
url |
string |
A publicly accessible file URL.
Important
The file has an expiration date. For long-term use, save the file to a private storage repository. |
|||
Sample query results:
{
"status": 10,
"apiInvokeId": "i_677e388319c46d002503b84c",
"data": {
"result": {
"embedding": {
"filename": "embedding5f6305a1a7bb4f90bf941b104699add3.zip",
"object_key": "comfy/output/lucana/embedding5f6305a1a7bb4f90bf941b104699add3.zip",
"subfolder": "lucana",
"type": "output",
"url": "https://example.com/example.zip"
}
},
"images": [
"http://example.com/example.zip"
],
"taskDuration": 34642,
"taskBeginTime": 1736325216019,
"taskId": "01jh2grdqs7t24x60dpay3qajv",
"status": "succeeded",
"taskEndTime": 1736325250661
},
"subErrCode": null,
"subErrMessage": null,
"errCode": null,
"errMessage": null,
"startTime": null,
"endTime": null,
"requestId": null
}