UploadSession

更新时间:
复制 MD 格式

UploadSession uploads data to a MaxCompute table by splitting the data into blocks, writing each block independently, and committing all blocks atomically when the upload is complete. Because each block can be retried individually if a network error occurs, this approach is suitable for large datasets and unstable network environments.

How it works

An upload follows four sequential steps:

  1. Create a session — call the constructor or use TableTunnel to create an UploadSession. The server creates a session and returns a unique upload ID. Retrieve the ID with getId().

  2. Write data to blocks — call openRecordWriter(blockId) to get a RecordWriter for each block. Write records to the writer, then close it. Use blockId to identify each block's position in the table.

  3. Verify uploaded blocks — call getBlockList() to list the blocks the server has received. Compare this list with the block IDs you sent, and re-upload any missing blocks.

  4. Commit — call commit(Long[] blocks) with the full list of uploaded blocks. The server verifies that the block list matches its records. If the lists match, the data is moved to the result table.

All requests in these steps run in synchronous mode.

Class definition

public class UploadSession {
    UploadSession(Configuration conf, String projectName, String tableName,
                  String partitionSpec) throws TunnelException;
    UploadSession(Configuration conf, String projectName, String tableName,
                  String partitionSpec, String uploadId) throws TunnelException;
    public void commit(Long[] blocks);
    public Long[] getBlockList();
    public String getId();
    public TableSchema getSchema();
    public UploadSession.Status getStatus();
    public Record newRecord();
    public RecordWriter openRecordWriter(long blockId);
    public RecordWriter openRecordWriter(long blockId, boolean compress);
    public RecordWriter openBufferedWriter();
    public RecordWriter openBufferedWriter(boolean compress);
}

The second constructor accepts an existing uploadId. Pass an uploadId when you need to reference or share an existing upload session that was previously created.

Choose a writer

openRecordWriteropenBufferedWriter
Block managementManual — you specify blockId for each blockAutomatic — block IDs are managed internally
Retry on failureManual — re-upload failed blocks using getBlockList()Automatic — the writer retries failed blocks
Use whenYou need fine-grained control over block orderingYou want straightforward batch upload without managing block IDs

Use openBufferedWriter for most uploads. Use openRecordWriter when you need explicit control over block IDs.

For a complete example using BufferedWriter, see Data upload by using BufferedWriter.

Session states

StateTriggered whenMeaning
UNKNOWNServer creates the sessionInitial state
NORMALSession is created successfullyReady to accept block writes
CLOSINGcommit() is calledServer is verifying the block list
CLOSEDCommit completes successfullyData has been moved to the result table
EXPIREDSession exceeds the 24-hour time-to-live (TTL)Session is no longer valid; create a new session
CRITICALA service error has occurredA server-side error has occurred

Limits

ItemValueNotes
blockId range0–20,000Block IDs must be unique within a session
Maximum block size100 GB
Recommended minimum block size64 MBBlocks smaller than 64 MB reduce upload efficiency
Session TTL24 hoursMeasured from session creation
Network idle timeout120 secondsIf the RecordWriter writes no data for 120 seconds, the server closes the connection; open a new RecordWriter to continue
Network action frequencyEvery 8 KB writtenA network action is triggered each time the RecordWriter writes 8 KB

Usage notes

  • Block IDs must be unique within the same upload session. After you open a RecordWriter with a given blockId, write data, and close the writer, that blockId cannot be reused to open another RecordWriter in the same session.

  • If commit() fails, retry the call.

  • Call getStatus() at any point to check the current session state.