UploadSession uploads data to a MaxCompute table by splitting the data into blocks, writing each block independently, and committing all blocks atomically when the upload is complete. Because each block can be retried individually if a network error occurs, this approach is suitable for large datasets and unstable network environments.
How it works
An upload follows four sequential steps:
Create a session — call the constructor or use
TableTunnelto create anUploadSession. The server creates a session and returns a unique upload ID. Retrieve the ID withgetId().Write data to blocks — call
openRecordWriter(blockId)to get aRecordWriterfor each block. Write records to the writer, then close it. UseblockIdto identify each block's position in the table.Verify uploaded blocks — call
getBlockList()to list the blocks the server has received. Compare this list with the block IDs you sent, and re-upload any missing blocks.Commit — call
commit(Long[] blocks)with the full list of uploaded blocks. The server verifies that the block list matches its records. If the lists match, the data is moved to the result table.
All requests in these steps run in synchronous mode.
Class definition
public class UploadSession {
UploadSession(Configuration conf, String projectName, String tableName,
String partitionSpec) throws TunnelException;
UploadSession(Configuration conf, String projectName, String tableName,
String partitionSpec, String uploadId) throws TunnelException;
public void commit(Long[] blocks);
public Long[] getBlockList();
public String getId();
public TableSchema getSchema();
public UploadSession.Status getStatus();
public Record newRecord();
public RecordWriter openRecordWriter(long blockId);
public RecordWriter openRecordWriter(long blockId, boolean compress);
public RecordWriter openBufferedWriter();
public RecordWriter openBufferedWriter(boolean compress);
}The second constructor accepts an existing uploadId. Pass an uploadId when you need to reference or share an existing upload session that was previously created.
Choose a writer
openRecordWriter | openBufferedWriter | |
|---|---|---|
| Block management | Manual — you specify blockId for each block | Automatic — block IDs are managed internally |
| Retry on failure | Manual — re-upload failed blocks using getBlockList() | Automatic — the writer retries failed blocks |
| Use when | You need fine-grained control over block ordering | You want straightforward batch upload without managing block IDs |
Use openBufferedWriter for most uploads. Use openRecordWriter when you need explicit control over block IDs.
For a complete example using BufferedWriter, see Data upload by using BufferedWriter.
Session states
| State | Triggered when | Meaning |
|---|---|---|
UNKNOWN | Server creates the session | Initial state |
NORMAL | Session is created successfully | Ready to accept block writes |
CLOSING | commit() is called | Server is verifying the block list |
CLOSED | Commit completes successfully | Data has been moved to the result table |
EXPIRED | Session exceeds the 24-hour time-to-live (TTL) | Session is no longer valid; create a new session |
CRITICAL | A service error has occurred | A server-side error has occurred |
Limits
| Item | Value | Notes |
|---|---|---|
blockId range | 0–20,000 | Block IDs must be unique within a session |
| Maximum block size | 100 GB | |
| Recommended minimum block size | 64 MB | Blocks smaller than 64 MB reduce upload efficiency |
| Session TTL | 24 hours | Measured from session creation |
| Network idle timeout | 120 seconds | If the RecordWriter writes no data for 120 seconds, the server closes the connection; open a new RecordWriter to continue |
| Network action frequency | Every 8 KB written | A network action is triggered each time the RecordWriter writes 8 KB |
Usage notes
Block IDs must be unique within the same upload session. After you open a
RecordWriterwith a givenblockId, write data, and close the writer, thatblockIdcannot be reused to open anotherRecordWriterin the same session.If
commit()fails, retry the call.Call
getStatus()at any point to check the current session state.