Tunnel SDK examples (Python)

更新时间:
复制 MD 格式

MaxCompute Tunnel is a tunnel service for uploading data to and downloading data from MaxCompute. Tunnel SDK for Python is included in PyODPS (MaxCompute SDK for Python).

Usage notes

  • The following sections provide examples on how to upload data to and download data from MaxCompute by using the SDK for Python. For more information about examples in other scenarios, see .

  • In a Cython environment, PyODPS compiles C code during installation to accelerate Tunnel-based data upload and download.

Upload data

import os
from odps import ODPS
from odps.tunnel import TableTunnel

# Initialize the ODPS client using AccessKey credentials from environment variables.
# Store credentials in environment variables to avoid hardcoding them in your code.
o = ODPS(
    os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
    os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
    project='your-default-project',
    endpoint='your-end-point',
)

table = o.get_table('my_table')

tunnel = TableTunnel(o)
upload_session = tunnel.create_upload_session(table.name, partition_spec='pt=test')

with upload_session.open_record_writer(0) as writer:
    # Create a record by index
    record = table.new_record()
    record[0] = 'test1'
    record[1] = 'id1'
    writer.write(record)

    # Create a record from a list
    record = table.new_record(['test2', 'id2'])
    writer.write(record)

# Call commit() outside the with block. If you call it before data is written, an error is reported.
upload_session.commit([0])

Download data

from odps.tunnel import TableTunnel

tunnel = TableTunnel(odps)
download_session = tunnel.create_download_session('my_table', partition_spec='pt=test')

# Record reader: iterates row by row as record objects
with download_session.open_record_reader(0, download_session.count) as reader:
    for record in reader:
        # Process each record.

# Arrow reader: iterates in batches as Apache Arrow RecordBatch objects.
with download_session.open_arrow_reader(0, download_session.count) as reader:
    for batch in reader:
        # Process each Arrow RecordBatch.