MaxCompute Tunnel is a tunnel service for uploading data to and downloading data from MaxCompute. Tunnel SDK for Python is included in PyODPS (MaxCompute SDK for Python).
Usage notes
-
The following sections provide examples on how to upload data to and download data from MaxCompute by using the SDK for Python. For more information about examples in other scenarios, see .
-
In a Cython environment, PyODPS compiles C code during installation to accelerate Tunnel-based data upload and download.
Upload data
import os
from odps import ODPS
from odps.tunnel import TableTunnel
# Initialize the ODPS client using AccessKey credentials from environment variables.
# Store credentials in environment variables to avoid hardcoding them in your code.
o = ODPS(
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
project='your-default-project',
endpoint='your-end-point',
)
table = o.get_table('my_table')
tunnel = TableTunnel(o)
upload_session = tunnel.create_upload_session(table.name, partition_spec='pt=test')
with upload_session.open_record_writer(0) as writer:
# Create a record by index
record = table.new_record()
record[0] = 'test1'
record[1] = 'id1'
writer.write(record)
# Create a record from a list
record = table.new_record(['test2', 'id2'])
writer.write(record)
# Call commit() outside the with block. If you call it before data is written, an error is reported.
upload_session.commit([0])
Download data
from odps.tunnel import TableTunnel
tunnel = TableTunnel(odps)
download_session = tunnel.create_download_session('my_table', partition_spec='pt=test')
# Record reader: iterates row by row as record objects
with download_session.open_record_reader(0, download_session.count) as reader:
for record in reader:
# Process each record.
# Arrow reader: iterates in batches as Apache Arrow RecordBatch objects.
with download_session.open_arrow_reader(0, download_session.count) as reader:
for batch in reader:
# Process each Arrow RecordBatch.
该文章对您有帮助吗?