Create an OSS data shipping job (new version)

更新时间:
复制 MD 格式

After Simple Log Service (SLS) collects data, you can ship it to an Object Storage Service (OSS) bucket for storage and analysis. This topic describes how to create an OSS data shipping job (new version).

Prerequisites

Supported regions

Simple Log Service ships data to an OSS bucket in the same region as the SLS project.

Important

This feature is available only in the following regions: China (Hangzhou), China (Shanghai), China (Nanjing), China (Hangzhou) Finance, China (Shanghai) Finance, China (Qingdao), China (Beijing), China (Zhangjiakou), China (Hohhot), China (Ulanqab), China (Chengdu), China (Shenzhen), China (Heyuan), China (Guangzhou), China (Hong Kong), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Philippines (Manila), Thailand (Bangkok), Japan (Tokyo), US (Silicon Valley), and US (Virginia).

China (Hangzhou) Finance supports only buckets that are accessible over the public endpoint in the OSS China (Hangzhou) Finance region. China (Shanghai) Finance supports only buckets in the OSS China (Shanghai) Finance region.

Create a data shipping job

  1. Log on to the Simple Log Service console.

  2. In the Projects section, click the one you want.

    image

  3. On the Log Storage > Logstores tab, click the > icon to the left of the target logstore and choose Data Processing > Export > Object Storage Service.

  4. Hover over Object Storage Service and click the + icon.

  5. In the Data Shipping to OSS panel, configure the following parameters and click OK.

    Select New Version for the Shipping Version parameter. The following table describes the key parameters.

    Important
    • After you create a data shipping job, a shipping operation for each shard is triggered when the specified batch size is reached or the batch interval has elapsed.

    • After you create the job, verify that it works as expected by checking its status and the data in OSS.

    Parameter

    Description

    Job name

    The unique name of the data shipping job.

    Display Name

    The display name of the data shipping job.

    Job description

    The description of the OSS data shipping job.

    OSS bucket

    The name of the destination OSS bucket.

    Important
    • You can ship data to a bucket with the Standard, Infrequent Access (IA), Archive, Cold Archive, or Deep Cold Archive storage class. The storage class of the generated OSS objects defaults to that of the bucket. For more information, see Storage classes.

    • Storage classes other than Standard have minimum storage durations and billable sizes. Choose a storage class for the destination bucket that meets your requirements. For more information, see Storage class comparison.

    File Delivery Directory

    The directory in the OSS bucket. The directory name cannot start with a forward slash (/) or a backslash (\).

    After you create the data shipping job, Simple Log Service ships data from the logstore to this directory in the destination OSS bucket.

    Object Suffix

    If you do not specify an object suffix, Simple Log Service automatically generates one based on the storage format and compression type, such as .suffix.

    Partition Format

    A format that dynamically generates a directory path in the OSS bucket based on the shipping time. The path cannot start with a forward slash (/). The default value is %Y/%m/%d/%H/%M. For examples, see Partition format. For parameter details, see the strptime API.

    OSS Write RAM Role

    The RAM role that grants the data shipping job permissions to write data to the OSS bucket.

    Logstore read RAM role

    The RAM role that grants the data shipping job permissions to read data from the logstore.

    Storage Format

    The file format for data stored in OSS. For more information, see CSV format, JSON format, Parquet format, and ORC format.

    Compress

    The compression method for data stored in OSS.

    • none: Data is not compressed.

    • snappy: Compresses data by using the snappy algorithm. For more information, see snappy.

    • zstd: Compresses data by using the zstd algorithm.

    • gzip: Compresses data by using the gzip algorithm.

    Ship Tags

    Specifies whether to include __tag__ fields, which are reserved fields in Simple Log Service, in the shipped data. For more information, see reserved fields.

    Batch size

    The maximum size of uncompressed data, in MB, to ship from a shard in a single batch. A shipping operation is triggered when this size is reached. Value range: 5 to 256. Unit: MB.

    Note

    Batch size refers to the size of data to be batched after it is read, not the size of data already written to SLS. Data is read and shipped only after the batch interval has elapsed.

    Batch interval

    The maximum time to wait, in seconds, before shipping a batch of data from a shard. The interval starts when the first log entry of a batch is received. A shipping operation is triggered when the interval elapses. The value must be between 300 and 900 seconds. The default value is 300 seconds.

    Shipping latency

    The delay before data is shipped. For example, if you set this parameter to 3600, data is shipped with a 1-hour delay. For example, data from 10:00:00 on 2023-06-05 is written to the OSS bucket no earlier than 11:00:00 on 2023-06-05. For information about the limitations, see Configuration limits.

    Start Time Range

    The time range of the data to ship, based on when the logs are received by SLS. The following options are available:

    • All: Ships data starting from the first log entry received in the logstore. The job runs until you manually stop it.

    • From Specific Time: Ships data starting from a specified point in time. The job runs until you manually stop it.

    • Specific Time Range: Ships data within a specified start and end time. The job stops automatically at the end time.

    Note

    The time range refers to __tag__:__receive_time__. For more information, see reserved field.

    Time Zone

    The time zone used to format the time in the directory path.

    If you specify a Time Zone and a Partition Format, the system generates the directory path in the OSS bucket based on your settings.

View data in OSS

After data is successfully shipped to OSS, you can access it using the OSS console, an API, an SDK, or other tools. For more information, see File management.

The OSS object path is in the following format:

oss://OSS-BUCKET/OSS-PREFIX/PARTITION-FORMAT_RANDOM-ID

In this format, OSS-BUCKET is the OSS bucket name, OSS-PREFIX is the directory prefix, PARTITION-FORMAT is the partition format calculated from the shipping time using the strptime API, and RANDOM-ID is the unique ID of a shipping operation.

Note

Simple Log Service ships data to OSS in batches. Each shipping operation creates one object that contains a batch of data. The object path is determined by the earliest receive_time (the time when data arrives at SLS) in the batch. Note the following scenarios:

  • Shipping real-time data: Assume data is shipped every 5 minutes. A shipping operation at 00:00:00 on 2022-01-22 might process data received after 23:55:00 on 2022-01-21. Since the object path is based on the earliest timestamp in the batch, the resulting object could be placed in the 2022/01/21/ directory. Therefore, to analyze all data for 2022-01-22, you must check all objects in the 2022/01/22/ directory and the last few objects in the 2022/01/21/ directory.

  • Shipping historical data: If the logstore contains a small volume of data, a single data shipping operation may include data that spans multiple days. As a result, an object in the 2022/01/22/ directory might contain all the data for 2022-01-23, leaving the 2022/01/23/ directory empty.

Partition format

Each shipping operation corresponds to an OSS object path in the format oss://OSS-BUCKET/OSS-PREFIX/PARTITION-FORMAT_RANDOM-ID. The following table provides examples of partition formats for a shipping job created at 19:50:43 on 2022/01/20.

OSS bucket

OSS prefix

Partition format

Object suffix

OSS object path

test-bucket

test-table

%Y/%m/%d/%H/%M

.suffix

oss://test-bucket/test-table/2022/01/20/19/50_1484913043351525351_2850008.suffix

test-bucket

log_ship_oss_example

year=%Y/mon=%m/day=%d/log_%H%M

.suffix

oss://test-bucket/log_ship_oss_example/year=2022/mon=01/day=20/log_1950_1484913043351525351_2850008.suffix

test-bucket

log_ship_oss_example

ds=%Y%m%d/%H

.suffix

oss://test-bucket/log_ship_oss_example/ds=20220120/19_1484913043351525351_2850008.suffix

test-bucket

log_ship_oss_example

%Y%m%d/

.suffix

oss://test-bucket/log_ship_oss_example/20220120/_1484913043351525351_2850008.suffix

Note

This format may cause platforms like Hive to fail when parsing the OSS content. We recommend that you do not use this format.

test-bucket

log_ship_oss_example

%Y%m%d%H

.suffix

oss://test-bucket/log_ship_oss_example/2022012019_1484913043351525351_2850008.suffix

When you analyze OSS data with big data platforms like Hive, MaxCompute, or Alibaba Cloud Data Lake Analytics (DLA), set the partition format to a key=value format to use partition information. For example, in the path oss://test-bucket/log_ship_oss_example/year=2022/mon=01/day=20/log_195043_1484913043351525351_2850008.parquet, three partition columns are defined: year, mon, and day.

SDK example

Create an OSS data shipping job