Import OSS data

更新时间:
复制 MD 格式

You can import log files from Object Storage Service (OSS) buckets into Simple Log Service to query, analyze, and transform them. Simple Log Service imports OSS objects up to 5 GB. For compressed objects, this limit applies to their compressed size.

Billing

Simple Log Service does not charge for the data import feature. However, this feature calls Object Storage Service (OSS) APIs, which incurs OSS traffic fees and request fees. For more information about the pricing of related billable items, see OSS Pricing. The daily OSS fee for importing data from OSS is calculated using the following formula:

image..png

Billing parameters

Parameter

Description

N

The number of files imported per day.

T

The total amount of data imported per day, in GB.

p_read

The traffic fee per GB of data.

  • If you import data within the same region, internal outbound traffic is generated. This traffic is free.

  • If you import data across regions, outbound internet traffic is generated.

p_put

The fee per 10,000 PUT requests.

Simple Log Service calls the ListObjects API operation to list files in the source bucket. OSS charges for the ListObjects operation as PUT requests. Each API call can return up to 1,000 objects. For example, if you have 1,000,000 new files to import, 1,000 requests are required (1,000,000 files / 1,000 files per request).

p_get

The fee per 10,000 GET requests.

M

The new file check interval, in minutes.

When you create a data import configuration, you can set the New File Check Cycle parameter.

Same-region import

A company develops App A based on Alibaba Cloud products such as Object Storage Service (OSS) and Simple Log Service (SLS). The app generates 100,000 files per day. Each file has an average size of 100 MB. These files are stored in a Standard OSS bucket in the China (Hangzhou) region. The company wants to import this data into an SLS Logstore in the China (Hangzhou) region and sets the New File Check Cycle to 5 minutes. The daily OSS cost is CNY 0.114592. The following table describes the cost breakdown.

Billable item

Price

Usage

Daily cost

PUT requests

CNY 0.01 per 10,000 requests

14,592 requests

14,592 / 10,000 × CNY 0.01 = CNY 0.014592

GET requests

CNY 0.01 per 10,000 requests

100,000 requests

100,000 / 10,000 × CNY 0.01 = CNY 0.1

Total cost

CNY 0.114592

Cross-region import (daily)

A company develops App B based on Alibaba Cloud products such as Object Storage Service (OSS) and Simple Log Service (SLS). The app generates 100,000 files per day. Each file has an average size of 100 MB. These files are stored in a Standard OSS bucket in the China (Shanghai) region. The company wants to import this data into an SLS Logstore in the China (Hangzhou) region and sets the New File Check Cycle to 1 day. The daily OSS cost is CNY 2500.1001. The following table describes the cost breakdown.

Billable item

Price

Usage

Daily cost

PUT requests

CNY 0.01 per 10,000 requests

100 requests

100 / 10,000 × CNY 0.01 = CNY 0.0001

GET requests

CNY 0.01 per 10,000 requests

100,000 requests

100,000 / 10,000 × CNY 0.01 = CNY 0.1

outbound internet traffic

CNY 0.25/GB

10,000 GB

10,000 GB × CNY 0.25/GB = CNY 2,500

Total cost

CNY 2500.1001

Cross-region import (5 min)

A company develops App C based on Alibaba Cloud products such as Object Storage Service (OSS) and Simple Log Service (SLS). The app generates 100,000 files per day. Each file has an average size of 100 MB. These files are stored in a Standard OSS bucket in the China (Shanghai) region. The company wants to import this data into an SLS Logstore in the China (Hangzhou) region and sets the New File Check Cycle to 5 minutes. The daily OSS cost is CNY 4375.114592. The following table describes the cost breakdown.

Billable item

Price

Usage

Daily cost

PUT requests

CNY 0.01 per 10,000 requests

14,592 requests

14,592 / 10,000 × CNY 0.01 = CNY 0.014592

GET requests

CNY 0.01 per 10,000 requests

100,000 requests

100,000 / 10,000 × CNY 0.01 = CNY 0.1

outbound internet traffic

00:00 to 08:00: CNY 0.25/GB

08:00 to 24:00: CNY 0.50/GB

10,000 GB

CNY 4,375

Total cost

CNY 4375.114592

Prerequisites

  • Log files are uploaded to an Object Storage Service (OSS) bucket. For more information, see Upload objects.

  • A Project and a Logstore are created. For more information, see Manage projects and Create a basic Logstore.

  • You have completed Cloud Resource Access Authorization, which authorizes Simple Log Service to access your OSS resources with the AliyunLogImportOSSRole role.

  • Your account has the oss:ListBuckets permission to list OSS buckets. For more information, see Attach a custom policy to a RAM user.

    If you use a RAM user, you must also grant the RAM user the PassRole permission. For more information, see Create a custom policy and Manage RAM user permissions.

    {
      "Statement": [
        {
          "Effect": "Allow",
          "Action": ["ram:PassRole", "ram:GetRole"],
          "Resource": "acs:ram:*:*:role/aliyunlogimportossrole"
        },
        {
          "Effect": "Allow",
          "Action": "oss:GetBucketWebsite",
          "Resource": "*"
        },
        {
          "Effect": "Allow",
          "Action": "oss:ListBuckets",
          "Resource": "*"
        }
      ],
      "Version": "1"
    }    

    If you want to import data across accounts, root account B must first authorize Account A and a role within Account A using a Bucket Policy in the OSS console. Example role: acs:ram::123456789:role/aliyunlogimportossrole.

Create a data import configuration

Important

If you append data to an OSS file that has already been imported, the data import job re-imports the entire file.

  1. Log on to the Simple Log Service console.

  2. In the Import Data section, click the Data Import tab, and then click OSS - Data Import.

  3. Select the destination project and Logstore, and then click Next.

  4. Set the import configuration.

    1. In the Import Configuration step, configure the following parameters.

      Parameters

      Parameter

      Description

      Task Name

      A unique name for the import job.

      Display Name

      The display name for the job.

      Job Description

      A description of the import job.

      OSS Region

      The region where the source OSS bucket is located.

      If the OSS bucket and the Log Service Project are in the same region, you can save on public network traffic costs and benefit from faster data transfer.

      Bucket

      The source bucket that contains the data files to import.

      File Path Prefix Filter

      Filters OSS files by a file path prefix. For example, if all files to be imported are in the csv/ directory, you can specify the prefix as csv/.

      If you do not set this parameter, the job traverses the entire OSS bucket.

      Note

      We recommend that you configure this parameter. If a bucket contains a large number of files, traversing the entire bucket significantly reduces data import efficiency.

      File Path Regex Filter

      Filters OSS files by matching their paths against a regular expression. Only files with paths that match the regular expression are imported. By default, this parameter is empty, which indicates no filtering.

      For example, if the file path is testdata/csv/bill.csv, you can set the regular expression to (testdata/csv/)(.*).

      For more information about how to test a regular expression, see How to test a regular expression.

      File Modification Time Filter

      Filters OSS files by their modification time.

      • All: Select this option to import all matching files.

      • From Specific Time: Select this option to import files modified after a specific point in time.

      • Specific Time Range: Select this option to import files modified within a specific time range.

      Data Format

      The file parsing format. Valid values:

      • CSV: A delimited text file. You can use the first line as field names or manually specify field names. Each subsequent line is parsed as a log entry.

      • Single-line JSON: Reads the OSS file line by line, parsing each line as a separate JSON object. The keys in the JSON object become the field names in the log.

      • JSON array: Reads the entire OSS file at once. The file content must be a JSON array that contains one or more JSON objects.

      • CloudTrail: Reads the entire OSS file at once. The content must be in the standard CloudTrail data structure format.

      • Single-line Text Log: Parses each line in the OSS file as a single log entry.

      • Multi-line Text Logs: A multi-line mode that uses a regular expression to identify the start or end of a log entry that spans multiple lines.

      • ORC: Optimized Row Columnar (ORC) format. The file is automatically parsed into logs without additional configuration.

      • Parquet: Parquet format. The file is automatically parsed into logs without additional configuration.

      • Alibaba Cloud OSS Access Log: The format for Alibaba Cloud OSS access logs. For more information, see Log shipping.

      • Alibaba Cloud CDN Download Log: The format for Alibaba Cloud CDN download logs. For more information, see Quick start.

      Compression Format

      The compression format of the source files. Log Service decompresses the files based on the specified format before reading the data.

      Encoding Format

      The encoding format of the source files. Currently, only UTF-8 and GBK are supported.

      New File Check Cycle

      If new files are continuously added to the source OSS path, you can set a New File Check Cycle. The import job will run in the background and periodically discover and read new files. The system ensures that data from the same OSS file is not imported more than once. For example, if you create a job at 12:00 with a 30-minute check cycle, the first import runs at 12:00. If new files are generated, the next import runs at 12:30.

      If no new files are expected in the source OSS path, set the cycle to Never Check. The import job stops after it reads all existing files that match the criteria.

      Import Archive Files

      If the source files are stored as Archive or Cold Archive storage classes in OSS, they must be restored before they can be read. Enable this option to automatically restore the files. The Deep Cold Archive storage class is not supported.

      Note
      • Restoring Archive files takes about one minute, which may cause the initial preview to time out. If a timeout occurs, wait a moment and try again.

      • Restoring Cold Archive files takes about one hour. If the preview times out, you can skip the preview or wait for one hour and then try again.

        When restoring Cold Archive files, the restored copies remain available for seven days to ensure sufficient time for the import.

      Log Time Configuration

      Time Field

      When you select Data Format, Single-line JSON, Single-line JSON, CloudTrail, ORC, Parquet, Alibaba Cloud OSS Access Log, or Alibaba Cloud CDN Download Log as the Data Format, you must specify a time field. Log Service uses this field to set the time for each imported log entry.

      Time Field Extraction Regular Expression

      When you select Data Format or Single-line Text Log as the Multi-line Text Logs, you must use a regular expression to extract the time from the log content.

      For example, for the log entry 127.0.0.1 - - [10/Sep/2018:12:36:49 +0800] "GET /index.html HTTP/1.1", you can set Time Field Extraction Regular Expression to [0-9]{0,2}\/[0-9a-zA-Z]+\/[0-9\: +]+.

      Note

      For other data formats, you can also use a regular expression if you need to extract only a part of the time field.

      Time Field Format

      Specifies the time format used to parse the value in the time field.

      • Supports Java's SimpleDateFormat syntax, such as yyyy-MM-dd HH:mm:ss. For more information about the syntax, see Class SimpleDateFormat. For common time formats, see Time formats.

      • Supports epoch formats, including epoch, epochMillis, epochMicro, and epochNano.

      Time Zone

      Select the time zone for the time field. This parameter is not required if the time format is an epoch type.

      To account for daylight saving time when parsing the log time, select a UTC-based format. Otherwise, select a GMT-based format.

      Advanced Settings

      OSS Metadata Indexing

      If the number of files in your OSS bucket exceeds one million, we strongly recommend that you enable this option. Otherwise, file discovery is inefficient. With OSS metadata indexing, new files in the bucket can be discovered within seconds, enabling near-real-time data import.

      To use OSS metadata indexing, you must first enable the metadata management feature for the bucket in the OSS console. For more information, see Scalar search.

      If you select Data Format or Multi-line text log as the Multi-line Text Logs, you must configure additional parameters as described in the following tables.

      CSV

      Parameter

      Description

      Delimiter

      The delimiter used to separate fields in the log. The default is a comma (,).

      Quote

      The quote character used for strings in the CSV file.

      Quote

      The escape character for logs. The default is a backslash (\\).

      Maximum Lines

      The maximum number of lines that a single log entry can span. The default is 1.

      First Line as Field Name

      After you turn on the First Line as Field Name switch, the first line of the CSV file is used as the field names. For example, the first line in the following sample is extracted as the names of the log fields. Sample file preview:

      remote_addr,remote_user,time_local,request_time,request_length
      xxx,5,-,11/Dec/2020:15:31:06,0,000,133,3650,404,GET
      xxx,5,-,11/Dec/2020:15:32:06,0,000,133,3650,404,GET
      xxx,5,-,11/Dec/2020:15:34:10,0,000,133,3650,404,GET

      Custom Fields

      If you disable the First Line as Field Name option, you must specify a comma-separated list of custom field names.

      Lines to Skip

      The number of header lines to skip. For example, a value of 1 means that data collection starts from the second line of the CSV file.

      Multi-line text log

      Parameter

      Description

      Position to Match Regular Expression

      Specifies where to match the regular expression:

      • Regular Expression to Match First Line: Uses a regular expression to match the first line of a log entry. Subsequent lines that do not match are appended to the current log entry, up to the maximum configured number of lines.

      • Regular Expression to Match Last Line: Uses a regular expression to identify the end of a log entry. A log entry includes all lines up to and including the line that matches this expression.

      Regular Expression

      Enter a regular expression based on your log content.

      For more information, see How to test a regular expression.

      Maximum Lines

      The maximum number of lines that a single log entry can span.

    2. Click Preview to preview the import results.

    3. Once you confirm the settings, click Next.

  5. Create indexes and preview data. Then, click Next. By default, full-text indexing is enabled in Simple Log Service. You can also manually create field indexes for the collected logs or click Automatic Index Generation. Then, Simple Log Service generates field indexes. For more information, see Create indexes.

    Important

    If you want to query all fields in logs, we recommend that you use full-text indexes. If you want to query only specific fields, we recommend that you use field indexes. This helps reduce index traffic. If you want to analyze fields, you must create field indexes. You must include a SELECT statement in your query statement for analysis.

  6. Click Query Log to open the query and analysis page and check if the OSS data was imported successfully.

    Wait for about one minute. If the target OSS data appears, the import was successful.

Related operations

After creating a data import configuration, you can view the configuration and its statistical reports in the console.

  1. In the Project list, click the desired Project.

  2. In the left-side navigation pane, choose Log Storage > Logstore. On the page that appears, find the desired Logstore, choose Data Collection > Data Import, and click the configuration name.

  3. Overview

    On the Import Configuration Overview page, you can view the configuration's basic information and statistical reports.

    The Import Configuration Overview page contains the following information: The Basic Information section displays parameters such as Configuration Name, Status, Bucket, Compression Format, Encoding Format, File Path Regex Filter, Import Archive Files, OSS Region, New File Check Interval, Data Format, and Use System Time. The Statistical Reports (Data Processing Insight) section shows six metrics: Successful Reads, Failed Reads, Public Read Traffic, Successful Writes, Failed Writes, and Public Write Traffic, as well as a Processing Rate Line Chart and a Progress Lag Chart. The Running Status table at the bottom lists information for each task, such as Time, Type, Instance, Successful Items, Failed Items, and Average Response Time. The top-right corner provides the Edit Configurations, Stop, and Delete Configuration action buttons.

    Modify

    Click Edit Configurations to modify the data import configuration. For more information, see Create a data import configuration.

    Delete

    Click Delete Configuration to remove the data import configuration.

    Warning

    This operation is irreversible. Proceed with caution.

    Stop

    To stop the data import task, click Stop.

    Start

    Click Start to begin the data import task.

FAQ

Problem

Possible cause

Solution

When I preview files from an HDFS directory in my bucket, no data is displayed.

Importing files directly from an HDFS directory is not supported.

If the HDFS service is enabled for the bucket, a .dlsdata directory is created by default in the OSS path. You can import files from the .dlsdata directory.

No data is displayed during preview.

The OSS bucket contains no files, the files are empty, or no files match the filter conditions.

  • Confirm the bucket contains non-empty files. For CSV files, ensure they contain more than a header row. If not, wait for data to become available before starting the import.

  • Adjust configuration items such as the File Path Prefix Filter, File Path Regex Filter, and File Modification Time Filter.

The imported data contains garbled characters.

The data format, compression format, or encoding format is incorrect.

Confirm the actual format of the OSS file, and then adjust settings such as Data Format, Compression Format, or Encoding Format.

To fix existing garbled data, create a new Logstore and a new data import configuration.

The timestamps in Simple Log Service do not match the timestamps in the source data.

The data import configuration is missing the log time field, or the time format or time zone is incorrect.

Specify the log time field and configure the correct time format and time zone. For more information, see Create a data import configuration.

After data is imported, I cannot query or analyze it.

  • The data is outside the query time range.

  • An index is not configured.

  • The index has not taken effect.

  • Check if the data you want to query is within the specified time range. If not, adjust the time range and query again.

    If the data is not within the query range, adjust the range and search again.

  • Check whether an index has been set for the Logstore.

    If an index is not set, set one first. For more information, see Create index or Rebuild index.

  • If you have set an index and the amount of successfully imported data shown in the Data Processing Insight dashboard is as expected, the index may not have taken effect. Try to reindex data. For more information, see Reindex data.

The number of imported entries is less than expected.

Some files contain single lines of data that exceed the 3 MB limit and are dropped during the import process. For more information, see Collection limits.

When you write data to OSS files, ensure that no single line of data exceeds 3 MB.

When I create a data import configuration, I cannot select an OSS bucket.

Simple Log Service has not been authorized to assume the AliyunLogImportOSSRole role.

Complete the authorization as described in the prerequisites.

Some files are not imported.

The filter conditions are configured incorrectly, or some files exceed the 5 GB size limit. For more information, see Collection limits.

  • Check if the files to be imported match the filter conditions. If not, modify the filter conditions.

  • Ensure that the size of each file to be imported is less than 5 GB. If a file exceeds this limit, you must reduce its size.

    If a single file exceeds 5 GB, reduce its size.

Archive files are not imported.

The Import Archive Files switch is turned off. For more information, see Collection limits.

  • Method 1: Modify the data import configuration and turn on the Import Archive Files switch.

  • Method 2: Create a new data import configuration and turn on the Import Archive Files switch.

Multi-line text logs are parsed incorrectly.

The first-line regular expression or last-line regular expression is configured incorrectly.

Correct the first-line and last-line regular expressions.

Import latency for new files is high.

There are too many existing files that match the file path prefix filter, and the OSS Metadata Indexing switch is turned off in the data import configuration.

If a large number of files (more than one million) match the file path prefix filter, you must turn on the OSS Metadata Indexing switch in the data import configuration. Otherwise, new file discovery becomes very slow.

An STS-related permission error occurs during creation.

The RAM user has insufficient permissions.

  1. Verify that the RAM user's AccessKey is valid and enabled. For more information, see AccessKey pair.

  2. Check if the RAM user's temporary access credential (STS token) has expired. If so, extend its validity period. For more information, see Access OSS by using STS temporary credentials.

  3. Verify that you have completed the Cloud Resource Access Authorization, which grants Simple Log Service permission to access your OSS resources by assuming the AliyunLogImportOSSRole role.

Error handling

Error

Description

Object read failure

If a data import job cannot completely read an object, for example, due to a network exception or object damage, it automatically retries. If the read operation still fails after three attempts, the job skips the object.

The retry interval is the same as the New File Check Cycle. If New File Check Cycle is set to Never Check, the job retries after 5 minutes.

Compression format parsing error

If a data import job encounters an object with an invalid compression format, it skips the object.

Data format parsing error

  • If the data import job fails to parse an object in a binary format, such as ORC or Parquet, it skips the object.

  • If parsing fails for data in other formats, the data import job stores the raw text content in the content field of the log.

OSS bucket not found

The data import job retries periodically. The job automatically resumes importing after the OSS bucket is recreated.

Permission error

If a permission error occurs while reading from an OSS bucket or writing to a Logstore, the data import job retries periodically. The job resumes automatically once the permission issue is resolved.

The data import job does not skip any objects during a permission error. Therefore, after the permissions are corrected, the job processes all unprocessed objects in the OSS bucket, importing their data into the Logstore.

OSS ingestion API

Actions

API

Create an OSS data import job

CreateOSSIngestion

Update an OSS data import job

UpdateOSSIngestion

Get an OSS data import job

GetOSSIngestion

Delete an OSS data import job

DeleteOSSIngestion

Start an OSS data import job

StartOSSIngestion

Stop an OSS data import job

StopOSSIngestion