You can transform historical data in Simple Log Service to reprocess logs that were collected before a data transformation job was created.
Prerequisites
-
Data has been collected in Simple Log Service. For more information, see Data collection.
-
If you use a RAM user, you must first grant the RAM user permissions to perform data transformation. For more information, see Authorize a RAM user to perform data transformation.
Create a data transformation job
Log on to the Simple Log Service console.
-
Go to the Data Transformation page.
In the Projects section, click the project you want.
On the tab, click the logstore you want.
-
On the query and analysis page, click Data Transformation.
-
In the upper-right corner of the page, select a time range.
After you select a time range, verify that logs appear on the Raw Logs tab.
-
In the editor, enter a data transformation statement.
For more information about the syntax, see Data transformation syntax.
-
Preview the data.
-
Click Quick.
Quick and Advanced preview modes are available. For more information, see Preview and debug data.
-
Click Preview Data.
View the preview results.
-
If the data transformation fails due to an invalid statement or permission errors, resolve the issue based on the on-screen instructions.
-
If the transformation results are as expected, proceed to step 6.
-
-
-
Click Save Data Transformation (Legacy) to create a data transformation job.
For more information, see Create a data transformation job. You must configure the Time Range for Data Transformation based on the time range of the data you want to transform.
Option
Description
All
The data transformation job starts when the Logstore receives the first log entry and runs until you stop it manually.
From Specific Time
The job transforms data starting from a specified time and runs until you stop it manually.
Specific Time Range
The job transforms data within a specified time range. The job stops automatically at the end time.
View transformation results
View the transformation results in the target Logstore. If no data appears, try the following solutions.
Expand query time range
If you do not set the __time__ field in your data transformation statement, each log entry retains its original timestamp in the target Logstore. Because the default query time range is the last 15 minutes, historical transformation results may not appear. Expand the query time range to find them.
For example, a historical log entry has a timestamp of 2023-04-11 10:00:00, and the transformation runs at 2023-04-12 09:00:00. After the data is written to the target Logstore, its timestamp remains 2023-04-11 10:00:00. If you open the target Logstore at 2023-04-12 09:01:00, the log entry will not appear because the default query time range is 15 Minutes (Relative). On the query and analysis page, click the time range selector in the upper-right corner and set the time range to 1 Day (Relative) or a larger range to find the transformation results for historical data.
Create index
If you cannot view the transformed data in the target Logstore, you may need to create an index. An index maps keywords to data locations so that logs can be queried and analyzed. Without an index, queries return an IndexConfigNotExist error. After you enable indexing, wait for about one minute before you query the latest data. For more information, see Create an index.
Reindex data
An index applies only to data written after the index is created. If the data transformation job wrote data to the target Logstore before the index existed, you must reindex the data to make it queryable. For more information, see Reindex data.
Improve transformation efficiency
A single data transformation job may be too slow for large volumes of historical data due to throughput limitations. To improve efficiency, split the historical data across multiple parallel jobs and create a separate job for real-time data.
For example, on January 16, 2023, you need to create a data transformation job to transform all data written to the source Logstore since January 1, 2023, 00:00:00. You can split the historical data from January 1, 2023, 00:00:00 to January 15, 2023, 23:59:59 into three chunks and create a separate job for each. Then, create a fourth job to transform real-time data written after January 16, 2023, 00:00:00, as shown in the following figure.
Job 1
Create Job 1 to transform historical data from January 1, 2023, 00:00:00 to January 5, 2023, 23:59:59.
In the Processing Range section, set Time Range to Specific Time Range, and then set the Processing Start Time and Processing End Time.
Job 2
Create Job 2 to transform historical data from January 6, 2023, 00:00:00 to January 10, 2023, 23:59:59.
In the Processing Range section, set Time Range to Specific Time Range, and then set the Processing Start Time and Processing End Time.
Job 3
Create Job 3 to transform historical data from January 11, 2023, 00:00:00 to January 15, 2023, 23:59:59.
In the Processing Range section, set Time Range to Specific Time Range, and then set the Processing Start Time and Processing End Time.
Job 4
Create Job 4 to transform data written in real time after January 16, 2023, 00:00:00.
In the Processing Range section, set Time Range to From Specific Time, and then set the Processing Start Time.