The incremental pulling mode lets Simple Log Service efficiently pull only new or updated data. This topic describes how to use the res_rds_mysql function to incrementally pull data from an ApsaraDB RDS for MySQL database.
Video tutorial
Prerequisites
-
Simple Log Service
-
Collect data into the source Logstore. For more information, see Data collection.
-
Create a destination Logstore. For more information, see Create a Logstore.
-
If you use a RAM user, you must grant the RAM user permissions to perform data transformation. For more information, see Grant a RAM user permissions to perform data transformation.
-
Configure indexes for the source Logstore and the destination Logstore. For more information, see Create indexes.
A data transformation job does not depend on an index. However, you cannot query and analyze data if no index is configured.
-
-
ApsaraDB RDS
-
Create an ApsaraDB RDS for MySQL database and an account. For more information, see (Deprecated. Redirects to "Step 1") Create a database and account.
-
Upload data to the ApsaraDB RDS for MySQL table.
-
Configure a whitelist for the ApsaraDB RDS for MySQL instance. For more information, see (Deprecated. Redirects to "Step 2") Connect to an ApsaraDB RDS for MySQL instance by using a client or the CLI.
ImportantTo extract data from an ApsaraDB RDS for MySQL database by using the
res_rds_mysqlfunction, add 0.0.0.0 to the whitelist of the instance.
-
Background
A technology company stores customer information in an ApsaraDB RDS for MySQL database and customer service records in a Simple Log Service Logstore. Both data sources are continuously updated. The company wants to join the two data sources and save the results to a new Logstore.
Simple Log Service provides the res_rds_mysql function to pull data from an ApsaraDB RDS for MySQL database and save it to a destination Logstore. In incremental pulling mode, a data transformation job pulls data based on a timestamp field in the database. This method ensures high efficiency by processing only new or updated data. This mode is ideal for scenarios that involve large data volumes, frequent updates, few deletions, and strict real-time data processing requirements.
For more information about other pulling modes and incremental pulling, see res_rds_mysql.
Resources and data samples
-
Simple Log Service resources
-
Project: client-project
-
Source Logstore: client-log
The sample data includes the following fields:
__tag__:__client_ip__(client IP address),c_id(customer ID),staff_id(staff ID),status(follow-up status, such as "Ongoing follow-up" and "Follow-up"), andtag(customer tag, such as "Second follow-up visit" and "High-intent customer"). For example, one record has the valuesc_id:1,staff_id:002,status:Ongoing follow-up, andtag:Second follow-up visit. Another record has the valuesc_id:1,staff_id:001,status:Follow-up, andtag:High-intent customer. -
Destination Logstore: client-information
-
-
ApsaraDB RDS resources
-
Database: client-db
-
Database table: client
The
clienttable contains five fields:c_id,name,telephone,update_time, anddelete_flag. Three sample records are:c_id=1,name=maki,telephone=010-123;c_id=2,name=evan,telephone=010-156; andc_id=3,name=vicky,telephone=010-166. Theupdate_timefor all records is 1606358931, anddelete_flagis false. -
Database username and password: test/test1234@@
-
Public endpoint of the database: rm-bp1k****tp8o.mysql.rds.aliyuncs.com
-
Procedure
Log on to the Simple Log Service console.
-
Go to the data transformation page.
-
In the Projects section, click
client-project. -
On the tab, click the source Logstore
client-log. -
On the query and analysis page, click Data Transformation.
-
-
In the upper-right corner of the page, select a time range.
Verify that log data exists on the Raw Logs tab.
-
In the editor, enter the data transformation statement.
For more information about the parameters, see res_rds_mysql.
e_table_map( res_rds_mysql( "rm-bp1k****tp8o.mysql.rds.aliyuncs.com", "test", "test1234@@", "client-db", table="client", fields=["c_id", "name", "telephone", "update_time"], refresh_interval=1, primary_keys="c_id", update_time_key="update_time", deleted_flag_key=None, ), "c_id", ["name", "telephone"], ) -
Preview the data in quick mode.
Use the quick preview feature to verify that the data transformation statement is correct. For more information, see Quick preview.
-
Click Quick.
-
On the tab, enter the following content.
{ "__source__": "192.0.2.0", "__time__": 1624956516, "__topic__": "log", "__tag__:__client_ip__":"192.0.2.2", "c_id": "1", "staff_id": "002", "status": "Ongoing follow-up", "tag": "Second follow-up visit", } -
On the tab, enter the following content.
c_id,name,telephone,update_time,delete_flag 1,maki,010-123,1606358931,false -
Click Preview Data.
View the preview results. The left side of the page displays the transformed data, with the output destination as target0. The transformed fields include customer information fields such as
__source__,__tag__,__topic__,c_id,name,staff_id,status,tag, andtelephone. On the right, the Run Result Summary shows Total: 1, Success: 1, and Failed: 0. This indicates that the data transformation rule ran correctly.
-
-
Preview the data in advanced mode.
Use the advanced preview feature to ensure that Simple Log Service can connect to the ApsaraDB RDS for MySQL database. For more information, see Advanced preview.
-
Click Advanced.
-
Click Preview Data.
-
In the Add Preview Settings panel, configure the authorization method and click OK.
The authorization methods include default role, custom role, and access key. If you select default role, the role ARN defaults to
acs:ram::{AccountID}:role/aliyunlogetlrole. -
View the preview results.
The Run Result Summary panel on the right displays four statistics: Total Raw Logs, Dropped Count, Dispatched Count, and Failed. Confirm that the dispatched count matches the total raw logs and the failed count is 0. This indicates that the transformation rule ran correctly. The log output list on the left can be expanded to view details of the transformed sample log, which includes key-value pairs such as
__source__,__topic__,name, andstatus.If a runtime error occurs, see Troubleshoot ApsaraDB RDS for MySQL data retrieval errors.
-
-
Create a data transformation job.
-
Click Save as Transformation Job.
-
In the Create Data Transformation Job panel, configure the parameters and click OK.
For more information about the parameters, see Create a data transformation job. The configuration items include Rule Name (for example,
rds-sls), Authorization Method (options are default role, custom role, or access key), and Role ARN. In the Storage Destination section, configure Destination Name (for example,rds), Destination Region (for example, China (Hangzhou)), Destination Project (for example,client-project), Destination Logstore (for example,client-information), and the corresponding Authorization Method and Role ARN. To deliver data to multiple destinations, click Add.After you create a data transformation job, you can view the transformed logs in the destination Logstore. After the job runs successfully, the destination Logstore contains transformed log records, including structured fields and their values, such as
c_id,name,staff_id,status,tag, andtelephone.
-
FAQ
How do I use the delete feature in incremental pulling mode?
In incremental pulling mode, Simple Log Service pulls only new or updated data based on the primary key and timestamp fields of the database table. If you mark a record for deletion in the database table (for example, by setting delete_flag=true), Simple Log Service still pulls that record for processing. To handle this scenario, the res_rds_mysql function includes the deleted_flag_key parameter. After you configure the deleted_flag_key parameter, the data transformation job fetches updates from the database and then removes rows from its in-memory dimension table where the delete_flag is true. This operation does not affect the database table. The job no longer includes these deleted rows in subsequent JOIN operations with log data.
-
If delete_flag is set to a value such as true or 1, the row is marked for deletion. For more information, see res_rds_mysql.
-
If you configure the deleted_flag_key parameter, you must also configure the update_time_key parameter.
For example, two new rows (name=mia and name=tom) are added to the ApsaraDB RDS for MySQL table. The row where name=mia is marked as deleted by setting delete_flag to true. When the in-memory dimension table of Simple Log Service is updated, the row for name=mia is deleted from it and is not processed.
The following code provides an example of the data transformation statement:
e_table_map(
res_rds_mysql(
"rm-bp1****l3tp.mysql.rds.aliyuncs.com",
"test",
"test1234@@",
"client-db",
table="client",
fields=["c_id", "name", "telephone", "update_time"],
refresh_interval=1,
primary_keys="c_id",
update_time_key="update_time",
deleted_flag_key="delete_flag",
),
"c_id",
["name", "telephone"],
)