The MongoDB Writer plugin in DataWorks Data Integration synchronizes data from other data sources to MongoDB. This topic explains how to perform a batch synchronization from MaxCompute to MongoDB.
Prerequisites
Before you begin, ensure that you meet the following requirements:
-
DataWorks is activated and you have added a MaxCompute data source.
-
This tutorial uses an exclusive resource group for Data Integration to run the batch task. You must purchase and configure this resource group before proceeding. For more information, see Use an exclusive resource group for Data Integration.
NoteYou can also use a serverless resource group. For more information, see Use a serverless resource group.
Prepare sample data
Prepare a MongoDB collection and a MaxCompute table for batch synchronization.
-
Create a MaxCompute table and add data to it.
-
Create a partitioned table named
test_write_mongo. Set the partition field topt.CREATE TABLE IF NOT EXISTS test_write_mongo( id STRING , col_string STRING, col_int int, col_bigint bigint, col_decimal decimal, col_date DATETIME, col_boolean boolean, col_array string ) PARTITIONED BY (pt STRING) LIFECYCLE 10; -
Add a partition with the value
20230215.insert into test_write_mongo partition (pt='20230215') values ('11','name11',1,111,1.22,cast('2023-02-15 15:01:01' as datetime),true,'1,2,3'); -
Verify that the partitioned table is created correctly.
SELECT * FROM test_write_mongo WHERE pt='20230215';
-
-
Prepare the destination MongoDB collection.
This tutorial uses ApsaraDB for MongoDB as an example. Create a collection named
test_write_mongo.db.createCollection('test_write_mongo')
Batch task configuration
Step 1: Add a MongoDB data source
Add a MongoDB data source and establish a network connection between it and the exclusive resource group for Data Integration. For more information, see Configure a MongoDB data source.
Step 2: Create and configure a batch sync node
In DataWorks DataStudio, create a batch sync node and configure its source and destination. This section describes only the key parameters. Leave other parameters at their default values. For detailed instructions, see Configure a task in the codeless UI.
-
Configure the network connection for synchronization.
Select the MongoDB data source, the MaxCompute data source, and the exclusive resource group for Data Integration. Then, test the connectivity.
-
Configure the source and destination.
Select the MaxCompute partitioned table and MongoDB collection. The following table describes the key parameters.
Parameter
Description
write mode
Determines whether to overwrite existing data. This functionality involves two related settings:
write modeandbusiness primary key:-
write mode:-
If you select No, the plugin inserts each record as a new document. This is the default option.
-
If you select Yes, you must specify a
business primary key. The plugin then overwrites records that have the same business primary key.
-
-
business primary key: Specifies the business primary key for each record, which is used for the overwrite operation. You can specify only one field. In MongoDB, this typically corresponds to the_idfield.
NoteIf
write modeis set to Yes and you specify a field other than _id as the business primary key, an error similar to the following may occur:After applying the update, the (immutable) field '_id' was found to have been altered to _id: "2". This error occurs because the incoming data contains a record whose _id does not match the _id of the existing document that is identified by the business primary key. For more information, see Error: After applying the update, the (immutable) field '_id' was found to have been altered to _id: "2".Pre-Import Statement
Specifies a precondition (PreSQL) to run before the import task. The configuration must be in JSON format and include the
typeandjsonproperties.-
type: Required. Valid values areremoveanddrop(in lowercase). -
json:-
If
typeis set toremove, this parameter is required. Its value must be a standard MongoDB query. For more information, see Query Documents. -
If
typeis set todrop, this parameter is not required.
-
-
-
Configure field mapping.
When the data source is MongoDB, Peer mapping is used by default. You can also click the
icon to manually edit the source table fields. The following is an example of manual editing.{"name":"id","type":"string"} {"name":"col_string","type":"string"} {"name":"col_int","type":"long"} {"name":"col_bigint","type":"long"} {"name":"col_decimal","type":"double"} {"name":"col_date","type":"date"} {"name":"col_boolean","type":"bool"} {"name":"col_array","type":"array","splitter":","}After you manually edit the fields, the UI displays the mapping between source and destination fields.
Source field
Destination field
id
{"name":"id","type":"string"}col_string
{"name":"col_string","type":"string"}col_int
{"name":"col_int","type":"long"}col_bigint
{"name":"col_bigint","type":"long"}col_decimal
{"name":"col_decimal","type":"double"}col_date
{"name":"col_date","type":"date"}col_boolean
{"name":"col_boolean","type":"bool"}col_array
{"name":"col_array","type":"array","splitter":","}
Step 3: Commit and deploy the batch sync node
To run this task periodically in a standard mode workspace, you must commit and deploy the batch sync node to the production environment. For more information, see node deployment.
Step 4: Run the batch sync node and view results
After you configure the node, run it. When the task is complete, view the synchronized data in the MongoDB collection. The task creates a document record in the destination database with the following fields and values: _id is 63ecc513143e565c9b037623, col_array is [1, 2, 3], col_bigint is 111, col_boolean is true, col_date is 2023-02-15T07:01:01.000Z, col_decimal is 1.22, col_int is 1, col_string is name11, and id is 11.
Appendix: Data type conversion
Supported values for the type parameter
The supported values for the type parameter are: INT, LONG, DOUBLE, STRING, BOOL, DATE, and ARRAY.
Array type
To write data as a MongoDB array, set the type parameter to ARRAY and configure the splitter property. For example:
-
The source data is the string:
a,b,c. -
In the task configuration, type is set to
ARRAYand splitter is set to,. -
After the task runs, the resulting data in MongoDB is:
["a","b","c"].