This topic describes how to use the Data Integration feature of DataWorks to migrate JSON fields from MongoDB to MaxCompute.
Prerequisites
Add a MaxCompute data source. For more information, see Bind a MaxCompute compute engine.
Create a workflow in DataWorks. This example uses a workflow in basic mode. For more information, see Create a workflow.
Prepare test data in MongoDB
Prepare a database account.
Create a user in the database. DataWorks uses this account to add the data source. In this example, run the following command.
db.createUser({user:"bookuser",pwd:"123456",roles:["user1"]})This command creates a user named bookuser with the password 123456 and grants the user a role with data access permissions.
Prepare the data.
Upload your data to the MongoDB database. This example uses an ApsaraDB for MongoDB instance that runs in a virtual private cloud (VPC). You must apply for a public endpoint for the instance to communicate with the shared resource group for DataWorks. This example uses the following sample data.
{ "store": { "book": [ { "category": "reference", "author": "Nigel Rees", "title": "Sayings of the Century", "price": 8.95 }, { "category": "fiction", "author": "Evelyn Waugh", "title": "Sword of Honour", "price": 12.99 }, { "category": "fiction", "author": "J. R. R. Tolkien", "title": "The Lord of the Rings", "isbn": "0-395-19395-8", "price": 22.99 } ], "bicycle": { "color": "red", "price": 19.95 } }, "expensive": 10 }In the Data Management Service (DMS) console for MongoDB, run the following command to view the uploaded data. This example uses the admin database and the userlog collection.
db.userlog.find().limit(10)
Migrate JSON data with DataWorks
Login DataWorks console.
Create a destination table in DataWorks to store the migrated data.
Right-click a created Workflow, Select Create Table > Table
In Create Table page, select the engine type, and enter Name.
On the table editing page, click DDL Statement.
In the DDL Statement dialog box, enter the table creation statement and click Generate Table Schema.
ImportantThe table name in the
CREATE TABLEstatement must match the Table Name you entered in the Create Table dialog box.create table mqdata (mqdata string);Click Commit to Production Environment.
Add a MongoDB data source. For more information, see Configure a MongoDB data source.
Create a batch synchronization node.
Go to the data analytics page. Right-click the specified workflow and choose .
In Create Node dialog box, enter Name, and click Confirm.
In the top navigation bar, choose
icon.In script mode, click
icon.In import Template dialog box SOURCE type, data source, target type and data source, and click confirm.
Enter the following script.
{ "type": "job", "steps": [ { "stepType": "mongodb", "parameter": { "datasource": "mongodb_userlog", // The name of the source MongoDB data source. "column": [ { "name": "store.bicycle.color", // The path of the JSON field. In this example, the value of the color field is extracted. "type": "document.String" // For nested fields, specify the data type of the final extracted value. If you select a top-level field, such as expensive in this example, you can simply specify string. } ], "collectionName": "userlog" // The name of the collection. }, "name": "Reader", "category": "reader" }, { "stepType": "odps", "parameter": { "partition": "", "isCompress": false, "truncate": true, "datasource": "odps_source", // The name of the destination MaxCompute data source. "column": [ "mqdata" // The column name in the MaxCompute table. ], "emptyAsNull": false, "table": "mqdata" }, "name": "Writer", "category": "writer" } ], "version": "2.0", "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] }, "setting": { "errorLimit": { "record": "" }, "speed": { "concurrent": 2, "throttle": false, } } }Click
icon to run the code.You can operation Log view the results.
Verify the results
Right-click the workflow and choose .
In create a node dialog box, enter node name, and click submit.
On the ODPS SQL node editing page, enter the following statement.
SELECT * from mqdata;Click
icon to run the code.You can Runtime Log view the results.