This topic describes how to migrate JSON data from OSS to MaxCompute by using DataWorks Data Integration, and then parse the data using the MaxCompute string function GET_JSON_OBJECT.
Prerequisites
-
Create a workflow in a DataWorks workspace. This example uses a workspace in basic mode. For more information, see Create a workflow.
-
Rename a JSON file with the
.txtextension and upload it to an OSS bucket. In this example, the OSS bucket is in the China (Shanghai) region. The sample file is as follows.{ "store": { "book": [ { "category": "reference", "author": "Nigel Rees", "title": "Sayings of the Century", "price": 8.95 }, { "category": "fiction", "author": "Evelyn Waugh", "title": "Sword of Honour", "price": 12.99 }, { "category": "fiction", "author": "J. R. R. Tolkien", "title": "The Lord of the Rings", "isbn": "0-395-19395-8", "price": 22.99 } ], "bicycle": { "color": "red", "price": 19.95 } }, "expensive": 10 }
Migrate JSON data from OSS to MaxCompute
-
Add an OSS data source. For more information, see Add an OSS data source.
-
Create a table in DataWorks to store the migrated JSON data.
-
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
-
Click Data Source to go to the Data Source page. Click New data source to add a MaxCompute project.
-
Click Add data source and bind it to the data development environment.
-
On the Data Development page, hover over the
icon and choose Create Table > Table. -
In the Create Table dialog box, select a Path, enter a Name, and then click Create .
NoteIf multiple engine instances are bound, select a MaxCompute engine instance.
On the table editing page, click DDL Statement.
-
In the DDL Statement dialog box, enter the following statement and click Generate Table Schema.
create table mqdata (mq_data string); -
In the Confirm dialog box, click Confirm.
-
After the table schema is generated, in the General section, enter a Display Name for the table, and click Commit to Development Environment and Commit to Production Environment.
NoteIf you are using a workspace in basic mode, click only Commit to Production Environment.
-
-
Create an offline synchronization node.
Go to the data analytics page. Right-click the specified workflow and choose .
In Create Node dialog box, enter Name, and click Confirm.
In the top navigation bar, choose
icon.In script mode, click
icon.In import Template dialog box SOURCE type, data source, target type and data source, and click confirm.
-
Modify the JSON code and click the
icon.Example code:
{ "type": "job", "steps": [ { "stepType": "oss", "parameter": { "fieldDelimiterOrigin": "^", "nullFormat": "", "compress": "", "datasource": "OSS_userlog", "column": [ { "name": 0, "type": "string", "index": 0 } ], "skipHeader": "false", "encoding": "UTF-8", "fieldDelimiter": "^", "fileFormat": "binary", "object": [ "applog.txt" ] }, "name": "Reader", "category": "reader" }, { "stepType": "odps", "parameter": { "partition": "", "isCompress": false, "truncate": true, "datasource": "odps_first", "column": [ "mq_data" ], "emptyAsNull": false, "table": "mqdata" }, "name": "Writer", "category": "writer" } ], "version": "2.0", "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] }, "setting": { "errorLimit": { "record": "" }, "speed": { "concurrent": 2, "throttle": false } } }
Verify the result
Create an ODPS SQL node.
Right-click the workflow and choose .
In create a function dialog box, enter function name, click submit.
-
In the ODPS SQL node editor, enter the following statements.
-- Query data in the mq_data table. SELECT * from mqdata; -- Retrieve the value of the 'expensive' key from the JSON data. SELECT GET_JSON_OBJECT(mqdata.mq_data,'$.expensive') FROM mqdata; Click
icon to run the code.You can operation Log view the results.