DataWorks provides Graph Database (GDB) Reader and GDB Writer for you to read data from and write data to GDB data sources. This topic describes the capabilities of synchronizing data from or to GDB data sources.
Limits
Batch data read | Batch data write |
|
|
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data source management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.
Configure a batch synchronization task to synchronize data of a single table
For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.
Appendix: Code and parameters
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Use the Code Editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Reader script demo
When you configure a data synchronization job to write data to Graph Database (GDB), you must configure vertices and edges separately:
Configure a synchronization task to read data about vertices from a GDB instance
{ "order":{ "hops":[ { "from":"Reader", "to":"Writer" } ] }, "setting":{ "errorLimit":{ "record":"100" // Maximum number of dirty data records to tolerate. }, "jvmOption":"", "speed":{ "concurrent":3, "throttle":true, // If true, throttling is enabled. If false, throttling is disabled and the mbps parameter is ignored. "mbps":"12"// The throttling limit. 1 mbps is equal to 1 MB/s. } }, "steps":[ { "category":"reader", "name":"Reader", "parameter":{ "host": "gdb-xxxxxx.aliyuncs.com", // The endpoint of the GDB instance. "port": 8182, // The port of the GDB instance. "username": "gdb", // The username for accessing the GDB instance. "password": "gdb", // The password for the username. "labelType": "VERTEX", // The type of the label. VERTEX specifies a vertex. "labels": ["label1", "label2"], // A list of labels. If left empty, all vertices are exported. "column": [ { "name": "id", // The field name. "type": "string", // The field type. "columnType": "primaryKey" // Field category. Specifies the vertex primary key, which must be of the STRING type in GDB. }, { "name": "label", // The field name. "type": "string", // The field type. "columnType": "primaryLabel" // Field category. Specifies the vertex label, which must be of the STRING type in GDB. }, { "name": "age", // The property name. "type": "int", // The property type. "columnType": "vertexProperty" // Field category. Specifies a basic property of the vertex in GDB. } ] }, "stepType":"gdb" }, { "category":"writer", "name":"Writer", "parameter":{ "print": true }, "stepType":"stream" } ] }Configure a synchronization task to read data about edges from a GDB instance
{ "order":{ "hops":[ { "from":"Reader", "to":"Writer" } ] }, "setting":{ "errorLimit":{ "record":"100" // Maximum number of dirty data records to tolerate. }, "jvmOption":"", "speed":{ "concurrent":3, "throttle":true,// If true, throttling is enabled. If false, throttling is disabled and the mbps parameter is ignored. "mbps":"12"// The throttling limit. 1 mbps is equal to 1 MB/s. } }, "steps":[ { "category":"reader", "name":"Reader", "parameter":{ "host": "gdb-xxxxxx.aliyuncs.com", // The endpoint of the GDB instance. "port": 8182, // The port of the GDB instance. "username": "gdb", // The username for accessing the GDB instance. "password": "gdb", // The password for the username. "labelType": "EDGE", // The type of the label. EDGE specifies an edge. "labels": ["label1", "label2"], // A list of labels. If left empty, all edges are exported. "column": [ { "name": "id", // The field name. "type": "string", // The field type. "columnType": "primaryKey" // Field category. Specifies the edge primary key, which must be of the STRING type in GDB. }, { "name": "label", // The field name. "type": "string", // The field type. "columnType": "primaryLabel" // Field category. Specifies the edge label, which must be of the STRING type in GDB. }, { "name": "srcId", // The field name. "type": "string", // The field type. "columnType": "srcPrimaryKey" // Field category. Specifies the start vertex primary key, which must be of the STRING type in GDB. }, { "name": "srcLabel", // The field name. "type": "string", // The field type. "columnType": "srcPrimaryLabel" // Field category. Specifies the start vertex label, which must be of the STRING type in GDB. }, { "name": "dstId", // The field name. "type": "string", // The field type. "columnType": "dstPrimaryKey" // Field category. Specifies the end vertex primary key, which must be of the STRING type in GDB. }, { "name": "dstLabel", // The field name. "type": "string", // The field type. "columnType": "dstPrimaryLabel" // Field category. Specifies the end vertex label, which must be of the STRING type in GDB. }, { "name": "weight", // The property name. "type": "double", // The property type. "columnType": "edgeProperty" // Field category. Specifies an edge property. } ] }, "stepType":"gdb" }, { "category":"writer", "name":"Writer", "parameter":{ "print": true }, "stepType":"stream" } ] }
Reader script parameters
Parameter | Description | Required | Default value |
host | The connection endpoint of the GDB instance. In the GDB console, click Management for the target instance to view the Internal Endpoint (which is the host). | Yes | No default value |
port | The port number that is used to connect to the GDB instance. | Yes | 8182 |
username | The username that is used to connect to the GDB instance. | Yes | No default value |
password | The password that is used to connect to the GDB instance. | Yes | No default value |
labels | The label, which is the name of the vertex or edge. GDB Reader can read data from multiple vertices or edges at a time. In this case, the value of this parameter is an array, such as ["label1", "label2"]. | Yes | No default value |
labelType | The type of the label. Valid values:
| Yes | No default value |
column | The vertices or edges to be synchronized. | Yes | No default value |
column -> name | The name of the vertex or edge property to be synchronized. This parameter is required if vertex or edge properties are to be synchronized. | Yes | No default value |
column -> type | The data type for storing the vertex or edge property to be synchronized.
| Yes | No default value |
column -> columnType | The category of the vertex or edge property to be synchronized.
| Yes | No default value |
Writer script demo
Configure a synchronization task to write data about vertices to a GDB database
{ "order":{ "hops":[ { "from":"Reader", "to":"Writer" } ] }, "setting":{ "errorLimit":{ "record":"100" // Maximum number of dirty data records to tolerate. }, "speed":{ "throttle":true,// If true, throttling is enabled. If false, throttling is disabled and the mbps parameter is ignored. "concurrent":3, // The job concurrency. "mbps":"12"// The throttling limit. 1 mbps is equal to 1 MB/s. } }, "steps":[ { "category":"reader", "name":"Reader", "parameter":{ "column":[ "*" ], "datasource":"_ODPS", "emptyAsNull":true, "guid":"", "isCompress":false, "partition":[], "table":"" }, "stepType":"odps" }, { "category":"writer", "name":"Writer", "parameter": { "datasource": "testGDB", // The name of the data source. "label": "person", // The label, which is the name of the vertex. "srcLabel": "", // Not applicable to vertices. "dstLabel": "", // Not applicable to vertices. "labelType": "VERTEX", // The type of the label. "VERTEX" specifies a vertex. "writeMode": "INSERT", // The policy for handling duplicate primary keys. "idTransRule": "labelPrefix", // The conversion rule for the vertex primary key. "srcIdTransRule": "none", // Not applicable to vertices. "dstIdTransRule": "none", // Not applicable to vertices. "column": [ { "name": "id", // The field name. "value": "#{0}", // Takes the value from the source column at index 0. Concatenation is supported. "type": "string", // The field type. "columnType": "primaryKey" // The field category. `primaryKey` specifies the primary key. }, // The primary key of the vertex. The field name must be `id`, the type must be STRING, and this record is required. { "name": "person_age", "value": "#{1}", // Takes the value from the source column at index 1. Concatenation is supported. "type": "int", "columnType": "vertexProperty" // The field category. `vertexProperty` specifies a vertex property. }, // A property of the vertex. Supported types: INT, LONG, FLOAT, DOUBLE, BOOLEAN, and STRING. { "name": "person_credit", "value": "#{2}", // Takes the value from the source column at index 2. Concatenation is supported. "type": "string", "columnType": "vertexProperty" } // A property of the vertex. ] } "stepType":"gdb" } ], "type":"job", "version":"2.0" }Configure a synchronization task to write data about edges to a GDB database
{ "order":{ "hops":[ { "from":"Reader", "to":"Writer" } ] }, "setting":{ "errorLimit":{ "record":"100" // Maximum number of dirty data records to tolerate. }, "jvmOption":"", "speed":{ "throttle":true,// If true, throttling is enabled. If false, throttling is disabled and the mbps parameter is ignored. "concurrent":3, // The job concurrency. "mbps":"12"// The throttling limit. 1 mbps is equal to 1 MB/s. } }, "steps":[ { "category":"reader", "name":"Reader", "parameter":{ "column":[ "*" ], "datasource":"_ODPS", "emptyAsNull":true, "guid":"", "isCompress":false, "partition":[], "table":"" }, "stepType":"odps" }, { "category":"writer", "name":"Writer", "parameter": { "datasource": "testGDB", // The name of the data source. "label": "use", // The label, which is the name of the edge. "labelType": "EDGE", // The type of the label. `EDGE` specifies an edge. "srcLabel": "person", // The label of the start vertex. "dstLabel": "software", // The label of the end vertex. "writeMode": "INSERT", // The policy for handling duplicate primary keys. "idTransRule": "labelPrefix", // The conversion rule for the edge primary key. "srcIdTransRule": "labelPrefix", // The conversion rule for the primary key of the start vertex. "dstIdTransRule": "labelPrefix", // The conversion rule for the primary key of the end vertex. "column": [ { "name": "id", // The field name. "value": "#{0}", // Takes the value from the source column at index 0. Concatenation is supported. "type": "string", // The field type. "columnType": "primaryKey" // The field category. `primaryKey` specifies the primary key. }, // The primary key of the edge. The field name must be `id` and the type must be STRING. This record is optional. { "name": "id", "value": "#{1}", // Concatenation is supported. Ensure the mapping rule is consistent with the one used when importing vertices. "type": "string", "columnType": "srcPrimaryKey" // The field category. `srcPrimaryKey` specifies the primary key of the start vertex. }, // The primary key of the start vertex. The field name must be `id`, the type must be STRING, and this record is required. { "name": "id", "value": "#{2}", // Concatenation is supported. Ensure the mapping rule is consistent with the one used when importing vertices. "type": "string", "columnType": "dstPrimaryKey" // The field category. `dstPrimaryKey` specifies the primary key of the end vertex. }, // The primary key of the end vertex. The field name must be `id`, the type must be STRING, and this record is required. { "name": "person_use_software_time", "value": "#{3}", // Concatenation is supported. "type": "long", "columnType": "edgeProperty" // The field category. `edgeProperty` specifies an edge property. }, // A property of the edge. Supported types: INT, LONG, FLOAT, DOUBLE, BOOLEAN, and STRING. { "name": "person_regist_software_name", "value": "#{4}", // Concatenation is supported. "type": "string", "columnType": "edgeProperty" }, // A property of the edge. { "name": "id", "value": "#{5}", // Concatenation is supported. "type": "long", "columnType": "edgeProperty" } // A property of the edge with the field name `id`. This is a regular property, not a primary key, and is optional. ] } "stepType":"gdb" } ], "type":"job", "version":"2.0" }
Writer script parameters
Parameter | Description | Required | Default value |
datasource | The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor. | Yes | No default value |
label | The label, which is the name of the vertex or edge. GDB Writer can obtain labels from columns in the source table. For example, if you set this parameter to #{0}, GDB Writer uses the value of the first column as the label. The column index starts from 0. | Yes | No default value |
labelType | The type of the label. Valid values:
| Yes | No default value |
srcLabel |
| No | No default value |
dstLabel |
| No | No default value |
writeMode | The mode in which GDB Writer processes data records with duplicate primary keys. Valid values:
| Yes | INSERT |
idTransRule | The rule for converting the primary key. Valid values:
| Yes | none |
srcIdTransRule | The rule for converting the primary key of the start vertex when the labelType parameter is set to EDGE. Valid values:
| Required when the labelType parameter is set to EDGE | none |
dstIdTransRule | The rule for converting the primary key of the end vertex when the labelType parameter is set to EDGE. Valid values:
| Required when the labelType parameter is set to EDGE | none |
column | The vertices or edges that you want to synchronize.
Sample of properties | Yes | No default value |