Graph Database

更新时间:
复制 MD 格式

DataWorks provides Graph Database (GDB) Reader and GDB Writer for you to read data from and write data to GDB data sources. This topic describes the capabilities of synchronizing data from or to GDB data sources.

Limits

Batch data read

Batch data write

  • You must configure two synchronization tasks to synchronize data about vertices and edges separately.

  • The vertices or edges whose data is to be synchronized must have names for DataWorks to traverse and obtain related data.

  • The primary key values of vertices and edges in GDB are of the STRING type. The type of data to be synchronized must be configured as the STRING type. If the configured type is a numeric type, such as LONG, GDB Reader forcibly converts the primary key values to the STRING type. If the conversion fails, the primary key values are lost.

  • For the values of vertex or edge properties, the data type for the property values to be synchronized must be the same as the original data type in a GDB instance. If the data type for the property values is different from the original data type, GDB Reader forcibly converts the property values to the specified data type. If the conversion fails, the property values are lost.

  • If you run a task to synchronize the vertex data multiple times, the obtained values of the SET property may be different.

  • If you configure all properties in the JSON format, the SET property that contains only one value is regarded as a common property.

  • Unless otherwise specified, field names or enumerated values in this topic are case-sensitive.

  • GDB Reader supports only the UTF-8 encoding format. The synchronized data must be encoded in UTF-8.

  • Only GDB 1.0.20 or later supports the SET property. Confirm the GDB version before you use the SET property.

  • You must run a synchronization task to synchronize vertex data before you run a synchronization task to synchronize edge data.

  • Limits on vertices:

    • A vertex must have a name, which is specified by the label parameter.

    • A vertex must have a unique primary key of the STRING type. If the primary key is not a string, GDB Writer forcibly converts the primary key into a string.

    • Exercise caution when you configure the idTransRule parameter. If you want to set this parameter to none, you must make sure that the primary key of each vertex is unique among all vertices.

  • Limits on edges:

    • An edge must have a name, which is specified by the label parameter.

    • A primary key is optional for an edge.

      • If you specify a primary key, it must be of the STRING type and be globally unique across all edges. If a non-string value is provided, GDB Writer automatically converts it to the STRING type.

      • If you do not specify a primary key for an edge, GDB Writer automatically generates a universally unique identifier (UUID) of the STRING type for the edge. If the UUID is not a string, GDB Writer forcibly converts the UUID into a string.

    • Exercise caution when you configure the idTransRule parameter. If you want to set this parameter to none, you must make sure that the primary key of each edge is unique among all edges.

    • The srcIdTransRule and dstIdTransRule parameters are required for an edge. The values of the two parameters must be the same as the value of the idTransRule parameter of the related vertex.

  • Unless otherwise specified, field names and enumerated values in this topic are case-sensitive.

  • GDB Writer supports only the UTF-8 encoding format. Source data must be encoded in UTF-8.

  • Due to network restrictions, synchronization tasks can be run only on a Serverless resource group (recommended) or an exclusive resource group for Data Integration. You must first purchase the required resource group and associate it with the Virtual Private Cloud (VPC) that contains your GDB instance.

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data source management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Configure a batch synchronization task to synchronize data of a single table

Appendix: Code and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Use the Code Editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script demo

When you configure a data synchronization job to write data to Graph Database (GDB), you must configure vertices and edges separately:

  • Configure a synchronization task to read data about vertices from a GDB instance

    {
        "order":{
            "hops":[
                {
                    "from":"Reader",
                    "to":"Writer"
                }
            ]
        },
        "setting":{
            "errorLimit":{
                "record":"100"  // Maximum number of dirty data records to tolerate.
            },
            "jvmOption":"",
            "speed":{
                "concurrent":3,
                "throttle":true, // If true, throttling is enabled. If false, throttling is disabled and the mbps parameter is ignored.
                "mbps":"12"// The throttling limit. 1 mbps is equal to 1 MB/s.
            }
        },
        "steps":[
            {
                "category":"reader",
                "name":"Reader",
                "parameter":{
                    "host": "gdb-xxxxxx.aliyuncs.com", // The endpoint of the GDB instance.
                    "port": 8182, // The port of the GDB instance.
                    "username": "gdb", // The username for accessing the GDB instance.
                    "password": "gdb", // The password for the username.
                    "labelType": "VERTEX", // The type of the label. VERTEX specifies a vertex.
                    "labels": ["label1", "label2"],  // A list of labels. If left empty, all vertices are exported.
                    "column": [
                        {
                            "name": "id",               // The field name.
                            "type": "string",           // The field type.
                            "columnType": "primaryKey"  // Field category. Specifies the vertex primary key, which must be of the STRING type in GDB.
                        },
                        {
                            "name": "label",              // The field name.
                            "type": "string",             // The field type.
                            "columnType": "primaryLabel"  // Field category. Specifies the vertex label, which must be of the STRING type in GDB.
                        },
                        {
                            "name": "age",                   // The property name.
                            "type": "int",                   // The property type.
                            "columnType": "vertexProperty"   // Field category. Specifies a basic property of the vertex in GDB.
                        }
                    ]
                },
                "stepType":"gdb"
            },
            {
                "category":"writer",
                "name":"Writer",
                "parameter":{
                    "print": true
                },
                "stepType":"stream"
            }
        ]
    }
  • Configure a synchronization task to read data about edges from a GDB instance

    {
        "order":{
            "hops":[
                {
                    "from":"Reader",
                    "to":"Writer"
                }
            ]
        },
        "setting":{
            "errorLimit":{
                "record":"100"  // Maximum number of dirty data records to tolerate.
            },
            "jvmOption":"",
            "speed":{
                "concurrent":3,
                "throttle":true,// If true, throttling is enabled. If false, throttling is disabled and the mbps parameter is ignored.
                "mbps":"12"// The throttling limit. 1 mbps is equal to 1 MB/s.
            }
        },
        "steps":[
            {
                "category":"reader",
                "name":"Reader",
                "parameter":{
                    "host": "gdb-xxxxxx.aliyuncs.com", // The endpoint of the GDB instance.
                    "port": 8182, // The port of the GDB instance.
                    "username": "gdb", // The username for accessing the GDB instance.
                    "password": "gdb", // The password for the username.
                    "labelType": "EDGE", // The type of the label. EDGE specifies an edge.
                    "labels": ["label1", "label2"],  // A list of labels. If left empty, all edges are exported.
                    "column": [
                        {
                            "name": "id",               // The field name.
                            "type": "string",           // The field type.
                            "columnType": "primaryKey"  // Field category. Specifies the edge primary key, which must be of the STRING type in GDB.
                        },
                        {
                            "name": "label",              // The field name.
                            "type": "string",             // The field type.
                            "columnType": "primaryLabel"  // Field category. Specifies the edge label, which must be of the STRING type in GDB.
                        },
                        {
                            "name": "srcId",               // The field name.
                            "type": "string",              // The field type.
                            "columnType": "srcPrimaryKey"  // Field category. Specifies the start vertex primary key, which must be of the STRING type in GDB.
                        },
                        {
                            "name": "srcLabel",               // The field name.
                            "type": "string",                 // The field type.
                            "columnType": "srcPrimaryLabel"   // Field category. Specifies the start vertex label, which must be of the STRING type in GDB.
                        },
                        {
                            "name": "dstId",                    // The field name.
                            "type": "string",                   // The field type.
                            "columnType": "dstPrimaryKey"       // Field category. Specifies the end vertex primary key, which must be of the STRING type in GDB.
                        },
                        {
                            "name": "dstLabel",                 // The field name.
                            "type": "string",                   // The field type.
                            "columnType": "dstPrimaryLabel"     // Field category. Specifies the end vertex label, which must be of the STRING type in GDB.
                        },
                        {
                            "name": "weight",               // The property name.
                            "type": "double",               // The property type.
                            "columnType": "edgeProperty"    // Field category. Specifies an edge property.
                        }
                    ]
                },
                "stepType":"gdb"
            },
            {
                "category":"writer",
                "name":"Writer",
                "parameter":{
                    "print": true
                },
                "stepType":"stream"
            }
        ]
    }

Reader script parameters

Parameter

Description

Required

Default value

host

The connection endpoint of the GDB instance. In the GDB console, click Management for the target instance to view the Internal Endpoint (which is the host).

Yes

No default value

port

The port number that is used to connect to the GDB instance.

Yes

8182

username

The username that is used to connect to the GDB instance.

Yes

No default value

password

The password that is used to connect to the GDB instance.

Yes

No default value

labels

The label, which is the name of the vertex or edge. GDB Reader can read data from multiple vertices or edges at a time. In this case, the value of this parameter is an array, such as ["label1", "label2"].

Yes

No default value

labelType

The type of the label. Valid values:

  • VERTEX

  • EDGE

Yes

No default value

column

The vertices or edges to be synchronized.

Yes

No default value

column -> name

The name of the vertex or edge property to be synchronized. This parameter is required if vertex or edge properties are to be synchronized.

Yes

No default value

column -> type

The data type for storing the vertex or edge property to be synchronized.

  • The primary key and label can only be of the STRING type. If you do not set the data type to STRING, data conversion fails.

  • Other properties can be of the INT, LONG, FLOAT, DOUBLE, BOOLEAN, or STRING type.

  • GDB Reader forcibly converts the obtained data to the specified type. If the conversion fails, the data record is lost.

Yes

No default value

column -> columnType

The category of the vertex or edge property to be synchronized.

  • For both vertices and edges:

    • primaryKey: the primary key.

    • primaryLabel: the label.

  • For vertices:

    • vertexProperty: a common property of the vertex.

    • vertexJsonProperty: a collection of the properties of the vertex, in the JSON format. If you set the columnType parameter to vertexJsonProperty, all properties are listed in this column. Other columns cannot contain the property of the vertex.

      Example of vertexJsonProperty:

      {
          "properties":[
              {"k":"name","t":"string","v":"tom","c":"set"},
              {"k":"name","t":"string","v":"jack","c":"set"},
              {"k":"sex","t":"string","v":"male","c":"single"}
          ]
      }
                                                          

      The preceding code contains a multi-value property name and a single-value property gender. The name property has two records. If a multi-value property in GDB contains only one value, it is exported as a single-value property.

  • For edges:

    • srcPrimaryKey: the primary key of the start vertex.

    • dstPrimaryKey: the primary key of the end vertex.

    • srcPrimaryLabel: the label of the start vertex.

    • dstPrimaryLabel: the label of the end vertex.

    • edgeProperty: a property of the edge.

    • edgeJsonProperty: a collection of the properties of the edge, in the JSON format. If you set the columnType parameter to edgeJsonProperty, all properties are listed in this column. Other columns cannot contain the property of the edge.

      Example of edgeJsonProperty:

      {
          "properties":[
              {"k":"name","t":"string","v":"tom"},
              {"k":"sex","t":"string","v":"male"}
      ]
      }
                                                          

      An edge does not support multi-value properties or the c field.

Yes

No default value

Writer script demo

  • Configure a synchronization task to write data about vertices to a GDB database

    {
        "order":{
            "hops":[
                {
                    "from":"Reader",
                    "to":"Writer"
                }
            ]
        },
        "setting":{
            "errorLimit":{
                "record":"100"  // Maximum number of dirty data records to tolerate.
            },
            "speed":{
                 "throttle":true,// If true, throttling is enabled. If false, throttling is disabled and the mbps parameter is ignored.
                "concurrent":3, // The job concurrency.
                "mbps":"12"// The throttling limit. 1 mbps is equal to 1 MB/s.
            }
        },
        "steps":[
            {
                "category":"reader",
                "name":"Reader",
                "parameter":{
                    "column":[
                        "*"
                    ],
                    "datasource":"_ODPS",
                    "emptyAsNull":true,
                    "guid":"",
                    "isCompress":false,
                    "partition":[],
                    "table":""
                },
                "stepType":"odps"
            },
            {
                "category":"writer",
                "name":"Writer",
                "parameter": {
                    "datasource": "testGDB", // The name of the data source.
                    "label": "person", // The label, which is the name of the vertex.
                    "srcLabel": "", // Not applicable to vertices.
                    "dstLabel": "", // Not applicable to vertices.
                    "labelType": "VERTEX", // The type of the label. "VERTEX" specifies a vertex.
                    "writeMode": "INSERT", // The policy for handling duplicate primary keys.
                    "idTransRule": "labelPrefix", // The conversion rule for the vertex primary key.
                    "srcIdTransRule": "none", // Not applicable to vertices.
                    "dstIdTransRule": "none", // Not applicable to vertices.
                    "column": [
                        {
                            "name": "id", // The field name.
                            "value": "#{0}", // Takes the value from the source column at index 0. Concatenation is supported.
                            "type": "string", // The field type.
                            "columnType": "primaryKey" // The field category. `primaryKey` specifies the primary key.
                        }, // The primary key of the vertex. The field name must be `id`, the type must be STRING, and this record is required.
                        {
                            "name": "person_age",
                            "value": "#{1}", // Takes the value from the source column at index 1. Concatenation is supported.
                            "type": "int",
                            "columnType": "vertexProperty" // The field category. `vertexProperty` specifies a vertex property.
                        }, // A property of the vertex. Supported types: INT, LONG, FLOAT, DOUBLE, BOOLEAN, and STRING.
                        {
                            "name": "person_credit",
                            "value": "#{2}", // Takes the value from the source column at index 2. Concatenation is supported.
                            "type": "string",
                            "columnType": "vertexProperty"
                        } // A property of the vertex.
                    ]
                }
                "stepType":"gdb"
            }
        ],
        "type":"job",
        "version":"2.0"
    }
  • Configure a synchronization task to write data about edges to a GDB database

    {
        "order":{
            "hops":[
                {
                    "from":"Reader",
                    "to":"Writer"
                }
            ]
        },
        "setting":{
            "errorLimit":{
                "record":"100" // Maximum number of dirty data records to tolerate.
            },
            "jvmOption":"",
            "speed":{
                "throttle":true,// If true, throttling is enabled. If false, throttling is disabled and the mbps parameter is ignored.
                "concurrent":3, // The job concurrency.
                "mbps":"12"// The throttling limit. 1 mbps is equal to 1 MB/s.
            }
        },
        "steps":[
            {
                "category":"reader",
                "name":"Reader",
                "parameter":{
                    "column":[
                        "*"
                    ],
                    "datasource":"_ODPS",
                    "emptyAsNull":true,
                    "guid":"",
                    "isCompress":false,
                    "partition":[],
                    "table":""
                },
                "stepType":"odps"
            },
            {
                "category":"writer",
                "name":"Writer",
                "parameter": {
                    "datasource": "testGDB", // The name of the data source.
                    "label": "use", // The label, which is the name of the edge.
                    "labelType": "EDGE", // The type of the label. `EDGE` specifies an edge.
                    "srcLabel": "person", // The label of the start vertex.
                    "dstLabel": "software", // The label of the end vertex.
                    "writeMode": "INSERT", // The policy for handling duplicate primary keys.
                    "idTransRule": "labelPrefix", // The conversion rule for the edge primary key.
                    "srcIdTransRule": "labelPrefix", // The conversion rule for the primary key of the start vertex.
                    "dstIdTransRule": "labelPrefix", // The conversion rule for the primary key of the end vertex.
                    "column": [
                        {
                            "name": "id", // The field name.
                            "value": "#{0}", // Takes the value from the source column at index 0. Concatenation is supported.
                            "type": "string", // The field type.
                            "columnType": "primaryKey" // The field category. `primaryKey` specifies the primary key.
                        }, // The primary key of the edge. The field name must be `id` and the type must be STRING. This record is optional.
                        {
                            "name": "id",
                            "value": "#{1}", // Concatenation is supported. Ensure the mapping rule is consistent with the one used when importing vertices.
                            "type": "string",
                            "columnType": "srcPrimaryKey" // The field category. `srcPrimaryKey` specifies the primary key of the start vertex.
                        }, // The primary key of the start vertex. The field name must be `id`, the type must be STRING, and this record is required.
                        {
                            "name": "id",
                            "value": "#{2}", // Concatenation is supported. Ensure the mapping rule is consistent with the one used when importing vertices.
                            "type": "string",
                            "columnType": "dstPrimaryKey" // The field category. `dstPrimaryKey` specifies the primary key of the end vertex.
                        }, // The primary key of the end vertex. The field name must be `id`, the type must be STRING, and this record is required.
                        {
                            "name": "person_use_software_time",
                            "value": "#{3}", // Concatenation is supported.
                            "type": "long",
                            "columnType": "edgeProperty" // The field category. `edgeProperty` specifies an edge property.
                        }, // A property of the edge. Supported types: INT, LONG, FLOAT, DOUBLE, BOOLEAN, and STRING.
                        {
                            "name": "person_regist_software_name",
                            "value": "#{4}", // Concatenation is supported.
                            "type": "string",
                            "columnType": "edgeProperty"
                        }, // A property of the edge.
                        {
                            "name": "id",
                            "value": "#{5}", // Concatenation is supported.
                            "type": "long",
                            "columnType": "edgeProperty"
                        } // A property of the edge with the field name `id`. This is a regular property, not a primary key, and is optional.
                    ]
                }
                "stepType":"gdb"
            }
        ],
        "type":"job",
        "version":"2.0"
    }

Writer script parameters

Parameter

Description

Required

Default value

datasource

The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor.

Yes

No default value

label

The label, which is the name of the vertex or edge.

GDB Writer can obtain labels from columns in the source table. For example, if you set this parameter to #{0}, GDB Writer uses the value of the first column as the label. The column index starts from 0.

Yes

No default value

labelType

The type of the label. Valid values:

  • VERTEX

  • EDGE

Yes

No default value

srcLabel

  • The name of the start vertex in an edge when the labelType parameter is set to EDGE.

    This parameter can be left empty if srcIdTransRule is set to none. If srcIdTransRule is set to another value, this parameter is required.

  • Leave this parameter empty if the labelType parameter is set to VERTEX.

No

No default value

dstLabel

  • The name of the end vertex in an edge when the labelType parameter is set to EDGE.

    This parameter can be left empty if dstIdTransRule is set to none. If dstIdTransRule is set to another value, this parameter is required.

  • Leave this parameter empty if the labelType parameter is set to VERTEX.

No

No default value

writeMode

The mode in which GDB Writer processes data records with duplicate primary keys. Valid values:

  • INSERT: returns an error message. The number of error data records is increased by 1.

  • MERGE: overwrites the existing data record with the new one.

Yes

INSERT

idTransRule

The rule for converting the primary key. Valid values:

  • labelPrefix: Converts the mapped value to the {label}-{source column} format.

  • none: does not convert the primary key.

Yes

none

srcIdTransRule

The rule for converting the primary key of the start vertex when the labelType parameter is set to EDGE. Valid values:

  • labelPrefix: Converts the mapped value to the {label}-{source column} format.

  • none: does not convert the primary key. In this case, the srcLabel parameter can be left empty.

Required when the labelType parameter is set to EDGE

none

dstIdTransRule

The rule for converting the primary key of the end vertex when the labelType parameter is set to EDGE. Valid values:

  • labelPrefix: Converts the mapped value to the {label}-{source column} format.

  • none: does not convert the primary key. In this case, the dstLabel parameter can be left empty.

Required when the labelType parameter is set to EDGE

none

column

The vertices or edges that you want to synchronize.

  • name: the name of the vertex or edge property.

  • value: the value of the vertex or edge property. You can customize a value only in the code editor.

    • #{N}: uses the value of the Nth column in the source as the value of the vertex or edge property. N indicates the column index, which starts from 0.

    • #{0}: uses the value of the first column in the source as the value of the vertex or edge property.

    • test-#{0}: appends a fixed string such as test- to the beginning or end of #{0}.

    • #{0}-#{1}: Concatenates multiple fields. You can also add fixed strings at any position, for example, test-#{0}-test1-#{1}-test2.

  • type: the data type of the vertex or edge property.

    The primary key must be of the STRING type. If the value obtained from the source is not a string, GDB Writer forcibly converts the value into a string. Make sure that the value can be converted into a string.

    Other properties can be of the INT, LONG, FLOAT, DOUBLE, BOOLEAN, or STRING type.

  • columnType: the category of the vertex or edge property that you want to synchronize.

    • For both vertices and edges

      primaryKey: the primary key.

    • For vertices

      • vertexProperty: a common property of a vertex.

      • vertexJsonProperty: a JSON property of the vertex. For more information about the value structure, see the sample of properties.

    • For edges

      • srcPrimaryKey: the primary key of the start vertex.

      • dstPrimaryKey: the primary key of the end vertex.

      • edgeProperty: a common property of an edge.

      • edgeJsonProperty: a JSON property of an edge. For more information about the value structure, see the sample of properties.

Sample of properties

{"properties":[
    {"k":"name","t":"string","v":"tom"},
    {"k":"age","t":"int","v":"20"},
    {"k":"sex","t":"string","v":"male"}
]}

Yes

No default value