Command line interface

The console CLI lets you access DataHub projects and run commands to manage projects, topics, connectors, shards, and subscriptions.

Prerequisites

Make sure that the following requirement is met:

Java 8 or later is installed on the target device.

Install and configure the console client

Download the datahub_console.tar.gz package and extract it.
The extracted package contains the bin, conf, and lib folders.
Open the conf folder and enter your AccessKey and endpoint in the datahub.properties file:

datahub.accessid=
datahub.accesskey=
datahub.endpoint=

The following table describes the parameters:

Parameter	Required	Description	Example
datahub.accessid	Yes	The AccessKey ID of your Alibaba Cloud account or RAM user.	N/A
datahub.accesskey	Yes	The AccessKey secret that corresponds to the AccessKey ID.	N/A
datahub.endpoint	Yes	The endpoint of the DataHub service. Configure the endpoint based on the region and network type of your DataHub project. DataHub Domain Names.	https://dh-cn-hangzhou.aliyuncs.com

Run the console client

Start the console client using one of the following methods:

Method 1: In the bin folder, double-click datahubcmd.bat (Windows). The following output indicates a successful start.

Method 2: Open a terminal, navigate to the bin folder, and run datahubcmd (Windows) or sh datahubcmd.sh (Linux/macOS). The following output shows a successful connection to DataHub.

Command help

You can get command help in two ways:

Method 1: View help in the console client

View help for all commands:

help

View help for a specific command by keyword:

Example: list topics:

DataHub=>help lt
NAME
        lt - List topic

SYNOPSYS
        lt [-p] string

OPTIONS
        -p  string
                projectName
                [Mandatory]

Method 2: Run the following command from the bin folder in your terminal to view all commands:

...\bin>datahubcmd help

Usage guide

Project operations

Create a project

-p: The project name.
-c: The project description.

cp -p test_project  -c test_comment

Delete a project

-p: The project name.

dp -p test_project

Note: Delete all resources in the project (topics, subscriptions, and sync tasks) before you delete the project.

List projects

lp

Topic operations

Create a topic

-p: The project name.
-t: The topic name.
-m: The topic type. BLOB for BLOB topics, TUPLE for TUPLE topics.
-f: The field format for TUPLE topics: [(fieldName,fieldType,isNull)]. Separate multiple fields with commas (,).
-s: The number of shards.
-l: The data TTL in days. Valid values: 1 to 7.
-c: The topic description.

ct -p test_project -t test_topic -m TUPLE -f [(name,string,true)] -s 3 -l 3 -c test_comment

Delete a topic

-p: The project name.
-t: The topic name.

dt -p test_project -t test_topic

Get topic details

-p: The project name.
-t: The topic name.

gt -p test_project -t test_topic

Export a topic schema to a JSON file

-f: The path where the file is saved.
-p: The project name.
-t: The topic name.

gts -f filepath -p test_project -t test_topic

List topics

-p: The project name.

lt -p test_project

Create a topic from a JSON file

-s: The number of shards.
-l: The data TTL in days. Valid values: 1 to 7.
-f: The file path.
-p: The project name.
-t: The topic name.

rtt -s 3 -l 3 -c test_comment -f filepath -p test_project -t test_topic

Modify the lifecycle of a topic

-p: The project name.
-t: The topic name.
-l: The topic lifecycle in days.
-c: The topic description.

utl -p test_project -t test_topic -l 3 -c test_comment

Connector operations

Create an ODPS connector

-p: The project name.
-t: The topic name.
-m: The sync type. Supported types for ODPS: SYSTEM_TIME, USER_DEFINE, EVENT_TIME, and META_TIME.
-e: The ODPS endpoint. Use the classic network endpoint.
-op: The ODPS project name.
-oa: The AccessKey ID used to access ODPS.
-ok: The AccessKey used to access ODPS.
-tr: The partition interval in minutes. Default: 60.
-tf: The partition format. `ds` indicates partitioning by day, `ds hh` indicates partitioning by hour, and `ds hh mm` indicates partitioning by minute.

coc -p test_project -t test_topic -m SYSTEM_TIME -e odpsEndpoint -op odpsProject -ot odpsTable -oa odpsAccessId -ok odpsAccessKey -tr 60 -c (field1,field2) -tf ds hh mm

Add a field for ODPS sync

-p: The project name.
-t: The topic name.
-c: The connector ID. Find it on the Data Synchronization tab.
-f: The name of the new field.

acf -p test_project -t test_topic -c connectorId -f fieldName

Create a connector to sync data to MySQL or RDS

-p: The project name.
-t: The topic name.
-h: The host. Use the classic network address.
-po: The port.
-ty: The sync type:
SINK_MYSQL: Sync data to MySQL.
SINK_ADS: Sync data to ADS.
-d: The database name.
-ta: The table name.
-u: The username.
-pa: The password.
-ht: The insert mode:
IGNORE
OVERWRITE
-n: The fields to sync, for example (field1,field2).

cdc -p test_project -t test_topic -h host -po 3306 -ty mysql -d mysql_database -ta msyql_table -u username -pa password -ht IGNORE -n (field1,field2)

Create a DataHub connector

-p: The project name.
-t: The topic name.
-sp: The sink project where data is imported.
-st: The sink topic where data is imported.
-m: The authentication type.
AK: AccessKey authentication. Requires the AccessKey ID and AccessKey secret.
STS: STS authentication.

cdhc -p test_project -t test_topic -sp sinkProject -st sinkTopic -m AK -i accessid k accessKey

Create an FC connector

-p: The project name.
-t: The topic name.
-e: The FC endpoint. Use the classic network endpoint.
-s: The FC service name.
-f: The FC function name.
-au: The authentication method.
AK: AccessKey authentication. Requires the AccessKey ID and AccessKey secret.
STS: Authentication using STS.
-n: The fields to sync, for example (field1,field2).

cfc -p test_project -t test_topic -e endpoint -s service -f function -au AK -i accessId -k accessKey -n (field1,field2)

Create a Hologres connector

-p: The project name.
-t: The topic name.
-e: The endpoint.
-cl: The fields to sync to Hologres.
-au: The authentication method. Only AccessKey authentication is supported for Hologres sync.
-m: The parsing type. Delimiter requires lineDelimiter, parseData, and columnDelimiter. InformaticaJson requires parseData.
Delimiter
InformaticaJson

chc -p test_project -t test_topic -e endpoint -cl (field,field2) -au AK -hp holoProject -ht holoTopic -i accessId -k accessKey -m Delimiter -l 1 -b false -n (field1,field2)

Create an OTS connector

-p: The project name.
-t: The topic name.
-it: The OTS instance name.
-m: The authentication type. Default: STS.
AK: AccessKey authentication. Requires the AccessKey ID and AccessKey secret.
STS: STS authentication.
-t: The OTS table name.
-wm: The write mode:
PUT
UPDATE
-c: The fields to sync, for example (field1,field2).

cotsc -p test_project -t test_topic -i accessId -k accessKey -it instanceId -m AK -t table -wm PUT -c (field1,field2)

Create an OSS connector

-p: The project name.
-t: The topic name.
-b: The OSS bucket name.
-e: The OSS endpoint name.
-pr: The directory prefix for syncing data to OSS.
-tf: The synchronization time format. For example, %Y%m%d%H%M indicates partitioning by minute.
-tr: The partition interval.
-c: The fields to sync.

csc -p test_project -t test_topic -b bucket -e endpoint -pr ossPrefix -tf ossTimeFormat -tr timeRange -c (f1,f2)

Delete a connector

-p: The project name.
-t: The topic name.
-c: The connector ID. Find it on the Data Synchronization tab.

dc -p test_project -t test_topic -c connectorId

Get connector details

-p: The project name.
-t: The topic name.
-c: The connector ID. Find it on the Data Synchronization tab.

gc -p test_project -t test_topic -c connectorId

List connectors in a topic

-p: The project name.
-t: The topic name.

lc -p test_project -t test_topic

Restart a connector

-p: The project name.
-t: The topic name.
-c: The connector ID. Find it on the Data Synchronization tab.

rc -p test_project -t test_topic -c connectorId

Update connector AccessKey

-p: The project name.
-t: The topic name.
-ty: The sync type, for example SINK_ODPS.

uca -p test_project -t test_topic -ty SINK_ODPS  -a accessId -k accessKey

Shard operations

Merge shards

-p: The project name.
-t: The topic name.
-s: The ID of the shard to merge.
-a: The ID of the other shard to merge.

ms -p test_project -t test_topic -s shardId -a adjacentShardId

Split a shard

-p: The project name.
-t: The topic name.
-s: The ID of the shard to split.

ss -p test_project -t test_topic -s shardId

List shards in a topic

-p: The project name.
-t: The topic name.

ls -p test_project -t topicName

Get shard sync status

-p: The project name.
-t: The topic name.
-s: The shard ID.
-c: The connector ID. Find it on the Data Synchronization tab.

gcs -p test_project -t test_topic -s shardId -c connectorId

Get consumer offset per shard for a subscription

-p: The project name.
-t: The topic name.
-s: The subscription ID.
-i: The shard ID.

gso -p test_project -t test_topic -s subid -i shardId

Subscription operations

Create a subscription

-p: The project name.
-t: The topic name.
-c: The subscription description.

css -p test_project -t test_topic -c comment

Delete a subscription

-p: The project name.
-t: The topic name.
-s: The subscription ID.

dsc -p test_project -t test_topic -s subId

List subscriptions

-p: The project name.
-t: The topic name.

lss -p test_project -t test_topic

Upload and download data

Upload data

-f: The file path. On Windows, use escape characters, for example D:\\test\\test.txt.
-p: The project name.
-t: The topic name.
-m: The text separator. Commas (,) and spaces are supported.
-n: The batch size per upload. Default: 1000.

uf -f filepath -p test_topic -t test_topic -m "," -n 1000

Example: Upload a CSV file

The following example shows how to upload a CSV file to DataHub. The CSV format is:

1. 0,qe614c760fuk8judu01tn5x055rpt1,true,100.1,14321111111
2. 1,znv1py74o8ynn87k66o32ao4x875wi,true,100.1,14321111111
3. 2,7nm0mtpgo1q0ubuljjjx9b000ybltl,true,100.1,14321111111
4. 3,10t0n6pvonnan16279w848ukko5f6l,true,100.1,14321111111
5. 4,0ub584kw88s6dczd0mta7itmta10jo,true,100.1,14321111111
6. 5,1ltfpf0jt7fhvf0oy4lo8m3z62c940,true,100.1,14321111111
7. 6,zpqsfxqy9379lmcehd7q8kftntrozb,true,100.1,14321111111
8. 7,ce1ga9aln346xcj761c3iytshyzuxg,true,100.1,14321111111
9. 8,k5j2id9a0ko90cykl40s6ojq6gruyi,true,100.1,14321111111
10. 9,ns2zcx9bdip5y0aqd1tdicf7bkdmsm,true,100.1,14321111111
11. 10,54rs9cm1xau2fk66pzyz62tf9tsse4,true,100.1,14321111111

Each line is a record with comma-separated fields. The file is saved at /temp/test.csv. The DataHub topic schema is:

Field name	Field type
id	BIGINT
name	STRING
gender	BOOLEAN
salary	DOUBLE
my_time	TIMESTAMP

Upload command:

uf -f /temp/test.csv -p test_topic -t test_topic -m "," -n 1000

Download data

-f: The file path. On Windows, use escape characters, for example D:\\test\\test.txt.
-p: The project name.
-t: The topic name.
-s: The shard ID.
-d: The subscription ID.
-f: The download path.
-ti: The start time for reading data. Format: yyyy-mm-dd hh:mm:ss.
-l: The number of records to read per batch.
-g: Continuous reading mode.
0: Read once and stop.
1: Read continuously.

down -p test_project -t test_topic -s shardId -d subId -f filePath -ti "1970-01-01 00:00:00" -l 100 -g 0

FAQ

Startup failure: If the script fails to start in Windows, check whether the script path contains parentheses.