The console CLI lets you access DataHub projects and run commands to manage projects, topics, connectors, shards, and subscriptions.
Prerequisites
Make sure that the following requirement is met:
-
Java 8 or later is installed on the target device.
Install and configure the console client
-
Download the datahub_console.tar.gz package and extract it.
-
The extracted package contains the bin, conf, and lib folders.
-
Open the conf folder and enter your AccessKey and endpoint in the datahub.properties file:
datahub.accessid=
datahub.accesskey=
datahub.endpoint=
The following table describes the parameters:
|
Parameter |
Required |
Description |
Example |
|
datahub.accessid |
Yes |
The AccessKey ID of your Alibaba Cloud account or RAM user. |
N/A |
|
datahub.accesskey |
Yes |
The AccessKey secret that corresponds to the AccessKey ID. |
N/A |
|
datahub.endpoint |
Yes |
The endpoint of the DataHub service. Configure the endpoint based on the region and network type of your DataHub project. DataHub Domain Names. |
https://dh-cn-hangzhou.aliyuncs.com |
Run the console client
Start the console client using one of the following methods:
-
Method 1: In the bin folder, double-click datahubcmd.bat (Windows). The following output indicates a successful start.
-
Method 2: Open a terminal, navigate to the bin folder, and run datahubcmd (Windows) or sh datahubcmd.sh (Linux/macOS). The following output shows a successful connection to DataHub.
Command help
You can get command help in two ways:
Method 1: View help in the console client
-
View help for all commands:
help
-
View help for a specific command by keyword:
Example: list topics:
DataHub=>help lt
NAME
lt - List topic
SYNOPSYS
lt [-p] string
OPTIONS
-p string
projectName
[Mandatory]
Method 2: Run the following command from the bin folder in your terminal to view all commands:
...\bin>datahubcmd help
Usage guide
Project operations
Create a project
-
-p: The project name.
-
-c: The project description.
cp -p test_project -c test_comment
Delete a project
-
-p: The project name.
dp -p test_project
Note: Delete all resources in the project (topics, subscriptions, and sync tasks) before you delete the project.
List projects
lp
Topic operations
Create a topic
-
-p: The project name.
-
-t: The topic name.
-
-m: The topic type. BLOB for BLOB topics, TUPLE for TUPLE topics.
-
-f: The field format for TUPLE topics: [(fieldName,fieldType,isNull)]. Separate multiple fields with commas (,).
-
-s: The number of shards.
-
-l: The data TTL in days. Valid values: 1 to 7.
-
-c: The topic description.
ct -p test_project -t test_topic -m TUPLE -f [(name,string,true)] -s 3 -l 3 -c test_comment
Delete a topic
-
-p: The project name.
-
-t: The topic name.
dt -p test_project -t test_topic
Get topic details
-
-p: The project name.
-
-t: The topic name.
gt -p test_project -t test_topic
Export a topic schema to a JSON file
-
-f: The path where the file is saved.
-
-p: The project name.
-
-t: The topic name.
gts -f filepath -p test_project -t test_topic
List topics
-
-p: The project name.
lt -p test_project
Create a topic from a JSON file
-
-s: The number of shards.
-
-l: The data TTL in days. Valid values: 1 to 7.
-
-f: The file path.
-
-p: The project name.
-
-t: The topic name.
rtt -s 3 -l 3 -c test_comment -f filepath -p test_project -t test_topic
Modify the lifecycle of a topic
-
-p: The project name.
-
-t: The topic name.
-
-l: The topic lifecycle in days.
-
-c: The topic description.
utl -p test_project -t test_topic -l 3 -c test_comment
Connector operations
Create an ODPS connector
-
-p: The project name.
-
-t: The topic name.
-
-m: The sync type. Supported types for ODPS: SYSTEM_TIME, USER_DEFINE, EVENT_TIME, and META_TIME.
-
-e: The ODPS endpoint. Use the classic network endpoint.
-
-op: The ODPS project name.
-
-oa: The AccessKey ID used to access ODPS.
-
-ok: The AccessKey used to access ODPS.
-
-tr: The partition interval in minutes. Default: 60.
-
-tf: The partition format. `ds` indicates partitioning by day, `ds hh` indicates partitioning by hour, and `ds hh mm` indicates partitioning by minute.
coc -p test_project -t test_topic -m SYSTEM_TIME -e odpsEndpoint -op odpsProject -ot odpsTable -oa odpsAccessId -ok odpsAccessKey -tr 60 -c (field1,field2) -tf ds hh mm
Add a field for ODPS sync
-
-p: The project name.
-
-t: The topic name.
-
-c: The connector ID. Find it on the Data Synchronization tab.
-
-f: The name of the new field.
acf -p test_project -t test_topic -c connectorId -f fieldName
Create a connector to sync data to MySQL or RDS
-
-p: The project name.
-
-t: The topic name.
-
-h: The host. Use the classic network address.
-
-po: The port.
-
-ty: The sync type:
-
SINK_MYSQL: Sync data to MySQL.
-
SINK_ADS: Sync data to ADS.
-
-d: The database name.
-
-ta: The table name.
-
-u: The username.
-
-pa: The password.
-
-ht: The insert mode:
-
IGNORE
-
OVERWRITE
-
-n: The fields to sync, for example (field1,field2).
cdc -p test_project -t test_topic -h host -po 3306 -ty mysql -d mysql_database -ta msyql_table -u username -pa password -ht IGNORE -n (field1,field2)
Create a DataHub connector
-
-p: The project name.
-
-t: The topic name.
-
-sp: The sink project where data is imported.
-
-st: The sink topic where data is imported.
-
-m: The authentication type.
-
AK: AccessKey authentication. Requires the AccessKey ID and AccessKey secret.
-
STS: STS authentication.
cdhc -p test_project -t test_topic -sp sinkProject -st sinkTopic -m AK -i accessid k accessKey
Create an FC connector
-
-p: The project name.
-
-t: The topic name.
-
-e: The FC endpoint. Use the classic network endpoint.
-
-s: The FC service name.
-
-f: The FC function name.
-
-au: The authentication method.
-
AK: AccessKey authentication. Requires the AccessKey ID and AccessKey secret.
-
STS: Authentication using STS.
-
-n: The fields to sync, for example (field1,field2).
cfc -p test_project -t test_topic -e endpoint -s service -f function -au AK -i accessId -k accessKey -n (field1,field2)
Create a Hologres connector
-
-p: The project name.
-
-t: The topic name.
-
-e: The endpoint.
-
-cl: The fields to sync to Hologres.
-
-au: The authentication method. Only AccessKey authentication is supported for Hologres sync.
-
-m: The parsing type. Delimiter requires lineDelimiter, parseData, and columnDelimiter. InformaticaJson requires parseData.
-
Delimiter
-
InformaticaJson
chc -p test_project -t test_topic -e endpoint -cl (field,field2) -au AK -hp holoProject -ht holoTopic -i accessId -k accessKey -m Delimiter -l 1 -b false -n (field1,field2)
Create an OTS connector
-
-p: The project name.
-
-t: The topic name.
-
-it: The OTS instance name.
-
-m: The authentication type. Default: STS.
-
AK: AccessKey authentication. Requires the AccessKey ID and AccessKey secret.
-
STS: STS authentication.
-
-t: The OTS table name.
-
-wm: The write mode:
-
PUT
-
UPDATE
-
-c: The fields to sync, for example (field1,field2).
cotsc -p test_project -t test_topic -i accessId -k accessKey -it instanceId -m AK -t table -wm PUT -c (field1,field2)
Create an OSS connector
-
-p: The project name.
-
-t: The topic name.
-
-b: The OSS bucket name.
-
-e: The OSS endpoint name.
-
-pr: The directory prefix for syncing data to OSS.
-
-tf: The synchronization time format. For example, %Y%m%d%H%M indicates partitioning by minute.
-
-tr: The partition interval.
-
-c: The fields to sync.
csc -p test_project -t test_topic -b bucket -e endpoint -pr ossPrefix -tf ossTimeFormat -tr timeRange -c (f1,f2)
Delete a connector
-
-p: The project name.
-
-t: The topic name.
-
-c: The connector ID. Find it on the Data Synchronization tab.
dc -p test_project -t test_topic -c connectorId
Get connector details
-
-p: The project name.
-
-t: The topic name.
-
-c: The connector ID. Find it on the Data Synchronization tab.
gc -p test_project -t test_topic -c connectorId
List connectors in a topic
-
-p: The project name.
-
-t: The topic name.
lc -p test_project -t test_topic
Restart a connector
-
-p: The project name.
-
-t: The topic name.
-
-c: The connector ID. Find it on the Data Synchronization tab.
rc -p test_project -t test_topic -c connectorId
Update connector AccessKey
-
-p: The project name.
-
-t: The topic name.
-
-ty: The sync type, for example SINK_ODPS.
uca -p test_project -t test_topic -ty SINK_ODPS -a accessId -k accessKey
Shard operations
Merge shards
-
-p: The project name.
-
-t: The topic name.
-
-s: The ID of the shard to merge.
-
-a: The ID of the other shard to merge.
ms -p test_project -t test_topic -s shardId -a adjacentShardId
Split a shard
-
-p: The project name.
-
-t: The topic name.
-
-s: The ID of the shard to split.
ss -p test_project -t test_topic -s shardId
List shards in a topic
-
-p: The project name.
-
-t: The topic name.
ls -p test_project -t topicName
Get shard sync status
-
-p: The project name.
-
-t: The topic name.
-
-s: The shard ID.
-
-c: The connector ID. Find it on the Data Synchronization tab.
gcs -p test_project -t test_topic -s shardId -c connectorId
Get consumer offset per shard for a subscription
-
-p: The project name.
-
-t: The topic name.
-
-s: The subscription ID.
-
-i: The shard ID.
gso -p test_project -t test_topic -s subid -i shardId
Subscription operations
Create a subscription
-
-p: The project name.
-
-t: The topic name.
-
-c: The subscription description.
css -p test_project -t test_topic -c comment
Delete a subscription
-
-p: The project name.
-
-t: The topic name.
-
-s: The subscription ID.
dsc -p test_project -t test_topic -s subId
List subscriptions
-
-p: The project name.
-
-t: The topic name.
lss -p test_project -t test_topic
Upload and download data
Upload data
-
-f: The file path. On Windows, use escape characters, for example D:\\test\\test.txt.
-
-p: The project name.
-
-t: The topic name.
-
-m: The text separator. Commas (,) and spaces are supported.
-
-n: The batch size per upload. Default: 1000.
uf -f filepath -p test_topic -t test_topic -m "," -n 1000
Example: Upload a CSV file
The following example shows how to upload a CSV file to DataHub. The CSV format is:
1. 0,qe614c760fuk8judu01tn5x055rpt1,true,100.1,14321111111
2. 1,znv1py74o8ynn87k66o32ao4x875wi,true,100.1,14321111111
3. 2,7nm0mtpgo1q0ubuljjjx9b000ybltl,true,100.1,14321111111
4. 3,10t0n6pvonnan16279w848ukko5f6l,true,100.1,14321111111
5. 4,0ub584kw88s6dczd0mta7itmta10jo,true,100.1,14321111111
6. 5,1ltfpf0jt7fhvf0oy4lo8m3z62c940,true,100.1,14321111111
7. 6,zpqsfxqy9379lmcehd7q8kftntrozb,true,100.1,14321111111
8. 7,ce1ga9aln346xcj761c3iytshyzuxg,true,100.1,14321111111
9. 8,k5j2id9a0ko90cykl40s6ojq6gruyi,true,100.1,14321111111
10. 9,ns2zcx9bdip5y0aqd1tdicf7bkdmsm,true,100.1,14321111111
11. 10,54rs9cm1xau2fk66pzyz62tf9tsse4,true,100.1,14321111111
Each line is a record with comma-separated fields. The file is saved at /temp/test.csv. The DataHub topic schema is:
|
Field name |
Field type |
|
id |
BIGINT |
|
name |
STRING |
|
gender |
BOOLEAN |
|
salary |
DOUBLE |
|
my_time |
TIMESTAMP |
Upload command:
uf -f /temp/test.csv -p test_topic -t test_topic -m "," -n 1000
Download data
-
-f: The file path. On Windows, use escape characters, for example D:\\test\\test.txt.
-
-p: The project name.
-
-t: The topic name.
-
-s: The shard ID.
-
-d: The subscription ID.
-
-f: The download path.
-
-ti: The start time for reading data. Format: yyyy-mm-dd hh:mm:ss.
-
-l: The number of records to read per batch.
-
-g: Continuous reading mode.
-
0: Read once and stop.
-
1: Read continuously.
down -p test_project -t test_topic -s shardId -d subId -f filePath -ti "1970-01-01 00:00:00" -l 100 -g 0
FAQ
-
Startup failure: If the script fails to start in Windows, check whether the script path contains parentheses.