Export data to a local CSV file

更新时间:
复制 MD 格式

You can export tables from the LindormTable engine to local files in formats such as CSV, ORC, Parquet, and TXT. This topic describes how to use DataWorks and Object Storage Service (OSS) to export data from LindormTable to an OSS bucket and then download it as a CSV file to your local computer. It covers billing information, limitations, and the step-by-step procedure.

Prerequisites

Billing

The solution described in this topic uses DataWorks and OSS. In addition to Lindorm, the following items may incur charges:

Limitations

You can only perform a full export of a single table. Exporting an entire database or performing an incremental export is not supported.

Step 1: Export data to OSS

1. Add a Lindorm data source

  1. Log on to the DataWorks console. In the target region, click More > Management Center in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Management Center.

  2. In the navigation pane on the left, click Data Source to go to the Data Source List page.

  3. In the upper-right corner of the page, click Add Data Source and select Lindorm as the data source type. Follow the on-screen instructions to configure the data source.

2. Add an OSS data source

  1. Log on to the DataWorks console. In the target region, click More > Management Center in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Management Center.

  2. In the navigation pane on the left, click Data Source to go to the Data Source List page.

  3. In the upper-right corner of the page, click Add Data Source and select OSS as the data source type. Follow the on-screen instructions to configure the data source.

3. Configure a batch synchronization job

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the target region, click Data Development and O&M > Data Development in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Development.

  2. Hover over the Create icon and click Create Workflow.

    Workflows

  3. In the Create Workflow dialog box, enter a Workflow Name and Description.

  4. Click Create.

  1. Configure the network and resource settings, test the connectivity, and then click Next step.

    1. Set Data Source to Lindorm and select the Lindorm Data Source Name that you added in Step 1.

    2. Set Destination to OSS and select the OSS Data Source Name that you added in Step 2.

    3. For My Resource Group, select an exclusive resource group. This resource group is used to run the synchronization job. If you do not have an exclusive resource group, see Create and use an exclusive resource group for Data Integration.

  2. Configure the data source.

    1. Select the table type and table.

      Set the table type to TableService for tables created with SQL. For tables created with the HBase API or tables that contain dynamic columns, set the table type to WideColumn.

      If you cannot find your table, try switching the table type.

    2. Select the read method.

      If the table contains dynamic columns, select a narrow table. Otherwise, select a wide table.

  3. Configure the data destination.

    1. Set the file type to csv. You can also select other file types. For more information, see OSS data source.

    2. Set the file name (including the path), column delimiter, and other parameters.

  4. (Optional) Configure column headers. Follow this step if you want the exported CSV file to include column headers.

    1. In the toolbar, click the conversion script icon.

    2. In the OSS Writer script, add a header configuration.

      The following code provides a complete sample script for exporting a CSV file with column headers.

      Export SQL table

      {
          "type": "job",
          "version": "2.0",
          "steps": [
              {
                  "stepType": "lindorm",
                  "parameter": {
                      "selects": [],
                      "mode": "FixedColumn",
                      "datasource": "lindorm",
                      "columns": [
                          "id",
                          "name",
                          "address"
                      ],
                      "envType": 1,
                      "tableMode": "tableService",
                      "encoding": "UTF-8",
                      "caching": 128,
                      "table": "tb"
                  },
                  "name": "Reader",
                  "category": "reader"
              },
              {
                  "stepType": "oss",
                  "parameter": {
                      "fieldDelimiterOrigin": ",",
                      "nullFormat": "null",
                      "dateFormat": "yyyy-MM-dd HH:mm:ss",
                      "datasource": "oss_lindorm",
                      "header": [
                          "id",
                          "name",
                          "address"
                      ],
                      "envType": 1,
                      "writeSingleObject": true,
                      "writeMode": "truncate",
                      "encoding": "UTF-8",
                      "fieldDelimiter": ",",
                      "fileFormat": "csv",
                      "object": "lindorm_sql"
                  },
                  "name": "Writer",
                  "category": "writer"
              },
              {
                  "name": "Processor",
                  "stepType": null,
                  "category": "processor",
                  "copies": 1,
                  "parameter": {
                      "nodes": [],
                      "edges": [],
                      "groups": [],
                      "version": "2.0"
                  }
              }
          ],
          "setting": {
              "executeMode": null,
              "errorLimit": {
                  "record": "0"
              },
              "speed": {
                  "concurrent": 2,
                  "throttle": false
              }
          },
          "order": {
              "hops": [
                  {
                      "from": "Reader",
                      "to": "Writer"
                  }
              ]
          }
      }

      Export HBase table

      {
          "type": "job",
          "version": "2.0",
          "steps": [
              {
                  "stepType": "lindorm",
                  "parameter": {
                      "selects": [],
                      "mode": "FixedColumn",
                      "datasource": "lindorm",
                      "columns": [
                          "STRING|rowkey",
                          "STRING|cf:a",
                          "STRING|cf:b",
                          "STRING|cf:c"
                      ],
                      "envType": 1,
                      "tableMode": "wideColumn",
                      "encoding": "UTF-8",
                      "caching": 128,
                      "table": "test"
                  },
                  "name": "Reader",
                  "category": "reader"
              },
              {
                  "stepType": "oss",
                  "parameter": {
                      "fieldDelimiterOrigin": ",",
                      "nullFormat": "null",
                      "dateFormat": "yyyy-MM-dd HH:mm:ss",
                      "datasource": "oss_lindorm",
                      "envType": 1,
                      "writeSingleObject": true,
                      "header": [
                          "rowkey",
                          "cf:a",
                          "cf:b",
                          "cf:c"
                      ],
                      "writeMode": "truncate",
                      "encoding": "UTF-8",
                      "fieldDelimiter": ",",
                      "fileFormat": "csv",
                      "object": "from_lindorm_hbase"
                  },
                  "name": "Writer",
                  "category": "writer"
              },
              {
                  "copies": 1,
                  "parameter": {
                      "nodes": [],
                      "edges": [],
                      "groups": [],
                      "version": "2.0"
                  },
                  "name": "Processor",
                  "category": "processor"
              }
          ],
          "setting": {
              "errorLimit": {
                  "record": "0"
              },
              "locale": "en-US",
              "speed": {
                  "throttle": false,
                  "concurrent": 2
              }
          },
          "order": {
              "hops": [
                  {
                      "from": "Reader",
                      "to": "Writer"
                  }
              ]
          }
      }

      For more information about the OSS data source script, see Appendix: Script demo and parameter description.

  5. (Optional) Configure channel and scheduling properties. To further control data synchronization properties, such as job concurrency and synchronization rate, see Configure a synchronization job in wizard mode.

  6. In the toolbar, click the Save icon and then the Run icon. Clicking the Run icon starts the data export to OSS.

Step 2: Download data from OSS

  1. Log on to the OSS console.

  2. Click Bucket List, and then click the name of the destination bucket.

  3. Find the file by using the data destination file name that you set when you configured the batch synchronization job. Click Download in the Actions column to download the file to your local computer.