Manage HBase full-text indexes

更新时间:
复制 MD 格式

Before you begin

Review the Quick start guide. Make sure you have downloaded and configured the latest version of HBase Shell as described in Use HBase Shell to connect to an enhanced cluster.

HBase to Search index mapping

You can define the mapping between an HBase table and a Search index by using a JSON configuration. The following example shows a typical mapping configuration:

{
  "sourceNamespace": "default",
  "sourceTable": "testTable",
  "targetIndexName": "democollection",
  "indexType": "SOLR",
  "rowkeyFormatterType": "STRING",
  "fields": [
    {
      "source": "f:name",
      "targetField": "name_s",
      "type": "STRING"
    },
     {
      "source": "f:age",
      "targetField": "age_i",
      "type": "INT"
    }
  ]
}

This example synchronizes data from the HBase table testTable to the Search index democollection. The f:name column, where the column family name and column name are separated by a colon, is mapped to the name_s column in the index, and the f:age column is mapped to the age_i column in the index. The following section describes the meaning of each configuration item and its configurable parameters.

Parameter

Description

sourceNamespace

The namespace of the HBase table. If the table does not have a namespace, you can omit this parameter or set it to 'default'.

sourceTable

The name of the HBase table, without the namespace.

targetIndexName

The name of the Search index.

indexType

This parameter is fixed to SOLR.

rowkeyFormatterType

The format of the rowkey in HBase. Valid values are STRING or HEX. The meaning of each value is described below.

fields

A JSON array that defines the column mappings. As shown in the example, separate multiple column configurations with commas (,). The following sections provide more details.

rowkeyFormatterType

rowkeyFormatterType specifies how the rowkey of an HBase table is mapped to the id field, which is a string, in an index Document. Two methods are currently supported:

  • STRING: You can use this configuration if the rowkey of your HBase table is a string, such as row1, order0001, or 12345 (note that 12345 is a string, not a number). This method uses the Bytes.toString(byte[]) function to convert the rowkey into the id of a Document in the Search index. After you find the corresponding Document in the Search index, you can use the Bytes.toBytes(String) function to convert the id into a byte[] to use as the rowkey to query the HBase table.

  • HEX: Use this method if the rowkey of your HBase table is not a String. For example, the rowkey can be a number such as 12345, or a value concatenated from multiple fields, where some fields are not of the String type. This method uses the encodeAsString(byte[]) function from the org.apache.commons.codec.binary.Hex package to convert the rowkey into the id of an index Document. After you find the corresponding Document in the Search index, you can use the Hex.decodeHex(String.toCharArray()) function to convert the ID string back to a byte[]. You can then use this byte[] as the rowkey to query the HBase table.

Note

If the rowkey of an HBase table is not created from a string by using the Bytes.toBytes(String) method, use the HEX format. Otherwise, when the ID from the index Document is converted back to bytes, it may not match the original rowkey, and the reverse lookup will fail.

fields

Each field mapping consists of the following three parameters:

Parameter

Description

source

The column name in the HBase table to be mapped, where the family and qualifier are separated by a colon, such as f:name.

targetField

In an index table, the column names provided in the example above are all dynamic columns, such as name_s and age_i. You can use such columns directly without defining them in advance, and the Search service automatically recognizes them. For more information about how to use dynamic columns, see Schema configuration.

type

The data type of the value when it was written to the HBase column. This is the data type of the source column in HBase. Valid values are INT, LONG, STRING, BOOLEAN, FLOAT, DOUBLE, SHORT, and BIGDECIMAL. The values are case-sensitive.

The type parameter

In HBase, there is no concept of a data type. All data, including Chinese characters, is converted into bytes by calling the Bytes.toBytes(String/long/int/...) method and then stored in HBase columns. You configure the type field to tell the system which conversion method was used for the data in a specific column. For example,

int age = 25;
byte[] ageValue = Bytes.toBytes(age);
put.addColumn(Bytes.toBytes("f"), Bytes.toBytes("age"), ageValue);
String name = "25";
byte[] nameValue = Bytes.toBytes(name);
put.addColumn(Bytes.toBytes("f"), Bytes.toBytes("name"), nameValue);

In the code above, the type of the f:age column is INT, whereas the type of the "f:name" column is STRING, not an INT. Specifying the correct type is crucial to correctly sync data to the Search index. This is because the system uses the type that you specify to deserialize the original value from a byte array before syncing the value to the Search index. In the preceding example, if you incorrectly specify the type of the f:name column as 'INT', the system calls the Bytes.toInt() method to deserialize the original value. The deserialized value will obviously be incorrect.

The targetField parameter

targetField specifies the column in the Search index that the source column from HBase is mapped to. The Search service is a system with a strong schema, which means each column must be predefined in the managed_schema file of a configuration set. For more information about schema configuration, see schema configuration. However, we recommend that you use the dynamic column (dynamicField) feature of the Search service. This feature automatically identifies the column type based on a suffix. For example, name_s indicates that the column type in the Search index is String.

In HBase, the type of a source does not need to strictly match the data type of the corresponding column in the index. For example, you can define the source column f:age as the STRING type and the target field age_i in the index as the INT type. During indexing, the Search service automatically converts the STRING to an INT. However, if you write a STRING value that cannot be converted to a number to the f:age column, an error will occur during indexing.

Manage the schema

Modify a mapping schema

You can save the schema in JSON format described in the previous section to a file, such as schema.json, and then call the alter_external_index command in the HBase Shell to modify the HBase mapping schema. The schema.json file must be placed in the directory where you start the HBase Shell, or be referenced by an absolute or relative path.

hbase(main):006:0> alter_external_index 'your_table_name', 'schema.json'

Using a JSON file allows you to quickly add, delete, or modify multiple columns. You can also remove all column mappings from the fields array to delete all mappings for the HBase table. For example:

{
  "sourceNamespace": "default",
  "sourceTable": "testTable",
  "targetIndexName": "democollection",
  "indexType": "SOLR",
  "rowkeyFormatterType": "STRING",
  "fields": []
}

If you only want to add one or more columns to an existing mapping schema, you can use the add_external_index_field command.

hbase shell> add_external_index_field 'testTable', {FAMILY => 'f', QUALIFIER => 'money', TARGETFIELD => 'money_f', TYPE => 'FLOAT' }

Note: You can use the add_external_index_field command to add columns only to a table whose mapping schema was created with the alter_external_index command. To modify many columns, we recommend using the alter_external_index command to perform all changes at once.

If you only want to remove one or more columns from an existing mapping schema, you can use remove_external_index.

hbase shell> remove_external_index 'testTable', 'f:name', 'f:age'

Note: To modify many columns, we recommend using the alter_external_index command to perform all changes at once.

View the current mapping schema

You can use the describe_external_index command in the HBase Shell to obtain the complete JSON description of the current table's mapping schema.

hbase(main):005:0> describe_external_index 'testTable'