Lindorm搜索方案_云原生多模数据库 Lindorm(Lindorm)-阿里云帮助中心

背景信息

在实际业务中通常采用数据库（DB）和搜索库联合部署的方案，来满足基于主键的点查、多字段联合检索和全文检索的需求。HBase与Elasticsearch是该方案的一个经典组合，广泛应用于海量数据处理场景。HBase提供高吞吐、低成本的分布式存储能力，而Elasticsearch专注于复杂查询与全文检索的场景，二者协同使用可以满足对数据“存得省、查得快”的双重需求。

HBase+Elasticsearch组合使用的经典场景：

主站搜索场景：HBase存储内容ID和详情，Elasticsearch实时匹配搜索关键词的相关内容。
日志场景：HBase存储用户的点击记录，Elasticsearch负责统计分析，例如统计“本周搜索过羽绒服的用户”。
交易系统：HBase保存订单详情，Elasticsearch负责快速查询，例如查询“待发货的海外订单”。

基于以上场景的客户需求，Lindorm提供了基于宽表引擎和搜索引擎的一站式解决方案，无需进行数据同步和数据冗余即可实现高效的查询和低成本的存储。

Lindorm搜索方案介绍

在宽表引擎和搜索引擎配合使用的Lindorm一站式搜索方案中，通过一份原始数据和多份索引数据，满足客户的多种数据需求。

高效、无感知的数据流转体验、低成本的数据存储：
数据写入时，原始数据首先存储在Lindorm宽表引擎中，以实现低成本存储和高并发主键查询。同时，Lindorm内置的数据同步服务监听宽表的数据变化，筛选带有SearchIndex的列，将数据同步至Lindorm搜索引擎。搜索引擎收到数据后，立即构建倒排索引，以支持复杂检索，整个过程您无需感知。
数据查询命令智能解析：
接收到查询请求时，Lindorm统一接口层会自动解析和编译命令。针对查询的类型及其复杂程度，优化器将制作合适的执行计划：
- 简单点查操作、仅使用二级索引的简单查询：所有查询请求在宽表引擎内闭环执行。
- 全文检索、多维检索、聚合查询：将请求路由至搜索引擎进行匹配检索，搜索引擎查询索引数据后，Lindorm会根据具体内容决定是否回查宽表数据进行补齐，最终将完整结果返回给客户端。

在操作的便利性上，Lindorm宽表引擎兼容HBase语法，Lindorm搜索引擎兼容Elasticsearch语法，无需业务改造即可从原先的组合方案转移到使用Lindorm搜索方案。不仅如此，Lindorm SQL语法兼容标准SQL，您可以选择改造为成本更低的SQL使用方案。

本文演示通过HBase API+Elasticsearch API读写、通过SQL读写两种方案。

说明

在实际应用中，您可以根据业务情况，选择使用SQL读写、Elasticsearch API读写、使用Elasticsearch API写+ SQL读，或者使用HBase API写 + Elasticsearch API读等方案，非常灵活。

前提条件

已开通Lindorm宽表引擎。
已开通Lindorm搜索引擎。如何开通，请参见开通指南。
已开通搜索索引。如何开通，请参见开通搜索索引。
已将客户端IP添加至Lindorm白名单。如何添加，请参见设置白名单。

通过SQL使用

Lindorm SQL语法对齐标准SQL，多引擎使用统一接口。您可以像使用MySQL一样，通过MySQL客户端连接Lindorm，完成数据写入、查询、分析等多种操作。

以通过Lindorm-cli方式访问为例：

说明

具体使用方式，请参见通过Lindorm-cli连接并使用宽表引擎。

步骤一：创建表

CREATE TABLE myTable (
    id        BIGINT,
    name      VARCHAR,
    age       INT,
    sex       VARCHAR,
    city      VARCHAR,
    address   VARCHAR,
    PRIMARY KEY(id)
);

步骤二：写入数据

UPSERT INTO myTable(id, name, age, sex, city, address) VALUES(1001,'小王',30,'男','杭州','上城区万松岭路81号');
UPSERT INTO myTable(id, name, age, sex, city, address) VALUES(1002,'小张',32,'女','北京','海淀区双清路30号');
UPSERT INTO myTable(id, name, age, sex, city, address) VALUES(1003,'小李',33,'男','上海','浦东新区世纪大道1号');
UPSERT INTO myTable(id, name, age, sex, city, address) VALUES(1004,'小沈',28,'男','深圳','南山区蛇口街道深圳湾社区中心路1号');
UPSERT INTO myTable(id, name, age, sex, city, address) VALUES(1005,'小陆',41,'女','杭州','西湖区天目山路518号');
UPSERT INTO myTable(id, name, age, sex, city, address) VALUES(1006,'小孟',17,'男','杭州','滨江区滨文路548号');

步骤三：创建搜索索引

搜索引擎在复杂多条件的查询下延迟更低，可以为您带来更好的查询体验。

CREATE INDEX myIndex USING SEARCH ON myTable (name, age, sex, city, address(type=text,analyzer=ik));

创建完成后，需执行以下语句检查索引状态INDEX_STATE是否为ACTIVE（即可用状态）。

SHOW INDEX FROM myTable;

重要

如果索引状态INDEX_STATE为BUILDING，表示索引正在构建中，请稍等2~5分钟待索引变为ACTIVE后再执行后续步骤。

步骤四：数据查询

查询类型	SQL查询示例	返回结果

查询类型	SQL查询示例	返回结果
多维检索	`SELECT * FROM myTable WHERE sex='女' and city='北京';`	`+------+------+-----+-----+------+------------------+ \| id \| name \| age \| sex \| city \| address \| +------+------+-----+-----+------+------------------+ \| 1002 \| 小张 \| 32 \| 女 \| 北京 \| 海淀区双清路30号 \| +------+------+-----+-----+------+------------------+`
分词查询	`SELECT * FROM myTable WHERE MATCH (address) AGAINST ('蛇口');`	`+------+------+-----+-----+------+-----------------------------------+ \| id \| name \| age \| sex \| city \| address \| +------+------+-----+-----+------+-----------------------------------+ \| 1004 \| 小沈 \| 28 \| 男 \| 深圳 \| 南山区蛇口街道深圳湾社区中心路1号 \| +------+------+-----+-----+------+-----------------------------------+`
多条件排序	`SELECT * FROM myTable WHERE city='杭州' AND age>=18 ORDER BY age ASC;`	`+------+------+-----+-----+------+---------------------+ \| id \| name \| age \| sex \| city \| address \| +------+------+-----+-----+------+---------------------+ \| 1001 \| 小王 \| 30 \| 男 \| 杭州 \| 上城区万松岭路81号 \| \| 1005 \| 小陆 \| 41 \| 女 \| 杭州 \| 西湖区天目山路518号 \| +------+------+-----+-----+------+---------------------+`
模糊查询	`SELECT * FROM myTable WHERE name LIKE '小%';`	+------+------+-----+-----+------+-----------------------------------+ \| id \| name \| age \| sex \| city \| address \| +------+------+-----+-----+------+-----------------------------------+ \| 1004 \| 小沈 \| 28 \| 男 \| 深圳 \| 南山区蛇口街道深圳湾社区中心路1号 \| \| 1005 \| 小陆 \| 41 \| 女 \| 杭州 \| 西湖区天目山路518号 \| \| 1001 \| 小王 \| 30 \| 男 \| 杭州 \| 上城区万松岭路81号 \| \| 1006 \| 小孟 \| 17 \| 男 \| 杭州 \| 滨江区滨文路548号 \| \| 1002 \| 小张 \| 32 \| 女 \| 北京 \| 海淀区双清路30号 \| \| 1003 \| 小李 \| 33 \| 男 \| 上海 \| 浦东新区世纪大道1号 \| +------+------+-----+-----+------+-----------------------------------+
分页检索	`SELECT * FROM myTable WHERE sex='男' ORDER BY age DESC LIMIT 3 OFFSET 2;`	`+------+------+-----+-----+------+-----------------------------------+ \| id \| name \| age \| sex \| city \| address \| +------+------+-----+-----+------+-----------------------------------+ \| 1004 \| 小沈 \| 28 \| 男 \| 深圳 \| 南山区蛇口街道深圳湾社区中心路1号 \| \| 1006 \| 小孟 \| 17 \| 男 \| 杭州 \| 滨江区滨文路548号 \| +------+------+-----+-----+------+-----------------------------------+`

通过HBase API+Elasticsearch API使用

操作	调用API

操作	调用API
建表	HBase API
创建索引	Elasticsearch API
数据写入	HBase API
数据同步	您无需手动调用API。该过程中原始数据不会重复存储，索引数据将通过Lindorm内置的同步工具无感同步至搜索引擎。
数据查询	HBase API、Elasticsearch API

步骤一：创建HBase表

以通过Lindorm Shell方式访问为例，建表语句如下：

说明

Lindorm Shell详细使用介绍，请参见通过Lindorm Shell访问宽表引擎。

create 'myTable', {NAME => 'cf'}

建表成功预计返回如下结果：

Created table myTable

步骤二：创建索引表

以通过搜索引擎可视化UI创建索引表为例，示例语句如下：

说明

搜索引擎UI详细使用介绍，请参见可视化用户界面。

PUT /myIndex
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "age": {
        "type": "integer"
      },
      "sex": {
        "type": "keyword"
      },
      "city": {
        "type": "keyword"
      },
      "address": {
        "type": "text",
        "analyzer": "ik_max_word"
      }
    }
  }
}

创建成功预计返回以下结果：

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "myIndex"
}

步骤三：映射HBase表和索引表

以通过Lindorm Shell方式访问为例，在alihbase-2.0.18/bin目录下创建映射文件schema.json，用于保存映射关系。

{
  "sourceNamespace": "default",
  "sourceTable": "myTable",
  "targetIndexName": "myIndex",
  "indexType": "ES",
  "rowkeyFormatterType": "STRING",
  "fields": [
    {
      "source": "cf:name",
      "targetField": "name",
      "type": "STRING"
    },
    {
      "source": "cf:age",
      "targetField": "age",
      "type": "INT"
    },
    {
      "source": "cf:sex",
      "targetField": "sex",
      "type": "STRING"
    },
    {
      "source": "cf:city",
      "targetField": "city",
      "type": "STRING"
    },
    {
      "source": "cf:address",
      "targetField": "address",
      "type": "STRING"
    }
  ]
}

执行以下语句，使配置生效。

alter_external_index 'myTable', 'schema.json'

步骤四：数据写入

支持通过HBase开源客户端直接连接，您可以通过HBase Java API写入数据。

说明

具体操作及配置，请参见安装HBase Java SDK和基于HBase Java API的应用开发。

Put put1001 = new Put(Bytes.toBytes("1001"));
put1001.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("name"), Bytes.toBytes("小王"));
put1001.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("age"), Bytes.toBytes(30));
put1001.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("sex"), Bytes.toBytes("男"));
put1001.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("city"), Bytes.toBytes("杭州"));
put1001.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("address"), Bytes.toBytes("上城区万松岭路81号"));
table.put(put1001);

Put put1002 = new Put(Bytes.toBytes("1002"));
put1002.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("name"), Bytes.toBytes("小张"));
put1002.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("age"), Bytes.toBytes(32));
put1002.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("sex"), Bytes.toBytes("女"));
put1002.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("city"), Bytes.toBytes("北京"));
put1002.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("address"), Bytes.toBytes("海淀区双清路30号"));
table.put(put1002);

Put put1003 = new Put(Bytes.toBytes("1003"));
put1003.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("name"), Bytes.toBytes("小李"));
put1003.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("age"), Bytes.toBytes(33));
put1003.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("sex"), Bytes.toBytes("男"));
put1003.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("city"), Bytes.toBytes("上海"));
put1003.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("address"), Bytes.toBytes("浦东新区世纪大道1号"));
table.put(put1003);

Put put1004 = new Put(Bytes.toBytes("1004"));
put1004.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("name"), Bytes.toBytes("小沈"));
put1004.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("age"), Bytes.toBytes(28));
put1004.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("sex"), Bytes.toBytes("男"));
put1004.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("city"), Bytes.toBytes("深圳"));
put1004.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("address"), Bytes.toBytes("南山区蛇口街道深圳湾社区中心路1号"));
table.put(put1004);

Put put1005 = new Put(Bytes.toBytes("1005"));
put1005.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("name"), Bytes.toBytes("小陆"));
put1005.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("age"), Bytes.toBytes(41));
put1005.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("sex"), Bytes.toBytes("女"));
put1005.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("city"), Bytes.toBytes("杭州"));
put1005.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("address"), Bytes.toBytes("西湖区天目山路518号"));
table.put(put1005);

Put put1006 = new Put(Bytes.toBytes("1006"));
put1006.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("name"), Bytes.toBytes("小孟"));
put1006.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("age"), Bytes.toBytes(17));
put1006.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("sex"), Bytes.toBytes("男"));
put1006.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("city"), Bytes.toBytes("杭州"));
put1006.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("address"), Bytes.toBytes("滨江区滨文路548号"));
table.put(put1006);

步骤五：同步索引数据并查询

Lindorm内置的LTS自动将索引数据同步到搜索引擎内，生成倒排索引文件，您可以通过Elasticsearch API查询数据。

以通过搜索引擎UI查询为例，示例代码如下：

查询类型	Elasticsearch API查询示例	返回结果

查询类型	Elasticsearch API查询示例	返回结果
多维检索	`GET /myIndex/_search { "query": { "bool": { "must": [ { "term": { "sex": "女" } }, { "term": { "city": "北京" } } ] } } }`	单击展开返回结果 `{ "took": 4, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 2.5700645, "hits": [ { "_index": "myIndex", "_id": "1002", "_score": 2.5700645, "_source": { "address": "海淀区双清路30号", "update_version_l": 174107429****, "city": "北京", "sex": "女", "name": "小张", "age": 32 } } ] } }`
分词查询	`GET /myIndex/_search { "query": { "match": { "address": "蛇口" } } }`	单击展开返回结果 `{ "took": 4, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 1.2788599, "hits": [ { "_index": "myIndex", "_id": "1004", "_score": 1.2788599, "_source": { "address": "南山区蛇口街道深圳湾社区中心路1号", "update_version_l": 174107429****, "city": "深圳", "sex": "男", "name": "小沈", "age": 28 } } ] } }`
多条件排序	`GET /myIndex/_search { "query": { "bool": { "must": [ { "term": { "city": "杭州" } }, { "range": { "age": { "gte": 18 } } } ] } }, "sort": [ { "age": "asc" } ] }`	单击展开返回结果 { "took": 5, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 2, "relation": "eq" }, "max_score": null, "hits": [ { "_index": "myIndex", "_id": "1001", "_score": null, "_source": { "address": "上城区万松岭路81号", "update_version_l": 174107422**, "city": "杭州", "sex": "男", "name": "小王", "age": 30 }, "sort": [ 30 ] }, { "_index": "myIndex", "_id": "1005", "_score": null, "_source": { "address": "西湖区天目山路518号", "update_version_l": 174107429**, "city": "杭州", "sex": "女", "name": "小陆", "age": 41 }, "sort": [ 41 ] } ] } }
模糊查询	`GET /myIndex/_search { "query": { "wildcard": { "name": { "value": "小*" } } } }`	单击展开返回结果 { "took": 4, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 6, "relation": "eq" }, "max_score": 1, "hits": [ { "_index": "myIndex", "_id": "1001", "_score": 1, "_source": { "address": "上城区万松岭路81号", "update_version_l": 174107429**, "city": "杭州", "sex": "男", "name": "小王", "age": 30 } }, { "_index": "myIndex", "_id": "1002", "_score": 1, "_source": { "address": "海淀区双清路30号", "update_version_l": 174107429, "city": "北京", "sex": "女", "name": "小张", "age": 32 } }, { "_index": "myIndex", "_id": "1003", "_score": 1, "_source": { "address": "浦东新区世纪大道1号", "update_version_l": 174107429, "city": "上海", "sex": "男", "name": "小李", "age": 33 } }, { "_index": "myIndex", "_id": "1004", "_score": 1, "_source": { "address": "南山区蛇口街道深圳湾社区中心路1号", "update_version_l": 174107429, "city": "深圳", "sex": "男", "name": "小沈", "age": 28 } }, { "_index": "myIndex", "_id": "1005", "_score": 1, "_source": { "address": "西湖区天目山路518号", "update_version_l": 174107429, "city": "杭州", "sex": "女", "name": "小陆", "age": 41 } }, { "_index": "myIndex", "_id": "1006", "_score": 1, "_source": { "address": "滨江区滨文路548号", "update_version_l": 174107429**, "city": "杭州", "sex": "男", "name": "小孟", "age": 17 } } ] } }

Lindorm搜索方案优势总结

对比项	Lindorm	自建HBase+Elasticsearch

对比项	Lindorm	自建HBase+Elasticsearch
存储成本	多模型数据统一存储，宽表和搜索使用同一份，无需冗余。支持本地盘、云盘、对象存储，EC技术减少副本，冷热分层，综合成本降低80%。	HBase与Elasticsearch数据不互通，需冗余存储两份原始数据。无冷热分离，存储成本高。
访问接口	支持标准SQL的同时也支持Elasticsearch API、HBase API，用户可单独或搭配使用。	仅支持Elasticsearch API、HBase API。
存算分离	原生存算分离，扩缩容秒级负载均衡，无需数据迁移。	Elasticsearch存算耦合，节点只可访问本地数据，扩缩容负载均衡用时长，需要进行数据迁移。
实时检索	支持数据写入立即可查。	近实时检索，存在数据同步和索引构建窗口期。
吞吐性能	读性能为开源3倍，写性能为开源2倍，计算资源利用率提升50%。	/
数据压缩	使用深度优化的ZSTD压缩算法，相比开源HBase和Elasticsearch，存储空间减少50%。	/
冷热分离	业务无感知自动分层，冷数据存储成本降低80%，查询时无感，查询自动路由和整合。	需手动进行数据迁移和管理生命周期，查询时需分开查询后合并。
兼容性	提供类Kibana的产品化查询管理工具和类Logstash的产品化迁移工具，支持ELK生态。	依赖开源生态工具。
数据迁移	提供产品化工具LTS，可通过界面配置迁移链路，支持HBase、Elasticsearch的离线和在线迁移。	需使用开源迁移方案，稳定性风险高。