TSDBForInfluxDB®中的常用术语有哪些_时间序列数据库 TSDB(TSDB)-阿里云帮助中心

在深入了解TSDB For InfluxDB®前，最好先熟悉数据库的一些关键概念。本文简单地介绍了这些概念和在TSDB For InfluxDB®中的常用术语。我们在下面列出了我们将涵盖的所有术语，但是建议您从头到尾阅读本文档，以便更全面地了解TSDB For InfluxDB®。

database	field key	field set
field value	measurement	point
retention policy	series	tag key
tag set	tag value	timestamp

关于更详细的描述，请查看文档专业术语。

示例数据

下一章节将参考下面列出来的数据。虽然这些数据是虚构的，但是在TSDB For InfluxDB®中具有代表性。这些数据展示了从2015年8月18日午夜到2015年8月18日6时12分，两位科学家（langstroth和perpetua）在两个地点（location 1和location 2）分别计数得出的butterflies和honeybees的数量。假设数据存储在名为my_database的数据库中，并受到数据保留策略autogen的约束。其中，census是measurement；time列中的是时间戳；butterflies和honeybees都是field key，butterflies列和honeybees列中的数据是field value，location和scientist都是tag key，location列和scientist列中的数据是tag value。

name: census

time	butterflies	honeybees	location	scientist
2015-08-18T00:00:00Z	12	23	1	langstroth
2015-08-18T00:00:00Z	1	30	1	perpetua
2015-08-18T00:06:00Z	11	28	1	langstroth
2015-08-18T00:06:00Z	3	28	1	perpetua
2015-08-18T05:54:00Z	2	11	2	langstroth
2015-08-18T06:00:00Z	1	10	2	langstroth
2015-08-18T06:06:00Z	8	23	2	perpetua
2015-08-18T06:12:00Z	7	22	2	perpetua

讨论

现在您已经在TSDB For InfluxDB®中看到了一些示例数据，这一章节将详细分析这些数据的含义。

TSDB For InfluxDB®是一个时序数据库，因此从时间开始分析是有意义。在上面的数据中，有一列是time，TSDB For InfluxDB®中所有的数据都有这一列。time存储着时间戳，并且时间戳是以RFC3339 UTC格式展示与特定数据相关联的日期和时间。

接下来的两列，名为butterflies和honeybees，称为field。field由field key和field value组成，其中，field key（butterflies和honeybees）是字符串，field key butterflies告诉我们蝴蝶的数量：从12到7，而field key honeybees告诉我们蜜蜂的数量：从23到22。

field value是您的数据，它们可以是字符串、浮点数、整数或者布尔值。因为TSDB For InfluxDB®是一个时序数据库，所以field value始终和时间戳相关联。示例数据中的field value如下：

field key-value对的集合组成一个field set，上面示例数据中总共有8个field set：

* butterflies = 12   honeybees = 23
* butterflies = 1    honeybees = 30
* butterflies = 11   honeybees = 28
* butterflies = 3    honeybees = 28
* butterflies = 2    honeybees = 11
* butterflies = 1    honeybees = 10
* butterflies = 8    honeybees = 23
* butterflies = 7    honeybees = 22

field是TSDB For InfluxDB®数据结构中必要部分之一，在TSDB For InfluxDB®中不能没有field。同样需要注意的是，field是没有索引的。如果使用field value作为过滤条件来进行查询，那么必须扫描完所有数据，才能找到与查询中的其它条件也都匹配的所有结果。因此，相对于用tag作为过滤条件的查询来说，那些用field value作为过滤条件的查询性能会低很多（下文会有更多关于tag的介绍）。一般来说，field不应该包含经常被查询的元数据（metadata）。

示例数据中的最后两列location和scientist，就是tag。tag由tag key和tag value组成。tag key和tag value都是字符串，并记录元数据。示例数据中的tag key是location和scientist，其中，location有两个tag value：1和2，scientist也有两个tag value：langstroth和perpetua。

在上面的数据中，tag set是所有tag key-value对的不同组合。示例数据中有4个tag set：

* location = 1, scientist = langstroth
* location = 2, scientist = langstroth
* location = 1, scientist = perpetua
* location = 2, scientist = perpetua

在TSDB For InfluxDB®中，tag不是必须要有的字段，您不需要一定在数据结构中添加tag。但是，使用tag通常大有裨益。因为不像field，tag是被索引的，这意味着以tag作为过滤条件的查询会更快，所以tag非常适合存储经常被查询的元数据。

为什么索引很重要：Schema案例研究

假设您的大多数查询都是以field key butterflies和honeybees的值作为过滤条件：SELECT FROM “census” WHERE “butterflies” = 1SELECT FROM “census” WHERE “honeybees” = 23

因为没有在field上建索引，TSDB For InfluxDB®会在第一个查询中扫描butterflies的每个值，并在第二个查询中扫描honeybees的每个值，然后才能返回查询结果。这种方式会大大拉长查询响应时间，特别是当查询规模变得更大的时候。为了优化查询性能，可以重新调整数据的schema结构，使原来的field（butterflies和honeybees）变为tag，tag（location和scientist）变为field：

name: census

time	location	scientist	butterflies	honeybees
2015-08-18T00:00:00Z	1	langstroth	12	23
2015-08-18T00:00:00Z	1	perpetua	1	30
2015-08-18T00:06:00Z	1	langstroth	11	28
2015-08-18T00:06:00Z	1	perpetua	3	28
2015-08-18T05:54:00Z	2	langstroth	2	11
2015-08-18T06:00:00Z	2	langstroth	1	10
2015-08-18T06:06:00Z	2	perpetua	8	23
2015-08-18T06:12:00Z	2	perpetua	7	22

注意到现在butterflies和honeybees是tag，当再执行上面的查询时，TSDB For InfluxDB®不需要在扫描它们的每一个值后才能返回结果了。

measurement作为tag，field和time列的容器，measurement的名字对存储在相关field中数据的描述。measurement的名字是字符串，对于SQL用户来说，measurement在概念上类似于table（表格）。示例数据中只有一个measurement，就是census。census告诉我们field value记录了butterflies和honeybees的数量，而不是它们的大小、方向或某种幸福指数。

一个measurement可以有不同的保留策略（retention policies）。一个保留策略描述了TSDB For InfluxDB®保存数据的时间（DURATION）以及存储在集群中数据的副本数量（REPLICATION）。

说明

复制系数（replication factors）不适用于单节点实例。

在示例数据中，census中的所有数据属于保留策略autogen。TSDB For InfluxDB®自动创建autogen这个保留策略，它具有无限的存储时间并且复制系数设为1。

现在您已经熟悉了measurement，tag set和保留策略，是时候讨论序列（series）了。在TSDB For InfluxDB®中，序列是有共同的保留策略、measurement和tag set的数据的集合。以上示例数据中的共有4个序列：

Arbitrary series number	Retention policy	Measurement	Tag set
series 1	autogen	census	location = 1, scientist = langstroth
series 2	autogen	census	location = 2, scientist = langstroth
series 3	autogen	census	location = 1, scientist = perpetua
series 4	autogen	census	location = 2, scientist = perpetua

在设计数据的schema和在TSDB For InfluxDB®中处理数据时，理解序列的概念是很有必要的。

最后，数据点（point）就是在相同序列里，具有相同时间戳的field set。例如，这就是一个数据点：

name: census
-----------------
time                    butterflies  honeybees   location    scientist
2015-08-18T00:00:00Z    1            30          1           perpetua

上面例子中的序列，其保留策略为autogen，measurement为census，tag set为location = 1, scientist = perpetua。该数据点的时间戳是2015-08-18T00:00:00Z。

我们刚刚介绍的所有内容都存储在数据库（database）中——示例数据存在数据库my_database。TSDB For InfluxDB®数据库与传统数据库类似，并且作为用户、保留策略、连续查询和时序数据的逻辑容器。

数据库可以有多个用户、连续查询、保留策略和measurement。TSDB For InfluxDB®是一个schemaless（无模式）数据库，意味着随时可以轻松地添加新的measurement、tag和field。TSDB For InfluxDB®的设计宗旨就是能够很好地处理时序数据。

恭喜您，您已经完整地阅读完本文档了。通过本文档，您已经知道了TSDB For InfluxDB®中的基本概念和术语。如果您是初学者，我们建议您浏览文档入门指南、通过HTTP API写入数据和通过HTTP API查询数据。希望我们的时序数据库能够很好地为您服务。

InfluxDB® is a trademark registered by InfluxData, which is not affiliated with, and does not endorse, TSDB for InfluxDB®.