Query and analyze JSON logs

更新时间:
复制 MD 格式

Query and analyze JSON-formatted website logs by creating indexes and running SQL statements.

Prerequisites

Text logs have been collected in JSON format. Collect text logs in JSON format.

Step 1: Create an index

  1. Log on to the Simple Log Service console.

  2. In the Projects section, click the one you want.

    image

  3. On the Log Storage > Logstores tab, click the logstore you want.

    image

  4. In the upper-right corner of the Query and Analysis page for the Logstore, choose Index Attributes > Attributes. If no index exists, click Enable Indexing. Create an index.

    Note

    To query all fields, create a full-text index. To query specific fields and reduce index traffic, use a field index. To analyze fields with SELECT statements, a field index is required.

  5. Configure a field index. The following sample JSON log shows the field configuration.

    {
      "@timestamp": "2023-01-01T00:00:00+08:00",
      "remote_addr": "192.168.0.1",
      "remote_user": "-",
      "request": {
        "request_length": 123,
        "request_method": "GET",
        "request_uri": "/index.html"
      },
      "status": 200,
      "http_referer": "http://example.com",
      "http_user_agent": "Mozilla/5.0",
      "server_protocal": "HTTP/1.1",
      "http_x_forward_for": "192.168.0.1",
      "upstream_addr": "10.0.0.1:8080",
      "time": {
        "request_time": 0.006,
        "upstream_response_time": 0.004
      },
      "body_bytes_sent": [123, 456]
    }

    Set the content field data type to json. The leaf nodes under request include request.request_length (long), request.request_method (text), and request.request_uri (text). Enable Enable Statistics for all fields.

    • The __topic__, __source__, and __tag__ fields are system-level reserved fields.

    • The @timestamp, remote_addr, remote_user, http_referer, http_user_agent, status, server_protocal, http_x_forward_for, and upstream_addr fields do not have leaf nodes. Create indexes for these fields directly under the content field.

    • The request and time fields have leaf nodes, and the leaf nodes are not JSON arrays.

      • You cannot create indexes for the parent fields request and time, or query or analyze them.

      • Create indexes for leaf nodes under request and time by specifying the full path from parent to leaf. The format is KEY1.KEY2.KEY3, such as time.request_time and time.upstream_response_time. After creating indexes, you can query time.request_time and time.upstream_response_time.

    • The body_bytes_sent field is a JSON array. You cannot create indexes for this field or its leaf nodes, or query or analyze body_bytes_sent or the leaf nodes of body_bytes_sent.

Step 2: Reindex

Indexes apply only to newly collected data. To query historical data, use the Reindex feature.

Step 3: Query and analyze logs

On the Query and Analysis page, enter a statement and select a time range. In analysis statements (SELECT), enclose field names in double quotation marks (") and strings in single quotation marks ('). Quick start for log query and analysis. FAQ about querying and analyzing JSON logs.

  • Find logs with request status 200.

    content.status:200
  • Find logs with request length greater than 70.

    content.request.request_length > 70
  • Find logs for GET requests.

    content.request.request_method:GET
  • Count logs by request status.

    * | SELECT "content.status", COUNT(*) AS PV GROUP BY "content.status"

    The result lists each status code, such as 200 and null, with its page views (PV).

  • Count requests by request duration, sorted in ascending order.

    * | SELECT "content.time.request_time", COUNT(*) AS count GROUP BY "content.time.request_time" ORDER BY "content.time.request_time"
  • Calculate average request duration by request method.

    * | SELECT avg("content.time.request_time") AS avg_time,"content.request.request_method"  GROUP BY "content.request.request_method"

    The result shows avg_time of 45 for GET and avg_time of 11 for PUT.

Related topics