Tair AI Assistant is an intelligent O&M tool powered by a large language model. It provides two core capabilities, intelligent Q&A and intelligent inspection, to quickly pinpoint instance issues and get optimization recommendations.
Product status: In free public beta.
Core capabilities
Intelligent Q&A
Features
Provides a natural language-driven intelligent Q&A service based on official Alibaba Cloud documentation and the Tair expert knowledge base. You can use @ to select a specific instance for more targeted answers.
Use cases
Product feature inquiries: Ask about product-related information for a Tair instance, such as its architecture type, version differences, and parameter configurations.
Best practice guidance: Get expert recommendations for O&M scenarios, such as large key governance, hot key optimization, memory management, and connection pool configuration.
Troubleshooting guidance: Get solutions for common issues such as connection timeouts, high memory usage, and latency jitter.
Operational guidance: Get instructions for operations like creating an instance, configuring a whitelist, setting up data synchronization, and enabling an audit log.
Intelligent inspection
Features
This feature lets you perform one-click intelligent inspections. You can select inspection items and a time range, and the system automatically collects instance runtime data to generate an inspection report, helping you proactively identify potential risks.
Inspection items
Inspection item | Description |
Instance status | Checks the running status and basic health metrics of the instance. |
Instance security | Reviews the security configurations of the instance. |
High availability and disaster recovery | Assesses the high availability architecture and disaster recovery capabilities of the instance. |
Data node performance | Analyzes latency insight data from DB nodes across command and event dimensions to pinpoint performance bottlenecks. |
Proxy node performance | Analyzes the latency path from Proxy to DB nodes to intelligently diagnose network and workload issues. |
Slow log analysis | Analyzes slow query logs from DB and Proxy nodes to identify high-latency commands and provide optimization recommendations. |
Large key/hot key | Detects large keys and hot keys across multiple dimensions and analyzes their node distribution and performance impact. |
Events and alerts | Aggregates O&M events and alerts to help you trace the root cause of issues. |
Inspection item details
Data node performance
This inspection pinpoints performance bottlenecks by analyzing latency insight data from DB nodes across two dimensions: commands and events.
Command dimension: Analyzes the maximum response time (maxRT) and number of calls for high-latency commands on each DB node to identify whether these commands are concentrated on a specific node. Examples include high-complexity commands such as
HGETALL,KEYS, andSMEMBERS.Event dimension: Detects system-level events that affect performance, such as
fork(creates a child process for RDB persistence or AOF rewrite),active-defrag-cycle(memory defragmentation),expire-cycle(expired key cleanup),aof-write(AOF write), andeviction-cycle(data eviction), and provides corresponding optimization suggestions.
Proxy node performance
By comparing the response time (RT) data between each Proxy and DB node, the system identifies the following four abnormal patterns:
Abnormal pattern | Possible causes |
High latency from multiple Proxy nodes to a single DB node | The DB node is overloaded or processing slow commands. |
High latency from a single Proxy node to a single DB node | The network link between the two nodes is abnormal. |
High latency from a single Proxy node to multiple DB nodes | The Proxy node is overloaded or has network issues. |
High latency from multiple Proxy nodes to multiple DB nodes | The cluster is experiencing a sudden traffic surge or issues with underlying resources. |
It also checks whether multiple abnormal nodes are located on the same physical host to pinpoint host-level performance issues.
Slow log analysis
This inspection analyzes the slow query logs from both DB nodes and Proxy nodes:
DB slow log: Aggregates and analyzes slow command details from each node, including the command type, execution count (cnt), maximum response time (maxRT), and source IP distribution. This feature focuses on identifying high-risk command patterns, such as
KEYSfull scans (useSCANinstead), large-rangeZRANGEBYSCOREqueries (narrow the query range or optimize indexes), and high-frequencyEVAL(Lua script).Proxy slow log: Analyzes the distribution of slow requests across DB nodes, client sources, and command types from the perspective of the Proxy, allowing you to identify DB nodes where slow requests are concentrated and determine the root cause by considering the time complexity of commands (for example, if both O(1) and O(N) commands slow down simultaneously, the slowdown is likely caused by blocking from the O(N) commands).
Large key/hot key
This analysis covers the following four dimensions:
Analysis dimension | Description |
Large key - number of elements | Detects keys with an excessive number of elements (such as large Sets, Lists, or Hashes), lists the top 5, and analyzes their node distribution. |
Large key - memory usage | Detects keys that consume excessive memory, lists the top 5, and analyzes their node distribution. |
Hot key - QPS | Detects keys with an excessively high number of queries per second (QPS), lists the top 5, and analyzes their inbound and outbound traffic. |
Hot key - network traffic | Detects keys with excessive network traffic, lists the top 5, and identifies high-risk keys that appear in both the QPS and traffic dimensions. |
The inspection report analyzes the distribution of large/hot keys across nodes, paying special attention to data skew caused by hashtags (such as {tag}), and recommends investigating related calls by cross-referencing them with slow logs.
Events and alerts
This inspection aggregates O&M events and alerts within the specified time range, including failovers, kernel upgrades, resource warnings, and high-traffic requests. The high-traffic analysis identifies concentrated command patterns, client IP sources, and target DB node distributions to find the root cause of throttling triggers.
Time range
The inspection supports the following time range options:
Last 1 hour
Last 3 hours
Last 24 hours
Custom time range: Specify a custom start and end time. The maximum duration is 24 hours by default, and you can inspect data from up to 7 days prior.
Procedure
In the Tair AI Assistant panel, select the target instance to inspect.
Select the required inspection items. You can also select all items.
Select the time range for the inspection.
Click Start Inspection and wait for the inspection report to be generated.
View the inspection results and apply optimizations based on the recommendations.
Access
Log in to the Tair Management Console. You can access the Tair AI Assistant in one of the following ways:
Click the Tair AI Assistant icon in the right-side panel.
Click the AI instance inspection entry at the top of the instance details page to directly start an intelligent inspection for the current instance.
A RAM user must have the necessary permissions to access Tair instances. The Tair AI Assistant automatically inherits the user's RAM permissions and can only access authorized instances.
Billing
Tair AI Assistant is currently in a free public beta. The billing method after the public beta ends will be announced separately.
Usage recommendations
When you ask a question, include the specific instance ID, time range, and a description of the problem to receive a more accurate answer. You can use
@to select the target instance.The answers and inspection reports are generated by model inference and are for reference only. Before making critical changes, validate the recommendations against your actual business workload.
You must manually confirm and execute all operations that modify an instance.
The inspection feature collects basic instance information, monitoring metrics, and logs for analysis only. This data is not used for any other purpose.
Disclaimer
The content provided by the Tair AI Assistant is generated by AI and is for reference only. Alibaba Cloud does not guarantee its complete accuracy and does not represent the official position or commitment of Alibaba Cloud.
You are solely responsible for your actions and the results of using the content generated by the Tair AI Assistant.
The diagnosis and inspection features require collecting basic instance information, monitoring metrics, and log data. This data is used only for the current analysis.
FAQ
Q: Does the Tair AI Assistant automatically make changes to my instance?
A: No. You must manually confirm and execute all operations that modify an instance.
Q: Which Tair instance types are supported?
A: The Tair AI Assistant supports all Tair (Redis-compatible) instance types, including standard architecture, cluster architecture, and read/write splitting architecture.
Q: What is the maximum time range supported by intelligent inspection?
A: You can inspect a time range of up to 24 hours. The inspection can include data from the past 7 days.
Q: How can I improve the accuracy of the assistant's answers?
A: We recommend that you specify the instance ID, a specific time range, and a description of the issue when you ask a question. Using @ to select the target instance helps the AI assistant obtain the instance context and provide more accurate analysis and suggestions.