You can configure bot management to create anti-crawler rules to protect your websites, H5 pages, or native iOS and Android apps.
Prerequisites
WAF is enabled. For more information, see Enable WAF.
The domain name that you want to protect has been added to WAF. For more information, see Add a protected domain name.
If you use a native iOS or Android app (excluding H5 pages in the app), you must integrate the SDK into the app before you configure anti-crawler rules. For more information, see Integrate the SDK into Android apps or Integrate the SDK into iOS apps.
Configure website anti-crawler rules
To protect web pages and H5 pages from crawlers, you can configure anti-crawler rules for websites.
-
Log on to the DCDN console.
-
In the left-side navigation pane, choose .
-
On the Protection Policies page, click Create Policy.
On the Create Policy page, configure the settings.
Automatic integration (Recommended):
This option uses a JavaScript-based web SDK. It improves protection for web browsers and helps avoid compatibility issues.
After you enable automatic integration, WAF automatically adds the SDK to the HTML pages of the protected object. The SDK collects browser information, specific attack probes, and user behavior. This does not include sensitive personal information. WAF then uses this information to identify and block risks.
Manual integration:
Use manual integration if automatic integration is not suitable for your environment.
Monitor: The anti-crawler rules allow matching traffic to pass and record the traffic in security reports.
Slider CAPTCHA: The client must complete a slider CAPTCHA to continue accessing the protected target.
Add and replace the original associated policy: Unbinds the currently associated policy and replaces it with this policy.
Add and keep the original associated policy: This policy coexists with the currently bound policy. They do not affect each other.
A protected domain name can be associated with only one policy of the same type.
If a domain name is already associated with another policy of the same type, associating the domain name with the current policy replaces the previous one.
You cannot configure bot protection for DCDN domain names that have WebSocket enabled. This is because WebSocket content is encrypted, and attack characteristics cannot be identified.
-
Click Create Policy.
The new protection policy is enabled by default.
Configuration module | Configuration item | Description |
Policy Information | Policy Type | Select Bot Management. |
Policy Name | Enter a custom name for the policy. The name can contain letters, digits, and underscores (_). The maximum length is 64 characters. | |
Global Configurations | Service Type | Select Websites. This protects websites or H5 pages that are accessed through a browser. This also includes content rendered as H5 pages within apps. |
Web SDK Integration | For more information, see Integrate an SDK for a web application. | |
Traffic Characteristics | Add HTTP request fields and their rules for the target traffic. These fields are related to your service scenario and are generated in HTTP requests when the protected target is accessed. For more information about the fields, see Match conditions. | |
Legitimate Bot Management | Spider Whitelist | Enable this feature to use dynamically updated IP address information for crawlers from major search engines, such as Google, Baidu, Sogou, Bing, 360, and Yandex. After the rule is enabled, legitimate crawler IP addresses from these search engines bypass detection by the bot management module. |
Bot Characteristic Detection | Script-based Bot Block (JavaScript) | When you enable this feature, it performs a JavaScript check on clients that access the anti-crawler target. It filters traffic from non-browser tools that do not support JavaScript checks to block simple script-based attacks. |
Advanced Bot Defense (Dynamic Token-based Authentication) | When you enable this feature, it verifies the signature of each request. Requests that fail signature verification are blocked. You can block requests based on the following conditions: signature verification failure (required, includes missing or invalid signatures), invalid signature timestamp, and WebDriver attacks. | |
Bot Behavior Detection | AI Intelligent Protection | When you enable this feature, the anti-crawler rules use the AI intelligent protection engine to analyze and automatically learn from access traffic. This generates targeted protection rules or blacklists. |
Custom Throttling | IP Address Throttling (Default) | If the number of requests from an IP address exceeds the specified threshold within a statistical interval, requests from that IP address are throttled. You can configure the action (slider CAPTCHA, block, or monitor) and the duration of the action. You can set up to three conditions. For more information, see Parameters for custom rules. |
Custom Session Throttling | Set the session type and define custom session throttling conditions. If the number of requests from a session exceeds the specified threshold within a statistical interval, requests from that session are throttled. You can configure the action (slider CAPTCHA, block, or monitor) and the duration of the action. You can set up to three conditions. For more information, see Parameters for custom rules. | |
Bot Threat Intelligence | Bot Threat Intelligence Library | Contains IP addresses of sources that have performed multiple malicious crawling activities against multiple Alibaba Cloud users over a period of time. You can set the action for the bot threat intelligence library to monitor or slider CAPTCHA. |
Data Center Blacklist | When you enable this feature, it blocks the selected IP address libraries. If you use source IP addresses from public clouds or data centers, add known legitimate calls to the whitelist. Examples include payment callbacks from Alipay or WeChat and monitoring programs. The data center blacklist supports the following IP address libraries: Alibaba Cloud, 21Vianet, Meituan Cloud, Tencent Cloud, and Others. You can set the action for the data center blacklist to monitor, slider CAPTCHA, or block. | |
Fake Spider Blocking | When enabled, this feature blocks the User-Agents of all search engines listed under legitimate bot management. Legitimate client IP addresses that correspond to whitelisted search engines are allowed to pass. | |
Protected Domain Names | Select Association Mode | |
Protected Domain Names | Select the domain names to add to this policy. Note |
Configure app anti-crawler rules
If you use a native iOS or Android app, you can configure app-specific anti-crawler rules for more targeted protection. This does not apply to H5 pages that are used in the app.
-
Log on to the DCDN console.
-
In the left-side navigation pane, choose .
-
On the Protection Policies page, click Create Policy.
On the Create Policy page, configure the settings.
Section
Parameter
Description
Policy Information
Policy Type
Select Bot Management.
Policy Name
Enter a name for the policy. The name can contain Chinese characters, letters, digits, and underscores (_). It can be up to 64 characters in length.
Global Configurations
Service Type
Select app to protect native iOS or Android apps.
Web SDK Integration
Uses the native app SDK (for Android or iOS) to improve protection. After integration, the SDK collects client risk characteristics and includes a security signature in requests. WAF uses this signature to identify and block threats. To get the SDK package, click Get and copy appkey, and then fill in the information to request the SDK package. For more information, see Integrate the SDK into Android apps or Integrate the SDK into iOS apps.
Traffic Characteristics
Specify the HTTP request fields that identify your target traffic. These fields describe the business scenario that you want to protect. You can add up to five conditions. For more information, see Match conditions.
Bot Characteristic Detection
Invalid App Signature
Invalid app signature is selected by default and cannot be disabled. The protection rule detects requests that have a missing or invalid signature after the SDK is integrated into the app.
Abnormal Device Behavior
When enabled, this rule detects and manages requests from devices with abnormal characteristics. Abnormal characteristics include:
Expired Signature: The request timestamp has expired. This is enabled by default.
Using Simulator: The device is using a simulator.
Using Proxy: The device is using a proxy service.
Rooted Device: The device has root access enabled.
Debugging Mode: The device has debugging mode enabled.
Hooking: A hook program exists on the device.
Multiboxing: Multiple instances of the protected app are running on the device.
Simulated Execution: User actions are being simulated on the device.
Script Tools: An automated script is running on the device.
Custom Signature Field
Select a field name in the Header, Parameter, or Cookie to define a custom field for the signature.
If the object to be signed is unusual (for example, oversized, empty, or specially encoded), you can process its content by using a hash and place the result in this custom field. WAF will use this field for signature verification.
Action
You can set the rule action to monitor or block.
Monitor: Triggers an alert but does not block the request.
Block: Blocks the attack request.
Secondary Packaging Detection
When enabled, requests from apps whose package name and signature are not on the whitelist are treated as repackaged app requests. You can specify valid version information:
Valid package name: Specify a valid app package name. Example: example.aliyundoc.com.
Package signature: Contact Alibaba Cloud security engineers to obtain the signature. If you do not need to verify the app's package signature, leave this field empty. WAF will then only verify the package name.
NoteThe package signature is not the app certificate signature.
You can add up to five valid versions for iOS and Android apps, and each package name must be unique.
You can set the rule action to monitor or block.
Throttling
IP Address Throttling (Default)
If the number of requests from a single IP address exceeds the specified threshold within the statistical interval, WAF applies the configured action (block or monitor) for a specified duration. You can add up to three conditions. For more information, see Custom rule parameters.
Device Throttling
If the number of requests from a single device exceeds the specified threshold within the statistical interval, WAF applies the configured action (block or monitor) for a specified duration. You can add up to three conditions. For more information, see Custom rule parameters.
Custom Session Throttling
You can define a session type. If the number of requests from the same session exceeds the specified threshold within the statistical interval, WAF applies the configured action (block or monitor) for a specified duration. You can add up to three conditions. For more information, see Custom rule parameters.
Bot Threat Intelligence
Bot Threat Intelligence Library
Contains IP addresses that are known sources of malicious crawling activity across the Alibaba Cloud network.
You can set the action to monitor or slider CAPTCHA.
Data Center Blacklist
When enabled, this feature blocks requests from the selected IP address libraries of data centers. If you access your services from public cloud or data center IPs, add legitimate IPs to a whitelist to avoid blocking them. Examples include payment callbacks and monitoring services. Supported libraries include Alibaba Cloud, 21Vianet, Meituan Cloud, Tencent Cloud, and Others.
You can set the action to monitor, slider CAPTCHA, or block.
Protected Domain Names
Select Association Mode
Add and replace the original associated policy: Unbinds the existing policy and applies the current one.
Add and keep the original associated policy: Applies the current policy alongside the existing one. Both policies will be active.
Protected Domain Names
Select the domain names to which you want to apply this policy.
NoteA domain name can be associated with only one policy of the same type.
If you apply this policy to a domain that is already associated with another policy of the same type, the new policy replaces the old one.
-
Click Create Policy.
The new protection policy is enabled by default.