When you need to refresh or prefetch a large number of URLs, submitting them manually through the console is slow and error-prone. Alibaba Cloud CDN provides a Python script that automates batch refresh and prefetch tasks: it reads a URL list from a file, splits the list into batches, submits each batch through the CDN API, and polls for completion before moving to the next batch.
How it works
The script processes URLs in three stages:
Split into batches: The script reads the URL file and groups URLs into batches of up to 20 (controlled by the
gopvariable in the script).Submit and poll: For each batch, the script submits a refresh or prefetch request and checks task status every 10 seconds, waiting up to 50 seconds per batch.
Proceed sequentially: After a batch completes, the script moves to the next batch automatically.
When to use this script
Use this script when:
The number of URLs to refresh or prefetch is large, and submitting them manually is too time-consuming.
You want to automate batch submissions without writing custom integration code.
You need automatic status polling instead of manually checking whether tasks are complete.
Quotas and limits
Python version: The script requires Python 3.x. To check your version:
python --versionor
python3 --versionDaily quotas: The daily refresh and prefetch quotas apply to all submissions, including those from the script. Before running large batches, check your remaining quota in the Alibaba Cloud CDN console under Refresh and Prefetch. The script automatically checks available quota before submitting tasks and exits with an error if the quota is insufficient.
| Quota type | Checked by script | Error message |
|---|---|---|
URL refresh quota (UrlRemain) | Yes | UrlRemain is not enough |
Directory refresh quota (DirRemain) | Yes | DirRemain is not enough |
Prefetch quota (PreloadRemain) | Yes | PreloadRemain is not enough |
Prerequisites
Before you begin, make sure you have:
Python 3.x installed
An Alibaba Cloud account with a Resource Access Management (RAM) user configured (see Create an AccessKey pair)
The RAM user granted the
AliyunCDNFullAccesssystem policy, or a custom policy with equivalent CDN permissions (see Create custom policies and Authorization details)
Use a RAM user AccessKey pair instead of your root account credentials. Leaked root account credentials give attackers full access to all your Alibaba Cloud resources.
Step 1: Install dependencies
Install the Alibaba Cloud CDN software development kit (SDK) for Python:
pip install alibabacloud_cdn20180510Step 2: Prepare a URL file
Create a plain text file (for example, urllist.txt) with one URL per line. Each URL must start with http:// or https://.
http://example.com/file1.jpg
http://example.com/file2.jpg
http://example.com/file3.jpgURLs that do not start with http:// or https:// cause the script to exit with a format error. URLs that contain special characters must be URL-encoded before you add them to the file.
Step 3: Create the script
Save the following code as Refresh.py (or any filename you prefer).
#!/usr/bin/env python3
# coding=utf-8
# __author__ = 'aliyun.cdn'
# __date__ = '2025-08-15'
# SDK installation command: pip install alibabacloud_cdn20180510
'''Check Package'''
# Import the required libraries.
import re, sys, getopt, time, logging, os
try:
from alibabacloud_cdn20180510.client import Client as Cdn20180510Client
from alibabacloud_credentials.models import Config as CreConfig
from alibabacloud_credentials.client import Client as CredentialClient
from alibabacloud_tea_openapi.models import Config
from alibabacloud_cdn20180510 import models as cdn_20180510_models
from alibabacloud_tea_util import models as util_models
# Catch import exceptions.
except ImportError as e:
sys.exit(f"[error] Please pip install alibabacloud_cdn20180510. Details: {e}")
# Initialize logging.
logging.basicConfig(level=logging.DEBUG, filename='./RefreshAndPredload.log')
# Define a global variable class to store information such as AccessKey ID, AccessKey secret, and file directory.
class Envariable(object):
LISTS = []
# For Endpoints, see https://api.aliyun.com/product/Cdn
ENDPOINT = 'cdn.aliyuncs.com'
AK = None
SK = None
FD = None
CLI = None
TASK_TYPE = None
TASK_AREA = None
TASK_OTYPE = None
# Set the AccessKey ID.
@staticmethod
def set_ak():
Envariable.AK = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_ID')
# Get the AccessKey ID.
@staticmethod
def get_ak():
return Envariable.AK
# Set the AccessKey secret.
@staticmethod
def set_sk():
Envariable.SK = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_SECRET')
# Get the AccessKey secret.
@staticmethod
def get_sk():
return Envariable.SK
# Set the file directory.
@staticmethod
def set_fd(fd):
Envariable.FD = fd
# Get the file directory.
@staticmethod
def get_fd():
return Envariable.FD
# Set the task type.
@staticmethod
def set_task_type(task_type):
Envariable.TASK_TYPE = task_type
# Get the task type.
@staticmethod
def get_task_type():
return Envariable.TASK_TYPE
# Set the task area.
@staticmethod
def set_task_area(task_area):
Envariable.TASK_AREA = task_area
# Get the task area.
@staticmethod
def get_task_area():
return Envariable.TASK_AREA
# Set the task object type.
@staticmethod
def set_task_otype(task_otype):
Envariable.TASK_OTYPE = task_otype
# Get the task object type.
@staticmethod
def get_task_otype():
return Envariable.TASK_OTYPE
# Create a new client.
@staticmethod
def set_acs_client():
try:
# Use the AccessKey pair to initialize the Credentials client.
credentialsConfig = CreConfig(
# Credential type.
type='access_key',
# Set to the AccessKey ID.
access_key_id=Envariable.get_ak(),
# Set to the AccessKey secret.
access_key_secret=Envariable.get_sk(),
)
credentialClient = CredentialClient(credentialsConfig)
cdnConfig = Config(credential=credentialClient)
# Configure the service endpoint.
cdnConfig.endpoint = Envariable.ENDPOINT
# Initialize the CDN client.
Envariable.CLI = Cdn20180510Client(cdnConfig)
except Exception as e:
logging.error(f"Failed to create client: {e}")
raise
# Get the client.
@staticmethod
def get_acs_client():
return Envariable.CLI
# Module-level initializer function.
def initialize_credentials_and_client():
"""Initializes the AccessKey pair and client when the module is loaded."""
try:
# Initialize the AccessKey pair from environment variables.
Envariable.set_ak()
Envariable.set_sk()
# Check whether the AccessKey pair is obtained.
if not Envariable.get_ak() or not Envariable.get_sk():
logging.warning("AK or SK not found in environment variables")
return False
# Initialize the client.
Envariable.set_acs_client()
logging.info("Credentials and client initialized successfully")
return True
except Exception as e:
logging.error(f"Failed to initialize credentials and client: {e}")
return False
# Run initialization when the module is loaded.
_initialization_success = initialize_credentials_and_client()
class BaseCheck(object):
def __init__(self):
self.invalidurl = ''
self.lines = 0
self.urllist = Envariable.get_fd()
# Check the quota.
def printQuota(self):
try:
client = Envariable.get_acs_client()
if not client:
raise Exception("CDN client not initialized")
# Use the SDK to make the call.
request = cdn_20180510_models.DescribeRefreshQuotaRequest()
runtime = util_models.RuntimeOptions()
response = client.describe_refresh_quota_with_options(request, runtime)
quotaResp = response.body.to_map()
except Exception as e:
logging.error(f"\n[error]: initial Cdn20180510Client failed: {e}\n")
sys.exit(1)
if Envariable.TASK_TYPE:
if Envariable.TASK_TYPE == 'push':
if self.lines > int(quotaResp['PreloadRemain']):
sys.exit("\n[error]:PreloadRemain is not enough {0}".format(quotaResp['PreloadRemain']))
return True
if Envariable.TASK_TYPE == 'clear':
if Envariable.get_task_otype() == 'File' and self.lines > int(quotaResp['UrlRemain']):
sys.exit("\n[error]:UrlRemain is not enough {0}".format(quotaResp['UrlRemain']))
elif Envariable.get_task_otype() == 'Directory' and self.lines > int(quotaResp['DirRemain']):
sys.exit("\n[error]:DirRemain is not enough {0}".format(quotaResp['DirRemain']))
else:
return True
# Verify the URL format.
def urlFormat(self):
try:
with open(self.urllist, "r") as f:
for line in f.readlines():
self.lines += 1
if not re.match(r'^((https)|(http))', line):
self.invalidurl = line + '\n' + self.invalidurl
if self.invalidurl != '':
sys.exit("\n[error]: URL format is illegal \n{0}".format(self.invalidurl))
return True
except FileNotFoundError:
sys.exit(f"\n[error]: File not found: {self.urllist}\n")
except Exception as e:
sys.exit(f"\n[error]: Failed to read file {self.urllist}: {e}\n")
# Batch processing class that splits the URL list into multiple batches of a specified size.
class doTask(object):
@staticmethod
def urlencode_pl(inputs_str):
len_str = len(inputs_str)
if inputs_str == "" or len_str <= 0:
return ""
result_end = ""
for chs in inputs_str:
if chs.isalnum() or chs in {":", "/", ".", "-", "_", "*"}:
result_end += chs
elif chs == ' ':
result_end += '+'
else:
result_end += f'%{ord(chs):02X}'
return result_end
# Process URLs in batches.
@staticmethod
def doProd():
gop = 20 # Defines the maximum number of URLs per batch.
mins = 1
maxs = gop
current_batch = [] # Use a local variable instead of a global variable.
try:
with open(Envariable.get_fd(), "r") as f:
for line in f.readlines():
line = doTask.urlencode_pl(line.strip()) + "\n"
current_batch.append(line)
if mins >= maxs:
yield current_batch
current_batch = []
mins = 1
else:
mins += 1
if current_batch:
yield current_batch
except FileNotFoundError:
sys.exit(f"\n[error]: File not found: {Envariable.get_fd()}\n")
except Exception as e:
sys.exit(f"\n[error]: Failed to read file {Envariable.get_fd()}: {e}\n")
# Run the refresh or prefetch task.
@staticmethod
def doRefresh(lists):
try:
client = Envariable.get_acs_client()
if not client:
raise Exception("CDN client not initialized")
runtime = util_models.RuntimeOptions()
taskID = None
response_data = None
if Envariable.get_task_type() == 'clear':
taskID = 'RefreshTaskId'
request = cdn_20180510_models.RefreshObjectCachesRequest()
if Envariable.get_task_otype():
request.object_type = Envariable.get_task_otype()
request.object_path = lists
response = client.refresh_object_caches_with_options(request, runtime)
response_data = response.body.to_map()
elif Envariable.get_task_type() == 'push':
taskID = 'PushTaskId'
request = cdn_20180510_models.PushObjectCacheRequest()
if Envariable.get_task_area():
request.area = Envariable.get_task_area()
request.object_path = lists
response = client.push_object_cache_with_options(request, runtime)
response_data = response.body.to_map()
if response_data and taskID:
print(response_data)
timeout = 0
while True:
count = 0
# Use the SDK to query the task status.
taskreq = cdn_20180510_models.DescribeRefreshTasksRequest()
taskreq.task_id = response_data[taskID]
taskresp = client.describe_refresh_tasks_with_options(taskreq, runtime)
taskresp_data = taskresp.body.to_map()
print(f"[{response_data[taskID]}] is doing... ...")
for t in taskresp_data['Tasks']['CDNTask']:
if t['Status'] != 'Complete':
count += 1
if count == 0:
logging.info(f"[{response_data[taskID]}] is finish")
break
elif timeout > 5: # Wait for a maximum of 50 seconds (5 x 10 seconds).
logging.info(f"[{response_data[taskID]}] timeout after 50 seconds")
break
else:
timeout += 1
time.sleep(10) # Check the status every 10 seconds.
continue
except Exception as e:
logging.error(f"\n[error]: {e}")
sys.exit(1)
class Refresh(object):
def main(self, argv):
if len(argv) < 1:
sys.exit(f"\n[usage]: {sys.argv[0]} -h ")
try:
opts, args = getopt.getopt(argv, "hr:t:a:o:")
except getopt.GetoptError as e:
sys.exit(f"\n[usage]: {sys.argv[0]} -h ")
for opt, arg in opts:
if opt == '-h':
self.help()
sys.exit()
elif opt == '-r':
Envariable.set_fd(arg)
elif opt == '-t':
Envariable.set_task_type(arg)
elif opt == '-a':
Envariable.set_task_area(arg)
elif opt == '-o':
Envariable.set_task_otype(arg)
else:
sys.exit(f"\n[usage]: {sys.argv[0]} -h ")
# Check the initialization status only when it is not a help command.
if not _initialization_success:
sys.exit("\n[error]: Failed to initialize credentials and client. Please check environment variables.\n")
try:
if not (Envariable.get_ak() and Envariable.get_sk() and Envariable.get_fd() and Envariable.get_task_type()):
sys.exit("\n[error]: Must set environment variables ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET, and parameters '-r', '-t'\n")
if Envariable.get_task_type() not in {"push", "clear"}:
sys.exit("\n[error]: taskType Error, '-t' option in 'push' or 'clear'\n")
if Envariable.get_task_area() and Envariable.get_task_otype():
sys.exit("\n[error]: -a and -o cannot exist at same time\n")
if Envariable.get_task_area():
if Envariable.get_task_area() not in {"domestic", "overseas"}:
sys.exit("\n[error]: Area value Error, '-a' option in 'domestic' or 'overseas'\n")
if Envariable.get_task_otype():
if Envariable.get_task_otype() not in {"File", "Directory"}:
sys.exit("\n[error]: ObjectType value Error, '-a' options in 'File' or 'Directory'\n")
if Envariable.get_task_type() == 'push':
sys.exit("\n[error]: -t must be clear and 'push' -a use together\n")
except Exception as e:
logging.error(f"\n[error]: Parameter {e} error\n")
sys.exit(1)
handler = BaseCheck()
if handler.urlFormat() and handler.printQuota():
for g in doTask.doProd():
doTask.doRefresh(''.join(g))
time.sleep(1)
def help(self):
print("\nscript options explain: \
\n\t -r <filename> The file path and file name. After the script runs, it reads the URLs from the file. Each line must contain one URL. URLs with special characters must be URL-encoded. Each URL must start with http or https. \
\n\t -t <taskType> The task type. `clear`: refresh. `push`: prefetch. \
\n\t -a [String,<domestic|overseas>] Optional. The prefetch scope. If you do not set this parameter, resources are prefetched globally.\
\n\t domestic The Chinese mainland only. \
\n\t overseas Global (excluding the Chinese mainland). \
\n\t -o [String,<File|Directory>] Optional. The type of content to refresh. \
\n\t File File (default). \
\n\t Directory Directory.")
if __name__ == '__main__':
fun = Refresh()
fun.main(sys.argv[1:])Script parameters
| Option | Description | Required |
|---|---|---|
-r <filename> | Path to the URL file. Each line must contain one URL. URLs with special characters must be URL-encoded. | Yes |
-t <taskType> | Task type. clear runs a refresh. push runs a prefetch. | Yes |
-a <domestic|overseas> | Prefetch scope. Only valid with -t push. domestic: the Chinese mainland only. overseas: global, excluding the Chinese mainland. If omitted, the script prefetches globally. | No |
-o <File|Directory> | Object type for refresh. Only valid with -t clear. File (default) or Directory. | No |
-a and -o cannot be used together. -a is only valid with -t push. -o is only valid with -t clear.
To adjust the batch size, change the value of the gop variable in the script. The default is 20 URLs per batch.
View help
Run the following command to display all parameter descriptions:
python Refresh.py -hOutput:
script options explain:
-r <filename> //The file path and file name. After the script runs, it reads the URLs from the file. Each line must contain one URL. URLs with special characters must be URL-encoded. Each URL must start with http or https.
-t <taskType> //The task type. `clear`: refresh. `push`: prefetch.
-a [String,<domestic|overseas> //Optional. The prefetch scope. If you do not set this parameter, resources are prefetched globally.
domestic //The Chinese mainland only.
overseas //Global (excluding the Chinese mainland).
-o [String,<File|Directory>] //Optional. The type of content to refresh.
File //File (default).
Directory //Directory.Step 4: Set your AccessKey as environment variables
The script reads credentials from the environment variables ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET. For setup instructions, see Configure environment variables in Linux, macOS, and Windows.
On Linux and macOS, environment variables set with export are valid only for the current terminal session. To make them persistent, add the export commands to your shell's startup file (for example, ~/.bashrc or ~/.zshrc).
Step 5: Run the script
Open a terminal (Command Prompt, PowerShell, or Terminal) and run:
python Refresh.py -r <PathToUrlFile> -t <TaskType>Replace <PathToUrlFile> with the path to your URL file and <TaskType> with clear (refresh) or push (prefetch).
Refresh cached files
If urllist.txt is in the same directory as Refresh.py:
python Refresh.py -r urllist.txt -t clearIf the URL file is in a different directory:
python Refresh.py -r D:\example\filename\urllist.txt -t clearExpected output:
{'RequestId': 'C1686DCA-F3B5-5575-ADD1-05F96617D770', 'RefreshTaskId': '18392588710'}
[18392588710] is doing... ...Prefetch content
If urllist.txt is in the same directory as Refresh.py:
python Refresh.py -r urllist.txt -t pushIf the URL file is in a different directory:
python Refresh.py -r D:\example\filename\urllist.txt -t pushExpected output:
{'RequestId': 'C1686DCA-F3B5-5575-ADD1-05F96617D771', 'RefreshTaskId': '18392588711'}
[18392588710] is doing... ...If the script returns Failed to initialize credentials and client. Please check environment variables., set the AccessKey environment variables as described in Step 4 and run the command again in the same terminal window.