Use a script to refresh and prefetch content

更新时间:
复制 MD 格式

When you need to refresh or prefetch a large number of URLs, submitting them manually through the console is slow and error-prone. Alibaba Cloud CDN provides a Python script that automates batch refresh and prefetch tasks: it reads a URL list from a file, splits the list into batches, submits each batch through the CDN API, and polls for completion before moving to the next batch.

How it works

The script processes URLs in three stages:

  1. Split into batches: The script reads the URL file and groups URLs into batches of up to 20 (controlled by the gop variable in the script).

  2. Submit and poll: For each batch, the script submits a refresh or prefetch request and checks task status every 10 seconds, waiting up to 50 seconds per batch.

  3. Proceed sequentially: After a batch completes, the script moves to the next batch automatically.

When to use this script

Use this script when:

  • The number of URLs to refresh or prefetch is large, and submitting them manually is too time-consuming.

  • You want to automate batch submissions without writing custom integration code.

  • You need automatic status polling instead of manually checking whether tasks are complete.

Quotas and limits

Python version: The script requires Python 3.x. To check your version:

python --version

or

python3 --version

Daily quotas: The daily refresh and prefetch quotas apply to all submissions, including those from the script. Before running large batches, check your remaining quota in the Alibaba Cloud CDN console under Refresh and Prefetch. The script automatically checks available quota before submitting tasks and exits with an error if the quota is insufficient.

Quota typeChecked by scriptError message
URL refresh quota (UrlRemain)YesUrlRemain is not enough
Directory refresh quota (DirRemain)YesDirRemain is not enough
Prefetch quota (PreloadRemain)YesPreloadRemain is not enough

Prerequisites

Before you begin, make sure you have:

Important

Use a RAM user AccessKey pair instead of your root account credentials. Leaked root account credentials give attackers full access to all your Alibaba Cloud resources.

Step 1: Install dependencies

Install the Alibaba Cloud CDN software development kit (SDK) for Python:

pip install alibabacloud_cdn20180510

Step 2: Prepare a URL file

Create a plain text file (for example, urllist.txt) with one URL per line. Each URL must start with http:// or https://.

http://example.com/file1.jpg
http://example.com/file2.jpg
http://example.com/file3.jpg
Warning

URLs that do not start with http:// or https:// cause the script to exit with a format error. URLs that contain special characters must be URL-encoded before you add them to the file.

Step 3: Create the script

Save the following code as Refresh.py (or any filename you prefer).

#!/usr/bin/env python3
# coding=utf-8
# __author__ = 'aliyun.cdn'
# __date__ = '2025-08-15'

# SDK installation command: pip install alibabacloud_cdn20180510

'''Check Package'''
# Import the required libraries.
import re, sys, getopt, time, logging, os

try:
    from alibabacloud_cdn20180510.client import Client as Cdn20180510Client
    from alibabacloud_credentials.models import Config as CreConfig
    from alibabacloud_credentials.client import Client as CredentialClient
    from alibabacloud_tea_openapi.models import Config
    from alibabacloud_cdn20180510 import models as cdn_20180510_models
    from alibabacloud_tea_util import models as util_models

# Catch import exceptions.
except ImportError as e:
    sys.exit(f"[error] Please pip install alibabacloud_cdn20180510. Details: {e}")

# Initialize logging.
logging.basicConfig(level=logging.DEBUG, filename='./RefreshAndPredload.log')

# Define a global variable class to store information such as AccessKey ID, AccessKey secret, and file directory.
class Envariable(object):
    LISTS = []
    # For Endpoints, see https://api.aliyun.com/product/Cdn
    ENDPOINT = 'cdn.aliyuncs.com'
    AK = None
    SK = None
    FD = None
    CLI = None
    TASK_TYPE = None
    TASK_AREA = None
    TASK_OTYPE = None

    # Set the AccessKey ID.
    @staticmethod
    def set_ak():
        Envariable.AK = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_ID')

    # Get the AccessKey ID.
    @staticmethod
    def get_ak():
        return Envariable.AK

    # Set the AccessKey secret.
    @staticmethod
    def set_sk():
        Envariable.SK = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_SECRET')

    # Get the AccessKey secret.
    @staticmethod
    def get_sk():
        return Envariable.SK

    # Set the file directory.
    @staticmethod
    def set_fd(fd):
        Envariable.FD = fd

    # Get the file directory.
    @staticmethod
    def get_fd():
        return Envariable.FD

    # Set the task type.
    @staticmethod
    def set_task_type(task_type):
        Envariable.TASK_TYPE = task_type

    # Get the task type.
    @staticmethod
    def get_task_type():
        return Envariable.TASK_TYPE

    # Set the task area.
    @staticmethod
    def set_task_area(task_area):
        Envariable.TASK_AREA = task_area

    # Get the task area.
    @staticmethod
    def get_task_area():
        return Envariable.TASK_AREA

    # Set the task object type.
    @staticmethod
    def set_task_otype(task_otype):
        Envariable.TASK_OTYPE = task_otype

    # Get the task object type.
    @staticmethod
    def get_task_otype():
        return Envariable.TASK_OTYPE

    # Create a new client.
    @staticmethod
    def set_acs_client():
        try:
            # Use the AccessKey pair to initialize the Credentials client.
            credentialsConfig = CreConfig(
                # Credential type.
                type='access_key',
                # Set to the AccessKey ID.
                access_key_id=Envariable.get_ak(),
                # Set to the AccessKey secret.
                access_key_secret=Envariable.get_sk(),
            )
            credentialClient = CredentialClient(credentialsConfig)

            cdnConfig = Config(credential=credentialClient)
            # Configure the service endpoint.
            cdnConfig.endpoint = Envariable.ENDPOINT
            # Initialize the CDN client.
            Envariable.CLI = Cdn20180510Client(cdnConfig)
        except Exception as e:
            logging.error(f"Failed to create client: {e}")
            raise

    # Get the client.
    @staticmethod
    def get_acs_client():
        return Envariable.CLI


# Module-level initializer function.
def initialize_credentials_and_client():
    """Initializes the AccessKey pair and client when the module is loaded."""
    try:
        # Initialize the AccessKey pair from environment variables.
        Envariable.set_ak()
        Envariable.set_sk()

        # Check whether the AccessKey pair is obtained.
        if not Envariable.get_ak() or not Envariable.get_sk():
            logging.warning("AK or SK not found in environment variables")
            return False

        # Initialize the client.
        Envariable.set_acs_client()
        logging.info("Credentials and client initialized successfully")
        return True
    except Exception as e:
        logging.error(f"Failed to initialize credentials and client: {e}")
        return False


# Run initialization when the module is loaded.
_initialization_success = initialize_credentials_and_client()




class BaseCheck(object):
    def __init__(self):
        self.invalidurl = ''
        self.lines = 0
        self.urllist = Envariable.get_fd()

    # Check the quota.
    def printQuota(self):
        try:
            client = Envariable.get_acs_client()
            if not client:
                raise Exception("CDN client not initialized")

            # Use the SDK to make the call.
            request = cdn_20180510_models.DescribeRefreshQuotaRequest()
            runtime = util_models.RuntimeOptions()
            response = client.describe_refresh_quota_with_options(request, runtime)
            quotaResp = response.body.to_map()
        except Exception as e:
            logging.error(f"\n[error]: initial Cdn20180510Client failed: {e}\n")
            sys.exit(1)

        if Envariable.TASK_TYPE:
            if Envariable.TASK_TYPE == 'push':
                if self.lines > int(quotaResp['PreloadRemain']):
                    sys.exit("\n[error]:PreloadRemain is not enough {0}".format(quotaResp['PreloadRemain']))
                return True
            if Envariable.TASK_TYPE == 'clear':
                if Envariable.get_task_otype() == 'File' and self.lines > int(quotaResp['UrlRemain']):
                    sys.exit("\n[error]:UrlRemain is not enough {0}".format(quotaResp['UrlRemain']))
                elif Envariable.get_task_otype() == 'Directory' and self.lines > int(quotaResp['DirRemain']):
                    sys.exit("\n[error]:DirRemain is not enough {0}".format(quotaResp['DirRemain']))
                else:
                    return True

    # Verify the URL format.
    def urlFormat(self):
        try:
            with open(self.urllist, "r") as f:
                for line in f.readlines():
                    self.lines += 1
                    if not re.match(r'^((https)|(http))', line):
                        self.invalidurl = line + '\n' + self.invalidurl
                if self.invalidurl != '':
                    sys.exit("\n[error]: URL format is illegal \n{0}".format(self.invalidurl))
                return True
        except FileNotFoundError:
            sys.exit(f"\n[error]: File not found: {self.urllist}\n")
        except Exception as e:
            sys.exit(f"\n[error]: Failed to read file {self.urllist}: {e}\n")

# Batch processing class that splits the URL list into multiple batches of a specified size.
class doTask(object):
    @staticmethod
    def urlencode_pl(inputs_str):
        len_str = len(inputs_str)
        if inputs_str == "" or len_str <= 0:
            return ""
        result_end = ""
        for chs in inputs_str:
            if chs.isalnum() or chs in {":", "/", ".", "-", "_", "*"}:
                result_end += chs
            elif chs == ' ':
                result_end += '+'
            else:
                result_end += f'%{ord(chs):02X}'
        return result_end

    # Process URLs in batches.
    @staticmethod
    def doProd():
        gop = 20  # Defines the maximum number of URLs per batch.
        mins = 1
        maxs = gop
        current_batch = []  # Use a local variable instead of a global variable.

        try:
            with open(Envariable.get_fd(), "r") as f:
                for line in f.readlines():
                    line = doTask.urlencode_pl(line.strip()) + "\n"
                    current_batch.append(line)
                    if mins >= maxs:
                        yield current_batch
                        current_batch = []
                        mins = 1
                    else:
                        mins += 1
            if current_batch:
                yield current_batch
        except FileNotFoundError:
            sys.exit(f"\n[error]: File not found: {Envariable.get_fd()}\n")
        except Exception as e:
            sys.exit(f"\n[error]: Failed to read file {Envariable.get_fd()}: {e}\n")

    # Run the refresh or prefetch task.
    @staticmethod
    def doRefresh(lists):
        try:
            client = Envariable.get_acs_client()
            if not client:
                raise Exception("CDN client not initialized")

            runtime = util_models.RuntimeOptions()
            taskID = None
            response_data = None

            if Envariable.get_task_type() == 'clear':
                taskID = 'RefreshTaskId'
                request = cdn_20180510_models.RefreshObjectCachesRequest()
                if Envariable.get_task_otype():
                    request.object_type = Envariable.get_task_otype()
                request.object_path = lists
                response = client.refresh_object_caches_with_options(request, runtime)
                response_data = response.body.to_map()
            elif Envariable.get_task_type() == 'push':
                taskID = 'PushTaskId'
                request = cdn_20180510_models.PushObjectCacheRequest()
                if Envariable.get_task_area():
                    request.area = Envariable.get_task_area()
                request.object_path = lists
                response = client.push_object_cache_with_options(request, runtime)
                response_data = response.body.to_map()

            if response_data and taskID:
                print(response_data)

                timeout = 0
                while True:
                    count = 0
                    # Use the SDK to query the task status.
                    taskreq = cdn_20180510_models.DescribeRefreshTasksRequest()
                    taskreq.task_id = response_data[taskID]
                    taskresp = client.describe_refresh_tasks_with_options(taskreq, runtime)
                    taskresp_data = taskresp.body.to_map()
                    print(f"[{response_data[taskID]}] is doing... ...")

                    for t in taskresp_data['Tasks']['CDNTask']:
                        if t['Status'] != 'Complete':
                            count += 1
                    if count == 0:
                        logging.info(f"[{response_data[taskID]}] is finish")
                        break
                    elif timeout > 5:  # Wait for a maximum of 50 seconds (5 x 10 seconds).
                        logging.info(f"[{response_data[taskID]}] timeout after 50 seconds")
                        break
                    else:
                        timeout += 1
                        time.sleep(10)  # Check the status every 10 seconds.
                        continue
        except Exception as e:
            logging.error(f"\n[error]: {e}")
            sys.exit(1)


class Refresh(object):
    def main(self, argv):
        if len(argv) < 1:
            sys.exit(f"\n[usage]: {sys.argv[0]} -h ")
        try:
            opts, args = getopt.getopt(argv, "hr:t:a:o:")
        except getopt.GetoptError as e:
            sys.exit(f"\n[usage]: {sys.argv[0]} -h ")

        for opt, arg in opts:
            if opt == '-h':
                self.help()
                sys.exit()
            elif opt == '-r':
                Envariable.set_fd(arg)
            elif opt == '-t':
                Envariable.set_task_type(arg)
            elif opt == '-a':
                Envariable.set_task_area(arg)
            elif opt == '-o':
                Envariable.set_task_otype(arg)
            else:
                sys.exit(f"\n[usage]: {sys.argv[0]} -h ")

        # Check the initialization status only when it is not a help command.
        if not _initialization_success:
            sys.exit("\n[error]: Failed to initialize credentials and client. Please check environment variables.\n")

        try:
            if not (Envariable.get_ak() and Envariable.get_sk() and Envariable.get_fd() and Envariable.get_task_type()):
                sys.exit("\n[error]: Must set environment variables ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET, and parameters '-r', '-t'\n")
            if Envariable.get_task_type() not in {"push", "clear"}:
                sys.exit("\n[error]: taskType Error, '-t' option in 'push' or 'clear'\n")
            if Envariable.get_task_area() and Envariable.get_task_otype():
                sys.exit("\n[error]: -a and -o cannot exist at same time\n")
            if Envariable.get_task_area():
                if Envariable.get_task_area() not in {"domestic", "overseas"}:
                    sys.exit("\n[error]: Area value Error, '-a' option in 'domestic' or 'overseas'\n")
            if Envariable.get_task_otype():
                if Envariable.get_task_otype() not in {"File", "Directory"}:
                    sys.exit("\n[error]: ObjectType value Error, '-a' options in 'File' or 'Directory'\n")
                if Envariable.get_task_type() == 'push':
                    sys.exit("\n[error]: -t must be clear and 'push' -a use together\n")
        except Exception as e:
            logging.error(f"\n[error]: Parameter {e} error\n")
            sys.exit(1)

        handler = BaseCheck()
        if handler.urlFormat() and handler.printQuota():
            for g in doTask.doProd():
                doTask.doRefresh(''.join(g))
                time.sleep(1)

    def help(self):
        print("\nscript options explain: \
                    \n\t -r <filename>                   The file path and file name. After the script runs, it reads the URLs from the file. Each line must contain one URL. URLs with special characters must be URL-encoded. Each URL must start with http or https. \
                    \n\t -t <taskType>                   The task type. `clear`: refresh. `push`: prefetch. \
                    \n\t -a [String,<domestic|overseas>] Optional. The prefetch scope. If you do not set this parameter, resources are prefetched globally.\
                    \n\t    domestic                     The Chinese mainland only. \
                    \n\t    overseas                     Global (excluding the Chinese mainland). \
                    \n\t -o [String,<File|Directory>]    Optional. The type of content to refresh. \
                    \n\t    File                         File (default). \
                    \n\t    Directory                    Directory.")


if __name__ == '__main__':
    fun = Refresh()
    fun.main(sys.argv[1:])

Script parameters

OptionDescriptionRequired
-r <filename>Path to the URL file. Each line must contain one URL. URLs with special characters must be URL-encoded.Yes
-t <taskType>Task type. clear runs a refresh. push runs a prefetch.Yes
-a <domestic|overseas>Prefetch scope. Only valid with -t push. domestic: the Chinese mainland only. overseas: global, excluding the Chinese mainland. If omitted, the script prefetches globally.No
-o <File|Directory>Object type for refresh. Only valid with -t clear. File (default) or Directory.No
Warning

-a and -o cannot be used together. -a is only valid with -t push. -o is only valid with -t clear.

To adjust the batch size, change the value of the gop variable in the script. The default is 20 URLs per batch.

View help

Run the following command to display all parameter descriptions:

python Refresh.py -h

Output:

script options explain:
      -r <filename>                   //The file path and file name. After the script runs, it reads the URLs from the file. Each line must contain one URL. URLs with special characters must be URL-encoded. Each URL must start with http or https.
      -t <taskType>                   //The task type. `clear`: refresh. `push`: prefetch.
      -a [String,<domestic|overseas>  //Optional. The prefetch scope. If you do not set this parameter, resources are prefetched globally.
           domestic                   //The Chinese mainland only.
           overseas                   //Global (excluding the Chinese mainland).
      -o [String,<File|Directory>]    //Optional. The type of content to refresh.
           File                       //File (default).
           Directory                  //Directory.

Step 4: Set your AccessKey as environment variables

The script reads credentials from the environment variables ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET. For setup instructions, see Configure environment variables in Linux, macOS, and Windows.

Important

On Linux and macOS, environment variables set with export are valid only for the current terminal session. To make them persistent, add the export commands to your shell's startup file (for example, ~/.bashrc or ~/.zshrc).

Step 5: Run the script

Open a terminal (Command Prompt, PowerShell, or Terminal) and run:

python Refresh.py -r <PathToUrlFile> -t <TaskType>

Replace <PathToUrlFile> with the path to your URL file and <TaskType> with clear (refresh) or push (prefetch).

Refresh cached files

If urllist.txt is in the same directory as Refresh.py:

python Refresh.py -r urllist.txt -t clear

If the URL file is in a different directory:

python Refresh.py -r D:\example\filename\urllist.txt -t clear

Expected output:

{'RequestId': 'C1686DCA-F3B5-5575-ADD1-05F96617D770', 'RefreshTaskId': '18392588710'}
[18392588710] is doing... ...

Prefetch content

If urllist.txt is in the same directory as Refresh.py:

python Refresh.py -r urllist.txt -t push

If the URL file is in a different directory:

python Refresh.py -r D:\example\filename\urllist.txt -t push

Expected output:

{'RequestId': 'C1686DCA-F3B5-5575-ADD1-05F96617D771', 'RefreshTaskId': '18392588711'}
[18392588710] is doing... ...
If the script returns Failed to initialize credentials and client. Please check environment variables., set the AccessKey environment variables as described in Step 4 and run the command again in the same terminal window.