Mirror back-to-origin

更新时间:
复制 MD 格式

To seamlessly migrate your data from a self-hosted origin server or a third-party cloud storage service to Alibaba Cloud Object Storage (OSS) without service disruption, you can configure mirroring-based back-to-origin. When a client requests an object that does not exist in OSS, OSS automatically fetches the object from your specified origin server, returns it to the client, and stores it in your bucket. This feature ensures that all data remains accessible during migration, enabling a smooth transition.

How it works

The mirroring-based back-to-origin feature works as a server-side proxy. When a client sends a GET request for an object that does not exist in an OSS bucket, OSS checks if the request triggers a back-to-origin rule (for example, by matching an object name prefix and returning an HTTP 404 error). If a rule is triggered, OSS sends an HTTP request to the specified origin server to fetch the object. If the origin server returns a 200 OK status code, OSS returns the object to the client and simultaneously stores it in the bucket. If the origin server returns a 404 Not Found or another error status code, OSS returns the corresponding error to the client. In this process, OSS acts as a proxy, enabling on-demand data migration and one-time caching. Note that once an object is stored in OSS, it is not automatically updated even if the source object on the origin server changes.

Fetch missing objects from a website

This is the most basic scenario for configuring mirroring-based back-to-origin. When a client requests an object that does not exist in OSS, OSS automatically fetches it from a specified origin server and stores it in the bucket. This example shows how to configure a rule to fetch objects from https://example.com/ when a requested object is not found in the examplefolder/ directory of the examplebucket bucket.

Step 1: Configure a mirror back-to-origin rule

  1. Go to the Buckets page and click the name of the target bucket.

  2. In the left-side navigation pane, choose Data Management > Mirroring-based Back-to-origin.

  3. On the Mirroring-based Back-to-origin page, click Create Rule.

  4. In the Create Rule panel, configure the parameters. Use the default values for any other parameters.

    Parameter

    Configuration

    Method

    Select Image.

    Condition

    Select Object Name Prefix and enter examplefolder/ in the text box.

    Origin URL

    In the first column (Protocol), select https. In the second column (Domain Name), enter example.com. Leave the third column (Path Prefix) empty. The path prefix is appended to the domain name to form the path of the origin URL.

  5. Click OK.

Step 2: Verify the rule

  1. Access https://examplebucket.oss-cn-hangzhou.aliyuncs.com/examplefolder/example.txt.

  2. If the examplefolder/example.txt object does not exist in the examplebucket bucket, OSS requests the object from https://example.com/examplefolder/example.txt.

  3. After fetching the object, OSS saves it as examplefolder/example.txt in the examplebucket bucket and returns the object to the client.

Replace directory and verify integrity

In some scenarios, the directory structure in your OSS bucket may differ from that of your origin server. You may also need to ensure the integrity of the objects fetched from the origin server. This use case shows how to map directories and use MD5 verification to ensure reliable data transfer.

  • When a client requests an object that does not exist in the examplefolder directory of the bucket-01 bucket in the China (Hangzhou) region, OSS fetches the object from the destfolder directory of the https://example.com website.

  • OSS verifies the MD5 hash of the fetched object. Objects with a mismatched MD5 hash are not saved to the bucket-01 bucket.

Step 1: Configure a mirror back-to-origin rule

  1. Go to the Buckets page and click the name of the target bucket.

  2. In the left-side navigation pane, choose Data Management > Mirroring-based Back-to-origin.

  3. On the Mirroring-based Back-to-origin page, click Create Rule.

  4. In the Create Rule panel, configure the required parameters as described in the following table. Use the default values for other parameters.

    Parameter

    Configuration

    Method

    Select Image.

    Condition

    Select Object Name Prefix and set it to examplefolder/.

    Replace or Delete File Prefix

    Select Replace or Delete File Prefix and set it to destfolder/.

    Note

    This option is displayed only after you set Object Name Prefix for the back-to-origin condition.

    Origin URL

    Set the first column to https, the second column to example.com, and leave the third column empty.

    MD5 Verification

    Select Perform MD5 verification. If the response to the back-to-origin request contains the Content-MD5 header, OSS verifies whether the MD5 hash of the fetched object matches the value of the Content-MD5 header.

    • If the values match, the client receives the object, and OSS saves the object.

    • If the values do not match, the client still receives the object, but OSS does not save it. This is because calculating the MD5 hash requires the complete object data, and at that point, the object has already been streamed to the client.

  5. Click OK.

Step 2: Verify the rule

  1. Access https://bucket-01.oss-cn-hangzhou.aliyuncs.com/examplefolder/example.txt.

  2. If the examplefolder/example.txt object does not exist in the bucket-01 bucket, OSS requests the object from https://example.com/destfolder/example.txt.

  3. After fetching the object, OSS performs the following operations:

    • If the response to the back-to-origin request contains the Content-MD5 header, OSS calculates the MD5 hash of the fetched object and compares it with the value of the Content-MD5 header. If the values match, OSS saves the object as examplefolder/example.txt to the bucket-01 bucket and returns the object to the client. If the values do not match, OSS returns the object to the client but does not save it to the bucket-01 bucket.

    • If the response to the back-to-origin request does not contain the Content-MD5 header, OSS saves the object as examplefolder/example.txt to the bucket-01 bucket and returns the object to the client.

Route requests based on directory

If your business involves multiple origin servers, you can route requests to different servers based on the requested object path. This scenario is useful for consolidating data from multiple sources or migrating from a distributed storage architecture. For example, you have two origin servers with identical directory structures, Origin Server A (https://example.com) and Origin Server B (https://example.org), and you want to implement the following behavior:

  • When a client requests an object that does not exist in the bucket-02/dir1 directory in the China (Beijing) region, OSS fetches the object from the example1 directory of the https://example.com website.

  • When a client requests an object that does not exist in the bucket-02/dir2 directory, OSS fetches the object from the example2 directory of the https://example.org website.

  • Depending on whether redirect policies are configured on Origin Server A and Origin Server B, OSS decides whether to request the object from the redirected address.

Step 1: Configure mirror back-to-origin rules

  1. Go to the Buckets page and click the name of the target bucket.

  2. In the left-side navigation pane, choose Data Management > Mirroring-based Back-to-origin.

  3. On the Mirroring-based Back-to-origin page, click Create Rule.

  4. In the Create Rule panel, configure two mirroring-based back-to-origin rules as described below. Use the default values for any other parameters.

    • Rule 1

      Parameter

      Configuration

      Method

      Select Image.

      Condition

      Select Object Name Prefix and set it to dir1/.

      Replace or Delete File Prefix

      Select Replace or Delete File Prefix and set it to example1/.

      Note

      This option is displayed only after you set Object Name Prefix for the back-to-origin condition.

      Origin URL

      Set the first column to https, the second column to example.com, and leave the third column empty.

      3xx Response

      Select Follow Origin to Redirect Request.

      Note

      If Follow Origin to Redirect Request is not selected, OSS returns the redirect address specified by the origin server directly to the client.

    • Rule 2

      Parameter

      Configuration

      Method

      Select Image.

      Condition

      Select Object Name Prefix and set it to dir2/.

      Replace or Delete File Prefix

      Select Replace or Delete File Prefix and set it to example2/.

      Note

      This option is displayed only after you set Object Name Prefix for the back-to-origin condition.

      Origin URL

      Set the first column to https, the second column to example.org, and leave the third column empty.

      3xx Response

      Select Follow Origin to Redirect Request.

  5. Click OK.

Step 2: Verify the rules

  1. Access https://bucket-02.oss-cn-beijing.aliyuncs.com/dir1/example.txt.

  2. If the example.txt object does not exist in the dir1 directory of the bucket-02 bucket, OSS sends a request for the object to https://example.com/example1/example.txt.

    • If Origin Server A has a redirect rule for example1/example.txt, OSS sends a new request to the redirected address. After fetching the object, OSS saves it as dir1/example.txt to the bucket-02 bucket and returns it to the client.

    • If Origin Server A does not have a redirect rule for example1/example.txt, OSS fetches the object, saves it as dir1/example.txt to the bucket-02 bucket, and returns it to the client.

  3. If a client requests https://bucket-02.oss-cn-beijing.aliyuncs.com/dir2/example.txt, the object fetched through the mirroring-based back-to-origin rule is stored as dir2/example.txt in the bucket-02 bucket.

Fetch from a private bucket and forward parameters

When your origin server is a private OSS bucket, you must configure the necessary access permissions. You may also need to forward specific parameters from the client request to the origin server. This use case shows how to configure back-to-origin for a private OSS bucket and forward parameters. For example, you have two buckets in the China (Shanghai) region: bucket-03 (public-read) and bucket-04 (private). You want to implement the following behavior:

  • When a client requests an object that does not exist in the examplefolder directory of the bucket-03 bucket, OSS fetches the object from the examplefolder directory of the bucket-04 bucket.

  • OSS passes the query string from the request URL to the origin server.

  • OSS passes the HTTP headers header1, header2, and header3 from the request to the origin server.

Step 1: Configure a mirror back-to-origin rule

  1. Go to the Buckets page and click the name of the target bucket.

  2. In the left-side navigation pane, choose Data Management > Mirroring-based Back-to-origin.

  3. On the Mirroring-based Back-to-origin page, click Create Rule.

  4. In the Create Rule panel, configure the required parameters as described in the following table. Use the default values for any other parameters.

    Parameter

    Configuration

    Method

    Select Image.

    Condition

    Select Object Name Prefix and set it to examplefolder/.

    Origin Type

    Select Back-to-origin to Private OSS Bucket, and then select bucket-04 from the Source Bucket drop-down list.

    After this option is configured, when a client requests an object that does not exist, OSS uses the default role AliyunOSSMirrorDefaultRole to fetch the data from the specified private origin bucket. This requires the AliyunOSSReadOnlyAccess permission, which ensures that OSS can only access the origin data in read-only mode and cannot modify or delete it.

    To configure mirroring-based back-to-origin for a private OSS bucket, a RAM user must have the ram:GetRole permission. This permission is used to check if the AliyunOSSMirrorDefaultRole role exists.

    • If the role exists, it is used directly.

    • If the role does not exist, we recommend that you use the primary Alibaba Cloud account associated with the RAM user to create the AliyunOSSMirrorDefaultRole role in advance and grant it the AliyunOSSReadOnlyAccess permission. This practice avoids granting high-risk permissions, such as creating roles (ram:CreateRole) and attaching policies to roles (ram:AttachPolicyToRole), to the RAM user. After the role is authorized, the RAM user can reuse the existing role, which reduces permission configuration risks.

    Origin URL

    Set the first column to https and leave the other fields empty.

    Origin Parameter

    Select Transfer with Query String.

    OSS passes the query string from the URL request to the origin server.

    Set Transmission Rule of HTTP Header

    Select Transmit Specific HTTP Headers and add the HTTP headers header1, header2, and header3. Back-to-origin rules do not support forwarding certain standard HTTP headers, such as authorization, authorization2, range, content-length, and date, or any headers that start with x-oss-, oss-, or x-drs-.

    Important

    When fetching from a private bucket, do not select the option to forward all HTTP headers. This causes the back-to-origin fetch to fail.

  5. Click OK.

Step 2: Verify the rule

  1. Access https://bucket-03.oss-cn-shanghai.aliyuncs.com/examplefolder/example.png?caller=lucas&production=oss.

  2. If the examplefolder/example.png object does not exist in the bucket-03 bucket, OSS sends a request for the object to https://bucket-04.oss-cn-shanghai.aliyuncs.com/examplefolder/example.png?caller=lucas&production=oss.

  3. The bucket-04 bucket returns the example.png object to OSS based on the forwarded ?caller=lucas&production=oss parameters.

  4. OSS saves the fetched object as examplefolder/example.png in the bucket-03 bucket.

If the request also carries the header1, header2, and header3 HTTP headers, OSS also passes them to the bucket-04 bucket.

Production use cases

Seamless data migration

For more information about the migration solution, see Seamlessly migrate services to Alibaba Cloud OSS by using mirroring-based back-to-origin.

Refresh cached objects

Mirroring-based back-to-origin is a one-time caching mechanism. If an object on the origin server is updated, OSS does not automatically refresh or re-fetch it. You can use the following methods to manually refresh cached objects.

  • Manual deletion: Delete the object from the OSS bucket by using the console or an API. The next time the object is accessed, the back-to-origin rule is triggered again.

  • Lifecycle rules: Configure an expiration policy for the mirrored objects. They are automatically deleted after a specified period, enabling periodic refreshes.

  • Object name versioning: When you update an object on the origin server, use a new name (for example, style.v2.css). This is the recommended approach to avoid caching issues.

Risk prevention and fault tolerance

  • Origin server load: Ensure that your origin server has sufficient bandwidth and processing capacity to handle back-to-origin requests. During the initial migration phase, the volume of back-to-origin requests may be high. We recommend that you monitor the load on your origin server and consider pre-warming the data during off-peak hours.

  • Cost control: To avoid unexpected high costs, we recommend that you set up cost alerts in the Alibaba Cloud Management Console to monitor the volume of back-to-origin requests.

  • Security configuration: Ensure that your origin server is accessible to OSS. If the origin URL uses the HTTPS protocol, make sure the origin server's certificate is issued by a trusted Certificate Authority (CA), the domain name matches, and the certificate has not expired.

  • Log query: Use the real-time log query feature to view logs related to back-to-origin. The User-Agent for back-to-origin requests contains the string aliyun-oss-mirror.

Quotas and limits

  • Number and order of rules: You can configure up to 20 back-to-origin rules for each bucket. Rules are matched in ascending order of their RuleNumber. Once a rule is matched, it is executed, and subsequent rules are not checked. You can adjust the matching priority by using the Up or Down options next to a rule.

  • QPS and bandwidth:

    • Regions in the Chinese mainland: The default total QPS is 2,000, and the total bandwidth is 2 Gbit/s.

    • Regions outside the Chinese mainland: The default total QPS is 1,000, and the total bandwidth is 1 Gbit/s.

    • This limit applies to the total mirroring-based back-to-origin capacity for all buckets that belong to a single Alibaba Cloud account in the corresponding region. Requests that exceed this limit are throttled, and a 503 error is returned. To request a higher quota, contact Technical Support.

  • Origin server address: The address must be a publicly accessible domain name or IP address that complies with RFC 3986 encoding standards. Internal network addresses are not supported.

  • Timeout: The default timeout for mirroring-based back-to-origin is 10 seconds.

  • Chunked back-to-origin: If your origin server supports range requests and you require the chunked back-to-origin feature, contact Technical Support.

FAQ

Mirrored object size differs from source

If you find a size discrepancy between the mirrored object and the source object, follow these steps to investigate.

  1. Check the Last-Modified timestamps of the mirrored object and the source object.

    import oss2
    import requests
    from datetime import datetime
    from oss2.credentials import EnvironmentVariableCredentialsProvider
    
    # Obtain credentials from environment variables. Before running this code,
    # make sure the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are set.
    auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
    
    # Specify the endpoint for the region where your bucket is located.
    # For example, for China (Hangzhou), the endpoint is https://oss-cn-hangzhou.aliyuncs.com.
    endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
    
    # Specify the region corresponding to the endpoint, e.g., cn-hangzhou.
    # This parameter is required for V4 signatures.
    region = "cn-hangzhou"
    
    # Replace "yourBucketName" with the name of the bucket where you configured the rule.
    bucket = oss2.Bucket(auth, endpoint, "yourBucketName", region=region)
    # Specify the full path of the mirrored object.
    object_key = 'yourObjectKey'
    # Specify the full path of the source object.
    source_url = 'yourSourceUrl'
    
    # Get the Last-Modified timestamp of the mirrored object.
    oss_object_info = bucket.get_object_meta(object_key)
    oss_last_modified = oss_object_info.headers['last-modified']
    print(f"OSS Last-Modified: {oss_last_modified}")
    
    # Get the Last-Modified timestamp of the source object.
    response = requests.head(source_url)
    source_last_modified = response.headers.get('last-modified')
    print(f"Source Last-Modified: {source_last_modified}")
    
    # Convert the timestamp strings to datetime objects for comparison.
    oss_time = datetime.strptime(oss_last_modified, '%a, %d %b %Y %H:%M:%S %Z')
    source_time = datetime.strptime(source_last_modified, '%a, %d %b %Y %H:%M:%S %Z')
    
    if oss_time < source_time:
        print("The source object has been updated.")
    elif oss_time > source_time:
        print("The mirrored object is newer.")
    else:
        print("The timestamps of the two objects are identical.")
    • If the Last-Modified timestamp of the source file is greater than the Last-Modified timestamp of the mirrored file, this indicates that the source file may have been updated after the mirrored file was generated.

      Note

      When OSS fetches an object from an origin server and writes it to a bucket, it does not preserve the Last-Modified timestamp of the source object. Instead, OSS sets the Last-Modified timestamp of the mirrored object to the time it was created or updated in OSS.

    • If the Last-Modified timestamp of the source file is the Last-Modified timestamp of the mirroring-based back-to-origin file, it indicates that the source file has not been updated since the mirroring-based back-to-origin file was generated. The next step is to check the MD5 or CRC64 checksum values of both files.

  2. Compare the MD5 or CRC64 checksums of the mirrored object and the source object.

    # -*- coding: utf-8 -*-
    import oss2
    import hashlib
    import requests
    # For CRC64 comparison, the Python standard library does not support CRC64.
    # You can use a third-party library like crcmod.
    # Install crcmod: pip install crcmod
    import crcmod
    from oss2.credentials import EnvironmentVariableCredentialsProvider
    
    # Obtain credentials from environment variables. Before running this code,
    # make sure the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are set.
    auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
    
    # Specify the endpoint for the region where your bucket is located.
    # For example, for China (Hangzhou), the endpoint is https://oss-cn-hangzhou.aliyuncs.com.
    endpoint = "https://oss-cn-hangzhou.aliyuncs.com"
    
    # Specify the region corresponding to the endpoint, e.g., cn-hangzhou.
    # This parameter is required for V4 signatures.
    region = "cn-hangzhou"
    
    # Replace "yourBucketName" with the name of the bucket where you configured the rule.
    bucket = oss2.Bucket(auth, endpoint, "yourBucketName", region=region)
    # Specify the full path of the mirrored object.
    object_key = 'yourObjectKey'
    # Specify the full path of the source object.
    source_url = 'yourSourceUrl'
    
    # Get the metadata of the mirrored object.
    oss_object_info = bucket.get_object_meta(object_key)
    
    oss_md5 = oss_object_info.headers.get('etag', '').strip('"')  # ETag is usually the MD5 hash
    oss_crc64 = oss_object_info.headers.get('x-oss-hash-crc64ecma', '')
    
    print(f"OSS MD5: {oss_md5}")
    print(f"OSS CRC64: {oss_crc64}")
    
    # Get the content of the source object and calculate its MD5 and CRC64.
    response = requests.get(source_url)
    if response.status_code == 200:
        source_content = response.content
        source_md5 = hashlib.md5(source_content).hexdigest()
        print(f"Source MD5: {source_md5}")
    
        crc64_func = crcmod.predefined.mkCrcFun('crc-64')
        source_crc64 = hex(crc64_func(source_content))[2:].upper().zfill(16)  # Convert to hex string and format
        print(f"Source CRC64: {source_crc64}")
    
        # Compare the MD5 values.
        if oss_md5 == source_md5:
            print("MD5 checksums are identical.")
        else:
            print("MD5 checksums do not match.")
    
        # Compare the CRC64 values.
        if oss_crc64.upper() == source_crc64:
            print("CRC64 checksums are identical.")
        else:
            print("CRC64 checksums do not match.")
    else:
        print(f"Failed to fetch source file. HTTP Status Code: {response.status_code}")
        
    • If the MD5 or CRC64 checksums are identical, the content of the two objects is the same. In this case, their sizes should also be identical.

    • If the MD5 or CRC64 checksums do not match, the content of the two objects is different. Proceed to the next step to check for special request headers.

  3. Check for special request headers.

    screenshot_2025-02-18_17-04-03

    • Check if the back-to-origin request contains special HTTP request headers, such as Accept-Encoding: gzip, deflate, br. This header indicates that the client can accept compressed data.

    • If the back-to-origin request uses HTTP compression and the requested object meets the compression criteria, the sizes of the two objects will differ.

    • If the Accept-Encoding header is present, do not forward it.

      • If you have configured the rule to forward all HTTP headers, add accept-encoding to the list of prohibited headers.

        p917892

      • If you have configured the rule to forward specific HTTP headers, ensure that accept-encoding is not included in the list of specified headers.

        screenshot_2025-02-19_14-30-45

Troubleshoot back-to-origin failures

If you encounter an origin fetch failure (such as a 424 MirrorFailed error), you can troubleshoot the issue by following the steps below.

  1. Check the reachability of the origin server.

    # Replace the URL with your actual origin server address and file path
    curl -I "https://www.example.com/images/test.jpg"
  2. Check the DNS resolution.

    # Replace the domain name with your actual origin domain name
    nslookup www.example.com
  3. Check the HTTPS certificate (if the origin server uses HTTPS).

    # Replace the domain name with your actual origin domain name
    openssl s_client -connect www.example.com:443 -servername www.example.com
  4. Analyze the issue by using OSS's real-time log query feature.

No mirrored object created

A client HEAD request retrieves only object metadata, such as size and type, without downloading the content. Therefore, HEAD requests do not trigger mirroring-based back-to-origin rules to fetch an object from the origin server and write it to the OSS bucket.

Unexpected status code from back-to-origin

When a request triggers mirroring-based back-to-origin, if the origin server returns a status code other than 404, 200, or 206, analyze the origin server's response.

  • Origin is OSS: Check the following configuration items.

    • Prohibit specific HTTP headers from being forwarded: Prohibit forwarding the host header to avoid exposing origin server information and to ensure back-to-origin requests are processed as expected. If you do not prohibit forwarding the host header, the back-to-origin request will pass the host value of the target bucket to the origin server. Because each bucket's host value is unique, if the requested host does not match the origin's actual host, the origin server returns a 403 error. OSS then returns a 424 error to the client.

      screenshot_2025-02-19_14-31-42

    • Back-to-origin for a private OSS bucket: If permissions are not configured, check whether the ACL of the target bucket and its objects is set to public-read. If permissions are configured, check whether the role authorization policy for mirroring-based back-to-origin has changed, resulting in insufficient permissions. The default role for mirroring-based back-to-origin is AliyunOSSMirrorDefaultRole, and its default system policy is AliyunOSSReadOnlyAccess.

  • Origin is not OSS: Analyze the server-side logs and check configurations for Server Name Indication (SNI), back-to-origin parameters, and header forwarding to identify the specific cause of the origin server error. The origin server may return status codes such as 401 (Unauthorized), 403 (Forbidden), or a 5xx (Server Internal Error).

Back-to-origin rule matching order

Rules are matched based on the rule number (RuleNumber) in ascending order. After the first rule that meets the conditions is matched, that rule is immediately executed and no subsequent rules are matched.

Fetch from a VPC or internal IP

No. The origin server must have a publicly accessible address. To access a service in a VPC, expose it to the public internet by using a NAT Gateway or an internet-facing SLB instance.

OSS object not updated after source update

Mirroring-based back-to-origin is a one-time pull mechanism and does not automatically synchronize updates from the origin server. To fetch an updated object, you must manually delete the mirrored object from OSS or use an object name versioning strategy.