首页 Dataphin Dataphin(Non-shared) User Guide Dataphin Manager Version upgrade Partial Downtime and Full Downtime Dataphin Upgrades

Partial Downtime and Full Downtime Dataphin Upgrades

更新时间: 2026-06-24 14:43:05

You can upgrade your Dataphin version with partial or full downtime in Dataphin Manager.

Prerequisites

  • You understand the scope of impact for version upgrades. For more information, see Scope of Impact for Version Upgrades.

  • No configuration or upgrade is in progress. If an upgrade is already running, wait for it to complete. If necessary, you can forcibly terminate the ongoing version upgrade before you start a new one.

Background

A partial or full downtime upgrade consists of three main steps: selecting the target version and upgrade mode, performing the upgrade, and verifying the result. The system runs in maintenance mode throughout the upgrade process.

If the current DataHub instance version is earlier than V5.1.1, only downtime upgrade and zero-downtime upgrade are supported. If the current version is V5.1.1 or later, semi-downtime upgrade, downtime upgrade, and zero-downtime upgrade are all supported. The following graph shows the complete upgrade flow for semi-downtime and downtime upgrades:

image
  • Terminate upgrade: You can terminate an upgrade during the forcibly terminate task execution or application not yet stopped phase, which occurs before downtime begins. After termination, the system resumes task dispatching and exits maintenance mode. You can choose whether to automatically re-run forcibly terminated instances. If automatic re-runs are not enabled, you must manually re-run them in the Task O&M module.

  • Forcibly terminate upgrade: You can forcibly terminate an upgrade at any stage after it starts, including forcibly terminating task execution, downtime, pre-upgrade, database backup, application upgrade, re-running tasks, and data update. You can mark the upgrade as successful or failed. However, the system cannot self-heal after a forced termination, which may cause Dataphin to become unavailable. Before forcing a termination, confirm that you have manually completed the upgrade or rollback. Only professional O&M engineers should perform forced terminations.

Procedure

Step 1: Select the Target Version and Upgrade Mode

  1. Append /opsconsole/v2 to your Dataphin logon URL to go to the Dataphin Manager logon page.

  2. On the Dataphin Manager logon page, enter your Username and Password, then click Log on. Contact your Dataphin operations and maintenance engineer to obtain your username and password.

  3. On the Dataphin Manager homepage, click System Configuration.

  4. On the Upgrade Records page, click Upgrade Dataphin.

  5. On the Upgrade Dataphin page, select the target version and upgrade mode. The following table describes the parameters.

    Parameter

    Description

    Upgrade configuration

    Target version

    Select the target version from the version list.

    If the version list does not include your target version, click Upload version configuration to upload the configuration file for that version.

    After uploading, the system validates the configuration file. If the file content is invalid, validation fails and the system displays an error with the reason. If validation passes and the version is not yet in the system, the system imports the version configuration. If validation passes and the version is already in the system, the system overwrites the existing configuration.

    Configuration file

    • Standardized configuration: Click Upload file to upload a YAML or ZIP configuration file. After upload, you can download the file.

      After the configuration file is uploaded, the system performs a configuration file validation. If the configuration file does not contain all configuration items, the system reports an error. You can click View Details in the error message to view a list of Missing Configuration Items.

    • Non-standardized configuration: Before uploading, contact your Dataphin O&M team to get the required ZIP configuration file. Click Upload file to upload the file. After upload, the system performs these validations automatically:

      • Check whether the version standard configuration template (product/dataphin/...) in the file matches the uploaded version configuration (MD5 validation). If they do not match, the system shows: The configuration template in the file is incompatible with the selected version. Confirm with your Dataphin O&M team before continuing.

      • Check whether the overlay file and values.yaml file in the file are compatible with the version standard configuration template (product/dataphin/...). If they are incompatible, the system blocks the upgrade and shows: The configuration information in the file is incompatible with the selected version. Confirm with your Dataphin O&M team before continuing.

      • Check whether the values.yaml file in the file matches the values.yaml file currently running in your production environment. If they do not match, the system shows: The configuration in the file differs from your current production configuration. Continuing the upgrade will use the new configuration file.

      Note

      You do not need non-standardized configurations unless required.

    Upgrade mode

    Select Partial downtime upgrade or Full downtime upgrade. During the upgrade, DataService Studio remains available. DataService Studio API synchronous calls remain available.

    • Partial downtime upgrade: Reduces scheduling downtime significantly and does not stop running tasks.

    • Full downtime upgrade: Pauses task scheduling and stops all running tasks.

    Note

    During partial or full downtime upgrades, asynchronous calls to DataService Studio APIs for StarRocks, MaxCompute, Databricks, and OceanBase data sources remain available.

    Compatibility between your current and target versions determines whether a no-downtime upgrade is supported. If no-downtime upgrade is not supported, full downtime upgrade is selected by default.

    Announcement configuration

    Estimated completion time

    Select the estimated time to exit maintenance mode. The default is the current time. Use the format YYYY-MM-DD hh:mm.

    Contact email

    Email address to contact during the upgrade.

    Contact phone number

    Phone number to contact during the upgrade. This can be a landline or mobile phone number.

  6. Select the risk statement, then click Enter maintenance mode.

    After you click Enter maintenance mode, the system records the time and performs the following actions:

    • Saves the current configuration and creates a record with the status Upgrading in the upgrade records list. A new upgrade cannot start until the current one is finished.

    • Pauses task dispatching. After the upgrade is complete, task dispatching resumes.

      Development or production environment: Task scheduling = Off, Task execution = On

      Note
      • Task scheduling toggle: Controls whether automatically triggered tasks are scheduled. When this toggle is off, automatically triggered tasks are not dispatched. Tasks that are already running continue to run. Data backfill and ad hoc queries are not affected.

      • Task execution toggle: Controls whether tasks are dispatched to resource scheduling. Data backfill, automatically triggered tasks, and ad hoc queries are all affected. Instances that are not yet running are not dispatched. Tasks that are already running continue to run.

      In development environments, the task scheduling toggle is off by default and cannot be turned on.

    • The frontend enters maintenance mode and blocks all operations.

    If you click Save, the system creates a record in the upgrade records list with a status of Configuring. You can click Continue Upgrade in the list to edit the configuration.

Step 2: Start the Upgrade

Partial Downtime Upgrade

  1. Database backup

    • Use a self-managed PostgreSQL database

      Click Start backup. When the status shows Backing up, click Next.

      The status options are Not backed up, Starting, Backing up, Backup complete, and Backup failed.

    • Use RDS or another database type

      You can go to the RDS console to perform backups for RDS databases. Click Go to RDS backup to open the console.

      Select the risk statement, and then click Next to skip the database backup step.

    Note

    The database backup status does not affect the upgrade. The upgrade can be finished even if the backup is incomplete. If you forcibly terminate the upgrade, the backup process continues and is not interrupted.

  2. Pre-upgrade

    Click Start pre-upgrade. All applications begin the pre-upgrade process. A progress bar displays the completion percentage, and a service list below shows the status of each pre-upgrade. After the pre-upgrade completes, click Next.

  3. Upgrade applications

    Click Start upgrade. In the Prompt dialog box, click OK. The system stops task scheduling (running tasks continue) and starts the upgrade. This step takes approximately 30 minutes. Upon successful completion, click Exit maintenance mode and next. The system records the time when task dispatching resumes and maintenance mode ends. If this step fails, contact your Dataphin O&M engineer. You cannot exit maintenance mode until it succeeds.

    Note

    After you exit maintenance mode, the system resumes scheduling and becomes accessible. However, the upgrade is not yet complete. You must still run the data update step.

    Applications start. OLTP applications in DataService Studio require rolling upgrades, which involve starting the new version and gradually replacing the old version. A progress bar shows the startup progress and status. After the applications start and the upgrade is complete, the system proceeds automatically.

    If you have sufficient resources, you can start the new OLTP, mgmt, and Gateway applications first, and then stop the old ones. If you have limited resources, there may be moments when no applications are available. After you start the mgmt application, you can upgrade the OLTP and Gateway applications separately.

    If some applications fail to upgrade, click Java Thread Dump to diagnose the issue or click Restart to restart the Java process. You can click View log in the top-right corner to view detailed logs for the application upgrade, regardless of whether it succeeded or failed.

  4. Data update

    Click Update and Next. The system automatically runs all data update tasks that previously failed. The list excludes non-blocking tasks, which do not require execution. Tasks that are already running or have succeeded will not run again. Only failed or unrun tasks trigger execution.

    Click Run to start a task. Click Stop to stop a running task. Both Run and Stop support batch operations. Click Log details to view logs when the task status is Succeeded, Running, Stopped, or Failed.

    Asset refresh tasks support viewing Asset refresh details. If the metadata format changes after an upgrade, you must upgrade the asset metadata. Message triggers handle this upgrade. Asset refresh details show the total number of messages, the number of remaining messages, the number of succeeded messages, and the number of failed messages. The total number of messages equals the sum reported by each application in the initial phase. Failed messages are those that failed during processing. If failed messages appear during a data update, contact your Dataphin O&M team to confirm whether to ignore them before continuing.

    For metadata acquisition tasks, you must manually start the task. You can view the task status and log details. The logs show the total number of workflows and completed workflows. During the upgrade, existing acquisition instances run unchanged using the latest code. If scheduling changes are included, they take effect the day after the upgrade is complete.

Note

At any time during the upgrade, you can click Forcibly terminate upgrade to end it. Note: Forced termination prevents system self-healing and may make Dataphin unavailable. Before you forcibly terminate the upgrade, confirm that you have manually completed the upgrade or rolled back. Only professional O&M engineers should confirm and perform this action. When you forcibly terminate the upgrade, choose whether to mark it as successful or failed. In either case, the system records the forced completion time and adds the record to the upgrade records list.

Full Downtime Upgrade

Note

Click View log to view the service startup logs.

  1. Forcibly terminate task execution

    1. In the Running tasks list, select one or more tasks to forcibly terminate them. Alternatively, click Forcibly terminate all tasks and next to terminate all running tasks. After the upgrade, you can re-run forcibly terminated tasks. If a task cannot be re-run, wait for it to finish, then click View terminated tasks list. You must re-run tasks in the terminated list after the application upgrade. When all tasks are forcibly terminated, the system records the time.

      After you click Forcibly terminate, the task is removed from the list and added to the terminated list. If a task finished or failed before you forcibly terminated it, it does not appear in the list.

      Note
      • You can proceed only when the running tasks list is empty. This means all tasks are either forcibly terminated or completed. If the forcible termination fails for any task, the system displays an error. You can click Retry in the error message to try again. The list refreshes every 20 seconds.

      • Forcibly terminated tasks must be manually re-run in the Re-run tasks step.

    2. (Optional) If needed, you can terminate this upgrade from the upgrade records list.

  2. Stop

    Click Downtime and next to stop all running pods. A progress bar will display the downtime progress.

    During the upgrade, DataService Studio remains available. The pod list shows the status of all applications except for DataService Studio.

  3. Database backup

    You can click Backup and next to start data backup. After the backup completes, the system proceeds automatically.

    A progress bar shows backup progress. If backup fails for any database, you can click View log to see failure logs, or click Re-backup to re-backup a single database. After re-backup, that database’s backup status resets. To stop backup for a specific database, click Forcibly terminate.

    Note
    • You can select the risk statement to skip the database backup.

    • For RDS databases, back up your data in the RDS console. Click Go to RDS backup to open the RDS console.

  4. Upgrade applications

    Click Upgrade applications and next to start the applications. OLTP applications in DataService Studio require rolling upgrades, which means that the new version starts and gradually replaces the old version. A progress bar displays the startup progress and status. After the applications start—that is, after the upgrade completes—the system proceeds automatically.

    If you have sufficient resources, you can start the new OLTP, mgmt, and Gateway applications first, and then stop the old ones. If you have limited resources, there may be moments when no applications are available. After you start the mgmt application, you can upgrade the OLTP and Gateway applications separately.

    The service list in the application upgrade process matches the application list in the Downtime step. If an application fails to upgrade, click Java Thread Dump to diagnose the issue, or click Restart to restart the Java process. Click View log in the top-right corner to view detailed logs of the application upgrade, regardless of the outcome.

  5. Re-run tasks

    Click Re-run tasks and next to automatically re-run all tasks that have not been re-run. The list displays instances that were forcibly terminated during this upgrade. After the re-run is complete, the system proceeds automatically. Click Run to re-run a single task. Click Stop to stop a running task. You can perform Run and Stop operations in batches. If you select tasks that are already complete or have failed, the system ignores them and automatically refreshes the list after the operation.

    If a re-run fails for any task, an error message appears. Click Retry in the error message, or click Re-run tasks and next to try again.

  6. Data update

    By default, Run tasks and exit maintenance mode is selected. Click Update and next to run all data update tasks that were neither skipped nor successful. After all blocking tasks finish, the system resumes task dispatching and exits maintenance mode. The system records the time when task dispatching resumes and maintenance mode ends. Non-blocking tasks continue running. Tasks that are already running or have succeeded will not run again. Failed and unrun tasks trigger execution. Skipped tasks are ignored and not run.

    Click Run to start a task. Click Skip to continue the process without being blocked by the current task. Click Stop to stop a running task. Run, Skip, and Stop all support batch operations. Click Log details to view logs when the task status is Succeeded, Running, Stopped, or Failed.

    Asset refresh tasks support viewing Asset refresh details. If metadata format changes after upgrade, asset metadata must be upgraded. Message triggers handle the upgrade. Asset refresh details show total messages, remaining messages, succeeded messages, and failed messages. Total messages equal the sum reported by each application in the initial phase. Failed messages are those that failed processing. If failed messages appear in data update, contact your Dataphin O&M team to confirm whether to ignore them before continuing.

    Metadata acquisition tasks cannot be skipped. You must manually start the task. You can view the task status and log details. The logs show the total number of workflows and completed workflows. During the upgrade, existing acquisition instances run unchanged using the latest code. If scheduling changes are included, they take effect the day after the upgrade is complete.

Note

At any time during the upgrade, click Forcibly terminate upgrade to end the upgrade process. Note: Forced termination prevents system self-healing and may make Dataphin unavailable. Before forcing termination, confirm that you have manually completed the upgrade or rolled back. Only professional O&M engineers should confirm and perform this action. When forcing termination, choose to mark the upgrade as successful or failed. Either way, the system records the forced completion time and adds the record to the upgrade records list.

Step 3: Verify the Upgrade Result

After completing all upgrade steps, you enter the verification phase. The Verify Upgrade Result page displays Instance count, Instance status comparison, and View failed instances.

Check the instance count, instance status comparison, and failed instances. After you confirm success, click Complete to finish the upgrade. The system records the current time as the completion time. If the results do not meet expectations, contact your Dataphin O&M team to resolve the issue.

  • Instance count: Shows the recent production instance count and the 7-day average instance count. The instance generation time is fixed at 23:00 daily.

    • Recent production instance count: Dynamically queries the latest instance generation and shows the count and business date. If no instance was generated, the status is Pending. The date format is YYYY-MM-DD.

    • 7-day average instance count: The average daily instance count from T-8 to T-2. Instances that were not generated are excluded. If no instances were generated in the past 7 days, the status is NA.

  • Instance status comparison: You can filter data by business date (default is T-1) and tenant. A donut chart shows the status distribution. A column chart shows the instance count for the past 7 days.

    • Status distribution: Shows the status distribution for the filtered instances.

    • 7-day instance count: Shows the instance count per status and the total count for the filtered instances.

  • View failed instances: Lists task instances that ran successfully yesterday but failed today. The columns include instance ID, node ID, task name, node type, scheduling type, last update time, and status. After the upgrade, you should monitor the system for 30 minutes. Re-run tasks that succeeded yesterday but failed today. If they still fail, check the operational logs to determine whether the task itself is abnormal. The task list refreshes every 20 seconds and updates the statuses.

上一篇: Upgrade Records 下一篇: Perform a zero-downtime upgrade for Dataphin
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈