This topic describes how to use Serverless workflow to implement long-running distributed transactions, allowing you to focus on your business logic.
Overview
Complex business applications, such as e-commerce, hotel, or flight booking systems, often involve multiple remote services and require strong transactional semantics. This means all steps in a process must either complete successfully or fail together, leaving no intermediate states. In applications with low traffic and centralized data storage, the atomicity, consistency, isolation, and durability (ACID) properties of a relational database can fulfill this requirement. However, to achieve high availability and scalability for high-traffic scenarios, businesses often adopt a distributed microservices architecture. In such an architecture, ensuring transactional integrity typically requires complex solutions involving message queues and databases to persist messages and process state. This adds significant development and operational overhead. Serverless workflow simplifies this by providing built-in support for long-running distributed transactions.
Scenarios
Assume an application allows users to book train tickets, flights, and hotels, and requires these three operations to be transactional. This feature requires three remote calls (for example, booking a train ticket requires calling the 12306 API). If all three calls succeed, the order is successful. In practice, any remote call can fail. Therefore, the application must implement compensation logic to roll back completed operations for different failure scenarios, as shown in the following figure:
- If booking the train ticket (
BuyTrainTicket) succeeds but reserving the flight (ReserveFlight) fails, the system must cancel the train ticket (CancelTrainTicket) and notify you that the order has failed. - If booking the train ticket (
BuyTrainTicket) and reserving the flight (ReserveFlight) both succeed but booking the hotel (ReserveHotel) fails, the system must cancel the flight (CancelFlight) and train ticket (CancelTrainTicket), and then notify you that the order has failed.
Implementation with Serverless workflow
The following example demonstrates how to orchestrate Function Compute (FC) functions into a Serverless workflow to implement a reliable, multi-step, long-running process. The example consists of three steps:
Step 1: Create Function Compute functions
This step simulates the three operations in the use case: booking a train ticket, reserving a flight, and booking a hotel.
- Service: fnf-demo
- Function: Operation
The Operation function simulates each operation, such as reserving a flight or hotel. It determines whether the operation succeeds or fails based on its input.
import json
import logging
import uuid
def handler(event, context):
evt = json.loads(event)
logger = logging.getLogger()
id = uuid.uuid4()
op = "operation"
if 'operation' in evt:
op = evt['operation']
if op in evt:
result = evt[op]
if result == False:
logger.info("%s failed" % op)
exit()
logger.info("%s succeeded, id %s" % (op, id))
return '{"%s":"success", "%s_txnID": "%s"}' % (op, op, id)
Step 2: Create a flow
Use the Serverless workflow console to create the following flow.
- Configure the flow's RAM role.
{ "Statement": [ { "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "Service": [ "fnf.aliyuncs.com" ] } } ], "Version": "1" } - Define the flow.
version: v1 type: flow steps: - type: task resourceArn: acs:fc:{region}:{accountID}:services/fnf-demo/functions/Operation name: BuyTrainTicket inputMappings: - target: operation source: buy_train_ticket - target: buy_train_ticket source: $input.buy_train_ticket_result catch: - errors: - FC.Unknown goto: OrderFailed - type: task resourceArn: acs:fc:{region}:{accountID}:services/fnf-demo/functions/Operation name: ReserveFlight inputMappings: - target: operation source: reserve_flight - target: reserve_flight source: $input.reserve_flight_result catch: # If the ReserveFlight task fails with an FC.Unknown error, jump to the CancelTrainTicket step. - errors: - FC.Unknown goto: CancelTrainTicket - type: task resourceArn: acs:fc:{region}:{accountID}:services/fnf-demo/functions/Operation name: ReserveHotel inputMappings: - target: operation source: reserve_hotel - target: reserve_hotel source: $input.reserve_hotel_result retry: # Retry up to 3 times with exponential backoff for FC.Unknown errors. The initial interval is 1s, and subsequent intervals are doubled. - errors: - FC.Unknown intervalSeconds: 1 maxAttempts: 3 multiplier: 2 catch: # If the ReserveHotel task fails with an FC.Unknown error after all retries, jump to the CancelFlight step. - errors: - FC.Unknown goto: CancelFlight - type: succeed name: OrderSucceeded - type: task resourceArn: acs:fc:{region}:{accountID}:services/fnf-demo/functions/Operation name: CancelFlight inputMappings: - target: operation source: cancel_flight - target: reserve_flight_txnID source: $local.reserve_flight_txnID - type: task resourceArn: acs:fc:{region}:{accountID}:services/fnf-demo/functions/Operation name: CancelTrainTicket inputMappings: - target: operation source: cancel_train_ticket - target: reserve_flight_txnID source: $local.reserve_flight_txnID - type: fail name: OrderFailed
Step 3: Execute and view results
In the console, start a new execution for the flow that you created. The StartExecution API requires input in JSON format. The following JSON object can be used to simulate the success or failure of each step. For example, "reserve_hotel_result":"fail" simulates a failure in the hotel booking step. The StartExecution API is asynchronous. When called, Serverless workflow returns an execution name that you can use to query the execution's status.
{
"buy_train_ticket_result":"success",
"reserve_flight_result":"success",
"reserve_hotel_result":"fail"
}
After the execution starts, you can view its progress and results in the Serverless workflow console. From the step details tab, you can see that because "reserve_hotel_result":"fail" and the ReserveHotel function call failed, Serverless workflow follows the flow definition and sequentially cancels the flight (CancelFlight) and the train ticket (CancelTrainTicket). Serverless workflow persists the state of each step transition, so network interruptions or process crashes do not affect the transactional integrity of the flow.
FC.Unknown, the cause is Process exited unexpectedly, and the retry count is 3.
A flow execution generates execution history events. You can query these events using the console, SDK, or CLI to call the GetExecutionHistory API.
StepEntered, TaskScheduled, TaskStarted, TaskSucceeded, and StepExited. In the Execution History tab, you can view each event's ID, type, step name, timestamp, and relative duration.
Error handling and retries
- In the preceding example, remote calls such as reserving a flight and booking a hotel can fail due to network or service errors. Adding retries for transient errors can increase the order success rate. Serverless workflow provides a built-in retry feature for
tasktype steps. For example, theReserveHotelstep is configured to use exponential backoff forFC.Unknownerrors. If theReserveHotelstep still fails after reaching the maximum number of retries, thecatchdefinition ensures that theFC.Unknownerror is caught and the execution jumps to theCancelFlightstep to run the defined compensation logic.- type: task resourceArn: acs:fc:{region}:{accountID}:services/fnf-demo/functions/Operation name: ReserveHotel inputMappings: - target: operation source: reserve_hotel retry: # Retry up to 3 times with exponential backoff for FC.Unknown errors. The initial interval is 1s, and subsequent intervals are doubled. - errors: - FC.Unknown intervalSeconds: 1 maxAttempts: 3 multiplier: 2 catch: # If the ReserveHotel task fails with an FC.Unknown error after all retries, jump to the CancelFlight step. - errors: - FC.Unknown goto: CancelFlight - From the execution history, you can see that after adding retries, the
ReserveHoteltask was executed multiple times, up to the maximum retry count. Each retry attempt triggers a sequence of three events: TaskScheduled, TaskStarted, and TaskFailed. The execution history shows this cycle repeating, with each TaskScheduled event marking a new retry.
Data transfer between steps
- If the hotel reservation fails, the workflow must cancel the flight and train ticket. These compensation actions require the transaction IDs (txnID) returned by the corresponding
ReserveFlightandBuyTrainTicketsteps. The followinginputMappingsobject shows how to pass the output from a previous step as input to theCancelFlightstep.- type: task resourceArn: acs:fc:{region}:{accountID}:services/fnf-demo/functions/Operation name: CancelFlight inputMappings: - target: operation source: cancel_flight - target: reserve_flight_txnID source: $local.reserve_flight_txnID - The output from each completed step is stored in the
localobject within theEventDetailof the correspondingStepExitedevent.{ "input":{ "operation":"reserve_hotel", "reserve_hotel_result":"fail" }, "local":{ "buy_train_ticket":"success", "buy_train_ticket_txnID":"d37412b3-bb68-4d04-9d90-c8c15643d45e", "reserve_flight_result":"success", "reserve_flight_txnID":"024caecf-cfa3-43a6-b561-9b6fe0571b55" }, "resourceArn":"acs:fc:{region}:{accountID}:services/fnf-demo/functions/Operation", "cause":"{\"errorMessage\":\"Process exited unexpectedly before completing request (duration: 12ms, maxMemoryUsage: 9.18MB)\"}", "error":"FC.Unknown", "retryCount":3, "goto":"CancelFlight" } - After the mapping defined in
inputMappingsis applied to the data in theEventDetail, the input for theCancelFlightstep becomes the following JSON object. This ensures theCancelFlightfunction receives thereserve_flight_txnIDfield."input":{ "operation":"cancel_flight", "reserve_flight_txnID":"024caecf-cfa3-43a6-b561-9b6fe0571b55" }