## My Confusion
I have been reading up about the Try Confirm Cancel (TCC) Protocol, and I understand the main idea behind it (for the happy flow). My confusion comes in how do we actually implement or manage this in the edge cases (when a service unexpectedly crashes).
Suppose we have 3 microservices: S1, S2, S3. Suppose S1 and S2 provide the TCC interface as APIs, while S3 is the service which actually calls S1 and S2 and manages the transaction. Assuming
T
to represent time, I have listed some scenarios below which seem to be problematic:
Scenario 1
T0: Client calls S3 to trigger a flow which uses TCC.
T1: S3 calls TRY for S1. This succeeds.
T2: S3 crashes
In Scenario 1 above, when S3 recovers from the crash, how will it know to call CANCEL for S1?
Another Scenario which faces the same issue:
Scenario 2
T0: Client calls S3 to trigger a flow which uses TCC.
T1: S3 calls TRY for S1 and S2. This succeeds.
T2: S3 calls CONFIRM for S1
T2: S3 crashes
In Scenario 2 above, when S3 recovers from the crash, how will it know to call CONFIRM for S2?
The articles I've read on TCC never seem to address the issues highlighted above.
## My Proposed Solution
I was thinking the only way to solve this is to somehow store a representation of the transaction that is supposed to take place. Taking the same example above, suppose S3 is a microservice trying to coordinate the transfer of money between 2 different banks S1 and S2. The flow would be something like this:
- T0: Client calls S3 to trigger transfer of money between 2 bank accounts.
- T1: S3 stores a record in DB first before making any calls to S1 and S2 (the status
field will change according to which stage the TCC is in):
| from_user_id | to_user_id | amount | status |
| -------------- | ------------ | -------- | -------- |
| 100 | 101 | 56 | TO_TRY |
- T2: S3 calls TRY for S1 and S2.
- T3: If both TRY succeeds, S3 should update the status
field of the DB record to TO_CONFIRM
. If either TRY fails, S3 should update the status
field of the DB record to TO_CANCEL
. Lets assume the cancellation scenario:
| from_user_id | to_user_id | amount | status |
| -------------- | ------------ | -------- | -------- |
| 100 | 101 | 56 | TO_CANCEL |
- T4: S3 calls CANCEL for S1 and S2.
- T5: If both CANCEL succeeds, S3 should update the status
field of the DB record to COMPLETED
. Otherwise, just leave it in TO_CANCEL
.
The idea behind this solution is that if S3 crashes at some point, we can have a cronjob to pick up the records in the DB that are stuck in non-COMPLETED statuses, and try to push them to the COMPLETED status (making sure that either the TRY, CANCEL or CONFIRM phase eventually follows through).
Is my proposed solution a legitimate solution? I'm assuming there are already existing solutions to this, but I can't seem to find anything on this issue. Any useful articles or resources would be greatly appreciated!
Asked by Ryn
(101 rep)
Jun 25, 2022, 06:36 AM
Last activity: Aug 4, 2025, 04:12 AM
Last activity: Aug 4, 2025, 04:12 AM