Exactly Once Semantics with Non-Idempotent Requests

Problem Statement

Currently, in the case of non-idempotent requests eg. incrementing a simple counter, we don’t really have a good way to deal with failures/retries. This is because it is very difficult to make sure whether the request failed before the operation completed on the server or after. So which leads to some common known problems

  • Duplicate data in case of multiple retries
    • Before 1st Increment ( INCR ) Request: Counter X = 10
    • After 1st Increment Request: Counter X = 11. But the client did not receive the ack for the request due to some network issue.
    • Client Retries again, 2nd Increment Request: Counter X = 12. This time client did receive the ack and hence did not retry again.
  • Loss of data in case of no retries.
    • Before 1st Increment ( INCR ) Request: Counter X = 10
    • The increment request was not able to complete due to a networking issue. But the client did not retry again as the client was not sure if the request was successfully executed on the server

Untitled document (7).jpg

Today’s Distributed Systems use this mechanism to deal with this situation

  • Generating a RequestID
  • Making the request to the server with this RequestID
  • The server makes sure that this RequestID has not been processed before by looking up in an in-memory cache of RequestsIDs it has processed before.
  • If not processed, it processes the request and adds the RequestID to its cache.
  • After t seconds, it cleans the entries in this cache. So that this cache does not end up consuming a lot of space/memory.
def putRequest(id: RequestId, key: K, value: V) = {
  if (isDuplicateRequest(id)) {
    return ERROR_CODE_DUPLICATE_REQUEST;
  } else {
    store.put(key, value)
  }
}

But this above-mentioned methodology is not robust. There might be cases, where this strategy might not work out.  Let’s see a case where it might fail

  • A Client sends a request to the server. The server successfully serves the request and makes the update.
  • Request Times Out and now the client is network partitioned so the client cannot make the request to the server for the next t seconds.
  • Now when the client can talk back to the server, it tries to issue the request again. But unfortunately, the requestID has been expired from the store because of which the request will succeed again and the update will be applied again 😦 😦

So how can we ensure exactly once writes or updates in a data store?

Solution

So for that, we need to break our non-idempotent operation into these two operations.

  • PREPARE Operation
  • COMMIT Operation

 

Untitled document (10)
PREPARE and COMMIT Operations

Now let’s dig deep into implementation details for these two operations.

PREPARE Operation

  • Client Issuing the request generates a random requestID
    • In this, we just add an entry on the server against the requestID and along with the requestID, we also store the operation OP we want to carry i.e. INC or DEC.
    • This requestID represents an undergoing operation.
    • This adding of this requestID in the DataStore is idempotent i.e. we can make add this requestID to the datastore multiple times with the same outcome and hence idempotent.
  • Even if this PREPARE operation fails, we should be able to handle this as this PREPARE is an idempotent request which means that we can execute this operation multiple times without any side effect.
PREPARE(id: requestID, key: K)

Insert Entry (X, OP)  for requestID in Table Ongoing_Requests

COMMIT Operation

  • In this operation, we COMMIT the results for this requestID.
    • For this requestID in the datastore, we apply the operation OP on Key X.
    • After applying the operation, we remove the entry for requestID from the datastore.
  • Even if this COMMIT operation is repeated multiple times, still we won’t have any side effects or unwanted state. 
    • In case of COMMIT operation being repeated, we won’t carry the same operation again because we would have already deleted the requestID in case of the first COMMIT operation. Hence the COMMIT operation is idempotent.
COMMIT(id: requestID) 

Update Value = OP(X) for Key X in Table Data
Delete Entry(X, V) for requestID in Table Ongoing_Requests
  • In Ongoing_Requests Table, we store the entries Keys which are undergoing update operations
  • In the DATA Table, we have the actual Key Value Pairs ( Key, Value )

Conclusion

So breaking any non-idempotent requests to PREPARE and COMMIT operations, change the nature of this non-idempotent request to idempotent which means we are good with any number of failures or retries. Even in case of failures and network partitions, retries of operation ( PREPARE or COMMIT ) will always make sure that the idempotent nature is maintained.

References

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s