In this post, we are going to talk about one of the issues that we faced on our production environment while partitioning.
Some Basic Terminologies:
- Task
- It is a basic unit of work which gets executed on a node.
- Tasks are assigned to the nodes based on hashing.
- Each Task has a metadata field which is a string.
- We take the hashCode of the string and based on this hashCode and cluster size, we then route the task appropriately on one of the nodes of the cluster.
- Request is nothing but a collections of these tasks.
Let us take an example:
- Request
- R( T1, T2, T3, ….. , Tn)
- Sub-Requests
- R(TI1, …..TI2) for Node1
- R(TI3, …..TI4) for Node2
- R(TI5, …..TI6) for Node3
- ….
Now as we are cloud based company, so our nodes becomes unavailable more than often. So in such scenarios where a node of a cluster is unavailable, we route the tasks for that node to an auxiliary cluster.
- So suppose we had assigned tasks Tk1, …..Tk2 to the Node6
- Node6 becomes unavailable due to some intermittent issue.
- Then we will route the tasks Tk1, …..Tk2 to the auxiliary nodes.
- So distribution of tasks on auxiliary cluster is decided by the same hashing technique which we used for routing the tasks to the main cluster.
Problem:
We started seeing uneven distribution of tasks on those auxiliary nodes when any of the nodes in the main tier was unserviceable. Due to this constantly we were keeping our auxiliary clusters over-provisioned which led to huge increases in cost and operational overhead.
Explanation:
So this uneven distribution of the tasks on the auxiliary cluster was due to a simple fact that the tasks which were assigned to the Node6 were having one special property that all their hashCodes followed this format:
HashCode ( Tasks Assigned to Node6 ) = 8*k + 6
So now all these tasks when got routed to auxiliary cluster , would only be routed to specific nodes.
HashCode ( Tasks Assigned to Node6 ) % 4 = i = ith node of the auxiliary cluster As HashCode ( Tasks Assigned to Node6 ) = 8*k + 6 Hence, (8*k + 6) % 4 = 2*m ( Even Number ) So, ith node of the auxiliary cluster would always be of form 2*m.
So all of these tasks would get routed to auxiliary nodes which are even numbered and hence uneven distribution.
Solution:
This uneven distribution on auxiliary cluster was because of the fact that both of the hashing functions ( for the main cluster as well auxiliary cluster ) were the same and used the same hashCode of the tasks.
We want that both of these hashFunctions should be independent from each other.
By independent, I mean to say is that if we have any information about HashFunctionMain , we shouldn’t be able to make any prediction regarding HashFunctionAux.
We tried to solve the problem by using HashCodeBuilder which is supposed to give a different functionality for finding the hash of an object ( string in our case ).
Still the Problem Persists:
Even after using HashCodeBuilder for finding the hash of the object , still we were seeing uneven distribution on our auxiliary nodes. We started digging deeper into the codebase of the hashCodeBuilder and to our surprise we found that every string or char hashCode got reduced to this generic formula:
HashCodeBuilder's HashCode Implementation s[0]*m^(n-1) + s[1]*m^(n-2) + ... + s[n-1] where m: Prime Number n: Characters in the string s[0], s[1], ... s[n]: Characters in the string Java's HashCode Implementation s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] where n: Characters in the string s[0], s[1], ... s[n]: Characters in the string
So we can clearly see that there is a clear dependency between these two hashCodes.
Dependency being that if s[0] + s[1] + ... s[n]
is even ,
- Java’s HashCode would also be even
- HashCodeBuilder’s HashCode would also be even
So as we can see that there is some relation between the two hashFunctions, so we cannot use HashCodeBuilder for finding the hashCode for routing to Auxiliary Nodes.
So finally , we arrived at two solutions.
- Using SHA Hash
- More Time Consuming
- Takes more CPU cycles
- Using HashCode for this first (n-1) characters
- So we have to show that given hashCode of n characters which is essentially a constant , hashCode of (n-1) characters would be uniformly distributed. This is due to this simple fact:
Proof: C : HashCode of the n characters s[n-1]: nth character Hn-1: HashCode of the (n-1) characters Now, we know ( C - s[n-1] ) / 31 = Hn-1 C - s[n-1] = 31 * Hn-1 Now this C is constant and s[n-1] has a uniform distribution and can take any value from [0, 255) Hence addition of a uniform distribution i.e. s[n-1] to a constant i.e. C in our case would keep the resultant uniformly distributed [0, x).
We wrote a small program to test the hypothesis as well
object HashingTest { val r = new scala.util.Random(31) protected[routing] def getHash(task: String): Int = { task.hashCode } private def mod(i: Int, modulus: Int): Int = { val result = i % modulus if (result >= 0) { result } else { result + modulus } } private def pickInitialNode(task: String): Int = { val nodeIndex = mod(getHash(task), 18) nodeIndex } private def pickAuxiliaryNode(task: String): Int = { val hashCodeBuilder = new HashCodeBuilder hashCodeBuilder.append(task.substring(0, task.length - 2)) val hashCode = hashCodeBuilder.hashCode mod(hashCode, 18) } def main(args: Array[String]): Unit = { val tasks = (1 to 100000).map(elem => { val task = Random.nextString(31) val a = pickInitialNode(task) if (a == 6) { pickAuxiliaryNode(task) } else { -1 } }) shards.filter(elem => elem != -1).toSet.toList.sorted.map(println) } }
Generated Output 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
In this above output we can see that the tasks were getting routed on all the nodes of the auxiliary cluster.
Also to show a performance comparison between SHA hashing and our home made hashing technique.
Hashing for 10000 task strings took: SHA Hash pct90 47 ms pct50 25 ms HashCode (n-1) Characters pct90 38 ms pct50 21 ms
So in this experiment we can clearly see that our Hashing Technique for Routing on Auxiliary Nodes performs better in terms of performance when compared to SHA hashing. Also we were able to save $$$ after we rolled this out due to uniform distribution across the clusters.