Recently, I started working on a new project dealing with building cloud native analytics platform. Getting Security Compliance for the platform is one of the major concerns we are currently facing. In this blog post, we are going to talk about few of the security concerns we needed to address before going ahead with these compliances.
- Data at Rest Encryption
- Data at Transit Encryption
Let’s try to answer the WHAT , WHY and HOW for the above mentioned terms.
What is Data at Rest / Data in Transmit Encryption ?
Before jumping into the core topic of this blog post, one should have a basic idea about Encryption. From wikipedia,
Encryption is the process of encoding a message or information in
such a way that only authorised parties can access it and those
who are not authorised cannot.
Currently data contains information from credit card details to health care information to personal information. Because of all this information present within the machine data, is being leveraged by companies like google, facebook and amazon, giving them huge revenue sources for that data.
With such a higher value, a lot of literature and studies are going into making the most secure technologies world wide. Most of the cloud solutions used by organizations all over the world have some form of customer data residing on their machines. These machines may or may not be provided by some cloud service provider.
Think of it , what happens if one of these cloud providers had a security breach on their premise. Do you want your data residing on these virtual hosts to get compromised by the attackers, I HOPE NOT !!!!!
So what can you do on your end to ensure that even if those cloud providers or even your own premise gets breached, even then your data is not compromised. For this you must follow best practices of data encryption. Among these being:
Data at Rest Encryption
In simple terms data residing onto your Hard Drives or SSDs must be encrypted. Even if the attacker gets hold of the disk drives, the data onto your hard drives or disk drives should be useless to the attacker.
Here is the brief lifecycle of data on a machine
Data in Transit Encryption
In the above mentioned diagram, the data keeps on rotating within the different components of a single system like RAM, Processors, Cache, Disk Drives. But in the above diagram there is one thing missing, Network Cables. More often than not be it any program you build which you use on production, talks to different services sitting on different machines which are within or outside your infra.
So communicating with those services requires exchange of information, which essentially means the information has to be encrypted so that any third person who is listening to every bit of the data on those network cables should not be able to make sense of the data.
Having gone through the Why and What of the Data at Rest + Data in Transit Encryption, lets understand how can one achieve these two encryption techniques.
Data at Rest Encryption
To put this simply there are many ways to achieve Data at Rest Encryption. Lets understand the different ways that exist to encrypt data on disks and where do they fit in the kernel stack.
- File System Level Encryption
- This encryption technique works by stacking itself on top of the existing file systems present on any system.
- In this technique, this stacked file system takes care of encrypting the contents of a file / directory.
- ECryptFS is an example of stacked encrypting file system which is packaged within the linux kernel.
- Block Level Encryption
- This encryptions makes sure that every chunk of data on the block device is encrypted.
- In this encryption technique, the block level encryption module like dm-crypt is responsible for decrypting the message block asked by file system and above.
- DM Crypt along with LUKs is a popular tool for block level encryption.
As we can clearly see in this diagram , that the block level encryption sits much closer to the block device as compared to the File System Level Encryption. Also do notice how these encryption techniques sit so seamlessly with the existing modules.
With these encryption techniques, we can easily make sure that even if the attacker gets hold of the disk , even then the attacker could make any sense out of the data. In this blog post , you can read about the performance between these two encryption techniques to pick up the best for you.
But most of the times, if we are using some cloud service provider like AWS or GCP, they do offer encrypted volumes with EBS ( in case of AWS ) and volume ( in case of GCP ). Here is a detailed tutorial to get you started with the encrypted volume setup with AWS.
Data in Transit Encryption
Till now we have secured the data on our disks, so let’s jump onto the next problem to mitigate which is to encrypt the data when talking between two services which could be internal or external.
Encryption is a way to solve the above mentioned problems. But encrypting and decrypting the data between two systems over some unreliable network is different from the encrypting and decrypting data between the block devices and user programs.
The difference is mainly because of key management. In case of block devices and user programs, keys are already present in the system. But in case of across systems encryption/decryption , the exchange of encryption/decryption keys becomes an involved topic in itself.
Let’s try to understand the different ways in which we can share keys between the systems securely, so that they can encrypt and decrypt data with the keys.
- Burning the keys in the application itself, so that when the application bootstraps, it know what secret keys to use to encrypt / decrypt the data.
- Using TLS / SSL to encrypt / decrypt the data between the services over an unreliable network.
1st option is not used, reason being that we don’t want to hardcode the secret keys within the code repository. With this approach we are just adding an extra hop to our application data, the attacker now just has to get to hold of your code repository in some way to get hold of the secret and then use that secret to decrypt that data.
2nd option is the most popular option used by services worldwide. One should read about the TLS/SSL to get more insights about how one can leverage TLS/SSL communication to send data securely over an unreliable network.
TLS/SSL involves exchanging a secret key encrypted with the public key of the server. This secret key which is encrypted with the public key of the server can only be decrypted with the private key which the server only has.