Kafka topics are entities in the kafka broker where data is produced and consumed by different applications. Data can be any trivial information or sensitive. It can be in various formats like Avro, Json, Xml, free text or any other format. Is it ok to produce and consume from any topic within an organization? At the end, data is rested and stored on a file system. So how important it is for an organization to take necessary steps to secure the data ?
Imagine a company which is into logistics business adopted Kafka. Their main needs would be track an order from initiation, until it is delivered. There are several notifications in between which have to be delivered to various parties. This is an ideal use case for Kafka. Data delivery is very important. But what about the data security? Data in these kind of notifications about orders is not so sensitive sometimes. So it could be ok to not secure your data.
Now let’s think about financial institutions, or banks where we data security is the key to run the bank. There would be different kinds of data like audit logs, notifications, customer profiles, bank accounts and transactions. It is very important to secure the customer data and also to be in compliant with GDPR, risk compliance, customer security etc.
It is crucial to decide whom you would like to designate your consumers of the data. Hence restrictions are necessary. Even on the producers of data, to prevent any malicious or malformed data being produced. These kind of unwanted situations will deform the integrity of the system.
When Kafka is adopted in an organization, it starts with a bunch of topics by one or more applications. But when the requirements keep growing or the needs of various applications scale, it is implicit that they would need more topics. As the topics, producers and consumers scale, it is important to take the decision in the right time to apply security, either on the data, application level and/or SSL.
If this decision is not taken in the beginning, companies would see several problems later to make all teams, client applications adhere to the security guidelines enforced. It could take months depending on the number of topics.
Now we know that depending on the kind of data and for how data is transported, securing data is important. There are different ways to secure a kafka message.
Data and Transport security
With symmetric key encryption, when a message is produced, a producer uses a key to produce, and the same key to be used to decrypt the message on the consumer side. This implementation will be part of the kafka (De)serializers.
Other authentication protocols like SASL (Kerberos/Plain modules) can be applied too.
Applying Access controls [ Authorization]
Kafka comes with security features called access controls on topics. Basically it is access given to Applications or client machines which are producing and consuming data.
For ex : Based on IP address of a Kafka Producer machine, access can be granted or denied. Similarly based on configured certificate of the application access can be granted or denied. It is recommended to use certificate based access controls, as they can bound across different applications using same certificates in one environment.
More details on these access controls (IP address based) are explained here.
While it is good and important to secure your data, it comes with a performance issue, if TLS is applied at transport layer. , or at the data processing side which should be considered if there is a need.
We learnt about couple of use cases, importance of security on kafka topics/data, ways to do it and performance problems.
If any organization decides in the early stages to implement security at various levels, it would be very easy to scale up and manage Kafka.
Apply security wisely.
Author : Murali Basani