Kafka is a distributed streaming platform. Kafka brokers have topics, and Producers produce events on to the topics, and consumers consume. Topics have partitions and can be scaled across multiple nodes. Consumers form consumer groups, and consume messages from each of those partitions.
A single stretched cluster spans across 2 or more data centers and availability zones, including Zookeepers. For ex: A Production environment may have 24 Kafka broker nodes. Similarly the lower environments like Dev, Test and UAT, each of them can have around 4 to 16 broker nodes. In large organizations like LinkedIn, Yahoo, there would be hundreds of them.
Large companies with big Kafka implementations may have 2000 to 10000 Kafka topics across Dev, Test, UAT, and Prod environments. Maintaining all the topics, their subscriptions, and preserving the configurations would eventually become cumbersome in terms of security, and isolation.
Image courtesy : https://kafka.apache.org/
While working with say 2000 Kafka topics, it is still manageable for all the teams to distinguish them based on the kind of events produced on them. For Example : Team 1 owns 50 topics, and produces and consumes from another 50 topics. There might be several other topics that are totally irrelevant for them, or not allowed to view them, or not allowed to produce or consume those topics. For a granular view, it is required to bring some isolation around the topics, and display only relevant topics, to enforce security and governance. It is sometimes required for teams to only browse through the available topics like a catalog, but not request for producing or consuming access. Let’s see how multi-tenancy addresses these questions.
With one or more stretched clusters, without introducing any isolation on Kafka topics is the default representation in any company.
By default every company with a Kafka implementation falls into a single tenant layout. Every team, every user is allowed to view the topics, produce or consume from them, based on the access control provided. Is it required to bring some isolation on the topics within this single tenant layout? This depends on the needs of the organization: perhaps not all the users are allowed to browse through them, as they are either not relevant or for security reasons.
Large companies like LinkedIn, Walmart, Twitter, Netflix, and Facebook implement a multi-tenant architecture. There could be several stretched clusters like (DTAP), each of them representing one tenant. What is the idea behind this ? It is isolation and bringing security on the topics. Security in this context does not mean access control, rather from a governance perspective. Basically every tenant is isolated from other tenants. Topics can be divided and placed into each of those tenants based on a functionality, sensitivity, region, domain or some other requirement. It is not always advisable to have multi tenants where there are about 2000 topics., as you would have to spend a lot on the Infrastructure side.
Multi tenancy creates a secure atmosphere with enforcing Governance , defining a controlled view.
Within a single tenant architecture, while Kafkawize can bring some restrictions to teams accessing Kafka topics enabling the possibility to configure a team to not own a Kafka topic, but rather only to consume. Kafkawize supports multi-tenancy without dependency on the underlying clusters. Through this feature, it is possible for Teams and Environments to associate with a tenant., and basically own, produce or consume the topics within the defined environments, without having to update or change anything on Kafka clusters. With a default installation, single tenancy is enabled.
While Multi tenancy brings better visibility and security on Kafka topics, the downside is, without multi tenancy, it is possible to reduce the number of brokers and zookeepers eventually reducing the costs. (Fyi, a cluster can only hold certain number of partitions.) It is for the organization to decide based on the trade-offs.