This is a common question asked by many Kafka users. The goal of this post is to explain a few important determining factors and provide a few simple formulas.
More partitions lead to higher throughput
The first thing to understand is that a topic partition is the unit of parallelism in Kafka. On both the producer and the broker side, writes to different partitions can be done fully in parallel. So expensive operations such as compression can utilize more hardware resources. On the consumer side, Kafka always gives a single partition’s data to one consumer thread. Thus, the degree of parallelism in the consumer (within a consumer group) is bounded by the number of partitions being consumed. Therefore, in general, the more partitions there are in a Kafka cluster, the higher the throughput one can achieve.
A rough formula for picking the number of partitions is based on throughput. You measure the throughout…
View original post 1,439 more words