Apache Kafka client library for Go

Hey, sarama contributor here. Some points of clarificarion:

  • The Kafka protocol allows concurrency; the server will only process one request at the time, but you can line up multiple requests while waiting for the response to the first one.
  • Sarama handles errors (leadership elections, network issues, etc). The error handling has been heavily tested, and can be configured quite a bit (number of attempts, backoff duration after a failed attempt, etc).

I'd like to clarify why we have opted to use a channel based API for the producer. For the consumer it's more a matter if taste, but for the producer you will always end up with a concurrent implementation if you require a decent amount of throughput.

The first realization is that messages normally do not occur in batches, but you usually want to produce one message at the time. This works great with a blocking API, but the performance of this approach is pretty terrible. Which is why Kafka's API supports batching. (Another reason is that compression works much better if you batch messages.)

With your API, you leave the batching completely up to the user. Batching is quite tricky though because there are conflicting requirements of low latency, high throughput, and staying within the maximum message size limits of the broker. For any batching implementation that wants to keep latency low (by forcing a batch flush after e.g. 250ms no matter how many messages are queued up), you have to use a goroutine.

Sarama implements this batching goroutine for you. You provide messages to a channel. A goroutine reads from it and queues up messages. It will flush batches either after a given number of messages or a given amount of time. It will also make sure it flushes before the batch goes over the maximum request size the broker is willing to handle. While it is sending the request to kafka and waiting for the response, the input channel is still accepting new messages that will be queued up for the next batch. This means that your application will not be blocked for a long time, e.g. when the Kafka cluster is in an election. This is important if you want short response times when handling HTTP requests.

Sarama will try really hard to produce the messages you provide, but in some cases it is impossible (e.g. the complete cluster is down). When this happens, the messages will eventually be returned on the errors channel. Your app can read from it and handle the failures any way it wants.

If you really need a synchronous way to produce messages, Sarama offers a sync producer: http://godoc.org/github.com/Shopify/sarama#SyncProducer. By using its SendMessage method, you get the result returned in a synchronous way. The nice part is that it is safe to use concurrently (e.g. when producing messages from different goroutines, common in HTTP servers). Internally it will still use batching to achieve higher throughput.

/r/golang Thread Link - github.com