Configuring Kafka Sources and Sinks in Kubernetes

Sebastien Goasguen
ITNEXT
Published in
3 min readJun 28, 2021

--

Streaming data into a Kafka cluster and from a Kafka cluster to somewhere else is usually done with a Kafka connect cluster and associated configurations.

This is usually a real pain point for Kafka users. It involves:

  • Deploying and running a Kafka connect cluster
  • Updating and compiling connectors in Java
  • Uploading JARs to specific directories in your Kafka connect cluster
Kafka connect data pipeline from confluent blog

While there is a nice collection of Kafka connectors, few on Confluent cloud are fully managed which leaves Kafka users with a lot of work to do if they want to manage their sources and sinks efficiently.

Following up on the previous two posts where I showed how to produce messages to Kafka and how to consume messages from Kafka I am now going to show you how you can define your Kafka Sources and Sinks declaratively and manage them in your Kubernetes clusters.

Configuring a Kafka Source

From the point of view of a Kafka cluster a Kafka source is something that produces an event into a Kafka topic. With the help of Knative we can configure a Kafka source using an object called a KafkaSink this is super confusing I know and I am really sorry about it :) everything is relative and depends on where you stand.

To create an addressable endpoint in your Kubernetes cluster that will become the source of messages into your Kafka cluster you create an object like that one below:

apiVersion: eventing.knative.dev/v1alpha1
kind: KafkaSink
metadata:
name: my-kafka-topic
spec:
auth:
secret:
ref:
name: kafkahackathon
bootstrapServers:
— pkc-456q9.us-east4.gcp.confluent.cloud:9092
topic: hackathon

Note the bootstrap server on Confluent cloud, and the specified topic. Equipped with this you can now create a AWS SQS source that will send messages directly into Kafka with a manifest like the one below:

apiVersion: sources.triggermesh.io/v1alpha1
kind: AWSSQSSource
metadata:
name: samplequeue
spec:
arn: arn:aws:sqs:us-west-2:1234567890:triggermeshqueue
credentials:
accessKeyID:
valueFromSecret:
name: awscreds
key: aws_access_key_id
secretAccessKey:
valueFromSecret:
name: awscreds
key: aws_secret_access_key
sink:
ref:
apiVersion: eventing.knative.dev/v1alpha1
kind: KafkaSink
name: my-kafka-topic

Note that the AWS SQS is represented by its ARN and that you can access it thanks to a Kubernetes secret which contains AWS API keys. The sink section points to the KafkaSink previously created. With this manifest all SQS messages will be consumed and sent to Kafka using the Cloudevents specification.

Two Kubernetes objects and you have a declarative definition of your Kafka source

Configuring a Kafka Sink

For your Kafka Sink, we sadly fall back to the same potential confusion: what is a Sink from the Kafka cluster perspective is a source outside of that cluster. So to define a Kafka sink, you define a KafkaSource in your Kubernetes cluster.

For example the manifest below specifies your bootstrap server pointing to a Kafka cluster in the Confluent cloud, a specific topic from which to consume messages from and a sink which is the end target of your Kafka messages. Here we point to a Kubernetes service called display :

apiVersion: sources.knative.dev/v1beta1
kind: KafkaSource
metadata:
name: my-kafka
spec:
bootstrapServers:
— pkc-453q9.us-east4.gcp.confluent.cloud:9092
net:
sasl:
enable: true

sink:
ref:
apiVersion: v1
kind: Service
name: display
topics:
— hackathon

So all you need for your Kafka Sink definition is a single Kubernetes manifest and a target micro-service exposed as a traditional Kubernetes service. Note that it could also be a Knative service in which case your target would benefit from auto-scaling including scale to zero.

Conclusions

Instead of a complex work flow of compiling JARs and uploading them to a Kafka connect cluster, you can leverage your Kubernetes cluster and define your Kafka sources and sinks with Kubernetes objects.

This brings a declarative mindset to your Kafka sources and sinks, gives you a source of truth and unifies the management of your Kafka flows and your micro-services. In addition this simplifies targeting micro-services running in Kubernetes when Kafka messages are present and with the addition of Knative gives you the autoscaling necessary to handle changing stream volume.

If you need a collection of Kafka sources, suddenly all Knative event sources can be used (e.g GitHub, GitLab, JIRA, Zendesk, Slack, Kinesis…)

--

--

@sebgoa, Kubernetes lover, R&D enthusiast, co-Founder of TriggerMesh, O’Reilly author, sports fan, husband and father…