Apache Flink
This tutorial shows how to integrate Upstash Kafka with Apache Flink
Apache Flink is a distributed processing engine which can process streaming data.
Upstash Kafka Setup
Create a Kafka cluster using Upstash Console or Upstash CLI by following Getting Started.
Create two topics by following the creating topic steps. Let’s name first topic “input”, since we are going to stream this topic to other one, which we can name it as “output”.
Project Setup
If you already have a project and want to implement Upstash Kafka and Apache Flink integration into it, you can skip this section and continue with Add Apache Flink and Kafka into the Project.
Install Maven to your machine by following Maven Installation Guide.
Run mvn –version
in a terminal or in a command prompt to make sure you have
Maven downloaded.
It should print out the version of the Maven you have:
To create the Maven project;
Go into the folder that you want to create the project in your terminal or
command prompt by running cd <folder path>
Run the following command:
Add Apache Flink and Kafka into the Project
Open the project folder by using an IDE which has maven plugin such as Intellij,
Visual Studio, Eclipse etc. Add following Apache Flink dependencies into the
dependencies tag in pom.xml
file.
Streaming From One Topic to Another Topic
You need to create 2 more classes (LineSplitter, CustomSerializationSchema) for word count example.
LineSplitter
This class will be custom implementation of FlatMapFunction from Apache Flink
client library. It takes a sentence, splits into words and returns a
two-dimensional Tuple in format: (<word>, 1)
.
Create LineSplitter class as following.
CustomSerializationSchema
This class will be a custom implementation of KafkaRecordSerializationSchema from Apache Flink Kafka connector library. It will provide a schema for serializing and converting data from two-dimensional Tuple, which will be the output of word counting process, to Kafka record format.
Create CustomSerializationSchema class as following:
Integration
Import the following packages first:
Define the names of the topics you are going to work on:
Create the following properties for Apache Flink Kafka connector and replace
UPSTASH-KAFKA-*
placeholders with your cluster information.
Get the stream execution environment to create and execute the pipeline in it.
Create the Kafka consumer.
Implement the stream processing part, which will take the input sentence from source and count words.
You can see the output by printing the data stream.
If you produce message to the input topic from your console, you will see the output like this:
Next, create a Kafka producer to sink the data stream to output Kafka topic.
Finally, execute the Stream execution environment that was retrieved and run it.
Was this page helpful?