Apache Pinot
This tutorial shows how to integrate Upstash Kafka with Apache Pinot
Apache Pinot is a real-time distributed OLAP (Online Analytical Processing) data store. It aims to make users able to execute OLAP queries with low latency. It can consume the data from batch data sources or streaming sources, which can be Upstash Kafka.
Upstash Kafka Setup
Create a Kafka cluster using Upstash Console or Upstash CLI by following Getting Started.
Create one topic by following the creating topic steps. This topic is going to be source for Apache Pinot table. Let’s name it “transcript” for this example tutorial.
Apache Pinot Setup
You need a host to run Apache Pinot. For this quick setup, you can run it on your local machine.
First, download Docker. Running in docker container is much better option for running Apache Pinot than running it locally.
Once you have docker on your machine, you can follow the steps on Getting Started run Apache Pinot in docker.
In short, you will need to pull the Apache Pinot image by running following command.
Create a file named docker-compose.yml with the following content.
Go into the directory from your terminal and run the following command to start Pinot.
Now, Apache Pinot should be up and running. You can check it by running:
You should see the output like this:
Now, you should add table to your Pinot to store the data streamed from Kafka topic.
You need to open http://localhost:9000/ on your browser.
Click on “Tables” section.
First, click on “Add Schema” and fill it until you see the following JSON as your schema config.
Click save and click to “Add Realtime Table” since we will stream the data real-time.
On this page, table name must be the same name with the schema name, which is “transcript” in this case.
Then, go below on this page and replace “segmentsConfig” and “tableIndexConfig” sections in the table config on your browser with the following JSON. Do not forget to replace UPSTASH-KAFKA-* placeholders with your cluster information.
Test the Setup
Now, let’s create some events to our Kafka topic. Go to Upstash console, click on your cluster then Topics, click “transcript”. Select Messages tab then click Produce a new message. Send a message in JSON format like the below:
Now, go back to your Pinot console on your browser. Navigate to “Query Console” from the left side bar. When you click on “transcript” table, you will see the result of the following query automatically.
The query result should be as following:
Links
Was this page helpful?