Nov 13, 2020

Keynote - Why is Replace Fonts greyed out?

Very short & sweet this post, but Google turned up nothing when I was stuck so hopefully I’ll save someone else some head scratching by sharing this.

Nov 3, 2020

Kafka Connect, ksqlDB, and Kafka Tombstone messages

As you may already realise, Kafka is not just a fancy message bus, or a pipe for big data. It’s an event streaming platform! If this is news to you, I’ll wait here whilst you read this or watch this…

Nov 3, 2020

Streaming Geopoint data from Kafka to Elasticsearch

Streaming data from Kafka to Elasticsearch is easy with Kafka Connect - you can see how in this tutorial and video.

One of the things that sometimes causes issues though is how to get location data correctly indexed into Elasticsearch as geo_point fields to enable all that lovely location analysis. Unlike data types like dates and numerics, Elasticsearch’s Dynamic Field Mapping won’t automagically pick up geo_point data, and so you have to do two things:

Oct 7, 2020

ksqlDB - How to model a variable number of fields in a nested value (`STRUCT`)

There was a good question on StackOverflow recently in which someone was struggling to find the appropriate ksqlDB DDL to model a source topic in which there was a variable number of fields in a STRUCT.

Oct 5, 2020

Streaming XML messages from IBM MQ into Kafka into MongoDB

Let’s imagine we have XML data on a queue in IBM MQ, and we want to ingest it into Kafka to then use downstream, perhaps in an application or maybe to stream to a NoSQL store like MongoDB.

Note	This same pattern for ingesting XML will work with other connectors such as JMS and ActiveMQ.

Oct 1, 2020

Ingesting XML data into Kafka - Option 3: Kafka Connect FilePulse connector

👉 Ingesting XML data into Kafka - Introduction

We saw in the first post how to hack together an ingestion pipeline for XML into Kafka using a source such as curl piped through xq to wrangle the XML and stream it into Kafka using kafkacat, optionally using ksqlDB to apply and register a schema for it.

The second one showed the use of any Kafka Connect source connector plus the kafka-connect-transform-xml Single Message Transformation. Now we’re going to take a look at a source connector from the community that can also be used to ingest XML data into Kafka.

Oct 1, 2020

Ingesting XML data into Kafka - Option 2: Kafka Connect plus Single Message Transform

We previously looked at the background to getting XML into Kafka, and potentially how [not] to do it. Now let’s look at the proper way to build a streaming ingestion pipeline for XML into Kafka, using Kafka Connect.

If you’re unfamiliar with Kafka Connect, check out this quick intro to Kafka Connect here. Kafka Connect’s excellent plugable architecture means that we can pair any source connector to read XML from wherever we have it (for example, a flat file, or a MQ, or anywhere else), with a Single Message Transform to transform the XML into a payload with a schema, and finally a converter to serialise the data in a form that we would like to use such as Avro or Protobuf.

Oct 1, 2020

Ingesting XML data into Kafka - Option 1: The Dirty Hack

👉 Ingesting XML data into Kafka - Introduction

What would a blog post on rmoff.net be if it didn’t include the dirty hack option? 😁

The secret to dirty hacks is that they are often rather effective and when needs must, they can suffice. If you’re prototyping and need to JFDI, a dirty hack is just fine. If you’re looking for code to run in Production, then a dirty hack probably is not fine.

Oct 1, 2020

Ingesting XML data into Kafka - Introduction

XML has been around for 20+ years, and whilst other ways of serialising our data have gained popularity in more recent times (such as JSON, Avro, and Protobuf), XML is not going away soon. Part of that is down to technical reasons (clearly defined and documented schemas), and part of it is simply down to enterprise inertia - having adopted XML for systems in the last couple of decades, they’re not going to be changing now just for some short-term fad.

Oct 1, 2020

`abcde` - Error trying to calculate disc ids without lead-out information

Short & sweet to help out future Googlers. Trying to use abcde I got the error:

[WARNING] something went wrong while querying the CD... Maybe a DATA CD or the CD is not loaded?
[WARNING] Error trying to calculate disc ids without lead-out information.

Oct 1, 2020

IBM MQ on Docker - Channel was blocked

Running IBM MQ in a Docker container and the client connecting to it was throwing repeated Channel was blocked errors.

Sep 30, 2020

Setting key value when piping from jq to kafkacat

One of my favourite hacks for getting data into Kafka is using kafkacat and stdin, often from jq. You can see this in action with Wi-Fi data, IoT data, and data from a REST endpoint. This is fine for getting values into a Kafka message - but Kafka messages are key/value, and being able to specify a key is can often be important.

Here’s a way to do that, using a separator and some jq magic. Note that at the moment kafkacat only supports single byte separator characters, so you need to choose carefully. If you pick a separator that also appears in your data, it’s possibly going to have unintended consequences.

Sep 25, 2020

Some of my favourite public data sets

Readers of a certain age and RDBMS background will probably remember northwind, or HR, or OE databases - or quite possibly not just remember them but still be using them. Hardcoded sample data is fine, and it’s great for repeatable tutorials and examples - but it’s boring as heck if you want to build an example with something that isn’t using the same data set for the 100th time.

Sep 23, 2020

📌 🎁 A collection of Kafka-related talks 💝

Here’s a collection of Kafka-related talks, just for you.

Each one has 🍿🎥 a recording, 📔 slides, and 👾 code to go and try out.

Sep 18, 2020

Using the Debezium MS SQL connector with ksqlDB embedded Kafka Connect

Prompted by a question on StackOverflow I thought I’d take a quick look at setting up ksqlDB to ingest CDC events from Microsoft SQL Server using Debezium. Some of this is based on my previous article, Streaming data from SQL Server to Kafka to Snowflake ❄️ with Kafka Connect. Setting up the Docker Compose I like standalone, repeatable, demo code. For that reason I love using Docker Compose and I embed everything in there - connector installation, the kitchen sink - the works.

Sep 18, 2020

Including content from external links with Asciidoc in Hugo

I use Hugo for my blog, hosted on GitHub pages. One of the reasons I’m really happy with it is that I can use Asciidoc to author my posts. I was writing a blog recently in which I wanted to include some code that’s hosted on GitHub. I could have copied & pasted it into the blog but that would be lame! With Asciidoc you can use the include:: directive to include both local files:

Sep 11, 2020

What is Kafka Connect?

Kafka Connect is the integration API for Apache Kafka. Check out this video for an overview of what Kafka Connect enables you to do, and how to do it.

Sep 8, 2020

Counting the number of messages in a Kafka topic

There’s ways, and then there’s ways, to count the number of records/events/messages in a Kafka topic. Most of them are potentially inaccurate, or inefficient, or both. Here’s one that falls into the potentially inefficient category, using kafkacat to read all the messages and pipe to wc which with the -l will tell you how many lines there are, and since each message is a line, how many messages you have in the Kafka topic:

$ kafkacat -b broker:29092 -t mytestopic -C -e -q| wc -l
       3

Sep 7, 2020

Poking around the search engines in Google Chrome

Google Chrome automagically adds sites that you visit which support searching to a list of custom search engines. For each one you can set a keyword which activates it, so based on the above list if I want to search Amazon I can just type a <tab> and then my search term

Aug 20, 2020

🤖Building a Telegram bot with Apache Kafka, Go, and ksqlDB

I had the pleasure of presenting at DataEngBytes recently, and am delighted to share with you the 🗒️ slides, 👾 code, and 🎥 recording of my ✨brand new talk✨:

🤖Building a Telegram bot with Apache Kafka, Go, and ksqlDB

rmoff’s random ramblings

✨ Data Engineering, Kafka, and other random geekery 🤓

Keynote - Why is Replace Fonts greyed out?

Kafka Connect, ksqlDB, and Kafka Tombstone messages

Streaming Geopoint data from Kafka to Elasticsearch

ksqlDB - How to model a variable number of fields in a nested value (`STRUCT`)

Streaming XML messages from IBM MQ into Kafka into MongoDB

Ingesting XML data into Kafka - Option 3: Kafka Connect FilePulse connector

Ingesting XML data into Kafka - Option 2: Kafka Connect plus Single Message Transform

Ingesting XML data into Kafka - Option 1: The Dirty Hack

Ingesting XML data into Kafka - Introduction

`abcde` - Error trying to calculate disc ids without lead-out information

IBM MQ on Docker - Channel was blocked

Setting key value when piping from jq to kafkacat

Some of my favourite public data sets

📌 🎁 A collection of Kafka-related talks 💝

Using the Debezium MS SQL connector with ksqlDB embedded Kafka Connect

Including content from external links with Asciidoc in Hugo

What is Kafka Connect?

Counting the number of messages in a Kafka topic

Poking around the search engines in Google Chrome

🤖Building a Telegram bot with Apache Kafka, Go, and ksqlDB