Thoughts on Data Science, IT Operations Analytics, programming and other random topics


Logstash for metric ingestion - considering custom plug-ins

15 Aug 2014

In my previous post, I mentioned that we'd adopted Logstash for ETL'ing data from perfomance metric sources, to our analytics, and gave some of our motivations. On of the initial obstacles to overcome was that we needed two key bits of functionality that were not available out-of-the-box in Logstash. These were Data Pivoting and Custom CSV format (where both the syntax and the semantics of the CSVs are critical) file production.

Data Pivoting

Pivoting along the lines shown here Datastage Pivot. Our analytics expects data in what we affectionatly call 'wide' format. This is primarily due to where started from, with a focus on some IBM data sources, e.g. Tivoli Data Warehouse. However, a lot, and an increasing proportion, of data sources are in 'skinny' format, and so, in the short-term, we needed some basic pivoting capability on our Logstash-based mediation.

Custom CSV file production

To match the expectations of our analytics non-DB input, we need the data flowing out of Logstash to be placed in files with a header, and particular naming convention (involving start and end time of the data). The timestamps here are data time, not the real-time processing time. We very often are dealing with backlog/old data, and we put particular emphasis on 'data time'

In both cases, the obvious thing to me, as a non-Logstash (yet) person, was to create a custom plug-in for each of these. We'll make these plug-ins available to our field community as soon as they are ready, and hopefully, the pivot operator to the wider Logstash/Open Source community, once we get legal aspects in order

Creating both of these will plug two very specific gaps in the off-the-shelf Logstash for us. With those in place, Logstash can serve as the basis for mediation solutions for a variety of our data sources.

comments powered by Disqus