In a previous post (see Lots of data in the data center) I gave a few examples of volumes of IT Operations data from typical large data centers. Given these volumes and the need for real-time assessment of that data, some form of 'streaming analytics' is essential.
IT Operations data is a perfect subject for the application of streaming analytics technologies. The data naturally flows from the sources to the point of assessment. The value of this data rapidly diminishes with the passage of time and so this data-in-motion, must be analyzed while it is in motion! Real-time analysis is key, and streaming analytics is the general technology to facilitate this analysis.
You can roll your own streaming-analytics infrastructure from the ground up, as I did more than a decade ago, when there weren't many (any?) platforms available. It was a tremendously fun exercise from which I learned a lot. However, I don't want to do it again, and you probably don't want to either at this stage, especially if you want to focus on solving ITOA analytics problems. Now, we're somewhat spoiled for choice with plenty of commercial and OpenSource offering out there ( e.g. InfoSphere Streams, Storm ). Take your pick. That gets you the basic platform, but just like a database platform, it doesn't do much on its own. It's the vertical application you create to run on top of the streaming analytics infrastructure that makes the difference.
In ITOA there are plenty of opportunities to create and apply new algorithms and particularly those that are inherently streaming. We have the data, lots of it, we need algorithms. So, if you are a data scientist out there, with expertise in streaming analytics algorithms, who hasn't previously looked at IT Operations data, I encourage you to take a look. You may well discover a rich area of applicability for you techniques.