Thoughts on Data Science, IT Operations Analytics, programming and other random topics


  • 14 Mar 2016

    Extracting data from Wily Introscope using Logstash

    Extracting data, either on once-off or ongoing basis is a sometimes tedious, unglamorous activity. It is a necessary part of our end-to-end analytics processing. No data, no analytics. I've just finished some development work to support a few customer trials., extracting performance data from Wily's Introscope performance management system, for use with our Predictive Analytics product. We'd started to see some increased demand for integration with Introscope and so I decided make this latest client engagement an opportunity to create a re-usable integration. In the same way as we'd done with some other integrations, I used Logstash as the platform.

    read more ...


  • 30 Sep 2015

    Why Analytics are Critical to IT Operations

    (first published on IBM Service Management 360 site)

    read more ...


  • 18 Aug 2015

    Unlock the Power of your IT data to Drive Innovation

    (first published on IBM Service Management 360 site)

    read more ...


  • 17 Jul 2015

    Tell Me Something I Don't Know About My Environment

    (first published on IBM Service Management 360 site)

    read more ...


  • 15 Jun 2015

    Reduce the Burden and Cost of Threshold Management

    (first published on IBM Service Management 360 site)

    read more ...


  • 13 Apr 2015

    Five whys of predictive analytics

    (first published on IBM Service Management 360 site)

    read more ...


  • 08 Apr 2015

    Building confidence in your predictive anomaly detection system

    (first published on IBM Service Management 360 site)

    read more ...


  • 15 Mar 2015

    Example use of partially applied functions in Scala

    A colleague asked me the other day about 'partially applied functions' in Scala. He knew basically what they were but was struggling to understand where they might be used. There are many great pages out there that describe this concept in detail, and yet, I remember too when I was first trying to get my head around them, I'd seen a number of similar comments on the various sites ie 'I get what the are, but why would I ever use them?'. I just happened to have a handy example of its use which I was working on, which I'll share.

    read more ...


  • 23 Dec 2014

    Using Predictive Insights to Maximize Your Chances of Success

    (first published on IBM Service Management 360 site)

    read more ...


  • 16 Dec 2014

    Combine perspectives to extract more value from your IT data

    (first published on IBM Service Management 360 site)

    read more ...


  • 16 Nov 2014

    A logstash delta filter

    I mentioned in a previous post (Logstash for metric ingestion - considering custom plugins) that we were going to start using logstash for metric extraction and preparation in some cases. I determined in short-order that we'd need to create a few custom plugins for our own uses. Here I'll show you my very first one, written more as a learning exercise for myself than as something we actually need.

    read more ...


  • 27 Oct 2014

    Divide and conquer your IT environment with partitioning strategies

    (first published on IBM Service Management 360 site)

    read more ...


  • 27 Oct 2014

    Some of my other blog postings

    (Since originally posting this, the IBM Service Management 360 site has been retired, the postings can now be found here on this under sm360)

    read more ...


  • 22 Oct 2014

    Exploring correlations in random timeseries with R cont.

    In my last post, Exploring correlations in random timeseries with R I started to show how we could use R to explore a simple question 'How many correlations would we expect between randomly produced timeseries?'. The emphasis is on applying code to the problem, rather than sweating the stats theory behind the problem.

    read more ...


  • 20 Oct 2014

    Exploring correlations in random timeseries with R

    I'd recently been thinking about aspects of 'the curse of big data' as they applied processing timeseries data from IT operations systems. These timeseries represent things such as cpu utilization, disk i/o rates and webserver response times. Given the volumes of data we deal with, this 'curse' often pops up, or at least the threat of it, in discussions. So we were having one of those back to basics discussions, related to simple correlations between such timeseries, and a number of questions around how many strong correlations were we likely to see, and when could we be confident that the correlations were valid (by some of our subjective definitions).

    read more ...


  • 18 Oct 2014

    Programming around statistics

    I just watched this

    read more ...


  • 02 Oct 2014

    Are you ready for IBM SmartCloud Analytics - Predictive Insights?

    (first published on IBM Service Management 360 site)

    read more ...


  • 20 Aug 2014

    Enforced Ruby experiences

    Of course I've been aware of Ruby for many years. I think I probably installed it years ago for a bit of a play. I never did take it up, not because of any fundamental dislike of the languag, rather, I was already a die-hard Smalltalker, and my thought was always, why bother?, it's Smalltalk-lite, with different syntax, and maybe a bigger user base who don't know what they are missing!.

    read more ...


  • 15 Aug 2014

    Logstash for metric ingestion - considering custom plug-ins

    In my previous post, I mentioned that we'd adopted Logstash for ETL'ing data from perfomance metric sources, to our analytics, and gave some of our motivations. On of the initial obstacles to overcome was that we needed two key bits of functionality that were not available out-of-the-box in Logstash. These were Data Pivoting and Custom CSV format (where both the syntax and the semantics of the CSVs are critical) file production.

    read more ...


  • 25 Jul 2014

    We adopt Logstash for ETL of metric data

    In the course of my daily analytics job, I regulary need to extract and prepare data from a variety of sources. Roughly speaking, I need to Extract,Transform and Load (ETL) data from source to analytics destinations, a major one at this time being Predictive Insights. Generally, this ETL activity is a continuous, near-realtime process, though there are plenty of occasions where it is a once-off activity.

    read more ...


  • 11 Jun 2014

    Shift to the Left with Predictive Insights and Log Analysis

    (first published on IBM Service Management 360 site)

    read more ...


  • 09 Apr 2014

    Anomalies, alarms and actionability

    (first published on IBM Service Management 360 site)

    read more ...


  • 03 Apr 2014

    First Twitter chat

    I took part in my first Twitter chat today. Fast and furious and great fun is my basic feeling. I was vaguely aware of such things, and a friend mentioned to me last night how he found them inspirational and a great way to meet new people with well developed interest in the area under discussion. So I was already eager, and somewhere in the back of my mind, waiting for opportunity to participate.

    read more ...


  • 23 Mar 2014

    Back to Smalltalk and on to Scala

    Years ago, in college, as I was daydreaming in front of an actual 'terminal', one of my professors walked in and dropped a Digitalk Smalltalk book on my desk. He said, read this, it is very cool and it'll be big one day. He was right! I ripped through that book, eagerly learning all kinds of cool concepts, objects..message passing, hierarchies. Most of my college programming was carried out in Smalltalk, with some Lisp and Prolog for good measure. All this despite being a Mechanical Engineering major. Of course, sometimes I'd do the obligatory Fortran for some number crunching project.

    read more ...


  • 20 Feb 2014

    In BigData-land, algorithms are king

    In the realm of Big Data, we have many platforms available now, plenty of functional overlap, and each presumably with their own particular strengths. Garter for example recently released this Top 16 list of Big Data platforms. Another list from ProfitBricks is Top 45 Big Data Tools, Amazon Kinesis. My own weapons of choice these days are a combo of Scala/Akka/Spark and R. We have lots of platforms and tooling available.

    read more ...


  • 24 Dec 2013

    Kaggle and IT Operations Analytics - no competition yet

    I first discovered Kaggle about a year ago when listening to TWiT.TV's wonderful Triangulation. I was immediately struck by the simplicity of the idea. The idea of combining data science/analytics with a crowdsourcing competative element seemed like a winner to me and it seems to have taken off. There now is a vibrant community of data science practitioners developing around it. If you haven't seen it yet, I encourage you to take a look.

    read more ...


  • 15 Dec 2013

    (ITOA) Data Science Venn Diagram

    A few days ago I stumbled upon this image (from The Data Science Venn Diagram)

    read more ...


  • 15 Dec 2013

    Planning for a podcast series focused on ITOA

    My colleague Doug, over at dougmcclure.net, and I had been throwing around the idea of starting a podcast focused on IT Operations Analytics. We think the time is right with lots happening in this space at the moment, things moving very quickly indeed. So we've set ourselves the objective getting things going in early 2014. We're in the planning stages, working out a format, and some early agendas.

    read more ...


  • 14 Dec 2013

    We need more 'streaming' analytics

    In a previous post (see Lots of data in the data center) I gave a few examples of volumes of IT Operations data from typical large data centers. Given these volumes and the need for real-time assessment of that data, some form of 'streaming analytics' is essential.

    read more ...


  • 07 Dec 2013

    Lots of data in the data center

    Sometimes friends who are not techies, or even those who are but not in the same area of work ask me about what I do. I sometimes simply say 'I build stuff that monitors and analyzes the performance of IT environments'. I'll go on to say that we're not talking about a handful of computers or servers in your local small business, we're talking about lots of equipment and huge networks. In an attempt to relate it to something they might know, I'll refer to an example of some large financial institution, or some multinational that would have an environment of the scale we're dealing with. Then they sometimes get a sense of it. Even for us in the business, it's worth pausing occasionally and getting an updated reality check on the volumes of data we are talking about in the typical large IT Operations center. Understanding this is key, since simply being able to handle the volumes comes before analysing the volumes.

    read more ...


  • 01 Dec 2013

    Loving R

    About three years ago I started using R (see The R Project for Statistical Computing). I won't lie and say it was trivial to get going, but coming to it from a die-hard developer perspective, with many languages and programming paradigms under my belt, some of which I dare say I'm quite skilled with, it didn't take me long at all to get productive.

    read more ...


  • 11 Nov 2013

    IT Operations - hot again

    A recent study highlighted a number of interesting trends in analytics. It confirmed what I already instinctively knew i.e. that a majority of the Big Data Analytics focus to date has been on those areas directly related to customer experience and management. You know, the kind of analytics which helps answer questions such as 'what products does my customer most like', 'what/who is influencing my customer', 'which customers are likely to leave me and go to a competitor any time soon'. Sometimes, this category of analytics is referred to as 'beer and diapers' analytics (see the reference here) and while the original story turns out to not be quite true, I like it as a convenient label for the kind of analytics where the focus is on data mining to understand customer behaviour - more often called 'Customer Sentiment Analysis'. What I found most interesting in the report was that after the Customer Sentiment Analysis, the next largest category were those related to improving the operations of the underlying infrastructures supporting the business.. This lines up nicely with my own area of specialization i.e. IT Operations Analytics (ITOA), and I'm pretty pumped to see this area rising in importance.

    read more ...


  • 02 Nov 2013

    Welcome!

    I welcome you to this new blog and am excited to get it going. Before I get blogging here in earnest, let me give you some sense of where I'm coming from. Over the last few years I've spent much of my professional time bouncing between hardcore code-slinging and crunching domain-specific data related to products I've been developing. My primary focus has been in the area generally known as IT Operations, and in recent years, more specifically in IT Operations Analytics (ITOA), which is basically applying Analytics techniques to the operation of large IT systems and the related infrastructure. I've been involved in this space since the early '90s, though with my academic background in Mechanical Engineering, I guess in hindsight, I've been deep in 'analytics' of one sort or another for many years.

    read more ...