Thoughts on Data Science, IT Operations Analytics, programming and other random topics


Divide and conquer your IT environment with partitioning strategies

27 Oct 2014

(first published on IBM Service Management 360 site)

A typical IT environment produces a considerable volume of measurement and monitoring data, in a variety of forms—for example, events, logs and performance metrics. Applying analytics to this data to glean insight is a critical and challenging task, whether you’re working in real time or offline data modes. There are several reasons for the difficulties, sheer volume of data being perhaps the most obvious one. However, the richness, complexity and relationships among the data also come to mind. See my post on this topic, Lots of data in the data center to get more of a sense of the volume and variety of data out there.

Bottom line: the scope of the problem can quickly become overwhelming, and there is a need to focus on a subset to make progress. This is where partitioning strategies come in.

The current crop of IT domain analytics tools generally have a particular area of applicability, and some recent products are evolving to combine different kinds of data for deeper insights. IBM SmartCloud Analytics – Predictive Insights, for instance, deals primarily with time series data. This is the regularly sampled performance data—for example, cpuUtilization for server1, userResponseTime for webPageX. Even a modest modern IT environment has many such time series to be collected, processed and stored by the deployed Performance Management (PM) systems. So there are millions of time series, each with data points on intervals typically in the 1 to 15 minute range. This quickly adds up to a lot of data to be analyzed in real time.

SmartCloud Analytics – Predictive Insights and tools with similar analytics capabilities can determine, among other things, relationships between time series. They also generate forecasts about how the time series (and the systems being monitored) are expected to behave. The essential point here is that when dealing with this kind of analytics, we are dealing collectively with sets or groups of time series. The simplest approach is to lump all the time series together and let the analytics engines do their thing. Often this is desirable from a management, or even sheer convenience, point of view. I regularly use the melting pot metaphor. Let’s just throw the data in there, mix it up and see what comes out!

However, this melting pot approach often breaks down in the real world. There are reasons why we might want to—or be forced to—partition the sets of time series and analyze them separately.

The two major reasons I usually encounter are:

  • Data volume/velocity
  • Environmental and organizational structure

Partitioning approaches

When faced with the need to partition our data for a Predictive Insights-style deployment, how might we go about it? Unfortunately, there’s no formulaic answer here, just approaches and factors to consider!

There are two main approaches to partitioning the data in an IT environment:

  • Organization
  • Technology

Partitioning by organization is conceptually obvious enough. A group deploying an instance of an analytics tool may decide to keep its data separate and run it through its local instance of that tool. However, things are getting more connected by the day, and the old silo walls are being broken down. Even if a group has nominal ownership of a set of IT components, those components and functions don’t exist in isolation. They are connected to other parts of the IT environment. Part of the value of these analytics is to help surface a relationship and see what, in an overall context, is affecting what else. So a simplistic separation by organization will work at some level, but it will also miss important insights. It’s a good first step, but keep an open mind about including other data in due course.

Separating technology by focusing on all the application servers, or focusing on separate business applications, can work too. Again, the tradeoff is that behavior of one tier of technology is affected by the infrastructure it runs upon, and potentially by other applications. So when we partition like this and analyze the data separately, we again lose the relationship insights across these partitions.

Mitigating the loss of relationship insight

We have found a useful way to mitigate the loss of relationship insights across the partitions by identifying sets of metrics that are obviously common across them. Many applications might share a common storage infrastructure. So while we might be forced to partition by application, at the same time, it may be practical to include the performance data from that common storage in all partitions. Then, when the analysis happens, the relationship and effect of that common infrastructure can be seen across all applications.

We do something similar with the metrics that directly measure customer experience—for example, web response time. As in the common storage infrastructure case, all those key customer experience metrics are fed into each partition. In this way, the direct effect of the data within those partitions can be seen on the key metrics.

Partitioning is unavoidable. The trick is to partition where necessary but at the same time keep common data in all the relevant partitions so you maximize the insight available across the environment.

I’d be interested in hearing about your own partitioning motivations, challenges and thoughts, either here on Twitter @rmckeown.

If you are at IBM Insight, stop by pedestal 420 to learn more about IBM SmartCloud Analytics – Predictive Insights, or watch this brief overview video:

comments powered by Disqus