Thoughts on Data Science, IT Operations Analytics, programming and other random topics


In BigData-land, algorithms are king

20 Feb 2014

In the realm of Big Data, we have many platforms available now, plenty of functional overlap, and each presumably with their own particular strengths. Garter for example recently released this Top 16 list of Big Data platforms. Another list from ProfitBricks is Top 45 Big Data Tools, Amazon Kinesis. My own weapons of choice these days are a combo of Scala/Akka/Spark and R. We have lots of platforms and tooling available.

We are spoilt for choice with basic capabilities available for routing and orchestrating data, horizontally scaling and other fundamental building blocks necessary to build batch and streaming analytics applications. You make your choice based upon your personal preferences or organizational constraints and off you go. I've spent good bit of my career developing such analytic platforms. This has always been as a means to build actual products to solve real problems. It's great to now see a minimum level of infrastructure and capability finally available as a starting point. We get to start with the problem, and not the problem of building the infrastructure to support the applications needed to solve the problem.

With the same starting capabilities available to us all, how do we differentiate? Given a problem to solve, execution, time-to-market and other business considerations are, of course, all of critical. They are a given. If you can't execute as well as the next guy, then you might as well pack up and move on, it's only a matter of time before you are passed. The real differentiation is to be found in the algorithms themselves. What new insightful, faster, more efficient algorithms can you bring to the table? That's where you get to demonstrate your leadership and win business.

Developing the algorithms requires a blend of skills, from general data science, domain specifics (e.g. in my own case, IT Operations ) and frequently software development. The mix is essential to understand what is needed, what is possible, and then to be able to build it. It is not common to find the necessary set of skills individually - usually a team must be assembled to bring the different perspectives together. There are exceptions though, and if you find people with that ideal mix, hold on to them!

As you look to create your big data solutions, keep a keen focus on the essential algorithms which are key to your solution. Ensure that you are well placed in this area. These are your intellectual property, your secret-sauce. Avoid being distracted building infrastructure - it's important, but rapidly becoming commoditized. Unless of course, your core-business is building such infrastructure. Regularly ask yourself how your algorithms are differentiating you from the competition. You must have advantage here!

comments powered by Disqus