Four data analytics pitfalls

In the video embedded below freelance data scientist Marck Vaisman lists out what he describes as common pitfalls for big data projects that he listed in a chapter he wrote for an O’Reilly book published last year called the Bad Data Handbook.  I think the list is better described as prerequisites for success, but either way, it’s a good list.

  • Know the data you’ve got – the data structure, the format, where it lives, how it was generated etc.
  • Be able to get at the data – if production systems need to be tweaked to generate reports everything will be too slow
  • Have a goal, a question you want to answer (even if you expect the project to wander and be exploratory) – don’t do analysis for analysis sake
  • Share knowledge across the organisation

It is only the first three that apply to most startups, but I see them all the time. Varying combinations of not having a good picture of the data available, access to the data being difficult and not being clear on the goals makes the likely benefits from projects less than the costs. Breaking that down a little, it is hard to have clear goals without a good understanding of the data asset, and if it is hard work getting to the data then iteration is painful and you face the difficult challenge of having to design the project right first time.

These days everybody is aware of the huge benefits that accrue to truly data driven companies, but not that many businesses get there. Nearly everyone tries though, and if you are involved in a situation where the results are frustrating, or the data analytics effort is more a collection of rifle shot projects than a deeply embedded capability then the first three items on the list might be a good place to look for improvements. Note that if there are improvements to be made in these areas then you are in for a period of investment that won’t yield short term benefits to the business and will need prioritisation over other items that will – e..g feature releases.


  • http://twitter.com/fabiodebe Fabio De Bernardi

    Data can be lots of different things but the 4 points hold true for most types of data I guess. Surely the list makes sense for social media listening projects!