Opportunities created by the growth in data

image

Brad Feld wrote a post about the Defrag Conference over the weekend where he notes that the amount of information in the world is exploding and that there will be a wave of software innovation to help us make sense of all the data (a prediction he originally made in 2006).

Taking the underlying trend first – the volume of information growth is staggering.  I can’t find any easily digestible stats on the web right now, but numbers I’ve seen before shows that the amount of data produced in the last few years exceeds that created in the entire rest of history.

For the doubters, Brad offers these explanations:

For the foreseeable future, there will be a continuous and rapid increase of information as more of the world gets digitized, more individuals become content creators, more systems open up and provide access to their data, and more infrastructure for creating, storing, and transmitting information (and data) gets built.

Implicit within Brad’s explanation but worth bringing out are increases in the amount of sensor data, that media consumption is becoming trackable for the first time, and the huge volumes of government data that are starting to be published/exposed (in the UK context I hear that Sir Tim Berners-Lee has secured a commitment from Gordon Brown which should see the UK as a leading nation in this regard).

Some months ago I tweeted something about the problems of information overload and my friend Jof Arnold replied that he didn’t see any of his non-techie buddies thinking or worrying about this problem and questioning if there is much of an opportunity here (excluding web search as ‘already done’).

I have returned to that thought a number of times, driven by the conviction that a trend this big has to be creating some kind of opening for startups, and my current view is that there are two types of opportunity, and that they might form the basis for investment themes.

Firstly there is the obvious tools and filters to help us manage all the data. Now that information is abundant time is the new scarcity and these tools and filters are really productivity aids. There is nothing new here, the spreadsheet is a good example of an innovation in this area and Google is another.  Going forward I think the startups will come in vertical markets, e.g. news, something about which I have written a lot on this blog and where we are seeing a lot of innovation at the moment.  Medical is another area where we will all soon need help interpreting the vast volumes of data that are becoming available (I have recently heard of two businesses that are taking the cost of sequencing individual human genomes down to mass market levels).

To Jof’s point – these tools will need to be *very* user friendly to get mass adoption, a hurdle at which many will fall, but which some will clear.  For example, an application which shows you the news stories most read by your friends could well get traction without the users ever thinking they were solving their information overload problem.

The following quote from the Foundry Group blog in 2008 gives further insight into how these tools and filters will work.  The eighteen months since it was written it have maybe rendered the insights more obvious, but they are no less relevant, and nor has the opportunity passed.

We think of the technologies that fall under the implicit web [which I have called ‘tools and filters] theme as a next-generation set of applications, tools and infrastructure that stitch together a long list of interrelated and overlapping ideas: the academic and theoretical ideas behind the Semantic Web, the utility of social networks and social media, crowd sourcing/wisdom-of-crowds, folksonomy, user attention data, advanced search and content analysis tools, lifestream analysis and numerous others.

When combined, these technologies offer the promise of a more unified computing environment that spans the applications where a user consumes and creates information (email clients, web browsers, RSS readers, etc) and is aware of the user’s preferences, interests and interpersonal relationships without requiring a ton of heavy lifting on the user’s part to get useful work done.

The second area perhaps has more promise, and that is the idea of building on the newly available information to create services to create products and services which simply weren’t possible before.  GPS/SatNav devices are a good example here – building on digital maps and GPS data to offer real time driving instructions.  Similarly LastFM built a music service based on data that only became available after people started using digital music players.  Looking forward the concept of VRM is built on the notion of using data generated by online purchase and surfing behaviour to turn the advertising model on its head.  Businesses like this are hard to describe in the abstract because they are solving problems we didn’t necessarily know we had, and in there early stages can expect to encounter lots of naysayers arguing that it will never take off/people don’t need it/they will never pay etc. etc.

Simply writing this post has clarified my thinking, but ideas like these really come to life when they are discussed, so I look forward to your comments.

Reblog this post [with Zemanta]
  • chrispadfield

    I am hoping someone comes along to solve the consumer data storage problem. I don't see “the cloud” being the solution here, residential upload speed is just too small to backup hundreds of GBs of data – and as far as I can tell, the rate of data creation is growing at a far faster rate than internet connection speeds (particularly upload speed). The only really interesting company I have seen in this space is Drobo – your typical external USB drive but with RAID like redundancy built in that almost anyone can use. But this does not solve the theft/fire/flood problem.

    This issue is only going to get worse. My latest digital camera's photos are 25MB and it creates GBs of video in a few minutes. There is no simple way for consumers to manage this level of data creation.

  • Interesting thought Chris. Tks

  • What about phone data? We are storming along with http://www.VoxAnalytics.com simply because the telephone which everyone still uses has been left behind by all the operators, and the tools at your fingertips are so useless in general that you have no data to make sense of because you can't access it. In a world where the web makes so much sense out of any web related data it is a joke.

    We see the rise of data as a huge business opportunity for off line and are working with some of the UK's largest Real Estates, Directories and Dating sites to bridge the gap in available analytics and functions which have until now been reserved for the web.

    It's strange when you read something like this and not only do the carriers forget that the telephone still exists as a standalone tool but that bloggers do too. Is that just a sign of how far behind traditional telephony has become?

  • Thanks Ryan, voice analytics is another good example.

  • It's also interesting how our brains filtering system must be in overdrive these days. There is such an overabundance of information that you rarely get the time to just sit and thinks, there is a huge amount of stimulus which needs to be filtered out. Manufacturing consent touches on some of this but nowadays there is so much info that it is outdated.

    I see some young people today who have no grasp of history and even recent pop history and world events. Now I'm not particularly old so the fact that it is such a massive difference in such a short time is a huge and largely unexplored phenomena which I think is happening very fast. My take is that people are learning more crap and useless information due to the overabundance of irrelevant info (by this I mean irrelevant to every day life) and have essentially not spent the time learning the stuff which is important.

    Or maybe, the younger people I see and talk to are just abnormal 😀

  • How old is 'not particularly old'? 🙂

    More seriously, the increasing volume of info does seem to be crowding out history and other things we traditionally regarded as important. I don't like the sound of that any more than you.

  • Stephen Upstone

    The growth in data from enterprises looking a few pieces of static data on each customer a year to seeing hundreds daily and the challenge of linking all that knowledge to make successful decisions in real time across all channels is staggering.

    Check out Causata from my old boss, Paul Phillips, former CEO and Founder of Touch Clarity (sold to OMTR). They are backed by Accel and are breaking new ground in real time customer knowledge and decisioning at all enterprise touch points. Great team that have a huge, very commercial vision and of course some cracking technology.

    Funding release – http://www.causata.com/press_release_01

    Good article – http://www.mercurynews.com/scott-harris/ci_1350

  • Thanks Stephen.

  • I'm still not convinced by the “news stories read by your friends thing”. Don't get me wrong – it can work, as we proved with Blog Friends – but I can't imagine it being a big business for the following reasons:
    – It's already a feature of twitter and facebook (and google reader, I suppose). A free one at that.
    – Almost all the early-stage startups I've met (and at least one funded one!) who are attempting social graph-based filtering fail when it comes to scalability. Directed Edge is one that doesn't; their solution is very sound. Broadersheet looks promising here too.
    – People still gravitate to branded sources of information. Despite the millions of blogs in the world, I still get my information from only about 10 sources. E.g. why do people still read Engadget, Techcrunch, BBC etc when there's plenty of other sites out there?

    However, managing information overload in the sense of data – CO2 emissions, finances, bills, fuel… now THAT is interesting. And there lies real opportunity to improve people's lives/finances and thus make money.

  • i think that “news stories read by your friends” is a feature. it shouldn't be the only metric, but its one of maybe, four, that matters. another one for example might be “what are people like *me* reading?” you could base that off your linkedin profile etc. people in the private equity scene read these ten stories this week and you didn't get shown them, etc.
    i'm certainly playing with the latter concept lots.

    i think its hard to put a value on the news b2c stuff. the long tail of content providers and suggesting that content to your readers is probably quite valuable, not an autonomy, but certainly a stumbleupon.

    the b2b stuff is where the big bucks are though.

  • The heart for me is making the approach seamless, so that a user doesn't realise the targeting in an overt way. Last.fm's strength was to take what people are doing anyway (listening to music) and creating connections for other listeners from this new source of data. (It has its failings though – I listened to a lot of Christmas-y music last year, and couldn't get rid of it in January. It's time for me to fire up my “Christmas jukebox”, which is what Last.fm is now for me) GameShadow tried to do the same for games, and with the coming explosion in direct-response marketing budgets for games I think that it could still be huge for someone. (Our problem was more execution than strategic). So for me, the best way of “sorting” this information is to let make it subtle and almost invisible. Mainly because I don't believe that anyone will actually *pay* to avoid information overload. After all, you can just stop listening for free.

  • Nicely put Nicholas. The problem people have (if it is a problem at all) is not information overload – as you say there is a very simple solution to that. The problem is that they are not getting all the value they could out of the information available, and the best way to get people to realise that there is value is to offer something they need and hide the hard work in the background.

  • Mobile news-discovery/recommendation is interesting as it solves an interface issue; i.e. browsing from site-to-site is currently a pain on all mobile devices – much better is just to get fed interesting stuff without needing much user input. This is what Nic alludes to, I think.

  • I think I agree with both of you – what my friends have read is an interesting feature which in addition to the social benefits when I see them will hopefully drop some interesting articles into my inbox. Low effort and easy to use filters will be critical to success.

  • Interesting point, Chris – and one that I was discussing with someone (*possibly* Flip Kromer from InfoChimps?) at Defrag last week.

    We ended up suggesting that there may be mileage in a trusted retail brand (Tesco, say?) entering this space; imagine a Tesco-branded USB drive that you back up to all week, then drop off in-store as you start your weekly shop. All the data is sucked off onto a Tesco server in-store as you shop, you pick the drive up at the checkout, and Tesco trickles the data overnight to some Cloud facility.

    The store only needs enough storage capacity to hold 24-48 hours' worth of customer data.

    The question we were struggling with was perceived value. You'd probably pay for the drive and then under £10 per month for the backup service… but would you pay any more? And what would the startup costs be for Tesco?

    It was a late-night conversation, and not fully worked through… but it might be worth exploring to solve an increasingly serious problem.

  • Enterprises are definitely sitting on mountains of data that drive their decision-making.
    One of the (as yet unrealised) promises of the whole Linked Data activity (and yes, Nic, there's some great stuff coming from the UK Government in that area) *should* be to give the Enterprise access to valuable intelligence from outside the firewall whilst reducing the quantity of 'context' data each company has to store in order to make the 'core' valuable.
    Enterprise A's relationship with Customer B *matters*. Customer B's new postal address, not so much (so long as you have ways to get it when you need it).

  • Interesting point, Chris – and one that I was discussing with someone (*possibly* Flip Kromer from InfoChimps?) at Defrag last week.

    We ended up suggesting that there may be mileage in a trusted retail brand (Tesco, say?) entering this space; imagine a Tesco-branded USB drive that you back up to all week, then drop off in-store as you start your weekly shop. All the data is sucked off onto a Tesco server in-store as you shop, you pick the drive up at the checkout, and Tesco trickles the data overnight to some Cloud facility.

    The store only needs enough storage capacity to hold 24-48 hours’ worth of customer data.

    The question we were struggling with was perceived value. You’d probably pay for the drive and then under £10 per month for the backup service… but would you pay any more? And what would the startup costs be for Tesco?

    It was a late-night conversation, and not fully worked through… but it might be worth exploring to solve an increasingly serious problem.

  • lsywlw00

    hermes and chloe jimmy choo handbag
    http://www.lookhandbag.com

  • Pingback: An Investment Thesis | Taylor Davidson()

  • Hi Paul – this comment only just surfaced for some reason. Apologies for not replying. My two cents would be that for most consumers having secure backup isn't worth the hassle of driving anywhere on a weekly basis. Fully automated solutions like Mozy hit the sweet spot of 'set up and forget'.

  • Nic

    the hassle of driving to the supermarket is certainly a valid point (although we have to go in order to eat for the week ahead, not just to lodge our backup), but the Mozy-style automated solution raises real issues at the outset; that initial upload to the Cloud takes many days to complete…

  • True enough. I nearly gave up on Mozy when I was doing it. Maybe they should send out a supersized USB with a freepost envelope 🙂

  • …or point you to the upload service at your local supermarket… 😉