Musings on the wisdom of crowds and machine intelligence

This is a very provisional post.  It has been swirling around my head since I was talking to Saul Klein before Xmas.  These thoughts are interesting to me as I think more about what it means to build social networks that do stuff, but they are new thoughts, and hence even more than normal I’d welcome your comments. 

It seems to me that you can build an online web service to derive insights from the wisdom of crowds and/or adopt a more mathematical approach.  Most social networks are about the wisdom of crowds – i.e. socially derived insights.  LastFM’s music recommendation service is a great example – finding music you might like by looking at what other people like.  Similarly Crowdstorm tells you what products are hot at the moment by figuring out what people are talking about most.

The other, more unusual, approach is to use machine based intelligence.  Pandora’s Music Genome Project is a great example of this in practice.  They have mapped out the different elements of music, much as the human genome maps out the different elements of our DNA, and are using that to make music recommendations.  Leiki also take this kind of approach – using taxonomies and categorisation at the heart of their recommendation engine.

People feel good about recommendations that come from people based (i.e. socially intelligent) systems, particularly if they come from friends.  The problems come with systems that are below critical mass where you can either get no meaningful intelligence, or sometimes unhelpful echo effects.  These systems are also typically less responsive to new categories/ideas/things than machine systems.  Competitive advantage is likely to come from scale and being the first in a space to hit scale.

Maths based systems are quick to add value once live, they can provide value for the first user of a system and they can be adapted quickly to changing circumstances, but they are more complicated to build in the first place, and I guess easier to get completely wrong.  You have to code the intelligence at the start and there will be no coming back from a big mistake.   Competitive advantage will come from scale AND the depth/uniqueness of your algorithm.

The other obvious thing is that the efficacy of each approach will vary with the problem you are trying to solve.  Some things lend themselves to mathematical approaches – e.g. the flight price forecasting on Farecast.  That said if Pandora can bring maths to music then I guess the same can be true of pretty much anything else.

I guess the so-what of this is that the more value that is crammed into a service the stronger it will be, so a start-up which adds a machine based component to its wisdom of crowds story might well be a more fundable proposition.  Assuming it works.

  • Google meets Yahoo Answers or as I mentioned today LinkedIn answers with a micro[format]search engine. Alogrithmic search alone is not enough. Riya realised this with their face matching software not being quicker than a human recognising faces.

  • Google meets Yahoo Answers or as I mentioned today LinkedIn answers with a micro[format]search engine. Alogrithmic search alone is not enough. Riya realised this with their face matching software not being quicker than a human recognising faces.

  • I think you’re on to something, and I think we (my company) may be on to something too. Good to hear we could be fundable one day 😉 Cheers from across the pond,

    Paul

  • I think you’re on to something, and I think we (my company) may be on to something too. Good to hear we could be fundable one day 😉 Cheers from across the pond,

    Paul

  • Nic,
    What you talked about is Collaborative Filtering vs Content Based.
    But mainly it’s Human Behaviour (Lastfm) vs Fixed Information (Pandora’s Genome).

    The fight is being alive from a very long time among academics. They both have pros & cons. Wikipedia as a good explaination of it.

    And from my point of view pandora & lastfm are a perfect couple !

    That’s why we’ve been mixing the two technologies together … when we have data we use Collaborative when not we use Content Base.

    But many challenges are facing both solutions. What if pandora was wikipedianlike, what if Lastfm known that two songs with different title ar in fact the same one ….

    You need Human computation to face the Riya problem (cf. Sam’s comment), you need Silicium Computation to do what can be automatic.

    I agree a good answer will be to embrace both of them so you can use the fastest.

    Think about illustration picture on lastfm. And try changing one on U.[lik]. Who is making the category on leiki ? Will it be right for me, or do I use a different tagging method ?

    It’s definitely not a black & white problem. Finding the perfect grey it’s what really matter. And I haven’t seen one yet (even @ home) but I’m looking for a Rothko’s painting. Black on one side white on the other … with a blur transition that creates fascination.
    http://tn1-1.deviantart.com/fs4/100/i/2004/238/0/b/Dark_Rothko_Drawing.png

  • Nic,
    What you talked about is Collaborative Filtering vs Content Based.
    But mainly it’s Human Behaviour (Lastfm) vs Fixed Information (Pandora’s Genome).

    The fight is being alive from a very long time among academics. They both have pros & cons. Wikipedia as a good explaination of it.

    And from my point of view pandora & lastfm are a perfect couple !

    That’s why we’ve been mixing the two technologies together … when we have data we use Collaborative when not we use Content Base.

    But many challenges are facing both solutions. What if pandora was wikipedianlike, what if Lastfm known that two songs with different title ar in fact the same one ….

    You need Human computation to face the Riya problem (cf. Sam’s comment), you need Silicium Computation to do what can be automatic.

    I agree a good answer will be to embrace both of them so you can use the fastest.

    Think about illustration picture on lastfm. And try changing one on U.[lik]. Who is making the category on leiki ? Will it be right for me, or do I use a different tagging method ?

    It’s definitely not a black & white problem. Finding the perfect grey it’s what really matter. And I haven’t seen one yet (even @ home) but I’m looking for a Rothko’s painting. Black on one side white on the other … with a blur transition that creates fascination.
    http://tn1-1.deviantart.com/fs4/100/i/2004/238/0/b/Dark_Rothko_Drawing.png

  • Andy Weissman

    Also see Carmun (www.carmun.com) for another approach — using generally available taxonomies (the US library of congress, for example) then adding a layer of user generated data or knowledge to make it even more useful, and of course with that user, or meta, data comes the ability to add community aspects to what is traditionally an individual task — studying

  • Andy Weissman

    Also see Carmun (www.carmun.com) for another approach — using generally available taxonomies (the US library of congress, for example) then adding a layer of user generated data or knowledge to make it even more useful, and of course with that user, or meta, data comes the ability to add community aspects to what is traditionally an individual task — studying

  • Nic

    I think there is an important sub-division in your “wisdom of crowds” category, between trust networks like LinkedIn and crowd sites which give you data on which you may or may not rely like Toptable reviews. Clearly the former is much stronger since you can rely on the information based on your assessment of the source. In the latter, popularity is your main guide, which doesn’t necessarily equate to wisdom. Sam Sethi and I recently chatted about this and its impact on the long-term success of “social networks”.

    As regards machine recommendations, the ability of computers to spot correlations in behaviour to help make “insightful” suggestions I believe is a very powerful force but only to the extent that the dataset and population on which it draws is sufficiently sizeable to allow meaningful conclusions to be drawn. Too often social networks have/use insufficient data points in making comparisons to allow them to identify genuine correlations e.g. male & works in capital markets as our shared points isn’t sufficient to guess our interest/tastes will overlap. Add on music, hobbies, education….. At which point we start to use something akin to “dating service” algorythms to identify people we “correspond” with and hence have increased likelihood that the machine recommendations will reasonate.

  • Nic

    I think there is an important sub-division in your “wisdom of crowds” category, between trust networks like LinkedIn and crowd sites which give you data on which you may or may not rely like Toptable reviews. Clearly the former is much stronger since you can rely on the information based on your assessment of the source. In the latter, popularity is your main guide, which doesn’t necessarily equate to wisdom. Sam Sethi and I recently chatted about this and its impact on the long-term success of “social networks”.

    As regards machine recommendations, the ability of computers to spot correlations in behaviour to help make “insightful” suggestions I believe is a very powerful force but only to the extent that the dataset and population on which it draws is sufficiently sizeable to allow meaningful conclusions to be drawn. Too often social networks have/use insufficient data points in making comparisons to allow them to identify genuine correlations e.g. male & works in capital markets as our shared points isn’t sufficient to guess our interest/tastes will overlap. Add on music, hobbies, education….. At which point we start to use something akin to “dating service” algorythms to identify people we “correspond” with and hence have increased likelihood that the machine recommendations will reasonate.

  • You can always do both right – you take some basic explicit data if provided by a user, combine it with some clever algorithms for measuring implicit behaviour, draw some conclusions, enhance it again and again as the user does more explicit things.

  • You can always do both right – you take some basic explicit data if provided by a user, combine it with some clever algorithms for measuring implicit behaviour, draw some conclusions, enhance it again and again as the user does more explicit things.