Quantcast
Channel: Enterprise Software Musings by Holger Mueller
Viewing all articles
Browse latest Browse all 639

What will Michael Wu have for dinner? Or: What's right data - Small vs Big

$
0
0
Following the AnalyticsChat earlier this week Michael Wu (@mich8elwu) kept tweeting on a specific question that Michael brought up - is predicting what he will have for dinner a 'small' data problem (fits on his laptop?) or a BigData problem?

Here is a piece of the Twitter exchange:



So what would we need to predict Michael's dinner in specific - and any person's dinner in general?
  • Location - what food sources are available
  • Previous Meals
    • Grocery receipts (what's in the fridge if he's at home)
    • Restaurant receipts (where does he usually eat, what type of food he eats when eating out)
  • Restaurant data - which restaurants are around, frequency to visit, whats on the menu
  • Calendar data - who is he meeting with
  • Social data - Facebook or Twitter may give away the dinner plan
  • Couponsphere- He may have vouchers from Groupon and likes
  • Credit card balance - may determine how much will be spend for dinner tonight
  • ...
[Needless to say we are not going into the practicability here - would be tough to get all the above data, there are privacy issues etc - but this is a theoretical exercise.]

Of course we would throw this in a BigData cluster and I would just 'start playing' with the data - and see what it can predict. One practical problem is, that we have no positives in the sense, that without Michael's participation we would not know what he really has eaten. All will get easier if we can get his consumed dinner information for as many as possible occasions. We could make it easy for him to expose that information by linking grocery shopping items to recipes (for home cooking) and by linking restaurant credit / debit card payments with menu information. The beauty of this approach is that we could coax Michael into training and selecting its own model.

In general I would not settle on specific analytical models at this point - given that data sources will remain very dynamic and the questionable amount of given dinner information. A bootstrap approach using different analytical models and using cloud storage (AWS Redshift anyone) is my gut feel approach here.

Michael makes a good point that the 'right' data is more important than to have lots of data. Fair enough, but that thinking may restrict data from being stored to be available - and that could restrict insight on the dinner preferences. And while maybe (with a big HDD) all the data to predict Michael's dinner tonight may fit on a laptop - it certainly does not to predict for any generic person. Even in Michael's specific case - assuming he meets with friends for dinner - we should predict their preferences and then try to find out where the group will go for dinner. At this point we are beyond typical laptop storage capacity - by a lot. So beware of some of the fallacies and misunderstandings of analytics - read more here.

With that said - I am predicting Michael will eat out tonight (Friday, with friends), in Berkeley (where I guess he live), at Shen Hua on College Avenue. Pretty sure on Shen Hua. Why? I am in the process of sending him a 10$ voucher... valid only tonight! ;-)

But keep in mind - analytics is hard for everyone.

Viewing all articles
Browse latest Browse all 639

Trending Articles