Saturday, November 29, 2014

Machine Learning vs Statistics (Statistical Learning)

In a sense, the difference between Machine Learning and Statistics (Statistical Learning) is what is the difference between a computer scientist and a statistician. In my humble opinion, they are becoming more similar as time goes by. On the other hand, Machine Learning is probably more empirical and works on problems by utilizing various optimization algorithms while Statistics (Statistical Learning) is more emphasizing on the assumptions and model validation -- more rigorous in mathematics and more often to talk about VC dimensions, KL divergence, conjugate prior, and etc. In practice, ML uses Matlab or Python while SL favors R. 

Rob Tibshirani, the author of "The Elements of Statistical Learning" gave a 'comparison of machine learning and statistics, here is a screenshot for your convenience:
In useR! 2004, Brian D. Ripley, another statistician, said: 'machine learning is statistics minus any checking of models and assumptions'.
A lecturer at Udacity gives a good comparison in one sentence, "Statistics focuses on analyzing the existing data and drawing valid conclusions, while Machine Learning focuses on making prediction and less worrying about the assumption as long as it makes good predictions".
Since 2001, the comparison become less  member the things have changed when William Cleveland introduced "data Science" as an independent discipline. As  said, "Machine learning and statistics may be the stars, but data science orchestrates the whole show."

In David Smith's blog, you can see CMU machine learning students "protest" at the G20 summit in Pittsburg, September 25 2009.

No comments:

Post a Comment