Saturday, November 29, 2014

Machine Learning vs Statistics (Statistical Learning)

In a sense, the difference between Machine Learning and Statistics (Statistical Learning) is what is the difference between a computer scientist and a statistician. In my humble opinion, they are becoming more similar as time goes by. On the other hand, Machine Learning is probably more empirical and works on problems by utilizing various optimization algorithms while Statistics (Statistical Learning) is more emphasizing on the assumptions and model validation -- more rigorous in mathematics and more often to talk about VC dimensions, KL divergence, conjugate prior, and etc. In practice, ML uses Matlab or Python while SL favors R. 

Rob Tibshirani, the author of "The Elements of Statistical Learning" gave a 'comparison of machine learning and statistics, here is a screenshot for your convenience:
In useR! 2004, Brian D. Ripley, another statistician, said: 'machine learning is statistics minus any checking of models and assumptions'.
A lecturer at Udacity gives a good comparison in one sentence, "Statistics focuses on analyzing the existing data and drawing valid conclusions, while Machine Learning focuses on making prediction and less worrying about the assumption as long as it makes good predictions".
Since 2001, the comparison become less  member the things have changed when William Cleveland introduced "data Science" as an independent discipline. As  said, "Machine learning and statistics may be the stars, but data science orchestrates the whole show."

In David Smith's blog, you can see CMU machine learning students "protest" at the G20 summit in Pittsburg, September 25 2009.

Monday, November 17, 2014

Network Analysis with NetworkX

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

NetworkX has been integrated with PyGraphviz, which is a Python interface to Graphviz -- the excellent open-source drawing tools initiated by AT&T Labs Research).

Here is the notes to build and integrate the Graphviz with NetworkX.

1, Install NetworkX

    I have WinPython 2.7, so installing NetworkX with WinPython's control panel is simple.

2, Install Graphviz

    Install Graphviz in "C:\graphviz"
    Add "C:\graphviz\bin" in PATH environment variable.
    Copy "C:\graphviz\include\graphviz" to "\python-2.7.x\include\"
    Copy cdt.lib and cgraph.lib in "C:\graphviz\lib\release\lib" to "\python-2.7.x\libs\"

3. Configure Visual C++ Compiler

    set VS90COMNTOOLS=C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\Tools\

4. Build and Install PyGraphviz

    Open command prompt in WinPython and run
    pip install

Now import pygraphviz to test and start to use it with NetworkX.