Friday, January 30, 2015

Financial Data for Statistical Learning

There are plenty of financial and economic data which can be used in Machine Learning and Data Mining exercises. Here is list of mine:

FRED - Federal Research Economic Data.
CRSP - Center for Research in Security Prices
Ken-French - Fama-French 3-factor model data
John Cochrane - Data and Programs, Liquidity factor, Grumpy Economist.
Robet Shiller - Online data, and other research data.

Web Scraping is the new way to get the free "real-time" data.:)

Here is introduction given by Christopher Reeves.

Python (urllib, re, scrapy) and R(quantmod) are my favorite languages for FIN/ECON data scraping.

Saturday, January 24, 2015


Intruction to ffn - Financial Functions for Python

ffn is a Python library for quantitative finance. It stands on the shoulders of giants (Pandas, Numpy, Scipy, etc.) and provides a vast array of utilities, from performance measurement and evaluation to graphing and common data transformations.

Its APIs support data retrieval, data manipulation, performance measurement, numerical routines and financial functions.

Statistical Significance vs. Economic Significance

Statistical Significance
Economic significance
Is it fitted well?
Is it an important factor?
[large] t-stat or [small] p-value
[large] bj values
Low t-stat => need more sample data?
Small bj values => Multicollinearity?

This is a sample regression output from fitlm() in Matlab:

Linear regression model:
    y ~ 1 + x1

Estimated Coefficients:
                   Estimate     SE           tStat     pValue    
    (Intercept)    0.0028283    0.0023652    1.1958    0.23514
    x1             0.91903      0.045009     20.419    2.5487e-34

Number of observations: 86, Error degrees of freedom: 84
Root Mean Squared Error: 0.0143
R-squared: 0.832,  Adjusted R-Squared 0.83
F-statistic vs. constant model: 417, p-value = 2.55e-34