Here is the sample code to analyze data with the Aggregation Framework within MongoDB.
Yet Another Coder's Blog
This blog is made to post some interesting things on Software Development and Quantitative Analysis. (T.Liu)
Saturday, April 11, 2020
Sunday, April 14, 2019
Connect to Oracle Database from Python or R
1. To connect to Oracle in Python
1) conda install -c anaconda cx_oracle
2.1) Method 1:
import pandas as pd
import cx_Oracle
2.2) Method 2:
import cx_Oracle
1) conda install -c anaconda cx_oracle
2.1) Method 1:
import pandas as pd
import cx_Oracle
con = cx_Oracle.connect('pythonhol/welcome@127.0.0.1/orcl')
query = 'select * from departments order by department_id'
df_ora = pd.read_sql(query, con=con)
df_ora.head()
con.close()
2.2) Method 2:
import cx_Oracle
con = cx_Oracle.connect('pythonhol/welcome@127.0.0.1/orcl')
cur = con.cursor()
query = 'select * from departments order by department_id'
cur.execute(query)
for result in cur:
print(result)
cur.close()
con.close()
2. To connect to Oracle in R
1) install.packages("ROracle")
2) Method 1:
library("ROracle")
drv = dbDriver("Oracle")
host = "hostname"
port = "1521"
sid = "SID_NAME"
connect.string = paste( "(DESCRIPTION=",
"(ADDRESS=(PROTOCOL=tcp)(HOST=", host, ")(PORT=",port, "))",
"(CONNECT_DATA=(SID=", sid, ")))", sep="")
con = dbConnect(drv, username="uid", password="pwd",
dbname=connect.string, prefetch=FALSE,
bulk_read=1000L, stmt_cache=0L,
external_credentials=FALSE, sysdba=FALSE)
res = dbGetQuery(con, "select * from dual")
print(res)
dbDisconnect(con)
Wednesday, December 21, 2016
Quantopian(Python) vs Quantmod(R)
1, Sid-by-side comparison
2, Code snippet:
from rpy2.robjects import r
import numpy
import matplotlib.pyplot as plt
1, Sid-by-side comparison
2, Code snippet:
from rpy2.robjects import r
from pandas_datareader import data, wb
import talibimport numpy
import matplotlib.pyplot as plt
IBM = r("getSymbols('IBM', src='google', from='2013-01-01')")
f = data.DataReader(ticker,'google')
f['SMA_50'] = talib.SMA(numpy.asarray(f['Close']), 50)
f.plot(y= ['Close','SMA_20','SMA_50'], title='AAPL Close & Moving Averages')
plt.show()
3. Module Installation
conda install rpy2
conda install -c quantopian ta-lib
# more inteteresting modules
conda install -c quantopian zipline
conda install -c quantopian pyfolio
conda install seaborn
conda install quandl
Sunday, August 21, 2016
Covariance formula with CDF (Hoeffding's Covariance Identity)
A complete proof of above lemma can be found on page 241 (Lemma 7.27) of Quantitative Risk Management: Concepts, Techniques and Tools.
Hint: 2\(cov(X_1, X_2) = E[(X_1-\tilde{X_1})(X_2-\tilde{X_2})]\),
where \( (\tilde{X_1}, \tilde{X_2}) \) is an independent copy with the same joint distribution function as \( (X_1, X_2) \).
Link to MathJax
Monday, August 15, 2016
Mathematical Programming and its modeling langues
Python is widely used in Mathematical Programming as a modeling language.
In commercial products, Gurobi has built its interactive shell in Python.
In open source world, Pyomo from Sandia National Lab use Python to offer an AMPL-like modeling language. Pyomo uses GLPK solver by default, but other solvers, such as GLPK, Gurobi, COIN CBC, can also be selected.
GLPK (GNU Linear Programming Toolkit) supports MathProg, which is also referred as GMPL (GNU Mathematical Programming Language). GLPK provides a command line tool glpsol, which is convenient for users to solve various optimization problems with well designed reports.
PuLP is an LP modeler written in Python. PuLP can generate MPS or LP files, and can call GLPK, COIN CLP.CBC, CPLEX and Gurobi to solve linear problems.
SolverStudio makes it easy to develop models inside Excel using Python. Data entered into the spreadsheet is automatically available to the model. SolverStudio supports PuLP, COOPR/Pyomo, AMPL, GMPL, GAMS, Gurobi, CMPL, SimPy.
In commercial products, Gurobi has built its interactive shell in Python.
In open source world, Pyomo from Sandia National Lab use Python to offer an AMPL-like modeling language. Pyomo uses GLPK solver by default, but other solvers, such as GLPK, Gurobi, COIN CBC, can also be selected.
GLPK (GNU Linear Programming Toolkit) supports MathProg, which is also referred as GMPL (GNU Mathematical Programming Language). GLPK provides a command line tool glpsol, which is convenient for users to solve various optimization problems with well designed reports.
PuLP is an LP modeler written in Python. PuLP can generate MPS or LP files, and can call GLPK, COIN CLP.CBC, CPLEX and Gurobi to solve linear problems.
SolverStudio makes it easy to develop models inside Excel using Python. Data entered into the spreadsheet is automatically available to the model. SolverStudio supports PuLP, COOPR/Pyomo, AMPL, GMPL, GAMS, Gurobi, CMPL, SimPy.
Wednesday, August 10, 2016
OLS in Python
There are a few ways to perform Linear Regression(OLS) in Python.
Here is a short list of them:
1: > pandas.ols(y, x)
2: > pandas.stats.api.ols(y ,x)
3: > scipy.stats.linregress(x, y)
4: > import statsmodels.formula.api as smf
> results = smf.ols('Close ~ Open + Volume', data = df) # df is a DataFrame
Here is a short list of them:
1: > pandas.ols(y, x)
2: > pandas.stats.api.ols(y ,x)
3: > scipy.stats.linregress(x, y)
4: > import statsmodels.formula.api as smf
> results = smf.ols('Close ~ Open + Volume', data = df) # df is a DataFrame
Friday, July 29, 2016
Interior Point Methods
Interior Point Methods are a class of optimization algorithms for solving linear or nonlinear programming problems.
It finds the optimum solution by moving inside the polygon rather than moving around its surface.
History:
In 1984, Karmarkar "invented" interior-point method.
In 1985, Affine-scalling method was "invented" as an intuitive version of Karmarkar's algorithm.
In 1989, it was realized that Dikin(USSR) invented Affine-scaling(Barrier method) in 1967.
Interior point method was the first practical polynomial time algorithm for solving linear programing problems. Ellipsoid method's run time is polynomial, but in practice, the Interior Point Method and variants of Simplex Methods are much faster.
Here goes technical in summary:
1. Primal objective with log barrier function: G(μ) = cx - μ Σ ln(xj)
2. Central path algorithm: μ from infinity to 0.
3. Min with constraint Ax=b?
∇G(μ) perpendicular to Ax=b;
<cj-μ/xj> is linear combination of A's rows;
<cj-μ/xj> = yA for some y;
Let sj = μ/xj, then
yA + s = c ==> yA ≤ c, this the dual constraints.
4, Duality gap: cx - yb = (yA+s)x - y (Ax)
= sx
= nμ
5. Conversely, if all sj xj = μ, then on central path.
To follow Central Path, use "predictor-corrector".
6. Improvement direction? "Affine-scaling"
From current x, s, μ ==> x+dx, s+ds, μ+dμ
==> sj dxj + xj dsj = dμ (1)
Also A(x+dx) = b ==> Adx = 0 (2)
yA + s = c ==> (dy)A + ds = 0 (3)
To solve (1)-(3), rescale "affine scaling", all xj = 1 ==> sj = μ
The equations say
μdx + ds = 1dμ
Adx = 0 ==> dx A
(dy)A + ds = 0 ==> ds A
==> project 1dμ into A and A
Note: some of the information comes from course "Advanced Algorithms" as follows:
MIT 6.854/18.415J: Advanced Algorithms (Fall 2014, David Karger)
MIT 6.854/18.415 Advanced Algorithms (Spring 2016, Ankur Moitra)
It finds the optimum solution by moving inside the polygon rather than moving around its surface.
History:
In 1984, Karmarkar "invented" interior-point method.
In 1985, Affine-scalling method was "invented" as an intuitive version of Karmarkar's algorithm.
In 1989, it was realized that Dikin(USSR) invented Affine-scaling(Barrier method) in 1967.
Interior point method was the first practical polynomial time algorithm for solving linear programing problems. Ellipsoid method's run time is polynomial, but in practice, the Interior Point Method and variants of Simplex Methods are much faster.
Here goes technical in summary:
1. Primal objective with log barrier function: G(μ) = cx - μ Σ ln(xj)
2. Central path algorithm: μ from infinity to 0.
3. Min with constraint Ax=b?
∇G(μ) perpendicular to Ax=b;
<cj-μ/xj> is linear combination of A's rows;
<cj-μ/xj> = yA for some y;
Let sj = μ/xj, then
yA + s = c ==> yA ≤ c, this the dual constraints.
4, Duality gap: cx - yb = (yA+s)x - y (Ax)
= sx
= nμ
5. Conversely, if all sj xj = μ, then on central path.
To follow Central Path, use "predictor-corrector".
6. Improvement direction? "Affine-scaling"
From current x, s, μ ==> x+dx, s+ds, μ+dμ
==> sj dxj + xj dsj = dμ (1)
Also A(x+dx) = b ==> Adx = 0 (2)
yA + s = c ==> (dy)A + ds = 0 (3)
To solve (1)-(3), rescale "affine scaling", all xj = 1 ==> sj = μ
The equations say
μdx + ds = 1dμ
Adx = 0 ==> dx A
(dy)A + ds = 0 ==> ds A
==> project 1dμ into A and A
Note: some of the information comes from course "Advanced Algorithms" as follows:
MIT 6.854/18.415J: Advanced Algorithms (Fall 2014, David Karger)
MIT 6.854/18.415 Advanced Algorithms (Spring 2016, Ankur Moitra)
Subscribe to:
Posts (Atom)