Pages

Saturday, April 11, 2020

MongoDB: Analyzing Data with Aggregation

Here is the sample code to analyze data with the Aggregation Framework within MongoDB.



Sunday, April 14, 2019

Connect to Oracle Database from Python or R

1. To connect to Oracle in Python

     1)  conda install -c anaconda cx_oracle 

2.1) Method 1:


    import pandas as pd

    import cx_Oracle


    con = cx_Oracle.connect('pythonhol/welcome@127.0.0.1/orcl')

    query = 'select * from departments order by department_id'     
    df_ora = pd.read_sql(query, con=con)
    df_ora.head()

    con.close()

2.2) Method 2:
    import cx_Oracle
    con = cx_Oracle.connect('pythonhol/welcome@127.0.0.1/orcl')
    cur = con.cursor()

    query = 'select * from departments order by department_id'
    cur.execute(query)
    for result in cur:
        print(result)
    cur.close()
    con.close()



2. To connect to Oracle in R
1)  install.packages("ROracle")

2) Method 1:
   library("ROracle")
   drv = dbDriver("Oracle")
   host = "hostname"
   port = "1521"
   sid = "SID_NAME"
   connect.string = paste(          "(DESCRIPTION=",
         "(ADDRESS=(PROTOCOL=tcp)(HOST=", host, ")(PORT=",port, "))",
         "(CONNECT_DATA=(SID=", sid, ")))", sep="")
   con = dbConnect(drv, username="uid", password="pwd", 
                   dbname=connect.string, prefetch=FALSE,
                   bulk_read=1000L, stmt_cache=0L,
                   external_credentials=FALSE, sysdba=FALSE)
   res = dbGetQuery(con, "select * from dual")
   print(res)
   dbDisconnect(con)

Wednesday, December 21, 2016

Quantopian(Python) vs Quantmod(R)

1, Sid-by-side comparison

2, Code snippet:

    from rpy2.robjects import r
    from pandas_datareader import data, wb
    import talib
    import numpy
    import matplotlib.pyplot as plt

    IBM = r("getSymbols('IBM', src='google', from='2013-01-01')")
    f = data.DataReader(ticker,'google')
    f['SMA_50'] = talib.SMA(numpy.asarray(f['Close']), 50)
    f.plot(y= ['Close','SMA_20','SMA_50'], title='AAPL Close & Moving Averages')
    plt.show()

3. Module Installation

    conda install rpy2
    conda install -c quantopian ta-lib

    # more inteteresting modules
    conda install -c quantopian zipline
    conda install -c quantopian pyfolio
    conda install seaborn
    conda install quandl

Sunday, August 21, 2016

Covariance formula with CDF (Hoeffding's Covariance Identity)

{\displaystyle \operatorname {cov} (X,Y)=\int _{\mathbb {R} }\int _{\mathbb {R} }F_{XY}(x,y)-F_{X}(x)F_{Y}(y)dxdy}


A complete proof of above lemma can be found on page 241 (Lemma 7.27) of Quantitative Risk Management: Concepts, Techniques and Tools.

Hint: 2\(cov(X_1, X_2) = E[(X_1-\tilde{X_1})(X_2-\tilde{X_2})]\),
where \( (\tilde{X_1}, \tilde{X_2}) \) is an independent copy with the same joint distribution function as  \( (X_1, X_2) \).

Link to MathJax

Monday, August 15, 2016

Mathematical Programming and its modeling langues

Python is widely used in Mathematical Programming as a modeling language.




In commercial products, Gurobi has built its interactive shell in Python.




In open source world, Pyomo from Sandia National Lab use Python to offer an AMPL-like modeling language. Pyomo uses GLPK solver by default, but other solvers, such as GLPK, Gurobi, COIN CBC, can also be selected.




GLPK (GNU Linear Programming Toolkit) supports MathProg, which is also referred as GMPL (GNU Mathematical Programming Language). GLPK provides a command line tool glpsol, which is convenient for users to solve various optimization problems with well designed reports.




PuLP is an LP modeler written in Python. PuLP can generate MPS or LP files, and can call GLPK, COIN CLP.CBC, CPLEX and Gurobi to solve linear problems.


SolverStudio makes it easy to develop models inside Excel using Python. Data entered into the spreadsheet is automatically available to the model. SolverStudio supports PuLP, COOPR/Pyomo, AMPL, GMPL, GAMS, Gurobi, CMPL, SimPy.



Wednesday, August 10, 2016

OLS in Python

There are a few ways to perform Linear Regression(OLS) in Python.


Here is a short list of them:


1: > pandas.ols(y, x)
2: > pandas.stats.api.ols(y ,x)
3: > scipy.stats.linregress(x, y)
4: > import statsmodels.formula.api as smf
    > results = smf.ols('Close ~ Open + Volume', data = df) # df is a DataFrame



Friday, July 29, 2016

Interior Point Methods

Interior Point Methods are a class of optimization algorithms for solving linear or nonlinear programming problems.


It finds the optimum solution by moving inside the polygon rather than moving around its surface.


History:


In 1984, Karmarkar "invented" interior-point method.
In 1985, Affine-scalling method was "invented" as an intuitive version of Karmarkar's algorithm.
In 1989, it was realized that Dikin(USSR) invented Affine-scaling(Barrier method) in 1967.


Interior point method was the first practical polynomial time algorithm for solving linear programing problems. Ellipsoid method's run time is polynomial, but in practice, the Interior Point Method and variants of Simplex Methods are much faster.


Here goes technical in summary:


1. Primal objective with log barrier function: G(μ) = cx - μ Σ ln(xj)


2. Central path algorithm: μ from infinity to 0.


3. Min with constraint Ax=b?

    ∇G(μ) perpendicular to Ax=b;
    <cj-μ/xj> is linear combination of A's rows;
    <cj-μ/xj> = yA for some y;


    Let sj = μ/xj, then
          yA + s = c  ==> yA ≤  c, this the dual constraints.


4, Duality gap: cx - yb = (yA+s)x - y (Ax)
                                     = sx
                                     = nμ

5. Conversely, if all sj xj = μ, then on central path.
    To follow Central Path, use "predictor-corrector".


6. Improvement direction? "Affine-scaling"
    From current x, s, μ ==>  x+dx, s+ds, μ+dμ
                                    ==> sj dxj + xj dsj = dμ     (1)
    Also A(x+dx) = b    ==> Adx = 0                      (2)
            yA + s = c        ==> (dy)A + ds = 0           (3)


    To solve (1)-(3), rescale "affine scaling", all xj = 1 ==> sj = μ
    The equations say
           μdx + ds = 1
           Adx = 0                ==> dx  A
           (dy)A + ds = 0      ==> ds  A


    ==> project 1dμ into A and A




Note: some of the information comes from course "Advanced Algorithms" as follows:


MIT 6.854/18.415J: Advanced Algorithms (Fall 2014, David Karger)
MIT 6.854/18.415 Advanced Algorithms (Spring 2016, Ankur Moitra)