Pages

Thursday, September 25, 2014

Which sorting algorithm is used when the DMBS executes an 'ORDER BY" SQL statement?

The answer to my question may depend on which database we are using. My bet is that we can not find the answer from the official documentation of RDBMS, such as Oracle and SQL Server. This is somewhat consistent with SQL as a 4th generation language.

Another similar question is "how can I sort a 10G of data while I have only 4G of memory"?

The answer should be "correlated" to "External Sorting" and "Internal Sorting", and usually beyond the in-memory sorting taught in general algorithm textbook.

Here are the links to two articles:

How things works : SQL Order By Clause

"Implementing Sorting in Database Systems" by GOETZ GRAEFE from Microsoft provides more detailed discussion on the sorting algorithm in DBMS.


Sunday, September 21, 2014

Fun with LeetCode?

I got the link to Leetcode from my colleagues this Thursday and planned to give it a try this weekend. The listed problems are pretty fun to solve. Here are two examples(with the highest AC rate and the lowest AC rate) I tried today.

Max Points on a Line:

package my_java;

public class MaxNumPoint {
    public int maxPoints(Point[] points) {
    int maxnum = 0;
        int n = points.length;
        int n2 = n*(n-1)/2;
        double[] theta = new double[n2];
        double[] alpha = new double[n2];
        int k = 0;

        for (int i=0; i<n; i++) {
            for(int j=i+1; j<n; j++) {
                Point pt1 = points[i];
                Point pt2 = points[j];
                if (pt1.x != pt2.x) {
                    theta[k] = 1.0*(pt2.y - pt1.y)/(pt2.x - pt1.x);
                    alpha[k] = pt1.y - pt1.x*theta[k];
                } else {
                    theta[k] = Double.MAX_VALUE;
                    alpha[k] = pt1.x;
                }
                k++;
            }
        }

        for (int i = 0; i < n2; i++) {
        k = 0;
    if (theta[i] == Double.MAX_VALUE) {
            for (int j = 0 ; j < n; j++) {
            if (Math.abs(points[j].x - alpha[i]) < 0.01) {
                k++;
                if (k > maxnum) {
                maxnum = k;
                }            
            }
            }
    } else {
            for (int j = 0 ; j < n; j++) {
            double y = alpha[i] + theta[i]*points[j].x;
                if (Math.abs(y - points[j].y) < 0.01) {
                k++;
                if (k > maxnum) {
                maxnum = k;
                }
                }
            }    
    } // end if
        }
        if (maxnum == 0 && points != null && points.length == 1) {
        maxnum = 1;
        }
        return maxnum;
    }

    public static void main(String[] args) {
    Point[] points = new Point[4];
    points[0] = new Point(0, 2);
    points[1] = new Point(0, 2);
    points[2] = new Point(3, 10);
    points[3] = new Point(3, 10);
    int k = new MaxNumPoint().maxPoints(points);
    System.out.println(k);
    }
}

/**
 Definition for a point.
 */
class Point {
    int x;
    int y;
    Point() { x = 0; y = 0; }
    Point(int a, int b) { x = a; y = b; }
}

The following is what I submitted for Single Number (This is my first submission @LeetCode. It had the highest AC rate, then I thought it's somewhat harder at least, but it's just opposite!
So I find a lowest AC rate, which is Max Points on a Line:).

public class Solution {
    public int singleNumber(int[] A) {
        int r = 0;
        for (int i = 0; i < A.length; i++) {
            r ^= A[i];
        }
        return r;
    }
}


Wednesday, September 10, 2014

A bug in Java's Binary Search method

Lucky #6:

Binary Search may look simple, but a bug on it could live for a decade.


In 2006, Joshua Bloch posted "Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken".

The following is the code he posted:

1:     public static int binarySearch(int[] a, int key) {
2:         int low = 0;
3:         int high = a.length - 1;
4:
5:         while (low <= high) {
6:             int mid = (low + high) / 2;
7:             int midVal = a[mid];
8:
9:             if (midVal < key)
10:                 low = mid + 1
11:             else if (midVal > key)
12:                 high = mid - 1;
13:             else
14:                 return mid; // key found
15:         }
16:         return -(low + 1);  // key not found.
17:     }


The bug is in line 6 and it's fixed in Java 6.

Joshua Bloch proposed two approaches,

               int mid = low + ((high - low) / 2);
or
               int mid = (low + high) >>> 1;

In Java 6's src.zip, I can see the second approach is chosen.

Thursday, September 4, 2014

Top 5 Theorems/Inequalities in Functional Analysis

Top 5 Theorems in Functional Analysis

1. Bounded inverse theorem (a bijective bounded linear operator T from one Banach space to another has bounded inverse )

2. Hahn-Banach theorem (to extend bounded linear functions on a subspace/v.s. to the whole space)

3. Hellinger-Toeplitz theorem (...symmetric operator on a Hilbert space is bounded)

4. Riesz representation theorem (to map Hilbert space H towards its dual space H*)
        Fréchet-Riesz theorem: Φ: HH* defined by Φ(x) = φx is an isometric (anti-) isomorphism.        Quantum Mechanics: every ket |\psi\rangle has a corresponding bra \langle\psi|,

5. Banach-Steinhaus theorem (Uniform boundedness principle)
         For continuous linear operators whose domain is a Banach space, pointwise boundedness is
         equivalent to uniform boundedness in operator norm.
=================================================================

Top 5 Theorems in Real Analysis and Measure Theory

1. Bolzano–Weierstrass theorem (each bounded sequence in R^n has a convergent subsequence)

2. Fatou–Lebesgue theorem (a chain of inequalities relating  Lebesgue integrals)
         Dominated convergence theorem (a special case of Fatou–Lebesgue theorem.)

3. Fundamental theorem of calculus, Mean value theorem, Taylor's theorem (for Engineers like me):

           \int_a^b f(x)\,dx = F(b) - F(a).   
                  
             f ' (c) = \frac{f(b) - f(a)}{b - a}.

              f(x) = f(a) + f'(a)(x-a) + h_1(x)(x-a), \qquad \lim_{x\to a}h_1(x)=0.

4. Heine–Borel theorem (A subset of a metric space is compact iff it is complete and totally bounded)

5. Uniform limit theorem (the uniform limit of any sequence of continuous functions is continuous.)
        
==================================================================

Top 5 Inequalities in Real Analysis (with minimum words:)

1. Hölder's inequality: \|fg\|_1 \le \|f\|_p \|g\|_q.

           (A special case) Cauchy–Schwarz inequality:   |\langle x,y\rangle| \leq \|x\| \cdot \|y\|.\,


2. Minkowski inequality: \|f+g\|_p \le \|f\|_p + \|g\|_p
    
           (A special case) Triangle inequality:  \displaystyle \|x + y\| \leq \|x\| + \|y\| \quad \forall \, x, y \in V


3. Chebyshev's inequality: 
    \Pr(|X-\mu|\geq k\sigma) \leq \frac{1}{k^2}.

4. Inequality of arithmetic and geometric means: \frac{x_1+x_2+\cdots+x_n}{n} \ge \sqrt[n]{x_1x_2 \cdots x_n}
           
 5. Jensen's inequality: \varphi\left(\int_\Omega g\, d\mu\right) \le \int_\Omega \varphi \circ g\, d\mu.