Omgo's Blog

October 24, 2010

parallel quicksort with java fork join framework JSR 166y

Filed under: Core Java — aswin @ 9:39 pm
Fork Join (jsr 166y) framework is slated for Java 7 and is absolutely cool and is no surprise coming from Doug Lea.  I had to try something out with it since the packages compiled on java 6 is also availalbe. 

I tried parallizing the quicksort algorithm using this and looks like it compiles and even produces right results!!!  The important part here is the use of Phaser, a resettable CountdownLatch if you would.  So basically you create a Phaser and pass it along to the root (first) RecursiveAction object.
Each of the RecursiveAction object (in this example an instance of ParallelQuckSort) increments the Phaser counter by using the regiter() method and when they are done their work (sorting or spawning children to do
that) they "arrive" and "unregister" at the Phaser bringing down the phaser counter by one. When all the RecursiveAction (spawned by the recursion) are done, the root one waiting on the Phaser lock would resume and return to the client, at which
point the array would have been sorted. 

Any way here it is the source for what I have done, any comments and suggestions are welcome.
package test.aswin.concurrency.forkjoin;

import jsr166y.*;
import jsr166y.Phaser;

import java.util.Arrays;

/**
 * Example of using JSR 166 Y for a simple parallel quick sort.
 */
public class ParallelQuickSort extends RecursiveAction {
    Phaser phaser;
    int[] arr = null;
    int left;
    int right;

    ParallelQuickSort(Phaser phaser, int[] arr) {
        this(phaser, arr, 0, arr.length - 1);
    }

    ParallelQuickSort(Phaser phaser, int[] arr, int left, int right) {
        this.phaser = phaser;
        this.arr = arr;
        this.left = left;
        this.right = right;
        phaser.register();  //important
    }


    private ParallelQuickSort leftSorter(int pivotI) {
        return new ParallelQuickSort(phaser, arr, left, --pivotI);
    }

    private ParallelQuickSort rightSorter(int pivotI) {
        return new ParallelQuickSort(phaser, arr, pivotI, right);
    }

    private void recurSort(int leftI, int rightI) {
        if (rightI - leftI > 7) {
            int pIdx = partition(leftI, rightI, getPivot(arr, leftI, rightI));
            recurSort(leftI, pIdx - 1);
            recurSort(pIdx, rightI);
        } else if (rightI - leftI > 0) {
            insertionSort(leftI, rightI);
        }
    }


    @Override
    protected void compute() {
        if (right - left > 1000) {   // if more than 1000 (totally arbitrary number i chose) try doing it parallelly
            int pIdx = partition(left, right, getPivot(arr, left, right));
            leftSorter(pIdx).fork();
            rightSorter(pIdx).fork();

        } else if (right - left > 7) {  // less than 1000 sort recursively in this thread
            recurSort(left, right);

        } else if (right - left > 0) {  //if less than 7 try simple insertion sort
            insertionSort(left, right);
        }

        if (isRoot()) { //if this instance is the root one (the one that started the sort process), wait for others
                        // to complete.
            phaser.arriveAndAwaitAdvance();
        } else {  // all not root one just arrive and de register not waiting for others.
            phaser.arriveAndDeregister();
        }
    }

    /** Patition the array segment based on the pivot   **/
    private int partition(int startI, int endI, int pivot) {
        for (int si = startI - 1, ei = endI + 1; ; ) {
            for (; arr[++si] < pivot;) ;
            for (; ei > startI && arr[--ei] > pivot ; ) ;
            if (si >= ei) {
                return si;
            }
            swap(si, ei);
        }
    }

    private void insertionSort(int leftI, int rightI) {
        for (int i = leftI; i < rightI + 1; i++)
            for (int j = i; j > leftI && arr[j - 1] > arr[j]; j--)
                swap(j, j - 1);

    }

    private void swap(int startI, int endI) {
        int temp = arr[startI];
        arr[startI] = arr[endI];
        arr[endI] = temp;
    }

    /**
     * Check to see if this instance is the root, i.e the first one used to sort the array.
     * @return
     */
    private boolean isRoot() {
        return arr.length == (right - left) + 1;
    }

    /**
     * copied from java.util.Arrays
     */
    private int getPivot(int[] arr, int startI, int endI) {
        int len = (endI - startI) + 1;
        // Choose a partition element, v
        int m = startI + (len >> 1);       // Small arrays, middle element
        if (len > 7) {
            int l = startI;
            int n = startI + len - 1;
            if (len > 40) {        // Big arrays, pseudomedian of 9
                int s = len / 8;
                l = med3(arr, l, l + s, l + 2 * s);
                m = med3(arr, m - s, m, m + s);
                n = med3(arr, n - 2 * s, n - s, n);
            }
            m = med3(arr, l, m, n); // Mid-size, med of 3
        }
        int v = arr[m];
        return v;
    }

    /**
     * copied from java.util.Arrays
     */
    private static int med3(int x[], int a, int b, int c) {
        return (x[a] < x[b] ?
                (x[b] < x[c] ? b : x[a] < x[c] ? c : a) :
                (x[b] > x[c] ? b : x[a] > x[c] ? c : a));
    }

    public static void main(String[] args) throws InterruptedException {
        int[] arr = ArrayUtils.generateRandom(1000000);
        ForkJoinPool pool = new ForkJoinPool();


        for (int i = 0; i < 10; i++) {
            System.out.println("Sarting " + i);
            int[] arrclone1 = arr.clone();
            int[] arrclone2 = arr.clone();

            long l = System.nanoTime();
            Arrays.sort(arrclone1);
            System.out.printf("Times for seq %,8d%n", (System.nanoTime() - l));

/*          // sorted (asc and desc) are faster than unsorted one, while trying parallel sort
            for (int left=0,right=arrclone1.length-1; left<right; left++, right--) {
                // exchange the first and last
                int temp = arrclone1[left]; arrclone1[left]  = arrclone1[right]; arrclone1[right] = temp;
            }
*/


            l = System.nanoTime();
            Phaser phaser = new Phaser();
            pool.invoke(new ParallelQuickSort(phaser, arrclone2));
            System.out.printf("Times %,8d%n", (System.nanoTime() - l));
            System.out.println("Is sorted :>" + ArrayUtils.isSorted(arrclone2, true));
        }
    }
}
One thing I tried and failed was to parallelize the partition operation which is single threaded in this example. The thing I wrote for that worked, but had horrible performance so I am not including it here. What I
did basically was to have two threads parallely swapping even and odd indexed items respectively. I think processor locality of the data and synchronizing the elements across cpu cache may be the issue, but I am
investigating it further and with help of the experts out there, may find out why it is perforing bad. If anyone wants to help me out with this, please let me know :)

Advertisements

8 Comments »

  1. The F/J framework is really just an academic exercise. Perhaps this is why you code is so lengthy. Try the sort example included with the open source software described in this article:
    http://www.coopsoft.com/ar/ConquerArticle.html

    Comment by Edward Harned — November 12, 2010 @ 2:18 pm

    • I wouldn’t call it an academic exercise since it’s the framework that will be in Java 7 or is supposed to be.

      Comment by Michael — January 6, 2011 @ 9:08 pm

  2. Hi there,
    i’m investigating exactly that to atm (datalocality in workstealing )…
    Would you mind sharing with me how you (want to) measure the CPU and cache performance? Because I can’t seem to find a decent way of profiling this kind of information for Java-apps…
    Tnx

    Comment by Mattias De Wael — December 10, 2010 @ 10:13 am

    • @Mattias I am at a loss trying to figure out the CPU cache performance. I wrote a parallel partitioning scheme based on this paper (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.2.482&rep=rep1&type=pdf), but still did not receive the performance characters I was looking for. In a dual core CPU my version was much slower and in a quad core I had a 40% improvement. I did not get much time to work on finding the reason behind it, precisely due to the lack of any good tools (asfaik) on this.

      Comment by aswin — December 10, 2010 @ 1:11 pm

  3. How does this compair to a single threaded version of the sort?

    Comment by Michael — January 6, 2011 @ 9:10 pm

  4. Perhaps it will be included in Java7, perhaps not. See here:
    A Java Fork-Join
    Calamity
          

    Comment by Edward Harned — January 7, 2011 @ 2:19 pm

  5. Hello,

    I am also trying to use the jsr166y package. I have java version 6 installed but it keeps giving the package not found error. I tried to download the jar file n use it but in vain. Please help me out.

    PS: I am using ubuntu 10.04 as the OS.

    Comment by Sam — February 7, 2011 @ 5:26 pm

  6. Hi Omgo,

    you are calling

    ArrayUtils.generateRandom(int);
    ArrayUtils.isSorted(int[], boolean);

    Did you write those or where are they coming from?
    Would you mind sharing?

    Thx.,
    Hans

    Comment by Hans Horn — August 20, 2011 @ 5:29 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: