Class BlockSort


  • final class BlockSort
    extends java.lang.Object
    Encapsulates the Burrows-Wheeler sorting algorithm needed by BZip2CompressorOutputStream.

    This class is based on a Java port of Julian Seward's blocksort.c in his libbzip2

    The Burrows-Wheeler transform is a reversible transform of the original data that is supposed to group similar bytes close to each other. The idea is to sort all permutations of the input and only keep the last byte of each permutation. E.g. for "Commons Compress" you'd get:

      CompressCommons
     Commons Compress
     CompressCommons
     essCommons Compr
     mmons CompressCo
     mons CompressCom
     mpressCommons Co
     ns CompressCommo
     ommons CompressC
     ompressCommons C
     ons CompressComm
     pressCommons Com
     ressCommons Comp
     s CompressCommon
     sCommons Compres
     ssCommons Compre
     

    Which results in a new text "ss romooCCmmpnse", in adition the index of the first line that contained the original text is kept - in this case it is 1. The idea is that in a long English text all permutations that start with "he" are likely suffixes of a "the" and thus they end in "t" leading to a larger block of "t"s that can better be compressed by the subsequent Move-to-Front, run-length und Huffman encoding steps.

    For more information see for example:

    • Field Detail

      • FALLBACK_QSORT_STACK_SIZE

        private static final int FALLBACK_QSORT_STACK_SIZE
        See Also:
        Constant Field Values
      • STACK_SIZE

        private static final int STACK_SIZE
      • FALLBACK_QSORT_SMALL_THRESH

        private static final int FALLBACK_QSORT_SMALL_THRESH
        See Also:
        Constant Field Values
      • INCS

        private static final int[] INCS
      • workDone

        private int workDone
      • workLimit

        private int workLimit
      • firstAttempt

        private boolean firstAttempt
      • stack_ll

        private final int[] stack_ll
      • stack_hh

        private final int[] stack_hh
      • stack_dd

        private final int[] stack_dd
      • mainSort_runningOrder

        private final int[] mainSort_runningOrder
      • mainSort_copy

        private final int[] mainSort_copy
      • mainSort_bigDone

        private final boolean[] mainSort_bigDone
      • ftab

        private final int[] ftab
      • quadrant

        private final char[] quadrant
        Array instance identical to Data's sfmap, both are used only temporarily and indepently, so we do not need to allocate additional memory.
      • eclass

        private int[] eclass
    • Method Detail

      • med3

        private static int med3​(int a,
                                int b,
                                int c)
      • vswap

        private static void vswap​(int[] fmap,
                                  int p1,
                                  int p2,
                                  int n)
      • fallbackQSort3

        private void fallbackQSort3​(int[] fmap,
                                    int[] eclass,
                                    int loSt,
                                    int hiSt)
        Parameters:
        fmap - points to the index of the starting point of a permutation inside the block of data in the current partially sorted order
        eclass - points from the index of a character inside the block to the first index in fmap that contains the bucket of its suffix that is sorted in this step.
        loSt - lower boundary of the fmap-interval to be sorted
        hiSt - upper boundary of the fmap-interval to be sorted
      • fallbackSimpleSort

        private void fallbackSimpleSort​(int[] fmap,
                                        int[] eclass,
                                        int lo,
                                        int hi)
        Parameters:
        fmap - points to the index of the starting point of a permutation inside the block of data in the current partially sorted order
        eclass - points from the index of a character inside the block to the first index in fmap that contains the bucket of its suffix that is sorted in this step.
        lo - lower boundary of the fmap-interval to be sorted
        hi - upper boundary of the fmap-interval to be sorted
      • fallbackSort

        final void fallbackSort​(BZip2CompressorOutputStream.Data data,
                                int last)
        Adapt fallbackSort to the expected interface of the rest of the code, in particular deal with the fact that block starts at offset 1 (in libbzip2 1.0.6 it starts at 0).
      • fallbackSort

        final void fallbackSort​(int[] fmap,
                                byte[] block,
                                int nblock)
        Parameters:
        fmap - points to the index of the starting point of a permutation inside the block of data in the current partially sorted order
        block - the original data
        nblock - size of the block
      • fpop

        private int[] fpop​(int sp)
      • fpush

        private void fpush​(int sp,
                           int lz,
                           int hz)
      • fswap

        private void fswap​(int[] fmap,
                           int zz1,
                           int zz2)
        swaps two values in fmap
      • fvswap

        private void fvswap​(int[] fmap,
                            int yyp1,
                            int yyp2,
                            int yyn)
        swaps two intervals starting at yyp1 and yyp2 of length yyn inside fmap.
      • getEclass

        private int[] getEclass()
      • mainQSort3

        private void mainQSort3​(BZip2CompressorOutputStream.Data dataShadow,
                                int loSt,
                                int hiSt,
                                int dSt,
                                int last)
        Method "mainQSort3", file "blocksort.c", BZip2 1.0.2
      • mainSimpleSort

        private boolean mainSimpleSort​(BZip2CompressorOutputStream.Data dataShadow,
                                       int lo,
                                       int hi,
                                       int d,
                                       int lastShadow)
        This is the most hammered method of this class.

        This is the version using unrolled loops. Normally I never use such ones in Java code. The unrolling has shown a noticeable performance improvement on JRE 1.4.2 (Linux i586 / HotSpot Client). Of course it depends on the JIT compiler of the vm.