Re: Reduce task failing on job with error java.lang.IllegalStateException: Keys appended out-of-order

Josh Elser Thu, 06 Dec 2012 07:16:22 -0800

The point of bulk-ingest is that you can perform this work "out of band"from Accumulo. You can perform the work "somewhere else" and just tellAccumulo to bring files online. The only potential work Accumulo has todo at that point is maintain the internal tree of files (merging andsplitting as the table is configured). Given that we have this massivelypopular tool for performing distributed sorting (cough MapReduce cough),I don't agree with your assertion.

If you don't want to be burdened with sorting output during the ingesttask, use live ingest (BatchWriters). For reasonable data flows, liveingest tends to be faster; however, bulk ingest provides the ability toscale to much larger flows of data while not tanking Accumulo.


On 12/6/12 9:15 AM, Chris Burrell wrote:

Is this a limitation of the bulk ingest approach? Does the MapReducejob need to give the data to the AccumuloOutputFileFormat ina lexicographically-sorted manner? If so, is this not a rather biglimitation of this approach, as you need to ensure your data comes infrom your various data sources in a form such that the accumulo keysare then sorted.

This seems to suggest that although the bulk ingest would be veryquick, you would lose most of the time trying to sort and adapt thesource files themselves in the MR job.


Chris

On 6 December 2012 14:08, William Slacum<[email protected]<mailto:[email protected]>> wrote:


    Excuse me, 'col3' sorts lexicographically *after* 'col16'.


    On Thu, Dec 6, 2012 at 9:07 AM, William Slacum
    <[email protected]
    <mailto:[email protected]>> wrote:

        'col3' sorts lexicographically before 'col16'. you'll either
        need to encode your numerics or zero pad them.


        On Thu, Dec 6, 2012 at 9:03 AM, Andrew Catterall
        <[email protected]
        <mailto:[email protected]>> wrote:

            Hi,


            I am trying to run a bulk ingest to import data into
            Accumulo but it is failing at the reduce task with the
            below error:

            java.lang.IllegalStateException: Keys appended
            out-of-order.  New key
            client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a
            foo:col3 [myVis] 9223372036854775807 false, previous key
            client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a
            foo:col16 [myVis] 9223372036854775807 false

            at
            
org.apache.accumulo.core.file.rfile.RFile$Writer.append(RFile.java:378)

            Could this be caused by the order at which the writes are
            being done?


            *-- Background*

            *
            *

            The input file is a tab separated file.  A sample row
            would look like:

            Data1 Data2    Data3    Data4    Data5 …             DataN

            The map parses the data, for each row, into a Map<String,
            String>.  This will contain the following:

            Col1 Data1

            Col2 Data2

            Col3 Data3

            …

            ColN DataN


            An outputKey is then generated for this row in the format
            *client@timeStamp@randomUUID*

            Then for each entry in Map<String, String> a
            outputValue is generated in the format *ColN|DataN*

            The outputKey and outputValue are written to Context.

            This completes successfully, however, the reduce task fails.


            My ReduceClass is as follows:

            *public**static**class* ReduceClass
            *extends* Reducer<Text,Text,Key,Value> {

            *public**void* reduce(Text key, Iterable<Text> keyValues,
            Context output) *throws* IOException, InterruptedException {

            // for each value belonging to the key

            *for* (Text keyValue : keyValues) {

            //split the keyValue into _Col_ and Data

                          String[] values =
            keyValue.toString().split("\\|");

            // Generate key

                                 Key outputKey = *new* Key(key,
            *new* Text("foo"), *new* Text(values[0]),
            *new* Text("myVis"));

            // Generate value

                                 Value outputValue =
            *new* Value(values[1].getBytes(), 0, values[1].length());

            // Write to context

            output.write(outputKey, outputValue);

                            }

                 }

                  }


            *-- Expected output*

            I am expecting the contents of the Accumulo table to be as
            follows:

            client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a
            foo:Col1 [myVis] Data1

            client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a
            foo:Col2 [myVis] Data2

            client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a
            foo:Col3 [myVis] Data3

            client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a
            foo:Col4 [myVis] Data4

            client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a
            foo:Col5 [myVis] Data5

            …

            client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a
            foo:ColN [myVis] DataN

            Thanks,

            Andrew

Re: Reduce task failing on job with error java.lang.IllegalStateException: Keys appended out-of-order

Reply via email to