'AS' is almost always dangerous. The loader already has a schema. Use a projection if you want to rename them.
On Fri, May 18, 2012 at 4:07 PM, Chris Diehl <cpdi...@gmail.com> wrote: > With a little bit of luck, we managed to find an answer. > > Turns out we needed to remove the cast from key and run the script in Pig > 0.10. I was running the script with Pig 0.8.1 up until today. > > raw_logs = LOAD '$INPUT_LOCATION' USING $SEQFILE_LOADER ('-c > $NULL_CONVERTER','-c $TEXT_CONVERTER') > AS (key, value: chararray); > > Chris > > On Fri, May 18, 2012 at 2:27 PM, Chris Diehl <cpdi...@gmail.com> wrote: > > > Hi Andy, > > > > Here's what is in the log file. > > > > Pig Stack Trace > > --------------- > > ERROR 2244: Job failed, hadoop does not return any error message > > > > org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job > > failed, hadoop does not return any error message > > at > > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:119) > > at > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172) > > at > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) > > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90) > > at org.apache.pig.Main.run(Main.java:500) > > at org.apache.pig.Main.main(Main.java:107) > > > > > ================================================================================ > > > > I am running it on the cluster. I could not find any additional > > information on the job tracker. > > > > The keys in the sequence files are all null. The values are all JSON > > strings. Given that information, I tried configuring the > SequenceFileLoader > > this way to no avail. > > > > %declare SEQFILE_LOADER > > 'com.twitter.elephantbird.pig.load.SequenceFileLoader'; > > %declare TEXT_CONVERTER > 'com.twitter.elephantbird.pig.util.TextConverter'; > > %declare NULL_CONVERTER > > 'com.twitter.elephantbird.pig.util.NullWritableConverter' > > > > raw_logs = LOAD '$INPUT_LOCATION' USING $SEQFILE_LOADER ('-c > > $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key: chararray, value: > > chararray); > > > > Is there another way I should be configuring it? > > > > Chris > > > > On Fri, May 18, 2012 at 11:24 AM, Andy Schlaikjer < > > andrew.schlaik...@gmail.com> wrote: > > > >> Chris, the console output mentions file "/opt/shared_storage/log_ > >> analysis_pig_python_scripts/pig_1337299061301.log". Does this contain > any > >> kind of stack trace? Were you running the script in local mode or on a > >> cluster? If the latter, there should be at least map task log output > >> someplace that may also have some clues. > >> > >> Does path > >> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' > >> contain SequenceFile<Text, Text> data? If not, you'll have to configure > >> SequenceFileLoader further to properly deserialize the key-value pairs. > >> > >> Andy > >> > >> > >> On Thu, May 17, 2012 at 5:07 PM, Chris Diehl <cpdi...@gmail.com> wrote: > >> > >> > Andy, > >> > > >> > Here's what I'm seeing when I run the following script. There's no > >> > information beyond what is here in the log file. > >> > > >> > Chris > >> > > >> > REGISTER > >> > > >> > '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar'; > >> > %declare SEQFILE_LOADER > >> > 'com.twitter.elephantbird.pig.load.SequenceFileLoader'; > >> > %declare TEXT_CONVERTER > >> 'com.twitter.elephantbird.pig.util.TextConverter'; > >> > %declare NULL_CONVERTER > >> > 'com.twitter.elephantbird.pig.util.NullWritableConverter' > >> > > >> > rmf /data/SearchLogJSON; > >> > > >> > -- Load raw log data > >> > raw_logs = LOAD > >> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' > >> USING > >> > $SEQFILE_LOADER (); > >> > > >> > -- Store the JSON > >> > STORE raw_logs INTO '/data/SearchLogJSON/'; > >> > > >> > ------------------- > >> > > >> > -sh-3.2$ pig dump_log_json.pig > >> > 2012-05-17 23:57:41,304 [main] INFO org.apache.pig.Main - Logging > error > >> > messages to: > >> > > >> > /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log > >> > 2012-05-17 23:57:41,586 [main] INFO > >> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > >> > Connecting to hadoop file system at: XXX > >> > 2012-05-17 23:57:41,932 [main] INFO > >> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > >> > Connecting to map-reduce job tracker at: XXX > >> > 2012-05-17 23:57:42,204 [main] INFO > >> > org.apache.pig.tools.pigstats.ScriptState - Pig features used in the > >> > script: UNKNOWN > >> > 2012-05-17 23:57:42,204 [main] INFO > >> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > >> > pig.usenewlogicalplan is set to true. New logical plan will be used. > >> > 2012-05-17 23:57:42,301 [main] INFO > >> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > (Name: > >> > raw_logs: > Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) - > >> > scope-1 Operator Key: scope-1) > >> > 2012-05-17 23:57:42,317 [main] INFO > >> > > >> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - > >> > File concatenation threshold: 100 optimistic? false > >> > 2012-05-17 23:57:42,349 [main] INFO > >> > > >> > > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > >> > - MR plan size before optimization: 1 > >> > 2012-05-17 23:57:42,349 [main] INFO > >> > > >> > > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > >> > - MR plan size after optimization: 1 > >> > 2012-05-17 23:57:42,529 [main] INFO > >> > org.apache.pig.tools.pigstats.ScriptState - Pig script settings are > >> added > >> > to the job > >> > 2012-05-17 23:57:42,545 [main] INFO > >> > > >> > > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > >> > - mapred.job.reduce.markreset.buffer.percent is not set, set to > default > >> 0.3 > >> > 2012-05-17 23:57:44,706 [main] INFO > >> > > >> > > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > >> > - Setting up single store job > >> > 2012-05-17 23:57:44,734 [main] INFO > >> > > >> > > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > >> > - 1 map-reduce job(s) waiting for submission. > >> > 2012-05-17 23:57:45,053 [Thread-4] INFO > >> > org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input > >> paths > >> > to process : 1 > >> > 2012-05-17 23:57:45,057 [Thread-4] INFO > >> > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total > >> > input paths (combined) to process : 1 > >> > 2012-05-17 23:57:45,236 [main] INFO > >> > > >> > > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > >> > - 0% complete > >> > 2012-05-17 23:57:45,849 [main] INFO > >> > > >> > > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > >> > - HadoopJobId: job_201205170527_0003 > >> > 2012-05-17 23:57:45,849 [main] INFO > >> > > >> > > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > >> > - More information at: XXX > >> > 2012-05-17 23:58:25,816 [main] INFO > >> > > >> > > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > >> > - job job_201205170527_0003 has failed! Stop running all dependent > jobs > >> > 2012-05-17 23:58:25,821 [main] INFO > >> > > >> > > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > >> > - 100% complete > >> > 2012-05-17 23:58:25,824 [main] ERROR > >> > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) > failed! > >> > 2012-05-17 23:58:25,825 [main] INFO > >> org.apache.pig.tools.pigstats.PigStats > >> > - Script Statistics: > >> > > >> > HadoopVersion PigVersion UserId StartedAt FinishedAt Features > >> > 0.20.2-cdh3u2 0.8.1-cdh3u2 chris.diehl 2012-05-17 23:57:42 2012-05-17 > >> > 23:58:25 UNKNOWN > >> > > >> > Failed! > >> > > >> > Failed Jobs: > >> > JobId Alias Feature Message Outputs > >> > job_201205170527_0003 raw_logs MAP_ONLY Message: Job failed! Error - > NA > >> > /data/SearchLogJSON, > >> > > >> > Input(s): > >> > Failed to read data from > >> > "/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq" > >> > > >> > Output(s): > >> > Failed to produce result in "/data/SearchLogJSON" > >> > > >> > Counters: > >> > Total records written : 0 > >> > Total bytes written : 0 > >> > Spillable Memory Manager spill count : 0 > >> > Total bags proactively spilled: 0 > >> > Total records proactively spilled: 0 > >> > > >> > Job DAG: > >> > job_201205170527_0003 > >> > > >> > > >> > 2012-05-17 23:58:25,825 [main] INFO > >> > > >> > > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > >> > - Failed! > >> > 2012-05-17 23:58:25,831 [main] ERROR > >> org.apache.pig.tools.grunt.GruntParser > >> > - ERROR 2244: Job failed, hadoop does not return any error message > >> > Details at logfile: > >> > > >> > /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log > >> > > >> > > >> > > >> > On Thu, May 17, 2012 at 1:20 PM, Andy Schlaikjer < > >> > andrew.schlaik...@gmail.com> wrote: > >> > > >> > > Chris, could you send us any of your error logs? What kind of > failures > >> > are > >> > > you running into? > >> > > > >> > > Andy > >> > > > >> > > > >> > > On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <cpdi...@gmail.com> > >> wrote: > >> > > > >> > > > Hi All, > >> > > > > >> > > > I'm attempting to load sequence files for the first using Elephant > >> > Bird's > >> > > > sequence file loader and having absolutely no luck. > >> > > > > >> > > > I did a hadoop fs -text one on of the sequence files and noticed > all > >> > the > >> > > > keys are (null). Not sure if that is throwing off things here. > >> > > > > >> > > > Here are various approaches I've tried that all have failed. > >> > > > > >> > > > REGISTER > >> > > > > >> > > > >> > > >> > '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar'; > >> > > > %declare SEQFILE_LOADER > >> > > > 'com.twitter.elephantbird.pig.load.SequenceFileLoader'; > >> > > > %declare TEXT_CONVERTER > >> > > 'com.twitter.elephantbird.pig.util.TextConverter'; > >> > > > %declare NULL_CONVERTER > >> > > > 'com.twitter.elephantbird.pig.util.NullWritableConverter' > >> > > > > >> > > > raw_logs = LOAD > >> > > > > >> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' > >> > > USING > >> > > > $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS > (key: > >> > > > bytearray, value: chararray); > >> > > > --raw_logs = LOAD > >> > > > > >> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' > >> > > USING > >> > > > $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS > (key: > >> > > > chararray, value: chararray); > >> > > > --raw_logs = LOAD > >> > > > > >> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' > >> > > USING > >> > > > $SEQFILE_LOADER (); > >> > > > > >> > > > STORE raw_logs INTO '/data/SearchLogJSON/'; > >> > > > > >> > > > Any thoughts on what might be the problem? Anything else I should > >> try? > >> > > I'm > >> > > > totally out of ideas. > >> > > > > >> > > > Appreciate any pointers! > >> > > > > >> > > > Chris > >> > > > > >> > > > >> > > >> > > > > >