Re: Error reading data from Cassandra

Jeremy Hanna Wed, 06 Apr 2011 07:29:18 -0700

Glad it's working for you!  Also, I've started a github project that might be 
helpful going forward.  It's called Pygmalion and is for info, scripts, and 
UDFs to help running Pig with Cassandra.  It only has a few resources now but I 
am planning on adding a couple more UDFs over the next couple of days.  Feel 
free to add to it as well :).


https://github.com/jeromatron/pygmalion

Jeremy

On Apr 6, 2011, at 4:15 AM, Fabio Souto wrote:

> It works. Thank you for your help Jeremy!!
> 
> Cheers
> Fabio
> 
> On 05/04/2011, at 20:08, Jeremy Hanna wrote:
> 
>> Hmmm, if it's the same error then it's not getting your PIG_RPC_PORT 
>> variable still.
>> 
>> If you're running this in <cassandra_src>/contrib/pig:
>> 'bin/pig_cassandra -x local myscript.pig'
>> then you should only need to set PIG_HOME, and the other environment 
>> variables for connecting to cassandra.
>> 
>> If you want to run it against a cluster, what I've done is had a hadoop 
>> configuration locally and point PIG_CONF to <hadoop_home>/conf and put those 
>> three variables in the mapred-site.xml like this:
>> <property>
>>   <name>cassandra.thrift.address</name>
>>   <value>123.45.67.89</value>
>> </property>
>> <property>
>>   <name>cassandra.thrift.port</name>
>>   <value>9160</value>
>> </property>
>> <property>
>>   <name>cassandra.partitioner.class</name>
>>   <value>org.apache.cassandra.dht.RandomPartitioner</value>
>> </property>
>> 
>> I would make sure you can get it to run locally first though.
>> 
>> On Apr 5, 2011, at 10:29 AM, Fabio Souto wrote:
>> 
>>> Hi,
>>> 
>>> I had a bad enviroment variable
>>> PIG_PARTITIONER=RandomPartitioner 
>>> instead of 
>>> PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>>> but I correct this and still not working. I have the same error
>>> 
>>> Just in case I have this on my ~/.bash_profile
>>> 
>>> export HADOOPDIR=/etc/hadoop-0.20/conf
>>> export HADOOP_CLASSPATH=/usr/cassandra/lib/*:$HADOOP_CLASSPATH
>>> export CLASSPATH=$HADOOPDIR:$CLASSPATH
>>> 
>>> export PIG_CONF_DIR=$HADOOPDIR
>>> export PIG_CLASSPATH=/etc/hadoop/conf
>>> export PIG_CONF_DIR=$HADOOPDIR
>>> 
>>> export PIG_INITIAL_ADDRESS=localhost
>>> export PIG_RPC_PORT=9160
>>> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>>> 
>>> 
>>> BTW I'm using the pig version that comes with Cassandra, the one in 
>>> cassandra/contrib/pig
>>> 
>>> Thanks for your time Jeremy! :)
>>> Fabio
>>> 
>>> On 05/04/2011, at 17:04, Jeremy Hanna wrote:
>>> 
>>>> Fabio,
>>>> 
>>>> It looks like you need to set your environment variables to connect to 
>>>> cassandra.  Check out the readme.  Quoting here:
>>>> Finally, set the following as environment variables (uppercase,
>>>> underscored), or as Hadoop configuration variables (lowercase, dotted):
>>>> * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on 
>>>> * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to 
>>>> connect to
>>>> * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
>>>> 
>>>> So you'll probably want to do:
>>>> export PIG_INITIAL_ADDRESS=localhost
>>>> export PIG_RPC_PORT=9160
>>>> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>>>> 
>>>> Tante belle cose and let me know if this doesn't work,
>>>> 
>>>> Jeremy
>>>> 
>>>> On Apr 5, 2011, at 9:38 AM, Fabio Souto wrote:
>>>> 
>>>>> Hi Jeremy,
>>>>> 
>>>>> Of course, here it is:
>>>>> 
>>>>> Backend error message
>>>>> ---------------------
>>>>> java.lang.NumberFormatException: null
>>>>>   at java.lang.Integer.parseInt(Integer.java:417)
>>>>>   at java.lang.Integer.parseInt(Integer.java:499)
>>>>>   at 
>>>>> org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
>>>>>   at 
>>>>> org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown
>>>>>  Source)
>>>>>   at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown 
>>>>> Source)
>>>>>   at 
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
>>>>>   at 
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
>>>>>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>>>>>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>>>>>   at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>>>>>   at java.security.AccessController.doPrivileged(Native Method)
>>>>>   at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>>   at 
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>>>   at org.apache.hadoop.mapred.Child.main(Child.java:234)
>>>>> 
>>>>> Pig Stack Trace
>>>>> ---------------
>>>>> ERROR 2997: Unable to recreate exception from backed error: 
>>>>> java.lang.NumberFormatException: null
>>>>> 
>>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
>>>>> open iterator for alias A. Backend error : Unable to recreate exception 
>>>>> from backed error: java.lang.NumberFormatException: null
>>>>>   at org.apache.pig.PigServer.openIterator(PigServer.java:742)
>>>>>   at 
>>>>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>>>>   at 
>>>>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>>>>   at 
>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>>>   at 
>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>>>>   at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>>>>   at org.apache.pig.Main.run(Main.java:465)
>>>>>   at org.apache.pig.Main.main(Main.java:107)
>>>>> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 
>>>>> 2997: Unable to recreate exception from backed error: 
>>>>> java.lang.NumberFormatException: null
>>>>>   at 
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
>>>>>   at 
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
>>>>>   at 
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
>>>>>   at 
>>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>>>>>   at 
>>>>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>>>>>   at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>>>>>   at org.apache.pig.PigServer.store(PigServer.java:816)
>>>>>   at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>>>>>   ... 7 more
>>>>> ================================================================================
>>>>> 
>>>>> 
>>>>> Thanks for all,
>>>>> Fabio
>>>>> 
>>>>> 
>>>>> On 05/04/2011, at 16:19, Jeremy Hanna wrote:
>>>>> 
>>>>>> Fabio,
>>>>>> 
>>>>>> Could you post the full stack trace that's found in the pig_<long 
>>>>>> number>.log that's in the directory that you ran pig?
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Jeremy
>>>>>> 
>>>>>> On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:
>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read 
>>>>>>> data from cassandra. I write a simple query just to test:
>>>>>>> 
>>>>>>> grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING 
>>>>>>> org.apache.cassandra.hadoop.pig.CassandraStorage();                     
>>>>>>>                                                  
>>>>>>> grunt> dump A;   
>>>>>>> 
>>>>>>> 
>>>>>>> And i'm getting the following error:
>>>>>>> ==========================================================================
>>>>>>> 2011-04-05 15:33:57,669 [main] INFO  
>>>>>>> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the 
>>>>>>> script: UNKNOWN
>>>>>>> 2011-04-05 15:33:57,669 [main] INFO  
>>>>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - 
>>>>>>> pig.usenewlogicalplan is set to true. New logical plan will be used.
>>>>>>> 2011-04-05 15:33:57,819 [main] INFO  
>>>>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: 
>>>>>>> A: 
>>>>>>> Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage)
>>>>>>>  - scope-1 Operator Key: scope-1)
>>>>>>> 2011-04-05 15:33:57,850 [main] INFO  
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler 
>>>>>>> - File concatenation threshold: 100 optimistic? false
>>>>>>> 2011-04-05 15:33:57,877 [main] INFO  
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>>>>>  - MR plan size before optimization: 1
>>>>>>> 2011-04-05 15:33:57,877 [main] INFO  
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>>>>>  - MR plan size after optimization: 1
>>>>>>> 2011-04-05 15:33:57,969 [main] INFO  
>>>>>>> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are 
>>>>>>> added to the job
>>>>>>> 2011-04-05 15:33:57,990 [main] INFO  
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>>>>>>  - mapred.job.reduce.markreset.buffer.percent is not set, set to 
>>>>>>> default 0.3
>>>>>>> 2011-04-05 15:34:03,376 [main] INFO  
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>>>>>>  - Setting up single store job
>>>>>>> 2011-04-05 15:34:03,416 [main] INFO  
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>  - 1 map-reduce job(s) waiting for submission.
>>>>>>> 2011-04-05 15:34:03,929 [main] INFO  
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>  - 0% complete
>>>>>>> 2011-04-05 15:34:04,597 [Thread-5] INFO  
>>>>>>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total 
>>>>>>> input paths (combined) to process : 1
>>>>>>> 2011-04-05 15:34:05,942 [main] INFO  
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>  - HadoopJobId: job_201104051459_0008
>>>>>>> 2011-04-05 15:34:05,943 [main] INFO  
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>  - More information at: 
>>>>>>> http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
>>>>>>> 2011-04-05 15:34:35,912 [main] INFO  
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>  - job job_201104051459_0008 has failed! Stop running all dependent jobs
>>>>>>> 2011-04-05 15:34:35,918 [main] INFO  
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>  - 100% complete
>>>>>>> 2011-04-05 15:34:35,931 [main] ERROR 
>>>>>>> org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate 
>>>>>>> exception from backed error: java.lang.NumberFormatException: null
>>>>>>> 2011-04-05 15:34:35,931 [main] ERROR 
>>>>>>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>>>>>>> 2011-04-05 15:34:35,933 [main] INFO  
>>>>>>> org.apache.pig.tools.pigstats.PigStats - Script Statistics: 
>>>>>>> 
>>>>>>> HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      
>>>>>>> Features
>>>>>>> 0.20.2-CDH3B4   0.8.0-SNAPSHOT  root    2011-04-05 15:33:57     
>>>>>>> 2011-04-05 15:34:35     UNKNOWN
>>>>>>> 
>>>>>>> Failed!
>>>>>>> 
>>>>>>> Failed Jobs:
>>>>>>> JobId   Alias   Feature Message Outputs
>>>>>>> job_201104051459_0008   A       MAP_ONLY        Message: Job failed! 
>>>>>>> Error - NA hdfs://localhost/tmp/temp2037710644/tmp-29784200,
>>>>>>> 
>>>>>>> Input(s):
>>>>>>> Failed to read data from "cassandra://msg_keyspace/messages"
>>>>>>> 
>>>>>>> Output(s):
>>>>>>> Failed to produce result in 
>>>>>>> "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
>>>>>>> ==========================================================================
>>>>>>> 
>>>>>>> Any idea how to fix this?
>>>>>>> Cheers
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

Re: Error reading data from Cassandra

Reply via email to