Glad it's working for you! Also, I've started a github project that might be helpful going forward. It's called Pygmalion and is for info, scripts, and UDFs to help running Pig with Cassandra. It only has a few resources now but I am planning on adding a couple more UDFs over the next couple of days. Feel free to add to it as well :).
https://github.com/jeromatron/pygmalion Jeremy On Apr 6, 2011, at 4:15 AM, Fabio Souto wrote: > It works. Thank you for your help Jeremy!! > > Cheers > Fabio > > On 05/04/2011, at 20:08, Jeremy Hanna wrote: > >> Hmmm, if it's the same error then it's not getting your PIG_RPC_PORT >> variable still. >> >> If you're running this in <cassandra_src>/contrib/pig: >> 'bin/pig_cassandra -x local myscript.pig' >> then you should only need to set PIG_HOME, and the other environment >> variables for connecting to cassandra. >> >> If you want to run it against a cluster, what I've done is had a hadoop >> configuration locally and point PIG_CONF to <hadoop_home>/conf and put those >> three variables in the mapred-site.xml like this: >> <property> >> <name>cassandra.thrift.address</name> >> <value>123.45.67.89</value> >> </property> >> <property> >> <name>cassandra.thrift.port</name> >> <value>9160</value> >> </property> >> <property> >> <name>cassandra.partitioner.class</name> >> <value>org.apache.cassandra.dht.RandomPartitioner</value> >> </property> >> >> I would make sure you can get it to run locally first though. >> >> On Apr 5, 2011, at 10:29 AM, Fabio Souto wrote: >> >>> Hi, >>> >>> I had a bad enviroment variable >>> PIG_PARTITIONER=RandomPartitioner >>> instead of >>> PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner >>> but I correct this and still not working. I have the same error >>> >>> Just in case I have this on my ~/.bash_profile >>> >>> export HADOOPDIR=/etc/hadoop-0.20/conf >>> export HADOOP_CLASSPATH=/usr/cassandra/lib/*:$HADOOP_CLASSPATH >>> export CLASSPATH=$HADOOPDIR:$CLASSPATH >>> >>> export PIG_CONF_DIR=$HADOOPDIR >>> export PIG_CLASSPATH=/etc/hadoop/conf >>> export PIG_CONF_DIR=$HADOOPDIR >>> >>> export PIG_INITIAL_ADDRESS=localhost >>> export PIG_RPC_PORT=9160 >>> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner >>> >>> >>> BTW I'm using the pig version that comes with Cassandra, the one in >>> cassandra/contrib/pig >>> >>> Thanks for your time Jeremy! :) >>> Fabio >>> >>> On 05/04/2011, at 17:04, Jeremy Hanna wrote: >>> >>>> Fabio, >>>> >>>> It looks like you need to set your environment variables to connect to >>>> cassandra. Check out the readme. Quoting here: >>>> Finally, set the following as environment variables (uppercase, >>>> underscored), or as Hadoop configuration variables (lowercase, dotted): >>>> * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on >>>> * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to >>>> connect to >>>> * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner >>>> >>>> So you'll probably want to do: >>>> export PIG_INITIAL_ADDRESS=localhost >>>> export PIG_RPC_PORT=9160 >>>> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner >>>> >>>> Tante belle cose and let me know if this doesn't work, >>>> >>>> Jeremy >>>> >>>> On Apr 5, 2011, at 9:38 AM, Fabio Souto wrote: >>>> >>>>> Hi Jeremy, >>>>> >>>>> Of course, here it is: >>>>> >>>>> Backend error message >>>>> --------------------- >>>>> java.lang.NumberFormatException: null >>>>> at java.lang.Integer.parseInt(Integer.java:417) >>>>> at java.lang.Integer.parseInt(Integer.java:499) >>>>> at >>>>> org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233) >>>>> at >>>>> org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown >>>>> Source) >>>>> at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown >>>>> Source) >>>>> at >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133) >>>>> at >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111) >>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613) >>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) >>>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:240) >>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>> at javax.security.auth.Subject.doAs(Subject.java:396) >>>>> at >>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) >>>>> at org.apache.hadoop.mapred.Child.main(Child.java:234) >>>>> >>>>> Pig Stack Trace >>>>> --------------- >>>>> ERROR 2997: Unable to recreate exception from backed error: >>>>> java.lang.NumberFormatException: null >>>>> >>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to >>>>> open iterator for alias A. Backend error : Unable to recreate exception >>>>> from backed error: java.lang.NumberFormatException: null >>>>> at org.apache.pig.PigServer.openIterator(PigServer.java:742) >>>>> at >>>>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612) >>>>> at >>>>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303) >>>>> at >>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) >>>>> at >>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) >>>>> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) >>>>> at org.apache.pig.Main.run(Main.java:465) >>>>> at org.apache.pig.Main.main(Main.java:107) >>>>> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR >>>>> 2997: Unable to recreate exception from backed error: >>>>> java.lang.NumberFormatException: null >>>>> at >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221) >>>>> at >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151) >>>>> at >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337) >>>>> at >>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378) >>>>> at >>>>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198) >>>>> at org.apache.pig.PigServer.storeEx(PigServer.java:874) >>>>> at org.apache.pig.PigServer.store(PigServer.java:816) >>>>> at org.apache.pig.PigServer.openIterator(PigServer.java:728) >>>>> ... 7 more >>>>> ================================================================================ >>>>> >>>>> >>>>> Thanks for all, >>>>> Fabio >>>>> >>>>> >>>>> On 05/04/2011, at 16:19, Jeremy Hanna wrote: >>>>> >>>>>> Fabio, >>>>>> >>>>>> Could you post the full stack trace that's found in the pig_<long >>>>>> number>.log that's in the directory that you ran pig? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Jeremy >>>>>> >>>>>> On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read >>>>>>> data from cassandra. I write a simple query just to test: >>>>>>> >>>>>>> grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING >>>>>>> org.apache.cassandra.hadoop.pig.CassandraStorage(); >>>>>>> >>>>>>> grunt> dump A; >>>>>>> >>>>>>> >>>>>>> And i'm getting the following error: >>>>>>> ========================================================================== >>>>>>> 2011-04-05 15:33:57,669 [main] INFO >>>>>>> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the >>>>>>> script: UNKNOWN >>>>>>> 2011-04-05 15:33:57,669 [main] INFO >>>>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - >>>>>>> pig.usenewlogicalplan is set to true. New logical plan will be used. >>>>>>> 2011-04-05 15:33:57,819 [main] INFO >>>>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: >>>>>>> A: >>>>>>> Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) >>>>>>> - scope-1 Operator Key: scope-1) >>>>>>> 2011-04-05 15:33:57,850 [main] INFO >>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler >>>>>>> - File concatenation threshold: 100 optimistic? false >>>>>>> 2011-04-05 15:33:57,877 [main] INFO >>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>>>>>> - MR plan size before optimization: 1 >>>>>>> 2011-04-05 15:33:57,877 [main] INFO >>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>>>>>> - MR plan size after optimization: 1 >>>>>>> 2011-04-05 15:33:57,969 [main] INFO >>>>>>> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are >>>>>>> added to the job >>>>>>> 2011-04-05 15:33:57,990 [main] INFO >>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >>>>>>> - mapred.job.reduce.markreset.buffer.percent is not set, set to >>>>>>> default 0.3 >>>>>>> 2011-04-05 15:34:03,376 [main] INFO >>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >>>>>>> - Setting up single store job >>>>>>> 2011-04-05 15:34:03,416 [main] INFO >>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>> - 1 map-reduce job(s) waiting for submission. >>>>>>> 2011-04-05 15:34:03,929 [main] INFO >>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>> - 0% complete >>>>>>> 2011-04-05 15:34:04,597 [Thread-5] INFO >>>>>>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total >>>>>>> input paths (combined) to process : 1 >>>>>>> 2011-04-05 15:34:05,942 [main] INFO >>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>> - HadoopJobId: job_201104051459_0008 >>>>>>> 2011-04-05 15:34:05,943 [main] INFO >>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>> - More information at: >>>>>>> http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008 >>>>>>> 2011-04-05 15:34:35,912 [main] INFO >>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>> - job job_201104051459_0008 has failed! Stop running all dependent jobs >>>>>>> 2011-04-05 15:34:35,918 [main] INFO >>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>> - 100% complete >>>>>>> 2011-04-05 15:34:35,931 [main] ERROR >>>>>>> org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate >>>>>>> exception from backed error: java.lang.NumberFormatException: null >>>>>>> 2011-04-05 15:34:35,931 [main] ERROR >>>>>>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! >>>>>>> 2011-04-05 15:34:35,933 [main] INFO >>>>>>> org.apache.pig.tools.pigstats.PigStats - Script Statistics: >>>>>>> >>>>>>> HadoopVersion PigVersion UserId StartedAt FinishedAt >>>>>>> Features >>>>>>> 0.20.2-CDH3B4 0.8.0-SNAPSHOT root 2011-04-05 15:33:57 >>>>>>> 2011-04-05 15:34:35 UNKNOWN >>>>>>> >>>>>>> Failed! >>>>>>> >>>>>>> Failed Jobs: >>>>>>> JobId Alias Feature Message Outputs >>>>>>> job_201104051459_0008 A MAP_ONLY Message: Job failed! >>>>>>> Error - NA hdfs://localhost/tmp/temp2037710644/tmp-29784200, >>>>>>> >>>>>>> Input(s): >>>>>>> Failed to read data from "cassandra://msg_keyspace/messages" >>>>>>> >>>>>>> Output(s): >>>>>>> Failed to produce result in >>>>>>> "hdfs://localhost/tmp/temp2037710644/tmp-29784200" >>>>>>> ========================================================================== >>>>>>> >>>>>>> Any idea how to fix this? >>>>>>> Cheers >>>>>> >>>>> >>>> >>> >> >
