Re: Reading sequence file in pig

2014-05-20 Thread Pradeep Gollakota
Sorry, Missed the part about loading custom types from SequenceFiles. The LoadFunc from piggybank will only load pig types. However, (as you already know), you can use elephant-bird. Not sure why you need to build it. The artifact exists in maven central. http://search.maven.org/#artifactdetails%

How to set number of mappers when using HBaseStorage

2014-05-20 Thread leiwang...@gmail.com
When using HBaseStorage to read data from hbase table, there will be one mapper for one region. Howerver, my hbase table has more than 1000 regions and only 80 mappers capacity. Is there a way to set the number of mappers when using HBaseStorage? Thanks, Lei leiwang...@gmail.com

CassandraStorage loader generating 2x many record?

2014-05-20 Thread Kevin Burton
(accidentally cross posted this to the cassandra list… when I meant to post it here) This has to be a bug or either that or I'm insane. Here's my table in Cassandra: CREATE TABLE test_source ( id int , primary key(id) ); INSERT INTO test_source (ID) VALUES(1); INSERT INTO test_source (ID) V

Re: Is pig maddening to work with because it's so slow?

2014-05-20 Thread Dan DeCapria, CivicScience
Seconded for PigUnit. As for a faster debugging procedure, I've gone modular. First I JUnit test individual UDFs against their functional requirements and use cases a priori. Then I mockup my whiteboard workflow as multiple pig script logical blocks (multiple pig files to test), start a pig -x lo

Re: Is pig maddening to work with because it's so slow?

2014-05-20 Thread Suraj Nayak
Also, Pig is data flow language where the statements gets converted to java and then run. In case of python, its native. Thus runs faster. On 21-May-2014 12:52 AM, "Suraj Nayak" wrote: > Why not consider PigUnit? PigUnit gives flexibility to test locally. Also > debugging is pretty simple, almos

Re: Is pig maddening to work with because it's so slow?

2014-05-20 Thread Suraj Nayak
Why not consider PigUnit? PigUnit gives flexibility to test locally. Also debugging is pretty simple, almost similar to JUnit. -- Suraj On 21-May-2014 12:47 AM, "Paul Houle" wrote: > Slow iteration is a problem with Pig. > > I still write MR jobs mainly in Java because (1) I control the > execut

Re: Is pig maddening to work with because it's so slow?

2014-05-20 Thread Paul Houle
Slow iteration is a problem with Pig. I still write MR jobs mainly in Java because (1) I control the execution plan, (2) can do things nearly zero-copy, and (3) I can get a quick iteration cycle by using JUnit to test mappers, reducers, and other components. On Tue, May 20, 2014 at 3:02 PM, K

Is pig maddening to work with because it's so slow?

2014-05-20 Thread Kevin Burton
I've noticed that while working with pig my stress level and frustration with the system is higher than other systems I've worked with. I think it's because the iteration cycle is longer. Even pig -x local takes a while to execute. Is this just me? If you're trying to learn and debug python lis

Re: Reading sequence file in pig

2014-05-20 Thread abhishek dodda
This File output from org.apache.hcatalog.pig.HCatStorer function On Tue, May 20, 2014 at 10:44 AM, abhishek dodda wrote: > Iam getting this error > > A = load '/a/part-m-' using > org.apache.pig.piggybank.storage.SequenceFileLoader(); > > org.apache.pig.backend.BackendException: ERROR 0: U

Re: Reading sequence file in pig

2014-05-20 Thread abhishek dodda
Iam getting this error A = load '/a/part-m-' using org.apache.pig.piggybank.storage.SequenceFileLoader(); org.apache.pig.backend.BackendException: ERROR 0: Unable to translate class org.apache.hadoop.io.NullWritable to a Pig datatype at org.apache.pig.piggybank.storage.SequenceFile

Re: Reading sequence file in pig

2014-05-20 Thread Pradeep Gollakota
You can use the SequenceFileLoader from the piggybank. http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/SequenceFileLoader.html On Tue, May 20, 2014 at 2:46 AM, abhishek dodda wrote: > Hi All, > > I have trouble building code for this project. > > https://github.com/kevin