Sorry,
Missed the part about loading custom types from SequenceFiles. The LoadFunc
from piggybank will only load pig types. However, (as you already know),
you can use elephant-bird. Not sure why you need to build it. The artifact
exists in maven central.
http://search.maven.org/#artifactdetails%
When using HBaseStorage to read data from hbase table, there will be one mapper
for one region.
Howerver, my hbase table has more than 1000 regions and only 80 mappers
capacity.
Is there a way to set the number of mappers when using HBaseStorage?
Thanks,
Lei
leiwang...@gmail.com
(accidentally cross posted this to the cassandra list… when I meant to post
it here)
This has to be a bug or either that or I'm insane.
Here's my table in Cassandra:
CREATE TABLE test_source (
id int ,
primary key(id)
);
INSERT INTO test_source (ID) VALUES(1);
INSERT INTO test_source (ID) V
Seconded for PigUnit.
As for a faster debugging procedure, I've gone modular. First I JUnit test
individual UDFs against their functional requirements and use cases a
priori. Then I mockup my whiteboard workflow as multiple pig script
logical blocks (multiple pig files to test), start a pig -x lo
Also, Pig is data flow language where the statements gets converted to
java and then run. In case of python, its native. Thus runs faster.
On 21-May-2014 12:52 AM, "Suraj Nayak" wrote:
> Why not consider PigUnit? PigUnit gives flexibility to test locally. Also
> debugging is pretty simple, almos
Why not consider PigUnit? PigUnit gives flexibility to test locally. Also
debugging is pretty simple, almost similar to JUnit.
--
Suraj
On 21-May-2014 12:47 AM, "Paul Houle" wrote:
> Slow iteration is a problem with Pig.
>
> I still write MR jobs mainly in Java because (1) I control the
> execut
Slow iteration is a problem with Pig.
I still write MR jobs mainly in Java because (1) I control the
execution plan, (2) can do things nearly zero-copy, and (3) I can
get a quick iteration cycle by using JUnit to test mappers, reducers,
and other components.
On Tue, May 20, 2014 at 3:02 PM, K
I've noticed that while working with pig my stress level and frustration
with the system is higher than other systems I've worked with.
I think it's because the iteration cycle is longer.
Even pig -x local takes a while to execute.
Is this just me?
If you're trying to learn and debug python lis
This File output from org.apache.hcatalog.pig.HCatStorer function
On Tue, May 20, 2014 at 10:44 AM, abhishek dodda
wrote:
> Iam getting this error
>
> A = load '/a/part-m-' using
> org.apache.pig.piggybank.storage.SequenceFileLoader();
>
> org.apache.pig.backend.BackendException: ERROR 0: U
Iam getting this error
A = load '/a/part-m-' using
org.apache.pig.piggybank.storage.SequenceFileLoader();
org.apache.pig.backend.BackendException: ERROR 0: Unable to translate class
org.apache.hadoop.io.NullWritable to a Pig datatype
at
org.apache.pig.piggybank.storage.SequenceFile
You can use the SequenceFileLoader from the piggybank.
http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/SequenceFileLoader.html
On Tue, May 20, 2014 at 2:46 AM, abhishek dodda
wrote:
> Hi All,
>
> I have trouble building code for this project.
>
> https://github.com/kevin
11 matches
Mail list logo