Hi,
I'm trying to read in a comma-separated file with a simple command:
a = load 'myfile' using PigStorage(',');
However, some lines in my file have the , inside a quoted string, and
Pig is picking it up as 2 separate tokens:
Example:
A,b,c,d,"string-with,comma",F
I get the "string-with" and "
No, no you misunderstand. I didn't mean to contact zookeeper for every
single record.
Each map instance will contact zookeeper once for every X number of records
it sees. What the mapper portion does is it gets a block of numbers, and
that block number become only available to that one mapper, the
On Apr 23, 2010, at 12:13 PM, hc busy wrote:
Is the Java class guaranteed to be unique? Or will I have to perform
an
additional check after I join back?
I'd check the Java docs, but AFAIK it is guaranteed.
I don't know the performance of UUID vs Zookeeper, nor how Zookeeper
generates its
You can certainly connect to zookeeper but you don't really need to (relying
on zookeeper to do atomic increments may not scale if you are doing this for
millions of records.. though I haven't done timings. Y! people?)
Just grab the task id from the jobconf and use it as a uuid prefix. Details
ab
Is the Java class guaranteed to be unique? Or will I have to perform an
additional check after I join back?
I guess I see how I can connect to a zookeeper server inside my UDF to get a
block of, say 50k, Id's at a time and sequentially increase within the
block. Then the UDF connects again to get
Unique identifiers are easy enough. Row ids (monotonically increasing
values) are impossible because of the parallel nature of map reduce.
If you just want to generate a unique identifier you can write a UDF
to wrap Java's UUID class (or use the new GenericInvoker UDF if you're
working of
Guys, is there a easy way to generate a unique row id that is guaranteed to
be unique?
R = foreach T generate *, globally_unique() as id;
The reason why I need this is because I have a really nasty memory problem
here and I can't perform a group on the entire row, so all I can resort to
is to spl