Re: Custom partitioning and order for optimum hbase store

2013-11-14 Thread Dmitriy Lyubimov
. It will read everything off of the reduce iterator into memory (spilling if there is more than can fit) and then storing it all to hbase. Alan. On Jan 24, 2011, at 2:06 PM, Dmitriy Lyubimov wrote: i guess i want to order the groups. the grouping is actually irrelevant in this case, it is only

Re: want to do Linear regression analysis to achieve Interpolation using PIG Scripts.

2012-03-12 Thread Dmitriy Lyubimov
No known public good attempts known to me exist to put ML kind of stuff on top of pig . (well almost none). There are some statistical packages written at Yahoo but afaik they don't do directly what you need. Pig is somewhat excellent data prep pipeline, but IMO is not as excellent as something

Re: want to do Linear regression analysis to achieve Interpolation using PIG Scripts.

2012-03-12 Thread Dmitriy Lyubimov
: https://github.com/tdunning/pig-vector On Mon, Mar 12, 2012 at 1:02 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: No known public good attempts known to me exist to put ML kind of stuff on top of pig . (well almost none). There are some statistical packages written at Yahoo but afaik they don't

bytearray constatnts

2011-12-21 Thread Dmitriy Lyubimov
Hello, is there _any_ way to specify an empty byte array (but not NULL)? There also seems to be no way to specify byte array constatnts or convert other constants to bytearray. Is there any reason why the constants and conversions to bytearray disallowed? thanks in advance. -Dmitriy

Re: Algebraic UDF with one bag and one non-bag parameter

2011-04-15 Thread Dmitriy Lyubimov
DEFINE func mypackage.myfunc(parameter); Thanks! this is so cool. Holy grail, literary. i think this was not available at least in 0.6? Since when is this available for eval funcs? So you could also instantiate 2 versions. 2011/4/15 Dmitriy Lyubimov dlyubi...@apache.org Hi

Custom partitioning and order for optimum hbase store

2011-01-24 Thread Dmitriy Lyubimov
Hi, so it seems to be more efficient if storing to hbase partitions by regions and orders by hbase keys. I see that pig 0.8 (pig-282) added custom partitioner in a group but i am not sure if order is enforced there. Is there a way to run single MR that orders and partitions data as per above

Re: Custom partitioning and order for optimum hbase store

2011-01-24 Thread Dmitriy Lyubimov
-summary.html%28the bulk loading section). D On Mon, Jan 24, 2011 at 11:51 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: Better yet, it would've seem to be logical if partitioning and advise on partition #s is somehow tailored to a storefunc . It would stand to reason that for as long as we

Re: Custom partitioning and order for optimum hbase store

2011-01-24 Thread Dmitriy Lyubimov
the reduce iterator to the collect, but it won't. It will read everything off of the reduce iterator into memory (spilling if there is more than can fit) and then storing it all to hbase. Alan. On Jan 24, 2011, at 2:06 PM, Dmitriy Lyubimov wrote: i guess i want to order the groups

Re: Managing pig script jar dependencies

2011-01-21 Thread Dmitriy Lyubimov
We have a bootstrap command that copies all libraries of hour maven assembly to a location in HDFS (actually, we use maven groupId and artifactId of our assembly in the hierarchical path to ensure each client has its jars on the backend avaiable of exactly the same assembly build). We also use

Re: LZO Pig (Elephantbird?)

2011-01-20 Thread Dmitriy Lyubimov
in the 0.6 and the 0.8-compatible branches. Pointing to a description file is a good idea, we'll add that. D On Thu, Jan 20, 2011 at 12:16 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: We just OSSd some load and store funcs for pig 0.7 cdh3b3 supporting reads/writes protobuf from/to sequence