Re: Modifying databag on the fly

2012-09-07 Thread Dmitriy Ryaboy
FYI -- we wound up going with a much cleaner and memory-friendly solution of returning a new databag implementation which simply proxied all the calls to the original bag, but returned a special Iterator which applied the necessary transformation to tuples on the fly. That way, we don't need to hav

Re: Modifying databag on the fly

2012-09-05 Thread Alan Gates
On Sep 5, 2012, at 6:30 PM, Prasanth J wrote: > Ahh.. Now it makes more sense. > > I think I got the solution. I was adding to List and then finally > creating a DataBag with that list.. Instead I should create a bag and keep > adding to it..!! Is that correct? Yes. Alan. > Thanks Alan. >

Re: Modifying databag on the fly

2012-09-05 Thread Prasanth J
Ahh.. Now it makes more sense. I think I got the solution. I was adding to List and then finally creating a DataBag with that list.. Instead I should create a bag and keep adding to it..!! Is that correct? Thanks Alan. Thanks -- Prasanth On Sep 5, 2012, at 9:24 PM, Alan Gates wrote: > You

Re: Modifying databag on the fly

2012-09-05 Thread Alan Gates
You cannot modify a bag once it is written. The implementation is written around the assumption that bags are immutable after they are written. Creating a new bag should not create an OOM exception, as bags are built to spill when they grow too large. In fact it's this spilling feature that

Modifying databag on the fly

2012-09-05 Thread Prasanth J
Hello devs I have specific case where I need to modify the contents (remove a field from each tuples) of Databag but I want to do it in-place and do not want to create another databag with new set of tuples. The situation is, say I have the following input tuple for an UDF {(111,222,3,121), (1