FYI -- we wound up going with a much cleaner and memory-friendly
solution of returning a new databag implementation which simply
proxied all the calls to the original bag, but returned a special
Iterator which applied the necessary transformation to tuples on the
fly. That way, we don't need to hav
On Sep 5, 2012, at 6:30 PM, Prasanth J wrote:
> Ahh.. Now it makes more sense.
>
> I think I got the solution. I was adding to List and then finally
> creating a DataBag with that list.. Instead I should create a bag and keep
> adding to it..!! Is that correct?
Yes.
Alan.
> Thanks Alan.
>
Ahh.. Now it makes more sense.
I think I got the solution. I was adding to List and then finally
creating a DataBag with that list.. Instead I should create a bag and keep
adding to it..!! Is that correct?
Thanks Alan.
Thanks
-- Prasanth
On Sep 5, 2012, at 9:24 PM, Alan Gates wrote:
> You
You cannot modify a bag once it is written. The implementation is written
around the assumption that bags are immutable after they are written.
Creating a new bag should not create an OOM exception, as bags are built to
spill when they grow too large. In fact it's this spilling feature that
Hello devs
I have specific case where I need to modify the contents (remove a field from
each tuples) of Databag but I want to do it in-place and do not want to create
another databag with new set of tuples.
The situation is, say I have the following input tuple for an UDF
{(111,222,3,121), (1