Re: Net ColumnFamily Count

Yamini Joshi Thu, 20 Oct 2016 13:09:13 -0700

Alright! Do you happen to have some reference code that I can refer to? I
am a newbie and I am not sure if by caching, aggregating and merge sort you
mean to use some Accumulo wrapper or write a simple java code.


Best regards,
Yamini Joshi

On Thu, Oct 20, 2016 at 2:49 PM, ivan bella <i...@ivan.bella.name> wrote:

> That is essentially the same thing, but instead of doing it within an
> iterator, you are letting accumulo do the work!  Perfect.
>
> On October 20, 2016 at 3:38 PM yamini.1...@gmail.com wrote:
>
> I am wondering what the complexity would be for this and also how does it
> compare to creating a new table with the required revered data and
> calculating the sum using an iterator.
>
> Sent from my iPhone
>
> On Oct 20, 2016, at 2:07 PM, ivan bella <i...@ivan.bella.name> wrote:
>
> You could cache results in an internal map.  Once the number of entries in
> your map gets to a certain point, you could dump them to a separate file in
> hdfs and then start building a new map.  Once you have completed the
> underlying scan, do a merge sort and aggregation of the written files to
> start returning the keys.  I did something similar to this and it seems to
> work well.  You might want to use RFiles as the underlying format which
> would enable reuse of some accumulo code when doing the merge sort.  Also
> it would allow more efficient reseeking into the rfiles if your iterator
> gets torn down and reconstructed provided you detect this and at least
> avoid redoing the entire scan.
>
> On October 20, 2016 at 1:22 PM Yamini Joshi <yamini.1...@gmail.com> wrote:
>
> Hello all
>
> I am trying to find the number of times a set of column families appear in
> a set of records (irrespective of the rowIds). Is it possible to do this on
> the server side? My concern is that if the set of column families is huge,
> it might face memory constraints on the server side. Also, we might need to
> generate new keys with columnfamily name as the key and count as the value.
>
> Best regards,
> Yamini Joshi
>
>

Re: Net ColumnFamily Count

Reply via email to