+1 and we'll add you to the contributors list for doing so, if you want and aren't already on it.
On Fri, Jul 15, 2016, 20:18 Dylan Hutchison <[email protected]> wrote: > Hi Mario, > > As you gain more experience with Accumulo, feel free to write or modify > Accumulo's documentation in the places you find it lacking and send a PR. > If you find a topic confusing, probably many others do too. > > Cheers, Dylan > > On Fri, Jul 15, 2016 at 4:04 PM, Christopher <[email protected]> wrote: > >> Ah, I thought you were doing WholeRowIterator -> RowCounterIterator >> I now understand you're doing WholeRowIterator -> SomeCustomFilter >> (column predicate) -> RowCounterIterator >> >> That's okay to do, but it may be better to have an iterator that creates >> a clone of its source at the beginning of each row, advances to do the >> filtering, and then informs the spawning iterator to either accept or >> reject. This is, admittedly, far more complicated than WholeRowIterator, >> but it can safer if you have really big rows which don't fit in memory. >> >> To your question about WholeRowIterator, yes, it's fine. The iterator >> will always see sorted data (unless it's sitting on top of another iterator >> which breaks this... which is possible, but not recommended at all), even >> though the client may not. And yes, rows are never split (but if the query >> range doesn't include the full row, it may return early). Their usage is >> orthogonal, and can be used together or not. >> >> On Fri, Jul 15, 2016 at 6:35 PM Mario Pastorelli < >> [email protected]> wrote: >> >>> The WholeRowIterator is for filtering: I need all the columns that the >>> filter requires so that the filter can see if the row matches or not the >>> query. That's the only proper way I found to implement logic operators on >>> predicated over columns of the same row. >>> >>> Actually I do have a question about WholeRowIterator, while we are >>> talking about them. Do they make sense when used with a BatchScanner? My >>> guess is yes because while the BatchScanner can return data non-sorted to >>> the client, when it is scanning a single tablet the data is sorted. Because >>> the data of the same rowId is never split (right?) then there is no problem >>> in using a WholeRowIterator with a BatchScanner. Is this correct? I really >>> can't find much documentation for Accumulo and the book doesn't help enough. >>> >>> On Sat, Jul 16, 2016 at 12:29 AM, Christopher <[email protected]> >>> wrote: >>> >>>> It'd be more efficient to use the FirstEntryInRowIterator to just grab >>>> one each, rather than the WholeRowIterator which could use up a lot of >>>> memory unnecessarily. >>>> >>>> On Fri, Jul 15, 2016 at 6:20 PM Mario Pastorelli < >>>> [email protected]> wrote: >>>> >>>>> I'm actually using this after a wholerowiterator, which is used to >>>>> filter rows with the same rowId. >>>>> >>>>> On Fri, Jul 15, 2016 at 10:02 PM, William Slacum <[email protected]> >>>>> wrote: >>>>> >>>>>> The iterator in the gist also counts cells/entries/KV pairs, not >>>>>> unique rows. You'll want to have some way to skip to the next row value >>>>>> if >>>>>> you want the count to be reflective of the number of rows being read. >>>>>> >>>>>> On Fri, Jul 15, 2016 at 3:34 PM, Shawn Walker < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> My read is that you're mistaking the sequence of calls Accumulo will >>>>>>> be making to your iterator. The sequence isn't quite the same as a Java >>>>>>> iterator (initially positioned "before" the first element), and is more >>>>>>> like a C++ iterator: >>>>>>> >>>>>>> 0. Accumulo calls seek(...) >>>>>>> 1. Is there more data? Accumulo calls hasTop(). You return yes. >>>>>>> 2. Ok, so there's data. Accumulo calls getTopKey(), getTopValue() >>>>>>> to retrieve the data. You return a key indicating 0 columns seen (since >>>>>>> next() hasn't yet been called) >>>>>>> 3. First datum done, Accumulo calls next() >>>>>>> ... >>>>>>> >>>>>>> I imagine that if you pull the second item out of your scan result, >>>>>>> it'll have the number you expect. Alternately, you might consider >>>>>>> performing the count computation during an override of the seek(...) >>>>>>> method, instead of in the next(...) method. >>>>>>> >>>>>>> -- >>>>>>> Shawn Walker >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jul 15, 2016 at 2:24 PM, Mario Pastorelli < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I'm trying to create a RowCounterIterator that counts all the rows >>>>>>>> and returns only one key-value with the counter inside. The problem is >>>>>>>> that >>>>>>>> I can't get it work. The Scala code is available in the gist >>>>>>>> <https://gist.github.com/melrief/5f2ca248f1a980ddead2f2eeb19e6389> >>>>>>>> together with some pseudo-code of a test. The problem is that if I add >>>>>>>> an >>>>>>>> entry to my table, this iterator will return 0 instead of 1 and >>>>>>>> apparently >>>>>>>> the reason is that super.hasTop() is always false. I've tried without >>>>>>>> the >>>>>>>> iterator and the scanner returns 1 elements. Any idea of what I'm doing >>>>>>>> wrong here? Is WrappingIterator the right class to extend for this >>>>>>>> kind of >>>>>>>> behaviour? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Mario >>>>>>>> >>>>>>>> -- >>>>>>>> Mario Pastorelli | TERALYTICS >>>>>>>> >>>>>>>> *software engineer* >>>>>>>> >>>>>>>> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland >>>>>>>> phone: +41794381682 >>>>>>>> email: [email protected] >>>>>>>> www.teralytics.net >>>>>>>> >>>>>>>> Company registration number: CH-020.3.037.709-7 | Trade register >>>>>>>> Canton Zurich >>>>>>>> Board of directors: Georg Polzer, Luciano Franceschina, Mark >>>>>>>> Schmitz, Yann de Vries >>>>>>>> >>>>>>>> This e-mail message contains confidential information which is for >>>>>>>> the sole attention and use of the intended recipient. Please notify us >>>>>>>> at >>>>>>>> once if you think that it may not be intended for you and delete it >>>>>>>> immediately. >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Mario Pastorelli | TERALYTICS >>>>> >>>>> *software engineer* >>>>> >>>>> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland >>>>> phone: +41794381682 >>>>> email: [email protected] >>>>> www.teralytics.net >>>>> >>>>> Company registration number: CH-020.3.037.709-7 | Trade register >>>>> Canton Zurich >>>>> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, >>>>> Yann de Vries >>>>> >>>>> This e-mail message contains confidential information which is for the >>>>> sole attention and use of the intended recipient. Please notify us at once >>>>> if you think that it may not be intended for you and delete it >>>>> immediately. >>>>> >>>> >>> >>> >>> -- >>> Mario Pastorelli | TERALYTICS >>> >>> *software engineer* >>> >>> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland >>> phone: +41794381682 >>> email: [email protected] >>> www.teralytics.net >>> >>> Company registration number: CH-020.3.037.709-7 | Trade register Canton >>> Zurich >>> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, >>> Yann de Vries >>> >>> This e-mail message contains confidential information which is for the >>> sole attention and use of the intended recipient. Please notify us at once >>> if you think that it may not be intended for you and delete it immediately. >>> >> >
