On Wed, Aug 24, 2016 at 9:22 AM, Sven Hodapp <[email protected]> wrote: > Hi there, > > currently we're experimenting with a two node Accumulo cluster (two tablet > servers) setup for document storage. > This documents are decomposed up to the sentence level. > > Now I'm using a BatchScanner to assemble the full document like this: > > val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // > ARTIFACTS table currently hosts ~30GB data, ~200M entries on ~45 tablets > bscan.setRanges(ranges) // there are like 3000 Range.exact's in the > ranges-list > for (entry <- bscan.asScala) yield { > val key = entry.getKey() > val value = entry.getValue() > // etc. > } > > For larger full documents (e.g. 3000 exact ranges), this operation will take > about 12 seconds. > But shorter documents are assembled blazing fast... > > Is that to much for a BatchScanner / I'm misusing the BatchScaner? > Is that a normal time for such a (seek) operation? > Can I do something to get a better seek performance?
How many threads did you configure the batch scanner with and did you try varying this? > > Note: I have already enabled bloom filtering on that table. > > Thank you for any advice! > > Regards, > Sven > > -- > Sven Hodapp, M.Sc., > Fraunhofer Institute for Algorithms and Scientific Computing SCAI, > Department of Bioinformatics > Schloss Birlinghoven, 53754 Sankt Augustin, Germany > [email protected] > www.scai.fraunhofer.de
