As John mentioned the specs of the three nodes will have a significant
effect. Since you already have hardware selected I would build it, run it,
and then add nodes if the performance is low. You can use things like the
Accumulo file output format to write directly to map files that you
subsequentl
John and Ed thank you both for your responses.
Using Solr for search is a requirement. When we process data theres quite a
bit of information we are interested in indexing (dates, locations, etc)
and we use Solr for that. All the data will be stored in Accumulo after
processing and then indexed in
With respect to indexing, what are you trying to achieve? I have not used
Solr with Accumulo but have done indexing directly in Accumulo, leveraging
Lucene libraries as appropriate. You can get very good performance specific
to your domain by doing so and its less O&M overhead. Of c course then you
1. This is quite variable. It depends on your hardware specs, primarily CPU
and disk throughput. It also depends on how your system is configured for
these resources and your typical mutation size. How your mutations are
distributed is another factor.
2. Under the hood, the output format uses a Bat
Hello,
I am investigating how well accumulo will handle mapreduce jobs. I am
interested in hearing about any known issues from anyone running mapreduce
with accumulo as their source and sink. Specifically, I want to hear your
thoughts about the following:
Assume cluster has 50 nodes.
Accumulo ru