Re: Mapreduce, Indexing and Logging

2013-03-03 Thread Ed Kohlwey
As John mentioned the specs of the three nodes will have a significant effect. Since you already have hardware selected I would build it, run it, and then add nodes if the performance is low. You can use things like the Accumulo file output format to write directly to map files that you subsequentl

Re: Mapreduce, Indexing and Logging

2013-03-03 Thread Aji Janis
John and Ed thank you both for your responses. Using Solr for search is a requirement. When we process data theres quite a bit of information we are interested in indexing (dates, locations, etc) and we use Solr for that. All the data will be stored in Accumulo after processing and then indexed in

Re: Mapreduce, Indexing and Logging

2013-03-03 Thread Ed Kohlwey
With respect to indexing, what are you trying to achieve? I have not used Solr with Accumulo but have done indexing directly in Accumulo, leveraging Lucene libraries as appropriate. You can get very good performance specific to your domain by doing so and its less O&M overhead. Of c course then you

Re: Mapreduce, Indexing and Logging

2013-03-02 Thread John Vines
1. This is quite variable. It depends on your hardware specs, primarily CPU and disk throughput. It also depends on how your system is configured for these resources and your typical mutation size. How your mutations are distributed is another factor. 2. Under the hood, the output format uses a Bat

Mapreduce, Indexing and Logging

2013-03-02 Thread Aji Janis
Hello, I am investigating how well accumulo will handle mapreduce jobs. I am interested in hearing about any known issues from anyone running mapreduce with accumulo as their source and sink. Specifically, I want to hear your thoughts about the following: Assume cluster has 50 nodes. Accumulo ru