Thanks for the input Roland. I share a similar use case. @Renato, the gora.write.buffer.limit property can be overridden within the Hadoop Configuration. AFAIK we can override in nutch-site.xml if using Nutch or core-site.xml if using Gora over hadoop. This is the way I have been tinkering. I was curious as to obtaining performance gains.
On Tuesday, March 5, 2013, Renato Marroquín Mogrovejo < renatoj.marroq...@gmail.com> wrote: > This is a very interesting topic to discuss about thank you for starting it Lewis (: > I think we have to think about two different application types, the ones doing real time processing, and the ones doing batch processing. For the former, a smaller flush-threshold is probably a better choice, and for the latter one a value depending on the application should be used i.e. different applications might consider "batch operations differently". > Just one quick question here Lewis, is this possible to set this parameter through the configuration file? or is it always hard-coded? I think it should be settable from outside Gora without having to recompile Gora every time we want to change it. What do you guys think? > > > Renato M. > > On Mar 5, 2013 7:23 AM, "Roland" <rol...@rvh-gmbh.de> wrote:Hi Lewis, >> >> for me (nutch use case) a lower value is better, because of 3 main reasons: >> a) load is better distributed for the db backend >> b) when running the nutch fetcherJob, towards the end of the job you don't have to wait for gora flushing all data to backend, because it was mostly done during the fetching >> c) during debugging you'll get gora/cassandra flushing errors much earlier >> >> I'm running with 1k write buffer for cassandra. >> >> --Roland >> >> Am 01.03.2013 02:01, schrieb Lewis John Mcgibbney: >> >> Hi, >> We use the above class for write operations in the Nutch InjectorJob. >> I am writing large URL lists to Cassandra using Gora and wonder if I can get it working better. >> Currently I am getting around 10000 writes per 90 seconds. Don't get me wrong, I am working from a very primitive laptop and right now I am merely attempting to push the software. >> What I want to know, is what is the consequence of altering the BUFFER_LIMIT_WRITE_VALUE? >> Currently we set a default value of 10K for the limit on this value, meaning that Gora batches flushes to reflect this value. >> Is a higher or lower value better? Is there any evidence of better performance by changing this value. >> I see it a pretty critical so I am wanting to understand more about this. >> Thanks >> Lewis >> >> -- >> Lewis >> >> > -- *Lewis*