Thanks for the input Roland. I share a similar use case.
@Renato, the gora.write.buffer.limit property can be overridden within the
Hadoop Configuration. AFAIK we can override in nutch-site.xml if using
Nutch or core-site.xml if using Gora over hadoop.
This is the way I have been tinkering.
I was curious as to obtaining performance gains.


On Tuesday, March 5, 2013, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:
> This is a very interesting topic to discuss about thank you for starting
it Lewis (:
> I think we have to think about two different application types, the ones
doing real time processing, and the ones doing batch processing. For the
former, a smaller flush-threshold is probably a better choice, and for
the latter one a value depending on the application should be used i.e.
different applications might consider "batch operations differently".
> Just one quick question here Lewis, is this possible to set this
parameter through the configuration file? or is it always hard-coded? I
think it should be settable from outside Gora without having to recompile
Gora every time we want to change it. What do you guys think?
>
>
> Renato M.
>
> On Mar 5, 2013 7:23 AM, "Roland" <rol...@rvh-gmbh.de> wrote:Hi Lewis,
>>
>> for me (nutch use case) a lower value is better, because of 3 main
reasons:
>> a) load is better distributed for the db backend
>> b) when running the nutch fetcherJob, towards the end of the job you
don't have to wait for gora flushing all data to backend, because it was
mostly done during the fetching
>> c) during debugging you'll get gora/cassandra flushing errors much
earlier
>>
>> I'm running with 1k write buffer for cassandra.
>>
>> --Roland
>>
>> Am 01.03.2013 02:01, schrieb Lewis John Mcgibbney:
>>
>> Hi,
>> We use the above class for write operations in the Nutch InjectorJob.
>> I am writing large URL lists to Cassandra using Gora and wonder if I can
get it working better.
>> Currently I am getting around 10000 writes per 90 seconds. Don't get me
wrong, I am working from a very primitive laptop and right now I am merely
attempting to push the software.
>> What I want to know, is what is the consequence of altering the
BUFFER_LIMIT_WRITE_VALUE?
>> Currently we set a default value of 10K for the limit on this value,
meaning that Gora batches flushes to reflect this value.
>> Is a higher or lower value better? Is there any evidence of better
performance by changing this value.
>> I see it a pretty critical so I am wanting to understand more about this.
>> Thanks
>> Lewis
>>
>> --
>> Lewis
>>
>>
>

-- 
*Lewis*

Reply via email to