Re: RF=1

Patrik Modesto Wed, 17 Aug 2011 04:29:30 -0700

Hi,

while I was investigating this issue, I've found that hadoop+cassandra
don't work if you stop even just one node in the cluster. It doesn't
depend on RF. ColumnFamilyRecordReader gets list of nodes (acording
the RF) but chooses just the local host and if there is no cassandra
running localy it throws RuntimeError exception. Which in turn marks
the MapReduce task as failed.


I've created a patch that makes ColumnFamilyRecordReader to try the
local node and if it fails tries the other nodes in it's list. The
patch is here http://pastebin.com/0RdQ0HMx I think attachements are
not allowed on this ML.

Please test it and apply. It's for 0.7.8 version.

Regards,
P.


On Wed, Aug 3, 2011 at 13:59, aaron morton <aa...@thelastpickle.com> wrote:
> If you want to take a look o.a.c.hadoop.ColumnFamilyRecordReader.getSplits() 
> is the function that gets the splits.
>
>
> Cheers
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 3 Aug 2011, at 16:18, Patrik Modesto wrote:
>
>> On Tue, Aug 2, 2011 at 23:10, Jeremiah Jordan
>> <jeremiah.jor...@morningstar.com> wrote:
>>> If you have RF=1, taking one node down is going to cause 25% of your
>>> data to be unavailable.  If you want to tolerate a machines going down
>>> you need to have at least RF=2, if you want to use quorum and have a
>>> machine go down, you need at least RF=3.
>>
>> I know I can have RF > 1 but I have limited resources and I don't care
>> lossing 25% of the data. RF > 1 basicaly means if a node goes down I
>> have the data elsewhere, but what I need is if node goes down just
>> ignore its range. I can handle it in my applications using thrift, but
>> the hadoop-mapreduce can't handle it. It just fails with "Exception in
>> thread "main" java.io.IOException: Could not get input splits". Is
>> there a way to say ignore this range to hadoop?
>>
>> Regards,
>> P.
>
>

Re: RF=1

Reply via email to