[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

Ivan Kelly (JIRA) Wed, 18 Aug 2010 02:48:45 -0700

    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899806#action_12899806
 ]


Ivan Kelly commented on ZOOKEEPER-702:
--------------------------------------

Hi Abmar,

The code looks good. Excellent job! I have a few comments about the API. 

API:
Ping and Heartbeat are used interchangably. 

There are heartbeatReceived & appMessageReceived and pingSent & appMessageSent. 
In the actual implementations these do the same or very similar things. It 
would make sense to compress them into one api, and have the type of message 
received/sent specified by a parameters. Whats more, if a new type of message 
comes along, the failure detectors will already handle it without each needing 
to have their code updated.

void messageReceived(String id, long now, MessageType type);
void messageSent(String id, long now, MessageType type);

The getFailedObjects and getObjectsToPing are used to return lists, which in 
all but one case have contains called on them once and are then discarded. I 
think these would be better as:

bool isFailed(String id, long now)
bool shouldPing(String id, long now)

So that:
a) You don't construct a list which is going to be discarded very soon.
b) The internals of the FailureDetector are better hidden from the user of the 
API.

In fact, shouldPing could be discarded and getTimeToNextPing(String id, long 
now) == 0 used.

release should be releaseMonitored to make it symmetrical with 
registerMonitored.

General:
The failure detection factory could use newInstance to create arbitary failure 
detectors. So after if (fdName.equals("slicedhb")) fails, you could check if 
fdName is a fully qualified class name, and if so find the class and create a 
newInstance. This way its possible to add a failure detector without having to 
modify the zookeeper code. 

There's a lot of duplication in the javadoc for the Zookeeper class. A lot of 
this could be eliminated by using the @see directive.

FailureDetector.java L131, typo, pint instead of ping.


> GSoC 2010: Failure Detector Model
> ---------------------------------
>
>                 Key: ZOOKEEPER-702
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
>             Project: Zookeeper
>          Issue Type: Wish
>            Reporter: Henry Robinson
>            Assignee: Abmar Barros
>         Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
> chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
> ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch
>
>
> Failure Detector Module
> Possible Mentor
> Henry Robinson (henry at apache dot org)
> Requirements
> Java, some distributed systems knowledge, comfort implementing distributed 
> systems protocols
> Description
> ZooKeeper servers detects the failure of other servers and clients by 
> counting the number of 'ticks' for which it doesn't get a heartbeat from 
> other machines. This is the 'timeout' method of failure detection and works 
> very well; however it is possible that it is too aggressive and not easily 
> tuned for some more unusual ZooKeeper installations (such as in a wide-area 
> network, or even in a mobile ad-hoc network).
> This project would abstract the notion of failure detection to a dedicated 
> Java module, and implement several failure detectors to compare and contrast 
> their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
> phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
> is much more tunable and has some very interesting properties. This is a 
> great project if you are interested in distributed algorithms, or want to 
> help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

Reply via email to