[jira] [Comment Edited] (YARN-5269) Bubble exceptions and errors all the way up the calls, including to clients.

2017-03-09 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903540#comment-15903540
 ] 

Haibo Chen edited comment on YARN-5269 at 3/9/17 10:08 PM:
---

Based on today's discussion, questions we need to answer are
1) for the synchronous putEntities() API, what do we promise if no 
error/exception is returned to clients? In what scenarios do we bubble 
exceptions/errors to clients?
2) similarly for the asynchronous write API

This is more to explicate the semantics+guarantees of our write API so that 
clients will have correct expectations. I'll check the existing code base and 
share my findings. [~vrushalic], [~jrottinghuis] can chime in on more 
complicated scenarios where spooled-buffered-mutator is involved.


was (Author: haibochen):
Based on today's discussion, questions we need to answer are
1) for the synchronous putEntities() API, what do we promise if no 
error/exception is returned to clients? In what scenarios do we bubble 
exceptions/errors to clients?
2) similarly for the asynchronous write API

This is more to explicate the semantics+guarantees of our write API so that 
clients will have correct expectations. I'll check the existing code base and 
share my findings. [~vrushalic], [~jrottinghuis] chime in on more complicated 
scenarios where spooled-buffered-mutator is involved.

> Bubble exceptions and errors all the way up the calls, including to clients.
> 
>
> Key: YARN-5269
> URL: https://issues.apache.org/jira/browse/YARN-5269
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Joep Rottinghuis
>Assignee: Haibo Chen
>  Labels: YARN-5355, yarn-5355-merge-blocker
>
> Currently we ignore (swallow) exception from the HBase side in many cases 
> (reads and writes).
> Also, on the client side, neither TimelineClient#putEntities (the v2 flavor) 
> nor the #putEntitiesAsync method return any value.
> For the second drop we may want to consider how we properly bubble up 
> exceptions throughout the write and reader call paths and if we want to 
> return a response in putEntities and some future kind of result for 
> putEntitiesAsync.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5269) Bubble exceptions and errors all the way up the calls, including to clients.

2017-03-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937223#comment-15937223
 ] 

Varun Saxena edited comment on YARN-5269 at 3/22/17 9:54 PM:
-

We had actually discussed about TimelinePutResponse long before.
I think where we thought TimelinePutResponse will be useful is that in 
HBaseTimelineWriterImpl#write we write entities one by one. And it maybe 
possible that a set of writes are successful and another set is not.

But frankly writes are not persisted till flush is called, if buffer size is 
not exceeded.
Should we attempt a flush for sync put even if an exception occurs on write. 
Can a flush succeed in such a scenario?
cc [~vrushalic]


was (Author: varun_saxena):
We had actually discussed about TimelinePutResponse long before.
I think where we though TimelinePutResponse will be useful is that in 
HBaseTimelineWriterImpl#write we write entities one by one. And it maybe 
possible that a set of writes are successful and another set is not.

But frankly writes are not persisted till flush is called, if buffer size is 
not exceeded.
Should we attempt a flush for sync put even if an exception occurs on write. 
Can a flush succeed in such a scenario?
cc [~vrushalic]

> Bubble exceptions and errors all the way up the calls, including to clients.
> 
>
> Key: YARN-5269
> URL: https://issues.apache.org/jira/browse/YARN-5269
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Joep Rottinghuis
>Assignee: Haibo Chen
>  Labels: YARN-5355, yarn-5355-merge-blocker
>
> Currently we ignore (swallow) exception from the HBase side in many cases 
> (reads and writes).
> Also, on the client side, neither TimelineClient#putEntities (the v2 flavor) 
> nor the #putEntitiesAsync method return any value.
> For the second drop we may want to consider how we properly bubble up 
> exceptions throughout the write and reader call paths and if we want to 
> return a response in putEntities and some future kind of result for 
> putEntitiesAsync.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5269) Bubble exceptions and errors all the way up the calls, including to clients.

2017-03-22 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937308#comment-15937308
 ] 

Haibo Chen edited comment on YARN-5269 at 3/22/17 10:48 PM:


Per offline discussion with [~vrushalic] and [~jrottinghuis] in the last weekly 
sync, in case of putEntities() requests, exceptions can happen even after data 
is already persisted in the backend (i.e., putEntities() throws exception means 
the data may have not been written to the backend),  so TimelineClients always 
need to make sure the data they send is idempotent if they ever want to retry.  
Whether we want to throw an exception only if an error occurred specifically 
because of the particular entities, or just in any case where there was a 
problem writing a whole batch, is another question. The error we can get from 
HBase may be too corse to let us provide errors on a per entitiy basis. 
[~vrushalic] [~jrottinghuis] Please correct me any of my misunderstanding.




was (Author: haibochen):
Per offline discussion with [~vrushalic] and [~jrottinghuis] in the last weekly 
sync, in case of putEntities() requests, exceptions can happen even after data 
is already persisted in the backend (i.e., putEntities() throws exception means 
the data may have not been written to the backend),  so TimelineClients always 
need to make sure the data they send is idempotent if they ever want to retry.  
Whether we want to throw an exception only if an error occurred specifically 
because of the particular entities, or just in any case where there was a 
problem writing a whole batch, is another question. The error we can get from 
HBase may be too corse to let us enable per-entitiy error/exception. 
[~vrushalic] [~jrottinghuis] Please correct me any of my misunderstanding.



> Bubble exceptions and errors all the way up the calls, including to clients.
> 
>
> Key: YARN-5269
> URL: https://issues.apache.org/jira/browse/YARN-5269
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Joep Rottinghuis
>Assignee: Haibo Chen
>  Labels: YARN-5355, yarn-5355-merge-blocker
>
> Currently we ignore (swallow) exception from the HBase side in many cases 
> (reads and writes).
> Also, on the client side, neither TimelineClient#putEntities (the v2 flavor) 
> nor the #putEntitiesAsync method return any value.
> For the second drop we may want to consider how we properly bubble up 
> exceptions throughout the write and reader call paths and if we want to 
> return a response in putEntities and some future kind of result for 
> putEntitiesAsync.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5269) Bubble exceptions and errors all the way up the calls, including to clients.

2017-04-12 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967126#comment-15967126
 ] 

Haibo Chen edited comment on YARN-5269 at 4/13/17 4:46 AM:
---

bq.  let's restrict the focus for this jira to showing any exception back to 
the client as and when we determine that a problem occurred writing to HBase
One the client side, any error in the response will be wrapped in an exception 
(after retries) and thrown correctly for TImelineV2Client.putEntities() calls. 
On the server side, we wrap timelinecollector.putEntities() and 
timelinecollector.putEntitiesAsyn() calls with try-catch clause in 
TimelineCollectorWebService, so any problem writing to HBase, that 
TimelineCollector indicates in the form of exceptions thrown by 
putEntities()/putEntitiesAsync(), will be returned to client as 500 error 
response. 

What we are missing though is checking TimelineWriteResponse returned by 
TimelineCollector.putEntities(). Per discussion above with Varun and its 
javadoc, TimelineWriteResponse allows details indications of errors and 
individual entities that they are associated with. To comply with current 
TimelineCollector API, it seems that we should inspect TimelineWriteResponse 
and return all found errors to clients.  

However, TimelineWriteResponse is hardly useful given our current 
implementation for two reasons: 

1) TimelineWriteResponse returned by TimelineCollector.putEntities() is what is 
returned by TimelineWriter.write().  HBaseTImelineWriter.write() is essentially 
an async operation, the returned TimelineResponse never contains any error. 
Problems are solely indicated as IOException
{code}
public class HBaseTimelineWriterImpl extends AbstractService implements
TimelineWriter {
 public TimelineWriteResponse write(String clusterId, String userId,
  String flowName, String flowVersion, long flowRunId, String appId,
  TimelineEntities data) throws IOException {

TimelineWriteResponse putStatus = new TimelineWriteResponse();
// defensive coding to avoid NPE during row key construction
if ((flowName == null) || (appId == null) || (clusterId == null)
|| (userId == null)) {
  LOG.warn("Found null for one of: flowName=" + flowName + " appId=" + appId
  + " userId=" + userId + " clusterId=" + clusterId
  + " . Not proceeding with writing to hbase");
  return putStatus;
}
...
return putStatus;
}
{code}
In cases where flowName/appId/userId/clusterId/ is null, for example, no error 
is included in the response.

2) We can no longer just return 500 with an exception message to clients. More 
involved structure is needed to return errors from TimelineCollector to 
TimelineV2Client, and callers of timelinev2Client.putEntity(TimelineEntity... 
entities) will now be expected to handle all cases of errors unless we change 
timelinev2Client.putEntity() to only accept one entity per call

It is OK for now to not handle TimelineWriteResponse given that 
HBaseTimelineWriter always return a dummy one and it is the only backend 
implementation. Thus, I am removing the yarn5355-merge-blocker label. 

But once we expect alternative backend/TimelineWriter implementations, we need 
to either add support in TimelineCollectorWebService and TimelineV2Client to 
handle detailed error response, or remove TimelineWriteResponse from 
TimelineWriter.write() signature completely if we cannot really make use of it 
like we desire to.


was (Author: haibochen):
bq.  let's restrict the focus for this jira to showing any exception back to 
the client as and when we determine that a problem occurred writing to HBase
One the client side, any error in the response will be wrapped in an exception 
(after retries) and thrown correctly for TImelineV2Client.putEntities() calls. 
On the server side, we wrap timelinecollector.putEntities() and 
timelinecollector.putEntitiesAsyn() calls with try-catch clause in 
TimelineCollectorWebService, so any problem writing to HBase, that 
TimelineCollector indicates in the form of exceptions thrown by 
putEntities()/putEntitiesAsync(), will be returned to client as 500 error 
response. 

What we are missing though is checking TimelineWriteResponse returned by 
TimelineCollector.putEntities(). Per discussion above with Varun and its 
javadoc, TimelineWriteResponse allows details indications of errors and 
individual entities that they are associated with. To comply with current 
TimelineCollector API, it seems that we should inspect TimelineWriteResponse 
and return all found errors to clients.  

However, TimelineWriteResponse is hardly useful given our current 
implementation for two reasons: 
1) TimelineWriteResponse returned by TimelineCollector.putEntities() is what is 
returned by TimelineWriter.write().  HBaseTImelineWriter.write() is essentially 
an async operation, the returned TimelineRespo