[jira] [Updated] (YARN-6376) Exceptions caused by synchronous putEntities requests can be swallowed
[ https://issues.apache.org/jira/browse/YARN-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-6376: --- Fix Version/s: 2.9.0 > Exceptions caused by synchronous putEntities requests can be swallowed > -- > > Key: YARN-6376 > URL: https://issues.apache.org/jira/browse/YARN-6376 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Critical > Labels: atsv2-hbase, yarn-5355-merge-blocker > Fix For: 2.9.0, YARN-5355, YARN-5355-branch-2, 3.0.0-alpha4 > > Attachments: YARN-6376.00.patch > > > TimelineCollector.putEntitities() is currently implemented by calling > TimelineWriter.write() followed by TimelineWriter.flush(). Given > HBaseTimelineWriter.write() is an asynchronous operation, it is possible that > TimelineClient sends a synchronous putEntities() request for critical data, > but never gets back an exception even though the HBase write request to store > the entities may have failed. > This is due to a race condition between the WriterFlushThread in > TimelineCollectorManager and web threads handling synchronous putEntities() > requests. Entities are first put into the buffer by the web thread, it is > possible that before the web thread invokes writer.flush(), WriterFlushThread > is fired up to flush the writer. If the entities were not successfully > written to the backend during flush, the WriterFlushThread would just simply > log an error, whereas the web thread would never get an exception out from > its writer.flush() invocation. This is bad because the reason of > TimelineClient sending synchronously putEntities() is to retry upon any > exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6376) Exceptions caused by synchronous putEntities requests can be swallowed
[ https://issues.apache.org/jira/browse/YARN-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-6376: - Labels: atsv2-hbase yarn-5355-merge-blocker (was: yarn-5355-merge-blocker) > Exceptions caused by synchronous putEntities requests can be swallowed > -- > > Key: YARN-6376 > URL: https://issues.apache.org/jira/browse/YARN-6376 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Critical > Labels: atsv2-hbase, yarn-5355-merge-blocker > Fix For: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha4 > > Attachments: YARN-6376.00.patch > > > TimelineCollector.putEntitities() is currently implemented by calling > TimelineWriter.write() followed by TimelineWriter.flush(). Given > HBaseTimelineWriter.write() is an asynchronous operation, it is possible that > TimelineClient sends a synchronous putEntities() request for critical data, > but never gets back an exception even though the HBase write request to store > the entities may have failed. > This is due to a race condition between the WriterFlushThread in > TimelineCollectorManager and web threads handling synchronous putEntities() > requests. Entities are first put into the buffer by the web thread, it is > possible that before the web thread invokes writer.flush(), WriterFlushThread > is fired up to flush the writer. If the entities were not successfully > written to the backend during flush, the WriterFlushThread would just simply > log an error, whereas the web thread would never get an exception out from > its writer.flush() invocation. This is bad because the reason of > TimelineClient sending synchronously putEntities() is to retry upon any > exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6376) Exceptions caused by synchronous putEntities requests can be swallowed
[ https://issues.apache.org/jira/browse/YARN-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6376: - Attachment: YARN-6376.00.patch > Exceptions caused by synchronous putEntities requests can be swallowed > -- > > Key: YARN-6376 > URL: https://issues.apache.org/jira/browse/YARN-6376 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Critical > Labels: yarn-5355-merge-blocker > Attachments: YARN-6376.00.patch > > > TimelineCollector.putEntitities() is currently implemented by calling > TimelineWriter.write() followed by TimelineWriter.flush(). Given > HBaseTimelineWriter.write() is an asynchronous operation, it is possible that > TimelineClient sends a synchronous putEntities() request for critical data, > but never gets back an exception even though the HBase write request to store > the entities may have failed. > This is due to a race condition between the WriterFlushThread in > TimelineCollectorManager and web threads handling synchronous putEntities() > requests. Entities are first put into the buffer by the web thread, it is > possible that before the web thread invokes writer.flush(), WriterFlushThread > is fired up to flush the writer. If the entities were not successfully > written to the backend during flush, the WriterFlushThread would just simply > log an error, whereas the web thread would never get an exception out from > its writer.flush() invocation. This is bad because the reason of > TimelineClient sending synchronously putEntities() is to retry upon any > exception. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6376) Exceptions caused by synchronous putEntities requests can be swallowed
[ https://issues.apache.org/jira/browse/YARN-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6376: - Summary: Exceptions caused by synchronous putEntities requests can be swallowed (was: Exceptions caused by synchronous putEntities requests can be swallowed in TimelineCollector) > Exceptions caused by synchronous putEntities requests can be swallowed > -- > > Key: YARN-6376 > URL: https://issues.apache.org/jira/browse/YARN-6376 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Critical > Labels: yarn-5355-merge-blocker > > TimelineCollector.putEntitities() is currently implemented by calling > TimelineWriter.write() followed by TimelineWriter.flush(). Given > HBaseTimelineWriter.write() is an asynchronous operation, it is possible that > TimelineClient sends a synchronous putEntities() request for critical data, > but never gets back an exception even though the HBase write request to store > the entities may have failed. > This is due to a race condition between the WriterFlushThread in > TimelineCollectorManager and web threads handling synchronous putEntities() > requests. Entities are first put into the buffer by the web thread, it is > possible that before the web thread invokes writer.flush(), WriterFlushThread > is fired up to flush the writer. If the entities were not successfully > written to the backend during flush, the WriterFlushThread would just simply > log an error, whereas the web thread would never get an exception out from > its writer.flush() invocation. This is bad because the reason of > TimelineClient sending synchronously putEntities() is to retry upon any > exception. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6376) Exceptions caused by synchronous putEntities requests can be swallowed in TimelineCollector
[ https://issues.apache.org/jira/browse/YARN-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6376: - Issue Type: Sub-task (was: Bug) Parent: YARN-5355 > Exceptions caused by synchronous putEntities requests can be swallowed in > TimelineCollector > --- > > Key: YARN-6376 > URL: https://issues.apache.org/jira/browse/YARN-6376 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen >Priority: Critical > Labels: yarn-5355-merge-blocker > > TimelineCollector.putEntitities() is currently implemented by calling > TimelineWriter.write() followed by TimelineWriter.flush(). Given > HBaseTimelineWriter.write() is an asynchronous operation, it is possible that > TimelineClient sends a synchronous putEntities() request for critical data, > but never gets back an exception even though the HBase write request to store > the entities may have failed. > This is due to a race condition between the WriterFlushThread in > TimelineCollectorManager and web threads handling synchronous putEntities() > requests. Entities are first put into the buffer by the web thread, it is > possible that before the web thread invokes writer.flush(), WriterFlushThread > is fired up to flush the writer. If the entities were not successfully > written to the backend during flush, the WriterFlushThread would just simply > log an error, whereas the web thread would never get an exception out from > its writer.flush() invocation. This is bad because the reason of > TimelineClient sending synchronously putEntities() is to retry upon any > exception. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6376) Exceptions caused by synchronous putEntities requests can be swallowed in TimelineCollector
[ https://issues.apache.org/jira/browse/YARN-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6376: - Description: TimelineCollector.putEntitities() is currently implemented by calling TimelineWriter.write() followed by TimelineWriter.flush(). Given HBaseTimelineWriter.write() is an asynchronous operation, it is possible that TimelineClient sends a synchronous putEntities() request for critical data, but never gets back an exception even though the HBase write request to store the entities may have failed. This is due to a race condition between the WriterFlushThread in TimelineCollectorManager and web threads handling synchronous putEntities() requests. Entities are first put into the buffer by the web thread, it is possible that before the web thread invokes writer.flush(), WriterFlushThread is fired up to flush the writer. If the entities were not successfully written to the backend during flush, the WriterFlushThread would just simply log an error, whereas the web thread would never get an exception out from its writer.flush() invocation. This is bad because the reason of TimelineClient sending synchronously putEntities() is to retry upon any exception. > Exceptions caused by synchronous putEntities requests can be swallowed in > TimelineCollector > --- > > Key: YARN-6376 > URL: https://issues.apache.org/jira/browse/YARN-6376 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen >Priority: Critical > Labels: yarn-5355-merge-blocker > > TimelineCollector.putEntitities() is currently implemented by calling > TimelineWriter.write() followed by TimelineWriter.flush(). Given > HBaseTimelineWriter.write() is an asynchronous operation, it is possible that > TimelineClient sends a synchronous putEntities() request for critical data, > but never gets back an exception even though the HBase write request to store > the entities may have failed. > This is due to a race condition between the WriterFlushThread in > TimelineCollectorManager and web threads handling synchronous putEntities() > requests. Entities are first put into the buffer by the web thread, it is > possible that before the web thread invokes writer.flush(), WriterFlushThread > is fired up to flush the writer. If the entities were not successfully > written to the backend during flush, the WriterFlushThread would just simply > log an error, whereas the web thread would never get an exception out from > its writer.flush() invocation. This is bad because the reason of > TimelineClient sending synchronously putEntities() is to retry upon any > exception. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6376) Exceptions caused by synchronous putEntities requests can be swallowed in TimelineCollector
[ https://issues.apache.org/jira/browse/YARN-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6376: - Labels: yarn-5355-merge-blocker (was: ) > Exceptions caused by synchronous putEntities requests can be swallowed in > TimelineCollector > --- > > Key: YARN-6376 > URL: https://issues.apache.org/jira/browse/YARN-6376 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen >Priority: Critical > Labels: yarn-5355-merge-blocker > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org