[ 
https://issues.apache.org/jira/browse/YARN-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530564#comment-15530564
 ] 

Joep Rottinghuis commented on YARN-4561:
----------------------------------------

For late arriving data that was spooled using whatever we come up with in 
YARN-4061, we can still recognize records that came in after we got rid of the 
finalized application's last write.
If we can have the _first_ write marked as a special value, just like the final 
value is marked, then we can later on distinguish the case where we have a 
later-arriving update.

Aside from the very first record (marked special) we will always have an 
existing record that will slide in front of or behind exising values. In other 
words, we see either a new value and it is the first, or we see multiple 
values. We could use that to see if late values come in. The trick will be to 
see if we can do this after the fact ( in the read or flush compaction) rather 
than during writes. This may have to be a write-time copro that processes data 
only if it is later than the normal timewindow (more than a day old). For those 
cases we might have to do reads during writes, or at least mark records as 
suspicious for later analysis.

> Compaction coprocessor enhancements: On/Off, whitelisting, blacklisting
> -----------------------------------------------------------------------
>
>                 Key: YARN-4561
>                 URL: https://issues.apache.org/jira/browse/YARN-4561
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Vrushali C
>            Assignee: Vrushali C
>              Labels: YARN-5355
>
> YARN-4062 deals with the flush and compaction related coprocessor basic 
> functionality. We also need to ensure we can turn compaction on/off as a 
> whole (in case of dealing with production issues) as well as provide a way to 
> allow for blacklisting and whitelisting of processing compaction for certain 
> records.
> For instance, we may want to compact only those records which belong to 
> applications in that datacenter. This way we donot interfere with hbase 
> replication causing coprocessors to process the same record in more than one 
> dc at the same time.
> Also, we might want to not compact/process certain records, perhaps whose 
> rowkey matches a certain criteria.
> Filing jira to track these enhancements



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to