[ https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15290991#comment-15290991 ]
Varun Saxena commented on YARN-5109: ------------------------------------ Thanks [~sjlee0]. I did realize this issue with timestamps in row keys. Was trying a way around for this kind of scenario with row keys by handling it on parse side in individual row key classes(i.e. do not encode and on read side consider separator as part of long/int, if i hasnt been read as yet). But your suggestion of split looks great. This reduces the changes. Will code according to this. For column qualifier though, this change isnt required and we can actually do encoding because for columns we will either have single column value filter or qualifier filter with prefixes applied. If I am not wrong, we do not need to preserve ordering for column qualifiers. However, we can keep it consistent because split method can be changed as per your suggestion. Will remove the encoding bytes routine then. > timestamps are stored unencoded causing parse errors > ---------------------------------------------------- > > Key: YARN-5109 > URL: https://issues.apache.org/jira/browse/YARN-5109 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Affects Versions: YARN-2928 > Reporter: Sangjin Lee > Assignee: Varun Saxena > Priority: Blocker > Labels: yarn-2928-1st-milestone > > When we store timestamps (for example as part of the row key or part of the > column name for an event), the bytes are used as is without any encoding. If > the byte value happens to contain a separator character we use (e.g. "!" or > "="), it causes a parse failure when we read it. > I came across this while looking into this error in the timeline reader: > {noformat} > 2016-05-17 21:28:38,643 WARN > org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils: > incorrectly formatted column name: it will be discarded > {noformat} > I traced the data that was causing this, and the column name (for the event) > was the following: > {noformat} > i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST > {noformat} > Note that the column name is supposed to be of the format (event > id)=(timestamp)=(event info key). However, observe the timestamp portion: > {noformat} > \x7F\xFF\xFE\xABDY=\x99 > {noformat} > The presence of the separator ("=") causes the parse error. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org