[jira] [Commented] (YARN-5109) timestamps are stored unencoded causing parse errors

Varun Saxena (JIRA) Thu, 19 May 2016 12:24:56 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15291983#comment-15291983
 ]


Varun Saxena commented on YARN-5109:
------------------------------------

Writing down what we discussed in the meeting with regards to this.

I have attached a WIP patch of whatever I had done so far.
Some things may have to be fine tuned(null checks etc.). Probably both limit 
and sizes need not be passed into splitRanges. But anyways this works well for 
row keys as per my tests.

For column qualifiers,we could either do encoding of bytes(representing longs, 
ints, app ids') before writing them into backend or use the same approach which 
we adopted for row keys i.e. do not encode but split by ignoring separators for 
longs, ints, etc. till they are fully read.
I had a patch for former too but consensus seems to be towards latter.

Sangjin and Joep thought that we should reorganize our row key and column 
qualifier parsing logic into a set of converters. It was decided that this 
approach will be explored further.

> timestamps are stored unencoded causing parse errors
> ----------------------------------------------------
>
>                 Key: YARN-5109
>                 URL: https://issues.apache.org/jira/browse/YARN-5109
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>            Priority: Blocker
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-5109-YARN-2928.01.patch
>
>
> When we store timestamps (for example as part of the row key or part of the 
> column name for an event), the bytes are used as is without any encoding. If 
> the byte value happens to contain a separator character we use (e.g. "!" or 
> "="), it causes a parse failure when we read it.
> I came across this while looking into this error in the timeline reader:
> {noformat}
> 2016-05-17 21:28:38,643 WARN 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils:
>  incorrectly formatted column name: it will be discarded
> {noformat}
> I traced the data that was causing this, and the column name (for the event) 
> was the following:
> {noformat}
> i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST
> {noformat}
> Note that the column name is supposed to be of the format (event 
> id)=(timestamp)=(event info key). However, observe the timestamp portion:
> {noformat}
> \x7F\xFF\xFE\xABDY=\x99
> {noformat}
> The presence of the separator ("=") causes the parse error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5109) timestamps are stored unencoded causing parse errors

Reply via email to