[ 
https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15291498#comment-15291498
 ] 

Sangjin Lee commented on YARN-5109:
-----------------------------------

Good point about the column qualifiers not needing the correct order. One other 
complication with the column qualifiers is that the call chain is several 
levels. The changes to use the new {{split()}} method there would be bit bigger.

If we were to go the route of encoding the bytes as well, there is one other 
issue we need to be mindful of. We need to guard against the occurrences of the 
encoded equivalent in the original bytes. For example, "=" would be encoded 
into "%1$". A problem would arise if the original bytes already contained "%1$" 
however unlikely that may be. Consider the following original bytes (totally 
made up with ascii characters):
{noformat}
t=h%1$ig
{noformat}

If we simply encode "=", then we get
{noformat}
t%1$h%1$ig
{noformat}

Now, if we read this back and decode it, we would decode it to
{noformat}
t=h=ig
{noformat}

To do this properly, we'd need to "escape" the existing patterns *before* 
encoding for the separator. The reverse should be done when decoding it.

To be clear, this is an existing issue (even with strings). We went ahead 
without treating for this as we felt that this is unlikely to occur in a 
string. But if we're going to revisit encoding, we might want to address that 
as well.

We can discuss the details offline if needed.

> timestamps are stored unencoded causing parse errors
> ----------------------------------------------------
>
>                 Key: YARN-5109
>                 URL: https://issues.apache.org/jira/browse/YARN-5109
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>            Priority: Blocker
>              Labels: yarn-2928-1st-milestone
>
> When we store timestamps (for example as part of the row key or part of the 
> column name for an event), the bytes are used as is without any encoding. If 
> the byte value happens to contain a separator character we use (e.g. "!" or 
> "="), it causes a parse failure when we read it.
> I came across this while looking into this error in the timeline reader:
> {noformat}
> 2016-05-17 21:28:38,643 WARN 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils:
>  incorrectly formatted column name: it will be discarded
> {noformat}
> I traced the data that was causing this, and the column name (for the event) 
> was the following:
> {noformat}
> i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST
> {noformat}
> Note that the column name is supposed to be of the format (event 
> id)=(timestamp)=(event info key). However, observe the timestamp portion:
> {noformat}
> \x7F\xFF\xFE\xABDY=\x99
> {noformat}
> The presence of the separator ("=") causes the parse error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to