[ 
https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297099#comment-15297099
 ] 

Sangjin Lee commented on YARN-5109:
-----------------------------------

Thanks [~varun_saxena] for the patch! I think it's almost there. I have a few 
mostly minor comments. It would be great if you could address them.

In terms of the package placement of the key converter classes, should they be 
in the respective packages instead of all in common? For example, 
{{ApplicationRowKeyConverter}} is really used by the application table classes, 
so it would be more natural to have it in the application package, and so on. 
Thoughts?

Also, do we have a test that tests an encoded long having a separator in it? 
After all, that's what caused us to uncover this issue. :)

(AppIdKeyConverter.java)
- one small suggestion: how about adding a method {{getKeySize()}} to return 
the expected size of the key so that users of {{AppIdKeyConverter}} do not need 
to hard-code the size themselves?

(FlowActivityRowKeyConverter.java)
- l.55: Should we replace "" with {{Separator.EMPTY_BYTES}}? That should be 
equivalent, right?

(FlowRunRowKeyConverter.java)
- l.56: same as above

(EventColumnName.java)
- l.31: the {{super()}} call is superfluous; can we remove it?

(ColumnHelper.java)
- l.259: nit: typo ({{converteColumnKey}} -> {{converterColumnKey}} or?)

(Separator.java)
- l.71: I think {{NO_LIMIT_SPLIT}} and {{VARIABLE_SIZE}} are getting confusing. 
Since we're using {{VARIABLE_SIZE}} for the most part, can we remove 
{{NO_LIMIT_SPLIT}}?
- l.491: I think this now calls {{split(byte[], byte[], int[])}}, not 
{{split(byte[], byte[], int)}}, which is quite confusing. We should eliminate 
the ambiguity here, by explicitly calling with the right argument instead of 
just {{null}}.
- we should make {{splitRanges()}} private or package-private

> timestamps are stored unencoded causing parse errors
> ----------------------------------------------------
>
>                 Key: YARN-5109
>                 URL: https://issues.apache.org/jira/browse/YARN-5109
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>            Priority: Blocker
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-5109-YARN-2928.003.patch, 
> YARN-5109-YARN-2928.01.patch, YARN-5109-YARN-2928.02.patch, 
> YARN-5109-YARN-2928.03.patch
>
>
> When we store timestamps (for example as part of the row key or part of the 
> column name for an event), the bytes are used as is without any encoding. If 
> the byte value happens to contain a separator character we use (e.g. "!" or 
> "="), it causes a parse failure when we read it.
> I came across this while looking into this error in the timeline reader:
> {noformat}
> 2016-05-17 21:28:38,643 WARN 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils:
>  incorrectly formatted column name: it will be discarded
> {noformat}
> I traced the data that was causing this, and the column name (for the event) 
> was the following:
> {noformat}
> i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST
> {noformat}
> Note that the column name is supposed to be of the format (event 
> id)=(timestamp)=(event info key). However, observe the timestamp portion:
> {noformat}
> \x7F\xFF\xFE\xABDY=\x99
> {noformat}
> The presence of the separator ("=") causes the parse error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to