[ 
https://issues.apache.org/jira/browse/YARN-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314445#comment-15314445
 ] 

Sangjin Lee edited comment on YARN-5167 at 6/3/16 5:25 PM:
-----------------------------------------------------------

I'm not quite sure. My thinking was to encode {{%}} first implicitly before 
encoding the series of separators. But what about the following sequence? 
{{SPACE.encode() -> Separator.encode(..., TAB, VALUES, QUALIFIERS)}}

When we encode for SPACE, we would first encode for PERCENT. But then in the 
second call when we encode for those series of separators, we would encode for 
PERCENT once more. So the encoding would be {{PERCENT -> SPACE -> PERCENT -> 
TAB, VALUES, QUALIFIERS}}. I'm not sure if it is correct/efficient to encode 
percent multiple times. If that is not desirable, we somehow must find a way to 
mandate that the caller of the Separator API must call the {{encode()}} method 
only once.

Furthermore, the current call hierarchy is challenging. We expose individual 
{{Separator.encode(String)}} as public. But we also expose 
{{Separator.encode(String, Separator...)}}, which in turn calls the former 
method. This could be tamed by making some refactoring, but wanted to point 
that out.


was (Author: sjlee0):
I'm not quite sure. My thinking was to encode {{%}} first implicitly before 
encoding the series of separators. But what about the following sequence? 
{{SPACE.encode() -> Separator.encode(..., TAB, VALUES, QUALIFIERS)}}

When we encode for SPACE, we would first encode for PERCENT. But then in the 
second call when we encode for those series of separators, we would encode for 
PERCENT once more. So the encoding would be {{PERCENT -> SPACE -> PERCENT -> 
TAB, VALUES, QUALIFIERS}}. I'm not sure if it is correct/efficient to encode 
percent multiple times.

Furthermore, the current call hierarchy is challenging. We expose individual 
{{Separator.encode(String)}} as public. But we also expose 
{{Separator.encode(String, Separator...)}}, which in turn calls the former 
method.

> Escaping occurences of encodedValues
> ------------------------------------
>
>                 Key: YARN-5167
>                 URL: https://issues.apache.org/jira/browse/YARN-5167
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Joep Rottinghuis
>            Assignee: Sangjin Lee
>            Priority: Critical
>              Labels: yarn-2928-1st-milestone
>
> We had earlier decided to punt on this, but in discussing YARN-5109 we 
> thought it would be best to just be safe rather than sorry later on.
> Encoded sequences can occur in the original string, especially in case of 
> "foreign key" if we decide to have lookups.
> For example, space is encoded as %2$.
> Encoding "String with %2$ in it" would decode to "String with   in it".
> We though we should first escape existing occurrences of encoded strings by 
> prefixing a backslash (even if there is already a backslash that should be 
> ok). Then we should replace all unencoded strings.
> On the way out, we should replace all occurrences of our encoded string to 
> the original except when it is prefixed by an escape character. Lastly we 
> should strip off the one additional backslash in front of each remaining 
> (escaped) sequence.
> If we add the following entry to TestSeparator#testEncodeDecode() that 
> demonstrates what this jira should accomplish:
> {code}
>     testEncodeDecode("Double-escape %2$ and %3$ or \\%2$ or \\%3$, nor  
> \\\\%2$ = no problem!", Separator.QUALIFIERS,
>         Separator.VALUES, Separator.SPACE, Separator.TAB);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to