[jira] [Comment Edited] (YARN-5167) Escaping occurences of encodedValues

2016-06-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315946#comment-15315946
 ] 

Varun Saxena edited comment on YARN-5167 at 6/5/16 5:07 PM:


Latest patch LGTM.
Will wait for Joep to have a look as well before committing it, in case he has 
any more comments.

Not related to the patch but noticed a typo in the javadoc below. It should be 
"byte" instead of "bye".
If Joep has any further comments, this can be fixed.
{code:title=Separator.java|borderStyle=solid}
/*
 * The bye representation of value.
 */
private final byte[] bytes;
{code}


was (Author: varun_saxena):
Latest patch LGTM.
Will wait for Joep to have a look as well before committing it, in case he has 
any more comments.

Not related to the patch but noticed a typo in the comment below. It should be 
"byte" instead of "bye".
If Joep has any further comments, this can be fixed.
{code}
/*
 * The bye representation of value.
 */
private final byte[] bytes;
{code}

> Escaping occurences of encodedValues
> 
>
> Key: YARN-5167
> URL: https://issues.apache.org/jira/browse/YARN-5167
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Joep Rottinghuis
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-5167-YARN-2928.01.patch, 
> YARN-5167-YARN-2928.02.patch, YARN-5167-YARN-2928.03.patch
>
>
> We had earlier decided to punt on this, but in discussing YARN-5109 we 
> thought it would be best to just be safe rather than sorry later on.
> Encoded sequences can occur in the original string, especially in case of 
> "foreign key" if we decide to have lookups.
> For example, space is encoded as %2$.
> Encoding "String with %2$ in it" would decode to "String with   in it".
> We though we should first escape existing occurrences of encoded strings by 
> prefixing a backslash (even if there is already a backslash that should be 
> ok). Then we should replace all unencoded strings.
> On the way out, we should replace all occurrences of our encoded string to 
> the original except when it is prefixed by an escape character. Lastly we 
> should strip off the one additional backslash in front of each remaining 
> (escaped) sequence.
> If we add the following entry to TestSeparator#testEncodeDecode() that 
> demonstrates what this jira should accomplish:
> {code}
> testEncodeDecode("Double-escape %2$ and %3$ or \\%2$ or \\%3$, nor  
> %2$ = no problem!", Separator.QUALIFIERS,
> Separator.VALUES, Separator.SPACE, Separator.TAB);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5167) Escaping occurences of encodedValues

2016-06-04 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315628#comment-15315628
 ] 

Varun Saxena edited comment on YARN-5167 at 6/4/16 8:53 PM:


Maybe we can add a note in javadoc stating that Separator#encode version with 
varargs should be preferred while encoding multiple separators, instead of 
calling single separator version of Separator#encode multiple times (because 
that will produce a longer encoded string).

Also, as we are using pre-compiled patterns now, we can probably use it while 
calling split in Separator#splitEncoded as well.


was (Author: varun_saxena):
Maybe we can add a note in javadoc stating that Separator#encode version with 
varargs should be preferred while encoding multiple separators instead of 
calling single separator version of Separator#encode multiple times.

Also, as we are using pre-compiled patterns now, we can probably use it while 
calling split in Separator#splitEncoded as well.

> Escaping occurences of encodedValues
> 
>
> Key: YARN-5167
> URL: https://issues.apache.org/jira/browse/YARN-5167
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Joep Rottinghuis
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-5167-YARN-2928.01.patch
>
>
> We had earlier decided to punt on this, but in discussing YARN-5109 we 
> thought it would be best to just be safe rather than sorry later on.
> Encoded sequences can occur in the original string, especially in case of 
> "foreign key" if we decide to have lookups.
> For example, space is encoded as %2$.
> Encoding "String with %2$ in it" would decode to "String with   in it".
> We though we should first escape existing occurrences of encoded strings by 
> prefixing a backslash (even if there is already a backslash that should be 
> ok). Then we should replace all unencoded strings.
> On the way out, we should replace all occurrences of our encoded string to 
> the original except when it is prefixed by an escape character. Lastly we 
> should strip off the one additional backslash in front of each remaining 
> (escaped) sequence.
> If we add the following entry to TestSeparator#testEncodeDecode() that 
> demonstrates what this jira should accomplish:
> {code}
> testEncodeDecode("Double-escape %2$ and %3$ or \\%2$ or \\%3$, nor  
> %2$ = no problem!", Separator.QUALIFIERS,
> Separator.VALUES, Separator.SPACE, Separator.TAB);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5167) Escaping occurences of encodedValues

2016-06-04 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315362#comment-15315362
 ] 

Joep Rottinghuis edited comment on YARN-5167 at 6/4/16 6:47 AM:


After initial scan the patch seems to be correct (but too tired to tell for 
sure, will look at it again with fresh eyes).
The javadoc for #encode
"This means that when encoding is already present in the token itself, this
   * is not a reversible process."
is no longer true right? The encoded sequence can now appear in the existing 
token and that will be totally fine with the % encoding business right?


was (Author: jrottinghuis):
After initial scan the patch seems to be correct (but too tired to tell for 
sure, will look at it again with fresh eyes).
The javadoc for #encode
"This means that when encoding is already present in the token itself, this
   * is not a reversible process."
is no longer true right? The encoded sequence can now appear in the existing 
token and that will be totally fine with the % escaping business right?

> Escaping occurences of encodedValues
> 
>
> Key: YARN-5167
> URL: https://issues.apache.org/jira/browse/YARN-5167
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Joep Rottinghuis
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-5167-YARN-2928.01.patch
>
>
> We had earlier decided to punt on this, but in discussing YARN-5109 we 
> thought it would be best to just be safe rather than sorry later on.
> Encoded sequences can occur in the original string, especially in case of 
> "foreign key" if we decide to have lookups.
> For example, space is encoded as %2$.
> Encoding "String with %2$ in it" would decode to "String with   in it".
> We though we should first escape existing occurrences of encoded strings by 
> prefixing a backslash (even if there is already a backslash that should be 
> ok). Then we should replace all unencoded strings.
> On the way out, we should replace all occurrences of our encoded string to 
> the original except when it is prefixed by an escape character. Lastly we 
> should strip off the one additional backslash in front of each remaining 
> (escaped) sequence.
> If we add the following entry to TestSeparator#testEncodeDecode() that 
> demonstrates what this jira should accomplish:
> {code}
> testEncodeDecode("Double-escape %2$ and %3$ or \\%2$ or \\%3$, nor  
> %2$ = no problem!", Separator.QUALIFIERS,
> Separator.VALUES, Separator.SPACE, Separator.TAB);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5167) Escaping occurences of encodedValues

2016-06-03 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314486#comment-15314486
 ] 

Varun Saxena edited comment on YARN-5167 at 6/3/16 6:28 PM:


What I was thinking was that we will encode % once. That is sanitize the 
incoming string. 
And then apply other separators. 
I wasn't thinking of applying it automatically inside the encode methods. 
However, till we repeat the same sequence in reverse as we used while encoding, 
that should work as well.  Will need to try out few examples though. It will 
however be unnecessary to encode % multiple times.

Cant we just document how to use it and the intention of it ? Just document 
that if you expect encoded values in your incoming strings, encode % as well.

After all, the sequence of separators used while encoding and decoding has to 
be taken care by the user of Separator enum anyways. Maybe users of this enum 
can take care of encoding % as well, if they expect the same sequence of 
characters in the incoming string, as the characters encountered after encoding 
other separators. 
We just provide them a mechanism to encode % as well by adding it in the 
Separator enum.


was (Author: varun_saxena):
What I was thinking was that we will encode % once. That is sanitize the 
incoming string. 
And then apply other separators. 
I wasn't thinking of applying it automatically inside the encode methods. 
However, till we repeat the same sequence in reverse as we used while encoding, 
that should work as well.  Will need to try out few examples though. It will 
however be unnecessary to encode % multiple times.

Cant we just document how to use it and the intention of it ? Just document 
that if you expect encoded values in your incoming strings, encode % as well.

After all, the sequence of separators used while encoding and decoding has to 
be taken care by the user of Separator enum anyways. Maybe users of this enum 
can take care of encoding % as well, if they expect the same sequence of 
characters as the characters encountered after encoding other separators. 
We just provide them a mechanism to encode % as well by adding it in the 
Separator enum.

> Escaping occurences of encodedValues
> 
>
> Key: YARN-5167
> URL: https://issues.apache.org/jira/browse/YARN-5167
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Joep Rottinghuis
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
>
> We had earlier decided to punt on this, but in discussing YARN-5109 we 
> thought it would be best to just be safe rather than sorry later on.
> Encoded sequences can occur in the original string, especially in case of 
> "foreign key" if we decide to have lookups.
> For example, space is encoded as %2$.
> Encoding "String with %2$ in it" would decode to "String with   in it".
> We though we should first escape existing occurrences of encoded strings by 
> prefixing a backslash (even if there is already a backslash that should be 
> ok). Then we should replace all unencoded strings.
> On the way out, we should replace all occurrences of our encoded string to 
> the original except when it is prefixed by an escape character. Lastly we 
> should strip off the one additional backslash in front of each remaining 
> (escaped) sequence.
> If we add the following entry to TestSeparator#testEncodeDecode() that 
> demonstrates what this jira should accomplish:
> {code}
> testEncodeDecode("Double-escape %2$ and %3$ or \\%2$ or \\%3$, nor  
> %2$ = no problem!", Separator.QUALIFIERS,
> Separator.VALUES, Separator.SPACE, Separator.TAB);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5167) Escaping occurences of encodedValues

2016-06-03 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314486#comment-15314486
 ] 

Varun Saxena edited comment on YARN-5167 at 6/3/16 5:56 PM:


What I was thinking was that we will encode % once. That is sanitize the 
incoming string. 
And then apply other separators. 
I wasn't thinking of applying it automatically inside the encode methods. 
However, till we repeat the same sequence in reverse as we used while encoding, 
that should work as well.  Will need to try out few examples though. It will 
however be unnecessary to encode % multiple times.

Cant we just document how to use it and the intention of it ? Just document 
that if you expect encoded values in your incoming strings, encode % as well.

After all, the sequence of separators used while encoding and decoding has to 
be taken care by the user of Separator enum anyways. Maybe users of this enum 
can take care of encoding % as well, if they expect the same sequence of 
characters as the characters encountered after encoding other separators. 
We just provide them a mechanism to encode % as well by adding it in the 
Separator enum.


was (Author: varun_saxena):
What I meant was that we will encode % once. That is sanitize the incoming 
string. 
And then apply other separators. 
I wasn't thinking of applying it automatically inside the encode methods. 
However, till we repeat the same sequence in reverse as we used while encoding, 
that should work as well.  Will need to try out few examples though. It will 
however be unnecessary to encode % multiple times.

Cant we just document how to use it and the intention of it ? Just document 
that if you expect encoded values in your incoming strings, encode % as well.

After all, the sequence of separators used while encoding and decoding has to 
be taken care by the user of Separator enum anyways. Maybe users of this enum 
can take care of encoding % as well, if they expect the same sequence of 
characters as the characters encountered after encoding other separators. 
We just provide them a mechanism to encode % as well by adding it in the 
Separator enum.

> Escaping occurences of encodedValues
> 
>
> Key: YARN-5167
> URL: https://issues.apache.org/jira/browse/YARN-5167
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Joep Rottinghuis
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
>
> We had earlier decided to punt on this, but in discussing YARN-5109 we 
> thought it would be best to just be safe rather than sorry later on.
> Encoded sequences can occur in the original string, especially in case of 
> "foreign key" if we decide to have lookups.
> For example, space is encoded as %2$.
> Encoding "String with %2$ in it" would decode to "String with   in it".
> We though we should first escape existing occurrences of encoded strings by 
> prefixing a backslash (even if there is already a backslash that should be 
> ok). Then we should replace all unencoded strings.
> On the way out, we should replace all occurrences of our encoded string to 
> the original except when it is prefixed by an escape character. Lastly we 
> should strip off the one additional backslash in front of each remaining 
> (escaped) sequence.
> If we add the following entry to TestSeparator#testEncodeDecode() that 
> demonstrates what this jira should accomplish:
> {code}
> testEncodeDecode("Double-escape %2$ and %3$ or \\%2$ or \\%3$, nor  
> %2$ = no problem!", Separator.QUALIFIERS,
> Separator.VALUES, Separator.SPACE, Separator.TAB);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5167) Escaping occurences of encodedValues

2016-06-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314445#comment-15314445
 ] 

Sangjin Lee edited comment on YARN-5167 at 6/3/16 5:25 PM:
---

I'm not quite sure. My thinking was to encode {{%}} first implicitly before 
encoding the series of separators. But what about the following sequence? 
{{SPACE.encode() -> Separator.encode(..., TAB, VALUES, QUALIFIERS)}}

When we encode for SPACE, we would first encode for PERCENT. But then in the 
second call when we encode for those series of separators, we would encode for 
PERCENT once more. So the encoding would be {{PERCENT -> SPACE -> PERCENT -> 
TAB, VALUES, QUALIFIERS}}. I'm not sure if it is correct/efficient to encode 
percent multiple times. If that is not desirable, we somehow must find a way to 
mandate that the caller of the Separator API must call the {{encode()}} method 
only once.

Furthermore, the current call hierarchy is challenging. We expose individual 
{{Separator.encode(String)}} as public. But we also expose 
{{Separator.encode(String, Separator...)}}, which in turn calls the former 
method. This could be tamed by making some refactoring, but wanted to point 
that out.


was (Author: sjlee0):
I'm not quite sure. My thinking was to encode {{%}} first implicitly before 
encoding the series of separators. But what about the following sequence? 
{{SPACE.encode() -> Separator.encode(..., TAB, VALUES, QUALIFIERS)}}

When we encode for SPACE, we would first encode for PERCENT. But then in the 
second call when we encode for those series of separators, we would encode for 
PERCENT once more. So the encoding would be {{PERCENT -> SPACE -> PERCENT -> 
TAB, VALUES, QUALIFIERS}}. I'm not sure if it is correct/efficient to encode 
percent multiple times.

Furthermore, the current call hierarchy is challenging. We expose individual 
{{Separator.encode(String)}} as public. But we also expose 
{{Separator.encode(String, Separator...)}}, which in turn calls the former 
method.

> Escaping occurences of encodedValues
> 
>
> Key: YARN-5167
> URL: https://issues.apache.org/jira/browse/YARN-5167
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Joep Rottinghuis
>Assignee: Sangjin Lee
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
>
> We had earlier decided to punt on this, but in discussing YARN-5109 we 
> thought it would be best to just be safe rather than sorry later on.
> Encoded sequences can occur in the original string, especially in case of 
> "foreign key" if we decide to have lookups.
> For example, space is encoded as %2$.
> Encoding "String with %2$ in it" would decode to "String with   in it".
> We though we should first escape existing occurrences of encoded strings by 
> prefixing a backslash (even if there is already a backslash that should be 
> ok). Then we should replace all unencoded strings.
> On the way out, we should replace all occurrences of our encoded string to 
> the original except when it is prefixed by an escape character. Lastly we 
> should strip off the one additional backslash in front of each remaining 
> (escaped) sequence.
> If we add the following entry to TestSeparator#testEncodeDecode() that 
> demonstrates what this jira should accomplish:
> {code}
> testEncodeDecode("Double-escape %2$ and %3$ or \\%2$ or \\%3$, nor  
> %2$ = no problem!", Separator.QUALIFIERS,
> Separator.VALUES, Separator.SPACE, Separator.TAB);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5167) Escaping occurences of encodedValues

2016-05-27 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304596#comment-15304596
 ] 

Varun Saxena edited comment on YARN-5167 at 5/27/16 7:16 PM:
-

This we will target for 1st milestone / drop on trunk ? Personally I think we 
can.


was (Author: varun_saxena):
This we will target for 1st milestone ? Personally I think we can.

> Escaping occurences of encodedValues
> 
>
> Key: YARN-5167
> URL: https://issues.apache.org/jira/browse/YARN-5167
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Joep Rottinghuis
>Assignee: Sangjin Lee
>Priority: Critical
>
> We had earlier decided to punt on this, but in discussing YARN-5109 we 
> thought it would be best to just be safe rather than sorry later on.
> Encoded sequences can occur in the original string, especially in case of 
> "foreign key" if we decide to have lookups.
> For example, space is encoded as %2$.
> Encoding "String with %2$ in it" would decode to "String with   in it".
> We though we should first escape existing occurrences of encoded strings by 
> prefixing a backslash (even if there is already a backslash that should be 
> ok). Then we should replace all unencoded strings.
> On the way out, we should replace all occurrences of our encoded string to 
> the original except when it is prefixed by an escape character. Lastly we 
> should strip off the one additional backslash in front of each remaining 
> (escaped) sequence.
> If we add the following entry to TestSeparator#testEncodeDecode() that 
> demonstrates what this jira should accomplish:
> {code}
> testEncodeDecode("Double-escape %2$ and %3$ or \\%2$ or \\%3$, nor  
> %2$ = no problem!", Separator.QUALIFIERS,
> Separator.VALUES, Separator.SPACE, Separator.TAB);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org