Hi Kevin -

You may see some change related to
https://issues.apache.org/jira/browse/KNOX-709.

thanks,

--larry

On Wed, May 24, 2017 at 6:24 PM, Kevin Risden <[email protected]>
wrote:

> Just saw this as I was submitting a potentially related WebHBase url
> encoding email to the knox-user list. Curious if they are related.
>
> Alex - out of curiousity did you use Knox with HDP 2.4 or prior and not
> see this issue?
>
> Kevin Risden
>
> On Wed, May 24, 2017 at 4:08 PM, larry mccay <[email protected]>
> wrote:
>
>> Thank you, Alex.
>>
>> Please file a JIRA for this with the above details.
>> I will try and reproduce and investigate and see if we can't get it fixed
>> or a workaround for the 0.13.0 release.
>> This is planned for the end of next week.
>>
>> On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) <
>> [email protected]> wrote:
>>
>>> Hi Larry,
>>>
>>> The same file does work directly from WebHDFS (see below). Looking more
>>> closely at the logs I sent previously, it looks like Knox (or something in
>>> the chain I'm unaware of) is decoding the %20 encoded spaces, then
>>> reencoding them as + encoded, i.e.
>>>
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>>> with spaces.pdf?op=OPEN|unavailable|Request method: GET
>>> ..
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf
>>> ?op=OPEN&doAs=<username>|success|Response status: 404
>>>
>>> With thanks, Alex
>>>
>>>
>>> Direct WebHDFS request (hostnames redacted)
>>>
>>> # curl -si -u: "http://<namenode>:50070/webhd
>>> fs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" --negotiate -L | head
>>> -n40
>>> HTTP/1.1 401 Authentication required
>>> Cache-Control: must-revalidate,no-cache,no-store
>>> Date: Wed, 24 May 2017 19:01:41 GMT
>>> Pragma: no-cache
>>> Date: Wed, 24 May 2017 19:01:41 GMT
>>> Pragma: no-cache
>>> X-FRAME-OPTIONS: SAMEORIGIN
>>> WWW-Authenticate: Negotiate
>>> Set-Cookie: hadoop.auth=; Path=/; HttpOnly
>>> Content-Type: text/html; charset=iso-8859-1
>>> Content-Length: 1533
>>> Server: Jetty(6.1.26.hwx)
>>>
>>> HTTP/1.1 307 TEMPORARY_REDIRECT
>>> Cache-Control: no-cache
>>> Expires: Wed, 24 May 2017 19:01:42 GMT
>>> Date: Wed, 24 May 2017 19:01:42 GMT
>>> Pragma: no-cache
>>> Expires: Wed, 24 May 2017 19:01:42 GMT
>>> Date: Wed, 24 May 2017 19:01:42 GMT
>>> Pragma: no-cache
>>> X-FRAME-OPTIONS: SAMEORIGIN
>>> WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAg
>>> EFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxM
>>> zW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU=
>>> Set-Cookie: hadoop.auth="u=admin&p=admin@CYSAFA&t=kerberos&e=14956885020
>>> 02&s=b7p35TgaxItAUTkKJuSXuynoq9E="; Path=/; HttpOnly
>>> Content-Type: application/octet-stream
>>> Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%20sp
>>> aces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8o
>>> oBXF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYX
>>> Rpb24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0
>>> Content-Length: 0
>>> Server: Jetty(6.1.26.hwx)
>>>
>>> HTTP/1.1 200 OK
>>> Access-Control-Allow-Methods: GET
>>> Access-Control-Allow-Origin: *
>>> Content-Type: application/octet-stream
>>> Connection: close
>>> Content-Length: 13365618
>>>
>>> %����1.6
>>> <</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream
>>> ...
>>>
>>>
>>> ------------------------------
>>> *From:* larry mccay [[email protected]]
>>> *Sent:* 24 May 2017 18:05
>>> *To:* [email protected]
>>> *Subject:* Re: Encoding/escaping whitespace in WebHDFS requests
>>>
>>> Hi Alex -
>>>
>>> I notice from the audit log that the 404 is actually coming from WebHDFS
>>> not from Knox.
>>> Can you confirm that direct access to WebHDFS without going through Knox
>>> works with the same URL?
>>>
>>> thanks,
>>>
>>> --larry
>>>
>>> On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) <
>>> [email protected]> wrote:
>>>
>>>> How should I encode spaces characters in the URL when I make a request
>>>> to WebHDFS through Knox? Or should be enabling/configuring  something in
>>>> Knox to handle them?
>>>>
>>>> I'm making the following (redacted values in <>) request to WebHDFS,
>>>> through Knox
>>>>
>>>> curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/
>>>> filename%20with%20spaces.pdf?op=OPEN" \
>>>>      -<username>:<password> -k -s
>>>>
>>>> However Knox is returning HTTP 404 with the following body
>>>> (whitespace/formatting added by me)
>>>>
>>>> {"exception":"FileNotFoundException",
>>>>  "javaClassName":"java.io.FileNotFoundException",
>>>>  "message":"File /docs/filename+with+spaces.pdf not found."}}
>>>>
>>>> I've tried encoding the spaces as + (same result), and not encoding
>>>> them (HTTP 400  Unknown Version).
>>>> If I request a file for which the path does not contain spaces then it
>>>> works.
>>>>
>>>> Any ideas?
>>>>
>>>> With thanks, Alex
>>>>
>>>>
>>>>
>>>> PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK
>>>> 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is
>>>> enabled in the cluster.
>>>>
>>>> The (redacted) response headers for the %20 encoded request
>>>>
>>>> < HTTP/1.1 404 Not Found
>>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>>> < Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk4
>>>> 8y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly
>>>> < Expires: Thu, 01 Jan 1970 00:00:00 GMT
>>>> < Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0;
>>>> Expires=Tue, 23-May-2017 15:34:26 GMT
>>>> < Cache-Control: no-cache
>>>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>>> < Pragma: no-cache
>>>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>>> < Pragma: no-cache
>>>> < X-FRAME-OPTIONS: SAMEORIGIN
>>>> < Content-Type: application/json; charset=UTF-8
>>>> < Server: Jetty(6.1.26.hwx)
>>>> < Content-Length: 252
>>>>
>>>> The (redacted) Knox logs for the %20 encoded request
>>>>
>>>> ==> /var/log/hadoop/knox/gateway-audit.log <==
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>>>> with spaces.pdf?op=OPEN|unavailable|Request method: GET
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate
>>>> way/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate
>>>> way/<cluster>/webhdfs/v1/docs/filename with
>>>> spaces.pdf?op=OPEN|success|Groups: []
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gatew
>>>> ay/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.
>>>> pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.
>>>> pdf?op=OPEN&doAs=<username>|success|Response status: 404
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>>>> with spaces.pdf?op=OPEN|success|Response status: 404
>>>>
>>>> ==> /var/log/hadoop/knox/gateway.log <==
>>>> 2017-05-24 15:51:05,254 INFO  hadoop.gateway
>>>> (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn:
>>>> uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for
>>>> principal: <username>
>>>> 2017-05-24 15:51:05,259 INFO  hadoop.gateway
>>>> (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true
>>>>
>>>> The (redacted) topology
>>>>
>>>> <topology>
>>>>     <gateway>
>>>>         <provider>
>>>>             <role>authentication</role>
>>>>             <name>ShiroProvider</name>
>>>>             <enabled>true</enabled>
>>>>             <param>
>>>>                 <name>sessionTimeout</name>
>>>>                 <value>30</value>
>>>>             </param>
>>>>             <param>
>>>>                 <name>main.ldapRealm</name>
>>>>                 <value>org.apache.hadoop.gatew
>>>> ay.shirorealm.KnoxLdapRealm</value>
>>>>             </param>
>>>>             <param>
>>>>                 <name>main.ldapContextFactory</name>
>>>>                 <value>org.apache.hadoop.gatew
>>>> ay.shirorealm.KnoxLdapContextFactory</value>
>>>>             </param>
>>>>             <param>
>>>>                 <name>main.ldapRealm.contextFactory</name>
>>>>                 <value>$ldapContextFactory</value>
>>>>             </param>
>>>>             <param>
>>>>                 <name>main.ldapRealm.userDnTemplate</name>
>>>>                 <value>uid={0},cn=users,cn=acc
>>>> ounts,dc=<cluster></value>
>>>>             </param>
>>>>             <param>
>>>>                 <name>main.ldapRealm.contextFactory.url</name>
>>>>                 <value>ldap://<freeipa_node>:389</value>
>>>>             </param>
>>>>             <param>
>>>>                 <name>main.ldapRealm.contextFa
>>>> ctory.authenticationMechanism</name>
>>>>                 <value>simple</value>
>>>>             </param>
>>>>             <param>
>>>>                 <name>urls./**</name>
>>>>                 <value>authcBasic</value>
>>>>             </param>
>>>>         </provider>
>>>>         <provider>
>>>>             <role>authorization</role>
>>>>             <name>AclsAuthz</name>
>>>>             <enabled>true</enabled>
>>>>             <param>
>>>>                 <name>knox.acl</name>
>>>>                 <value>admin;*;*</value>
>>>>             </param>
>>>>         </provider>
>>>>         <provider>
>>>>             <role>identity-assertion</role>
>>>>             <name>Default</name>
>>>>             <enabled>true</enabled>
>>>>         </provider>
>>>>         <provider>
>>>>             <role>hostmap</role>
>>>>             <name>static</name>
>>>>             <enabled>false</enabled>
>>>>             <param><name>localhost</name><value>sandbox,
>>>> sandbox.hortonworks.com</value></param>
>>>>         </provider>
>>>>     </gateway>
>>>>
>>>>     <service>
>>>>         <role>WEBHDFS</role>
>>>>         <url>http://<namenode>:50070/webhdfs</url>
>>>>     </service>
>>>>
>>>>     <service>
>>>>         <role>SOLRAPI</role>
>>>>         <url>http://<solrnode>:6083/solr</url>
>>>>     </service>
>>>> </topology>
>>>>
>>>>
>>>
>>
>

Reply via email to