Thank you, Alex.

Please file a JIRA for this with the above details.
I will try and reproduce and investigate and see if we can't get it fixed
or a workaround for the 0.13.0 release.
This is planned for the end of next week.

On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) <
[email protected]> wrote:

> Hi Larry,
>
> The same file does work directly from WebHDFS (see below). Looking more
> closely at the logs I sent previously, it looks like Knox (or something in
> the chain I'm unaware of) is decoding the %20 encoded spaces, then
> reencoding them as + encoded, i.e.
>
> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS||||
> access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf
> ?op=OPEN|unavailable|Request method: GET
> ..
> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<
> username>|||dispatch|uri|http://<namenode>.<cluster>:50070/
> webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response
> status: 404
>
> With thanks, Alex
>
>
> Direct WebHDFS request (hostnames redacted)
>
> # curl -si -u: 
> "http://<namenode>:50070/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN"
> --negotiate -L | head -n40
> HTTP/1.1 401 Authentication required
> Cache-Control: must-revalidate,no-cache,no-store
> Date: Wed, 24 May 2017 19:01:41 GMT
> Pragma: no-cache
> Date: Wed, 24 May 2017 19:01:41 GMT
> Pragma: no-cache
> X-FRAME-OPTIONS: SAMEORIGIN
> WWW-Authenticate: Negotiate
> Set-Cookie: hadoop.auth=; Path=/; HttpOnly
> Content-Type: text/html; charset=iso-8859-1
> Content-Length: 1533
> Server: Jetty(6.1.26.hwx)
>
> HTTP/1.1 307 TEMPORARY_REDIRECT
> Cache-Control: no-cache
> Expires: Wed, 24 May 2017 19:01:42 GMT
> Date: Wed, 24 May 2017 19:01:42 GMT
> Pragma: no-cache
> Expires: Wed, 24 May 2017 19:01:42 GMT
> Date: Wed, 24 May 2017 19:01:42 GMT
> Pragma: no-cache
> X-FRAME-OPTIONS: SAMEORIGIN
> WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAg
> EFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscx
> MzW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU=
> Set-Cookie: hadoop.auth="u=admin&p=admin@CYSAFA&t=kerberos&e=
> 1495688502002&s=b7p35TgaxItAUTkKJuSXuynoq9E="; Path=/; HttpOnly
> Content-Type: application/octet-stream
> Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%
> 20with%20spaces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8ooBXF_
> ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYXRpb24P
> MTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0
> Content-Length: 0
> Server: Jetty(6.1.26.hwx)
>
> HTTP/1.1 200 OK
> Access-Control-Allow-Methods: GET
> Access-Control-Allow-Origin: *
> Content-Type: application/octet-stream
> Connection: close
> Content-Length: 13365618
>
> %����1.6
> <</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream
> ...
>
>
> ------------------------------
> *From:* larry mccay [[email protected]]
> *Sent:* 24 May 2017 18:05
> *To:* [email protected]
> *Subject:* Re: Encoding/escaping whitespace in WebHDFS requests
>
> Hi Alex -
>
> I notice from the audit log that the 404 is actually coming from WebHDFS
> not from Knox.
> Can you confirm that direct access to WebHDFS without going through Knox
> works with the same URL?
>
> thanks,
>
> --larry
>
> On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) <
> [email protected]> wrote:
>
>> How should I encode spaces characters in the URL when I make a request to
>> WebHDFS through Knox? Or should be enabling/configuring  something in Knox
>> to handle them?
>>
>> I'm making the following (redacted values in <>) request to WebHDFS,
>> through Knox
>>
>> curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/
>> filename%20with%20spaces.pdf?op=OPEN" \
>>      -<username>:<password> -k -s
>>
>> However Knox is returning HTTP 404 with the following body
>> (whitespace/formatting added by me)
>>
>> {"exception":"FileNotFoundException",
>>  "javaClassName":"java.io.FileNotFoundException",
>>  "message":"File /docs/filename+with+spaces.pdf not found."}}
>>
>> I've tried encoding the spaces as + (same result), and not encoding them
>> (HTTP 400  Unknown Version).
>> If I request a file for which the path does not contain spaces then it
>> works.
>>
>> Any ideas?
>>
>> With thanks, Alex
>>
>>
>>
>> PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK
>> 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is
>> enabled in the cluster.
>>
>> The (redacted) response headers for the %20 encoded request
>>
>> < HTTP/1.1 404 Not Found
>> < Date: Wed, 24 May 2017 15:34:26 GMT
>> < Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk4
>> 8y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly
>> < Expires: Thu, 01 Jan 1970 00:00:00 GMT
>> < Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0;
>> Expires=Tue, 23-May-2017 15:34:26 GMT
>> < Cache-Control: no-cache
>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>> < Date: Wed, 24 May 2017 15:34:26 GMT
>> < Pragma: no-cache
>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>> < Date: Wed, 24 May 2017 15:34:26 GMT
>> < Pragma: no-cache
>> < X-FRAME-OPTIONS: SAMEORIGIN
>> < Content-Type: application/json; charset=UTF-8
>> < Server: Jetty(6.1.26.hwx)
>> < Content-Length: 252
>>
>> The (redacted) Knox logs for the %20 encoded request
>>
>> ==> /var/log/hadoop/knox/gateway-audit.log <==
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>> with spaces.pdf?op=OPEN|unavailable|Request method: GET
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/
>> gateway/<cluster>/webhdfs/v1/docs/filename with
>> spaces.pdf?op=OPEN|success|
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/
>> gateway/<cluster>/webhdfs/v1/docs/filename with
>> spaces.pdf?op=OPEN|success|Groups: []
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS|<username>|||authorization|uri|/
>> gateway/<cluster>/webhdfs/v1/docs/filename with
>> spaces.pdf?op=OPEN|success|
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<
>> namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spac
>> es.pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<
>> namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spac
>> es.pdf?op=OPEN&doAs=<username>|success|Response status: 404
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<
>> cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|Response
>> status: 404
>>
>> ==> /var/log/hadoop/knox/gateway.log <==
>> 2017-05-24 15:51:05,254 INFO  hadoop.gateway
>> (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn:
>> uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for
>> principal: <username>
>> 2017-05-24 15:51:05,259 INFO  hadoop.gateway
>> (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true
>>
>> The (redacted) topology
>>
>> <topology>
>>     <gateway>
>>         <provider>
>>             <role>authentication</role>
>>             <name>ShiroProvider</name>
>>             <enabled>true</enabled>
>>             <param>
>>                 <name>sessionTimeout</name>
>>                 <value>30</value>
>>             </param>
>>             <param>
>>                 <name>main.ldapRealm</name>
>>                 <value>org.apache.hadoop.gatew
>> ay.shirorealm.KnoxLdapRealm</value>
>>             </param>
>>             <param>
>>                 <name>main.ldapContextFactory</name>
>>                 <value>org.apache.hadoop.gatew
>> ay.shirorealm.KnoxLdapContextFactory</value>
>>             </param>
>>             <param>
>>                 <name>main.ldapRealm.contextFactory</name>
>>                 <value>$ldapContextFactory</value>
>>             </param>
>>             <param>
>>                 <name>main.ldapRealm.userDnTemplate</name>
>>                 <value>uid={0},cn=users,cn=accounts,dc=<cluster></value>
>>             </param>
>>             <param>
>>                 <name>main.ldapRealm.contextFactory.url</name>
>>                 <value>ldap://<freeipa_node>:389</value>
>>             </param>
>>             <param>
>>                 <name>main.ldapRealm.contextFa
>> ctory.authenticationMechanism</name>
>>                 <value>simple</value>
>>             </param>
>>             <param>
>>                 <name>urls./**</name>
>>                 <value>authcBasic</value>
>>             </param>
>>         </provider>
>>         <provider>
>>             <role>authorization</role>
>>             <name>AclsAuthz</name>
>>             <enabled>true</enabled>
>>             <param>
>>                 <name>knox.acl</name>
>>                 <value>admin;*;*</value>
>>             </param>
>>         </provider>
>>         <provider>
>>             <role>identity-assertion</role>
>>             <name>Default</name>
>>             <enabled>true</enabled>
>>         </provider>
>>         <provider>
>>             <role>hostmap</role>
>>             <name>static</name>
>>             <enabled>false</enabled>
>>             <param><name>localhost</name><value>sandbox,sandbox.hortonwo
>> rks.com</value></param>
>>         </provider>
>>     </gateway>
>>
>>     <service>
>>         <role>WEBHDFS</role>
>>         <url>http://<namenode>:50070/webhdfs</url>
>>     </service>
>>
>>     <service>
>>         <role>SOLRAPI</role>
>>         <url>http://<solrnode>:6083/solr</url>
>>     </service>
>> </topology>
>>
>>
>

Reply via email to