Thank you, Alex. Please file a JIRA for this with the above details. I will try and reproduce and investigate and see if we can't get it fixed or a workaround for the 0.13.0 release. This is planned for the end of next week.
On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) < [email protected]> wrote: > Hi Larry, > > The same file does work directly from WebHDFS (see below). Looking more > closely at the logs I sent previously, it looks like Knox (or something in > the chain I'm unaware of) is decoding the %20 encoded spaces, then > reencoding them as + encoded, i.e. > > 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|||| > access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf > ?op=OPEN|unavailable|Request method: GET > .. > 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|< > username>|||dispatch|uri|http://<namenode>.<cluster>:50070/ > webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response > status: 404 > > With thanks, Alex > > > Direct WebHDFS request (hostnames redacted) > > # curl -si -u: > "http://<namenode>:50070/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" > --negotiate -L | head -n40 > HTTP/1.1 401 Authentication required > Cache-Control: must-revalidate,no-cache,no-store > Date: Wed, 24 May 2017 19:01:41 GMT > Pragma: no-cache > Date: Wed, 24 May 2017 19:01:41 GMT > Pragma: no-cache > X-FRAME-OPTIONS: SAMEORIGIN > WWW-Authenticate: Negotiate > Set-Cookie: hadoop.auth=; Path=/; HttpOnly > Content-Type: text/html; charset=iso-8859-1 > Content-Length: 1533 > Server: Jetty(6.1.26.hwx) > > HTTP/1.1 307 TEMPORARY_REDIRECT > Cache-Control: no-cache > Expires: Wed, 24 May 2017 19:01:42 GMT > Date: Wed, 24 May 2017 19:01:42 GMT > Pragma: no-cache > Expires: Wed, 24 May 2017 19:01:42 GMT > Date: Wed, 24 May 2017 19:01:42 GMT > Pragma: no-cache > X-FRAME-OPTIONS: SAMEORIGIN > WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAg > EFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscx > MzW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU= > Set-Cookie: hadoop.auth="u=admin&p=admin@CYSAFA&t=kerberos&e= > 1495688502002&s=b7p35TgaxItAUTkKJuSXuynoq9E="; Path=/; HttpOnly > Content-Type: application/octet-stream > Location: http://<datanode3>:1022/webhdfs/v1/docs/filename% > 20with%20spaces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8ooBXF_ > ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYXRpb24P > MTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0 > Content-Length: 0 > Server: Jetty(6.1.26.hwx) > > HTTP/1.1 200 OK > Access-Control-Allow-Methods: GET > Access-Control-Allow-Origin: * > Content-Type: application/octet-stream > Connection: close > Content-Length: 13365618 > > %����1.6 > <</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream > ... > > > ------------------------------ > *From:* larry mccay [[email protected]] > *Sent:* 24 May 2017 18:05 > *To:* [email protected] > *Subject:* Re: Encoding/escaping whitespace in WebHDFS requests > > Hi Alex - > > I notice from the audit log that the 404 is actually coming from WebHDFS > not from Knox. > Can you confirm that direct access to WebHDFS without going through Knox > works with the same URL? > > thanks, > > --larry > > On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) < > [email protected]> wrote: > >> How should I encode spaces characters in the URL when I make a request to >> WebHDFS through Knox? Or should be enabling/configuring something in Knox >> to handle them? >> >> I'm making the following (redacted values in <>) request to WebHDFS, >> through Knox >> >> curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/ >> filename%20with%20spaces.pdf?op=OPEN" \ >> -<username>:<password> -k -s >> >> However Knox is returning HTTP 404 with the following body >> (whitespace/formatting added by me) >> >> {"exception":"FileNotFoundException", >> "javaClassName":"java.io.FileNotFoundException", >> "message":"File /docs/filename+with+spaces.pdf not found."}} >> >> I've tried encoding the spaces as + (same result), and not encoding them >> (HTTP 400 Unknown Version). >> If I request a file for which the path does not contain spaces then it >> works. >> >> Any ideas? >> >> With thanks, Alex >> >> >> >> PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK >> 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is >> enabled in the cluster. >> >> The (redacted) response headers for the %20 encoded request >> >> < HTTP/1.1 404 Not Found >> < Date: Wed, 24 May 2017 15:34:26 GMT >> < Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk4 >> 8y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly >> < Expires: Thu, 01 Jan 1970 00:00:00 GMT >> < Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0; >> Expires=Tue, 23-May-2017 15:34:26 GMT >> < Cache-Control: no-cache >> < Expires: Wed, 24 May 2017 15:34:26 GMT >> < Date: Wed, 24 May 2017 15:34:26 GMT >> < Pragma: no-cache >> < Expires: Wed, 24 May 2017 15:34:26 GMT >> < Date: Wed, 24 May 2017 15:34:26 GMT >> < Pragma: no-cache >> < X-FRAME-OPTIONS: SAMEORIGIN >> < Content-Type: application/json; charset=UTF-8 >> < Server: Jetty(6.1.26.hwx) >> < Content-Length: 252 >> >> The (redacted) Knox logs for the %20 encoded request >> >> ==> /var/log/hadoop/knox/gateway-audit.log <== >> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename >> with spaces.pdf?op=OPEN|unavailable|Request method: GET >> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/ >> gateway/<cluster>/webhdfs/v1/docs/filename with >> spaces.pdf?op=OPEN|success| >> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/ >> gateway/<cluster>/webhdfs/v1/docs/filename with >> spaces.pdf?op=OPEN|success|Groups: [] >> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >> 6b38130e|audit|WEBHDFS|<username>|||authorization|uri|/ >> gateway/<cluster>/webhdfs/v1/docs/filename with >> spaces.pdf?op=OPEN|success| >> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://< >> namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spac >> es.pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET >> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://< >> namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spac >> es.pdf?op=OPEN&doAs=<username>|success|Response status: 404 >> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >> 6b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/< >> cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|Response >> status: 404 >> >> ==> /var/log/hadoop/knox/gateway.log <== >> 2017-05-24 15:51:05,254 INFO hadoop.gateway >> (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn: >> uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for >> principal: <username> >> 2017-05-24 15:51:05,259 INFO hadoop.gateway >> (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true >> >> The (redacted) topology >> >> <topology> >> <gateway> >> <provider> >> <role>authentication</role> >> <name>ShiroProvider</name> >> <enabled>true</enabled> >> <param> >> <name>sessionTimeout</name> >> <value>30</value> >> </param> >> <param> >> <name>main.ldapRealm</name> >> <value>org.apache.hadoop.gatew >> ay.shirorealm.KnoxLdapRealm</value> >> </param> >> <param> >> <name>main.ldapContextFactory</name> >> <value>org.apache.hadoop.gatew >> ay.shirorealm.KnoxLdapContextFactory</value> >> </param> >> <param> >> <name>main.ldapRealm.contextFactory</name> >> <value>$ldapContextFactory</value> >> </param> >> <param> >> <name>main.ldapRealm.userDnTemplate</name> >> <value>uid={0},cn=users,cn=accounts,dc=<cluster></value> >> </param> >> <param> >> <name>main.ldapRealm.contextFactory.url</name> >> <value>ldap://<freeipa_node>:389</value> >> </param> >> <param> >> <name>main.ldapRealm.contextFa >> ctory.authenticationMechanism</name> >> <value>simple</value> >> </param> >> <param> >> <name>urls./**</name> >> <value>authcBasic</value> >> </param> >> </provider> >> <provider> >> <role>authorization</role> >> <name>AclsAuthz</name> >> <enabled>true</enabled> >> <param> >> <name>knox.acl</name> >> <value>admin;*;*</value> >> </param> >> </provider> >> <provider> >> <role>identity-assertion</role> >> <name>Default</name> >> <enabled>true</enabled> >> </provider> >> <provider> >> <role>hostmap</role> >> <name>static</name> >> <enabled>false</enabled> >> <param><name>localhost</name><value>sandbox,sandbox.hortonwo >> rks.com</value></param> >> </provider> >> </gateway> >> >> <service> >> <role>WEBHDFS</role> >> <url>http://<namenode>:50070/webhdfs</url> >> </service> >> >> <service> >> <role>SOLRAPI</role> >> <url>http://<solrnode>:6083/solr</url> >> </service> >> </topology> >> >> >
