Hi Kevin - You may see some change related to https://issues.apache.org/jira/browse/KNOX-709.
thanks, --larry On Wed, May 24, 2017 at 6:24 PM, Kevin Risden <[email protected]> wrote: > Just saw this as I was submitting a potentially related WebHBase url > encoding email to the knox-user list. Curious if they are related. > > Alex - out of curiousity did you use Knox with HDP 2.4 or prior and not > see this issue? > > Kevin Risden > > On Wed, May 24, 2017 at 4:08 PM, larry mccay <[email protected]> > wrote: > >> Thank you, Alex. >> >> Please file a JIRA for this with the above details. >> I will try and reproduce and investigate and see if we can't get it fixed >> or a workaround for the 0.13.0 release. >> This is planned for the end of next week. >> >> On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) < >> [email protected]> wrote: >> >>> Hi Larry, >>> >>> The same file does work directly from WebHDFS (see below). Looking more >>> closely at the logs I sent previously, it looks like Knox (or something in >>> the chain I'm unaware of) is decoding the %20 encoded spaces, then >>> reencoding them as + encoded, i.e. >>> >>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename >>> with spaces.pdf?op=OPEN|unavailable|Request method: GET >>> .. >>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam >>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf >>> ?op=OPEN&doAs=<username>|success|Response status: 404 >>> >>> With thanks, Alex >>> >>> >>> Direct WebHDFS request (hostnames redacted) >>> >>> # curl -si -u: "http://<namenode>:50070/webhd >>> fs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" --negotiate -L | head >>> -n40 >>> HTTP/1.1 401 Authentication required >>> Cache-Control: must-revalidate,no-cache,no-store >>> Date: Wed, 24 May 2017 19:01:41 GMT >>> Pragma: no-cache >>> Date: Wed, 24 May 2017 19:01:41 GMT >>> Pragma: no-cache >>> X-FRAME-OPTIONS: SAMEORIGIN >>> WWW-Authenticate: Negotiate >>> Set-Cookie: hadoop.auth=; Path=/; HttpOnly >>> Content-Type: text/html; charset=iso-8859-1 >>> Content-Length: 1533 >>> Server: Jetty(6.1.26.hwx) >>> >>> HTTP/1.1 307 TEMPORARY_REDIRECT >>> Cache-Control: no-cache >>> Expires: Wed, 24 May 2017 19:01:42 GMT >>> Date: Wed, 24 May 2017 19:01:42 GMT >>> Pragma: no-cache >>> Expires: Wed, 24 May 2017 19:01:42 GMT >>> Date: Wed, 24 May 2017 19:01:42 GMT >>> Pragma: no-cache >>> X-FRAME-OPTIONS: SAMEORIGIN >>> WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAg >>> EFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxM >>> zW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU= >>> Set-Cookie: hadoop.auth="u=admin&p=admin@CYSAFA&t=kerberos&e=14956885020 >>> 02&s=b7p35TgaxItAUTkKJuSXuynoq9E="; Path=/; HttpOnly >>> Content-Type: application/octet-stream >>> Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%20sp >>> aces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8o >>> oBXF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYX >>> Rpb24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0 >>> Content-Length: 0 >>> Server: Jetty(6.1.26.hwx) >>> >>> HTTP/1.1 200 OK >>> Access-Control-Allow-Methods: GET >>> Access-Control-Allow-Origin: * >>> Content-Type: application/octet-stream >>> Connection: close >>> Content-Length: 13365618 >>> >>> %����1.6 >>> <</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream >>> ... >>> >>> >>> ------------------------------ >>> *From:* larry mccay [[email protected]] >>> *Sent:* 24 May 2017 18:05 >>> *To:* [email protected] >>> *Subject:* Re: Encoding/escaping whitespace in WebHDFS requests >>> >>> Hi Alex - >>> >>> I notice from the audit log that the 404 is actually coming from WebHDFS >>> not from Knox. >>> Can you confirm that direct access to WebHDFS without going through Knox >>> works with the same URL? >>> >>> thanks, >>> >>> --larry >>> >>> On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) < >>> [email protected]> wrote: >>> >>>> How should I encode spaces characters in the URL when I make a request >>>> to WebHDFS through Knox? Or should be enabling/configuring something in >>>> Knox to handle them? >>>> >>>> I'm making the following (redacted values in <>) request to WebHDFS, >>>> through Knox >>>> >>>> curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/ >>>> filename%20with%20spaces.pdf?op=OPEN" \ >>>> -<username>:<password> -k -s >>>> >>>> However Knox is returning HTTP 404 with the following body >>>> (whitespace/formatting added by me) >>>> >>>> {"exception":"FileNotFoundException", >>>> "javaClassName":"java.io.FileNotFoundException", >>>> "message":"File /docs/filename+with+spaces.pdf not found."}} >>>> >>>> I've tried encoding the spaces as + (same result), and not encoding >>>> them (HTTP 400 Unknown Version). >>>> If I request a file for which the path does not contain spaces then it >>>> works. >>>> >>>> Any ideas? >>>> >>>> With thanks, Alex >>>> >>>> >>>> >>>> PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK >>>> 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is >>>> enabled in the cluster. >>>> >>>> The (redacted) response headers for the %20 encoded request >>>> >>>> < HTTP/1.1 404 Not Found >>>> < Date: Wed, 24 May 2017 15:34:26 GMT >>>> < Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk4 >>>> 8y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly >>>> < Expires: Thu, 01 Jan 1970 00:00:00 GMT >>>> < Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0; >>>> Expires=Tue, 23-May-2017 15:34:26 GMT >>>> < Cache-Control: no-cache >>>> < Expires: Wed, 24 May 2017 15:34:26 GMT >>>> < Date: Wed, 24 May 2017 15:34:26 GMT >>>> < Pragma: no-cache >>>> < Expires: Wed, 24 May 2017 15:34:26 GMT >>>> < Date: Wed, 24 May 2017 15:34:26 GMT >>>> < Pragma: no-cache >>>> < X-FRAME-OPTIONS: SAMEORIGIN >>>> < Content-Type: application/json; charset=UTF-8 >>>> < Server: Jetty(6.1.26.hwx) >>>> < Content-Length: 252 >>>> >>>> The (redacted) Knox logs for the %20 encoded request >>>> >>>> ==> /var/log/hadoop/knox/gateway-audit.log <== >>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename >>>> with spaces.pdf?op=OPEN|unavailable|Request method: GET >>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate >>>> way/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success| >>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate >>>> way/<cluster>/webhdfs/v1/docs/filename with >>>> spaces.pdf?op=OPEN|success|Groups: [] >>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>> 6b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gatew >>>> ay/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success| >>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam >>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces. >>>> pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET >>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam >>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces. >>>> pdf?op=OPEN&doAs=<username>|success|Response status: 404 >>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>> 6b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename >>>> with spaces.pdf?op=OPEN|success|Response status: 404 >>>> >>>> ==> /var/log/hadoop/knox/gateway.log <== >>>> 2017-05-24 15:51:05,254 INFO hadoop.gateway >>>> (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn: >>>> uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for >>>> principal: <username> >>>> 2017-05-24 15:51:05,259 INFO hadoop.gateway >>>> (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true >>>> >>>> The (redacted) topology >>>> >>>> <topology> >>>> <gateway> >>>> <provider> >>>> <role>authentication</role> >>>> <name>ShiroProvider</name> >>>> <enabled>true</enabled> >>>> <param> >>>> <name>sessionTimeout</name> >>>> <value>30</value> >>>> </param> >>>> <param> >>>> <name>main.ldapRealm</name> >>>> <value>org.apache.hadoop.gatew >>>> ay.shirorealm.KnoxLdapRealm</value> >>>> </param> >>>> <param> >>>> <name>main.ldapContextFactory</name> >>>> <value>org.apache.hadoop.gatew >>>> ay.shirorealm.KnoxLdapContextFactory</value> >>>> </param> >>>> <param> >>>> <name>main.ldapRealm.contextFactory</name> >>>> <value>$ldapContextFactory</value> >>>> </param> >>>> <param> >>>> <name>main.ldapRealm.userDnTemplate</name> >>>> <value>uid={0},cn=users,cn=acc >>>> ounts,dc=<cluster></value> >>>> </param> >>>> <param> >>>> <name>main.ldapRealm.contextFactory.url</name> >>>> <value>ldap://<freeipa_node>:389</value> >>>> </param> >>>> <param> >>>> <name>main.ldapRealm.contextFa >>>> ctory.authenticationMechanism</name> >>>> <value>simple</value> >>>> </param> >>>> <param> >>>> <name>urls./**</name> >>>> <value>authcBasic</value> >>>> </param> >>>> </provider> >>>> <provider> >>>> <role>authorization</role> >>>> <name>AclsAuthz</name> >>>> <enabled>true</enabled> >>>> <param> >>>> <name>knox.acl</name> >>>> <value>admin;*;*</value> >>>> </param> >>>> </provider> >>>> <provider> >>>> <role>identity-assertion</role> >>>> <name>Default</name> >>>> <enabled>true</enabled> >>>> </provider> >>>> <provider> >>>> <role>hostmap</role> >>>> <name>static</name> >>>> <enabled>false</enabled> >>>> <param><name>localhost</name><value>sandbox, >>>> sandbox.hortonworks.com</value></param> >>>> </provider> >>>> </gateway> >>>> >>>> <service> >>>> <role>WEBHDFS</role> >>>> <url>http://<namenode>:50070/webhdfs</url> >>>> </service> >>>> >>>> <service> >>>> <role>SOLRAPI</role> >>>> <url>http://<solrnode>:6083/solr</url> >>>> </service> >>>> </topology> >>>> >>>> >>> >> >
