Doubt that it is related actually. We do need to determine whether we have blocker bugs as we are going to try and close down on 0.13.0 over the next week.
On Wed, May 24, 2017 at 9:53 PM, Kevin Risden <[email protected]> wrote: > Thanks Larry yea I had stumbled upon KNOX-709. Way more detail is in > thread "WebHBase URL Encoding Issue". Didn't want to hijack this thread if > the WebHBase issue isn't related. > > Kevin Risden > > On Wed, May 24, 2017 at 6:08 PM, larry mccay <[email protected]> wrote: > >> Hi Kevin - >> >> You may see some change related to https://issues.apache.org/j >> ira/browse/KNOX-709. >> >> thanks, >> >> --larry >> >> On Wed, May 24, 2017 at 6:24 PM, Kevin Risden <[email protected]> >> wrote: >> >>> Just saw this as I was submitting a potentially related WebHBase url >>> encoding email to the knox-user list. Curious if they are related. >>> >>> Alex - out of curiousity did you use Knox with HDP 2.4 or prior and not >>> see this issue? >>> >>> Kevin Risden >>> >>> On Wed, May 24, 2017 at 4:08 PM, larry mccay <[email protected]> >>> wrote: >>> >>>> Thank you, Alex. >>>> >>>> Please file a JIRA for this with the above details. >>>> I will try and reproduce and investigate and see if we can't get it >>>> fixed or a workaround for the 0.13.0 release. >>>> This is planned for the end of next week. >>>> >>>> On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) < >>>> [email protected]> wrote: >>>> >>>>> Hi Larry, >>>>> >>>>> The same file does work directly from WebHDFS (see below). Looking >>>>> more closely at the logs I sent previously, it looks like Knox (or >>>>> something in the chain I'm unaware of) is decoding the %20 encoded spaces, >>>>> then reencoding them as + encoded, i.e. >>>>> >>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webh >>>>> dfs/v1/docs/filename with spaces.pdf?op=OPEN|unavailable|Request >>>>> method: GET >>>>> .. >>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam >>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf >>>>> ?op=OPEN&doAs=<username>|success|Response status: 404 >>>>> >>>>> With thanks, Alex >>>>> >>>>> >>>>> Direct WebHDFS request (hostnames redacted) >>>>> >>>>> # curl -si -u: "http://<namenode>:50070/webhd >>>>> fs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" --negotiate -L | >>>>> head -n40 >>>>> HTTP/1.1 401 Authentication required >>>>> Cache-Control: must-revalidate,no-cache,no-store >>>>> Date: Wed, 24 May 2017 19:01:41 GMT >>>>> Pragma: no-cache >>>>> Date: Wed, 24 May 2017 19:01:41 GMT >>>>> Pragma: no-cache >>>>> X-FRAME-OPTIONS: SAMEORIGIN >>>>> WWW-Authenticate: Negotiate >>>>> Set-Cookie: hadoop.auth=; Path=/; HttpOnly >>>>> Content-Type: text/html; charset=iso-8859-1 >>>>> Content-Length: 1533 >>>>> Server: Jetty(6.1.26.hwx) >>>>> >>>>> HTTP/1.1 307 TEMPORARY_REDIRECT >>>>> Cache-Control: no-cache >>>>> Expires: Wed, 24 May 2017 19:01:42 GMT >>>>> Date: Wed, 24 May 2017 19:01:42 GMT >>>>> Pragma: no-cache >>>>> Expires: Wed, 24 May 2017 19:01:42 GMT >>>>> Date: Wed, 24 May 2017 19:01:42 GMT >>>>> Pragma: no-cache >>>>> X-FRAME-OPTIONS: SAMEORIGIN >>>>> WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAg >>>>> EFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxM >>>>> zW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU= >>>>> Set-Cookie: hadoop.auth="u=admin&p=admin@C >>>>> YSAFA&t=kerberos&e=1495688502002&s=b7p35TgaxItAUTkKJuSXuynoq9E="; >>>>> Path=/; HttpOnly >>>>> Content-Type: application/octet-stream >>>>> Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%20sp >>>>> aces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8ooB >>>>> XF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYXRp >>>>> b24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0 >>>>> Content-Length: 0 >>>>> Server: Jetty(6.1.26.hwx) >>>>> >>>>> HTTP/1.1 200 OK >>>>> Access-Control-Allow-Methods: GET >>>>> Access-Control-Allow-Origin: * >>>>> Content-Type: application/octet-stream >>>>> Connection: close >>>>> Content-Length: 13365618 >>>>> >>>>> %����1.6 >>>>> <</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream >>>>> ... >>>>> >>>>> >>>>> ------------------------------ >>>>> *From:* larry mccay [[email protected]] >>>>> *Sent:* 24 May 2017 18:05 >>>>> *To:* [email protected] >>>>> *Subject:* Re: Encoding/escaping whitespace in WebHDFS requests >>>>> >>>>> Hi Alex - >>>>> >>>>> I notice from the audit log that the 404 is actually coming from >>>>> WebHDFS not from Knox. >>>>> Can you confirm that direct access to WebHDFS without going through >>>>> Knox works with the same URL? >>>>> >>>>> thanks, >>>>> >>>>> --larry >>>>> >>>>> On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) < >>>>> [email protected]> wrote: >>>>> >>>>>> How should I encode spaces characters in the URL when I make a >>>>>> request to WebHDFS through Knox? Or should be enabling/configuring >>>>>> something in Knox to handle them? >>>>>> >>>>>> I'm making the following (redacted values in <>) request to WebHDFS, >>>>>> through Knox >>>>>> >>>>>> curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/ >>>>>> filename%20with%20spaces.pdf?op=OPEN" \ >>>>>> -<username>:<password> -k -s >>>>>> >>>>>> However Knox is returning HTTP 404 with the following body >>>>>> (whitespace/formatting added by me) >>>>>> >>>>>> {"exception":"FileNotFoundException", >>>>>> "javaClassName":"java.io.FileNotFoundException", >>>>>> "message":"File /docs/filename+with+spaces.pdf not found."}} >>>>>> >>>>>> I've tried encoding the spaces as + (same result), and not encoding >>>>>> them (HTTP 400 Unknown Version). >>>>>> If I request a file for which the path does not contain spaces then >>>>>> it works. >>>>>> >>>>>> Any ideas? >>>>>> >>>>>> With thanks, Alex >>>>>> >>>>>> >>>>>> >>>>>> PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK >>>>>> 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos >>>>>> is >>>>>> enabled in the cluster. >>>>>> >>>>>> The (redacted) response headers for the %20 encoded request >>>>>> >>>>>> < HTTP/1.1 404 Not Found >>>>>> < Date: Wed, 24 May 2017 15:34:26 GMT >>>>>> < Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk4 >>>>>> 8y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly >>>>>> < Expires: Thu, 01 Jan 1970 00:00:00 GMT >>>>>> < Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0; >>>>>> Expires=Tue, 23-May-2017 15:34:26 GMT >>>>>> < Cache-Control: no-cache >>>>>> < Expires: Wed, 24 May 2017 15:34:26 GMT >>>>>> < Date: Wed, 24 May 2017 15:34:26 GMT >>>>>> < Pragma: no-cache >>>>>> < Expires: Wed, 24 May 2017 15:34:26 GMT >>>>>> < Date: Wed, 24 May 2017 15:34:26 GMT >>>>>> < Pragma: no-cache >>>>>> < X-FRAME-OPTIONS: SAMEORIGIN >>>>>> < Content-Type: application/json; charset=UTF-8 >>>>>> < Server: Jetty(6.1.26.hwx) >>>>>> < Content-Length: 252 >>>>>> >>>>>> The (redacted) Knox logs for the %20 encoded request >>>>>> >>>>>> ==> /var/log/hadoop/knox/gateway-audit.log <== >>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename >>>>>> with spaces.pdf?op=OPEN|unavailable|Request method: GET >>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate >>>>>> way/<cluster>/webhdfs/v1/docs/filename with >>>>>> spaces.pdf?op=OPEN|success| >>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate >>>>>> way/<cluster>/webhdfs/v1/docs/filename with >>>>>> spaces.pdf?op=OPEN|success|Groups: [] >>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>>>> 6b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gatew >>>>>> ay/<cluster>/webhdfs/v1/docs/filename with >>>>>> spaces.pdf?op=OPEN|success| >>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam >>>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces. >>>>>> pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET >>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam >>>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces. >>>>>> pdf?op=OPEN&doAs=<username>|success|Response status: 404 >>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9 >>>>>> 6b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename >>>>>> with spaces.pdf?op=OPEN|success|Response status: 404 >>>>>> >>>>>> ==> /var/log/hadoop/knox/gateway.log <== >>>>>> 2017-05-24 15:51:05,254 INFO hadoop.gateway >>>>>> (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn: >>>>>> uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate >>>>>> for principal: <username> >>>>>> 2017-05-24 15:51:05,259 INFO hadoop.gateway >>>>>> (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true >>>>>> >>>>>> The (redacted) topology >>>>>> >>>>>> <topology> >>>>>> <gateway> >>>>>> <provider> >>>>>> <role>authentication</role> >>>>>> <name>ShiroProvider</name> >>>>>> <enabled>true</enabled> >>>>>> <param> >>>>>> <name>sessionTimeout</name> >>>>>> <value>30</value> >>>>>> </param> >>>>>> <param> >>>>>> <name>main.ldapRealm</name> >>>>>> <value>org.apache.hadoop.gatew >>>>>> ay.shirorealm.KnoxLdapRealm</value> >>>>>> </param> >>>>>> <param> >>>>>> <name>main.ldapContextFactory</name> >>>>>> <value>org.apache.hadoop.gatew >>>>>> ay.shirorealm.KnoxLdapContextFactory</value> >>>>>> </param> >>>>>> <param> >>>>>> <name>main.ldapRealm.contextFactory</name> >>>>>> <value>$ldapContextFactory</value> >>>>>> </param> >>>>>> <param> >>>>>> <name>main.ldapRealm.userDnTemplate</name> >>>>>> <value>uid={0},cn=users,cn=acc >>>>>> ounts,dc=<cluster></value> >>>>>> </param> >>>>>> <param> >>>>>> <name>main.ldapRealm.contextFactory.url</name> >>>>>> <value>ldap://<freeipa_node>:389</value> >>>>>> </param> >>>>>> <param> >>>>>> <name>main.ldapRealm.contextFa >>>>>> ctory.authenticationMechanism</name> >>>>>> <value>simple</value> >>>>>> </param> >>>>>> <param> >>>>>> <name>urls./**</name> >>>>>> <value>authcBasic</value> >>>>>> </param> >>>>>> </provider> >>>>>> <provider> >>>>>> <role>authorization</role> >>>>>> <name>AclsAuthz</name> >>>>>> <enabled>true</enabled> >>>>>> <param> >>>>>> <name>knox.acl</name> >>>>>> <value>admin;*;*</value> >>>>>> </param> >>>>>> </provider> >>>>>> <provider> >>>>>> <role>identity-assertion</role> >>>>>> <name>Default</name> >>>>>> <enabled>true</enabled> >>>>>> </provider> >>>>>> <provider> >>>>>> <role>hostmap</role> >>>>>> <name>static</name> >>>>>> <enabled>false</enabled> >>>>>> <param><name>localhost</name><value>sandbox, >>>>>> sandbox.hortonworks.com</value></param> >>>>>> </provider> >>>>>> </gateway> >>>>>> >>>>>> <service> >>>>>> <role>WEBHDFS</role> >>>>>> <url>http://<namenode>:50070/webhdfs</url> >>>>>> </service> >>>>>> >>>>>> <service> >>>>>> <role>SOLRAPI</role> >>>>>> <url>http://<solrnode>:6083/solr</url> >>>>>> </service> >>>>>> </topology> >>>>>> >>>>>> >>>>> >>>> >>> >> >
