Hello, Me and a colleague have been working on setting up a Knox service for Livy, so that we can allow an external Jupyter setup to manage Spark sessions without handling Kerberos auth; basically following this guide:
https://community.hortonworks.com/articles/70499/adding-livy-server-as-service-to-apache-knox.html However, Livy doesn't seem to accept the calls coming through Knox, whereas if we POST directly to Livy using 'curl', all is good. >From a quick 'tcpdump' session, a difference seems to be that Knox uses chunked transfers and compression, so I decided to try out some options (see details further down), and there definitely appears to be a problem with compressing the request. Is there a way to disable compression for a particular service in Knox? NOTE: I know about 'gateway.gzip.compress.mime.types', but according to docs it only affects compression when sending data to the browser; we tried it nonetheless, and it didn't seem to help. TESTING DETAILS First, create some JSON to send to Livy: $ cat > session_johwar.json {"proxyUser":"johwar","kind": "pyspark"} $ gzip -n session_johwar.json Next, try a chunked and compressed POST request to /sessions: $ curl -u : --negotiate -v -s --trace-ascii http_trace_chunked_gz.log --data-binary @session_johwar.json.gz -H "Content-Type: application/json" -H 'Content-Encoding: gzip' -H 'Transfer-Encoding: chunked' http://myserver:8999/sessions "Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\r, \\n, \\t) is allowed between tokens\n at [Source: HttpInputOverHTTP@756a5d6c; line: 1, column: 2]" Nope.. log excerpt: 040e: User-Agent: curl/7.47.0 0427: Accept: */* 0434: Content-Type: application/json 0454: Content-Encoding: gzip 046c: Transfer-Encoding: chunked 0488: 048a: 3d => Send data, 68 bytes (0x44) 0000: ...........V*(....-N-R.R...(O,R.Q...KQ.RP*.,.H,.V.....7..)... 003f: 0 0042: == Info: upload completely sent off: 68 out of 61 bytes <= Recv header, 26 bytes (0x1a) 0000: HTTP/1.1 400 Bad Request <= Recv header, 37 bytes (0x25) 0000: Date: Thu, 24 Aug 2017 07:20:57 GMT <= Recv header, 362 bytes (0x16a) 0000: WWW-Authenticate: Negotiate ... <= Recv header, 132 bytes (0x84) 0000: Set-Cookie: hadoop.auth="u=johwar&..."; HttpOnly <= Recv header, 47 bytes (0x2f) 0000: Content-Type: application/json; charset=UTF-8 <= Recv header, 21 bytes (0x15) 0000: Content-Length: 172 <= Recv header, 33 bytes (0x21) 0000: Server: Jetty(9.2.16.v20160414) <= Recv header, 2 bytes (0x2) 0000: <= Recv data, 172 bytes (0xac) 0000: "Illegal character ((CTRL-CHAR, code 31)): only regular white sp 0040: ace (\\r, \\n, \\t) is allowed between tokens\n at [Source: Http 0080: InputOverHTTP@583564e8; line: 1, column: 2]" Ok, so let's try with just compression: $ curl -u : --negotiate -v -s --trace-ascii http_trace_gz.log --data-binary @session_johwar.json.gz -H "Content-Type: application/json" -H 'Content-Encoding: gzip' http://myserver:8999/sessions "Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\r, \\n, \\t) is allowed between tokens\n at [Source: HttpInputOverHTTP@188893c9; line: 1, column: 2]" Ok, no luck.. log is mostly the same, except for no chunking: 040e: User-Agent: curl/7.47.0 0427: Accept: */* 0434: Content-Type: application/json 0454: Content-Encoding: gzip 046c: Content-Length: 61 0480: => Send data, 61 bytes (0x3d) 0000: ...........V*(....-N-R.R...(O,R.Q...KQ.RP*.,.H,.V.....7..)... == Info: upload completely sent off: 61 out of 61 bytes <= Recv header, 26 bytes (0x1a) 0000: HTTP/1.1 400 Bad Request Decompress the file again: $ gunzip session_johwar.json.gz Then.. just a plain old request, "known" to work already: $ curl -u : --negotiate -v -s --trace-ascii http_trace.log --data @session_johwar.json -H "Content-Type: application/json" http://myserver:8999/sessions {"id":5,"appId":null,"owner":"johwar","proxyUser":"johwar","state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]} Yep. Log is looking a lot better: 040e: User-Agent: curl/7.47.0 0427: Accept: */* 0434: Content-Type: application/json 0454: Content-Length: 40 0468: => Send data, 40 bytes (0x28) 0000: {"proxyUser":"johwar","kind": "pyspark"} == Info: upload completely sent off: 40 out of 40 bytes <= Recv header, 22 bytes (0x16) 0000: HTTP/1.1 201 Created And with chunking? $ curl -u : --negotiate -v -s --trace-ascii http_trace_chunked.log --data @session_johwar.json -H "Content-Type: application/json" -H 'Transfer-Encoding: chunked' http://myserver:8999/sessions {"id":6,"appId":null,"owner":"johwar","proxyUser":"johwar","state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]} Still works.
