Hello,

Me and a colleague have been working on setting up a Knox service for Livy,
so that we can allow an external Jupyter setup to manage Spark sessions
without handling Kerberos auth; basically following this guide:

https://community.hortonworks.com/articles/70499/adding-livy-server-as-service-to-apache-knox.html

However, Livy doesn't seem to accept the calls coming through Knox, whereas
if we POST directly to Livy using 'curl', all is good.

>From a quick 'tcpdump' session, a difference seems to be that Knox uses
chunked transfers and compression, so I decided to try out some options
(see details further down), and there definitely appears to be a problem
with compressing the request.

Is there a way to disable compression for a particular service in Knox?

NOTE: I know about 'gateway.gzip.compress.mime.types', but according to
docs it only affects compression when sending data to the browser; we tried
it nonetheless, and it didn't seem to help.

TESTING DETAILS

First, create some JSON to send to Livy:

$ cat > session_johwar.json
{"proxyUser":"johwar","kind": "pyspark"}
$ gzip -n session_johwar.json

Next, try a chunked and compressed POST request to /sessions:

$ curl -u : --negotiate -v -s --trace-ascii http_trace_chunked_gz.log
--data-binary @session_johwar.json.gz -H "Content-Type: application/json"
-H 'Content-Encoding: gzip' -H 'Transfer-Encoding: chunked'
http://myserver:8999/sessions
"Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\r,
\\n, \\t) is allowed between tokens\n at [Source: HttpInputOverHTTP@756a5d6c;
line: 1, column: 2]"

Nope.. log excerpt:

040e: User-Agent: curl/7.47.0
0427: Accept: */*
0434: Content-Type: application/json
0454: Content-Encoding: gzip
046c: Transfer-Encoding: chunked
0488:
048a: 3d
=> Send data, 68 bytes (0x44)
0000: ...........V*(....-N-R.R...(O,R.Q...KQ.RP*.,.H,.V.....7..)...
003f: 0
0042:
== Info: upload completely sent off: 68 out of 61 bytes
<= Recv header, 26 bytes (0x1a)
0000: HTTP/1.1 400 Bad Request
<= Recv header, 37 bytes (0x25)
0000: Date: Thu, 24 Aug 2017 07:20:57 GMT
<= Recv header, 362 bytes (0x16a)
0000: WWW-Authenticate: Negotiate ...
<= Recv header, 132 bytes (0x84)
0000: Set-Cookie: hadoop.auth="u=johwar&..."; HttpOnly
<= Recv header, 47 bytes (0x2f)
0000: Content-Type: application/json; charset=UTF-8
<= Recv header, 21 bytes (0x15)
0000: Content-Length: 172
<= Recv header, 33 bytes (0x21)
0000: Server: Jetty(9.2.16.v20160414)
<= Recv header, 2 bytes (0x2)
0000:
<= Recv data, 172 bytes (0xac)
0000: "Illegal character ((CTRL-CHAR, code 31)): only regular white sp
0040: ace (\\r, \\n, \\t) is allowed between tokens\n at [Source: Http
0080: InputOverHTTP@583564e8; line: 1, column: 2]"

Ok, so let's try with just compression:

$ curl -u : --negotiate -v -s --trace-ascii http_trace_gz.log --data-binary
@session_johwar.json.gz -H "Content-Type: application/json" -H
'Content-Encoding: gzip' http://myserver:8999/sessions
"Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\r,
\\n, \\t) is allowed between tokens\n at [Source: HttpInputOverHTTP@188893c9;
line: 1, column: 2]"

Ok, no luck.. log is mostly the same, except for no chunking:

040e: User-Agent: curl/7.47.0
0427: Accept: */*
0434: Content-Type: application/json
0454: Content-Encoding: gzip
046c: Content-Length: 61
0480:
=> Send data, 61 bytes (0x3d)
0000: ...........V*(....-N-R.R...(O,R.Q...KQ.RP*.,.H,.V.....7..)...
== Info: upload completely sent off: 61 out of 61 bytes
<= Recv header, 26 bytes (0x1a)
0000: HTTP/1.1 400 Bad Request

Decompress the file again:

$ gunzip session_johwar.json.gz

Then.. just a plain old request, "known" to work already:

$ curl -u : --negotiate -v -s --trace-ascii http_trace.log --data
@session_johwar.json -H "Content-Type: application/json"
http://myserver:8999/sessions
{"id":5,"appId":null,"owner":"johwar","proxyUser":"johwar","state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}

Yep. Log is looking a lot better:

040e: User-Agent: curl/7.47.0
0427: Accept: */*
0434: Content-Type: application/json
0454: Content-Length: 40
0468:
=> Send data, 40 bytes (0x28)
0000: {"proxyUser":"johwar","kind": "pyspark"}
== Info: upload completely sent off: 40 out of 40 bytes
<= Recv header, 22 bytes (0x16)
0000: HTTP/1.1 201 Created

And with chunking?

$ curl -u : --negotiate -v -s --trace-ascii http_trace_chunked.log --data
@session_johwar.json -H "Content-Type: application/json" -H
'Transfer-Encoding: chunked' http://myserver:8999/sessions
{"id":6,"appId":null,"owner":"johwar","proxyUser":"johwar","state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}

Still works.

Reply via email to