Some updates for the benefit of the list

On 28/03/12 17:22, Sergey Beryozkin wrote:
Hi Chris
On 28/03/12 15:15, chrismattmann wrote:
Hi Sergey,



Sergey Beryozkin-5 wrote

UnpackerResource does not have any specific requirements about the
incoming Content-Type, it has no specific @Consumes.

415 is returned if Content-Type is not supported but right now
UnpackerResource has effectively @Consumes("*/*"). I would not be
surprised if Jersey also assumed the default wildcard, otherwise, given
that UnpackerResource does not have explicit @Consumes, it would not be
possible to post/put to it with Jersey clients.


Yeah, what's weird is that Jersey didn't allow through the request if you
specified "xxx/xxx" as the Accept header (via a call to
.accept("xxx/xxx")
which our tests used to do. Maybe the test was more exposing a
functionality
of Jersey than anything else, but I have to ask whether or not that's the
correct
functionality in general (which it kind of seems like it should be).

For example there is probably a set of known media types that are the
IANA
types and if it's not in that list, then maybe it should 415, dunno...


May be we can introduce a CXF property which can require the media type
parser to do some kind of strict validation of the main type against the
well-known types (text, application, image, few others), such as 'xxx'
in the 'xxx/aaa' etc, as subtypes are tricky to validate they can be
dynamic, wildcards, composite (as in xxx/a+b), etc. That may be a good
enhancement.

The issue was caused by Tika ExceptionMapper 'lost' during the migration, due to CXF requiring the explicit registration of providers and with Jersey auto-discovering it.

While I've been always pessimistic about the class scanning support, I appreciate that it can simplify the 'life' of applications depending on simple, non-configurable providers, hence

https://issues.apache.org/jira/browse/CXF-4199

I'm still contemplating though whether we should optionally validate main media types or not :-)

Thanks, Sergey





Sergey Beryozkin-5 wrote

"xxx/xxx" is a valid MediaType format so CXF parses it and lets through.
I think it really boils down to whether "xxx/xxx" can be treated as a
valid media type or not. We can add a check for "xxx/xxx" but then
someone will set "text/123" - that is acceptable enough but I guess not
something the tika server wants to accept :-)


True, good point. In fact, what we ought to do with both communities is
to inject Tika in there :) and to use Tika for detection and then drive
what we do in CXF from there (and thus in tika-server). Cross
fertilization
of projects!
:)

Sounds good :-)

Sergey Beryozkin-5 wrote

I guess I can update UnpackagerResource to guard against this and throw
HTTP
415 itself, but it seems a bit intrusive and I was really hoping for
more
of a seamless
Jar/Maven drop in fix.

I think the proper fix is to explicitly list supported content-types,
using wildcards when possible, example image/*, or a/b+*, etc


One other question I had though in this situation. It seems like CXF
breaks
the incoming
InputStream if I specify "xxx/xxx". For example, Paul and I have tested,
specifying
a *known* Accept type, with the same stream and the Content-type:
application/msword,
and we have seen the request go through via CXF and Tika parses the
stream
correctly
using its AutoDetectParser. Now, if we change the Content-type to
"xxx/xxx",
then Tika
and its AutoDetectParser get the stream, but it appears to have been
changed. Is
CXF changing the InputStream or defaulting to a different encoding or
whatever? It seems
like if CXF is going to let through "xxx/xxx" and then the stream gets to
Tika, instead of
500 like we're getting in the test415 right now, we should be getting
HTTP
200 OK at
the very least, which we do get if we change the Content-type to
something
with the same
encoding as the input stream, in which case it looks like CXF doesn't
modify
the stream.
For example, we tried changing Content-type to application/json just
to see
what would
happen on test415, and that returned a 204 error.


It's a bug somewhere in the Tika Detector implementation. I did not
debug it but narrowed it down. Tika keeps the repository of well known
types and if the repository is not aware of the given media type then it
attempts to detect by peeking into the input stream.

I reproduced 500 again by providing "application/json+cloud" custom
media type with the composite subtype.

Cheers, Sergey

Thoughts?

Cheers,
Chris& Paul



--
View this message in context:
http://cxf.547215.n5.nabble.com/TIKA-593-odd-behavior-related-to-CXF-JAX-RS-services-and-415-Http-response-codes-tp5600131p5600648.html

Sent from the cxf-user mailing list archive at Nabble.com.




--
Sergey Beryozkin

Talend Community Coders
http://coders.talend.com/

Blog: http://sberyozkin.blogspot.com

Reply via email to