Thanks Sergey, for all your help!

Cheers,
Chris

On Mar 29, 2012, at 9:17 AM, Sergey Beryozkin wrote:

> Some updates for the benefit of the list
> 
> On 28/03/12 17:22, Sergey Beryozkin wrote:
>> Hi Chris
>> On 28/03/12 15:15, chrismattmann wrote:
>>> Hi Sergey,
>>> 
>>> 
>>> 
>>> Sergey Beryozkin-5 wrote
>>>> 
>>>> UnpackerResource does not have any specific requirements about the
>>>> incoming Content-Type, it has no specific @Consumes.
>>>> 
>>>> 415 is returned if Content-Type is not supported but right now
>>>> UnpackerResource has effectively @Consumes("*/*"). I would not be
>>>> surprised if Jersey also assumed the default wildcard, otherwise, given
>>>> that UnpackerResource does not have explicit @Consumes, it would not be
>>>> possible to post/put to it with Jersey clients.
>>>> 
>>> 
>>> Yeah, what's weird is that Jersey didn't allow through the request if you
>>> specified "xxx/xxx" as the Accept header (via a call to
>>> .accept("xxx/xxx")
>>> which our tests used to do. Maybe the test was more exposing a
>>> functionality
>>> of Jersey than anything else, but I have to ask whether or not that's the
>>> correct
>>> functionality in general (which it kind of seems like it should be).
>>> 
>>> For example there is probably a set of known media types that are the
>>> IANA
>>> types and if it's not in that list, then maybe it should 415, dunno...
>>> 
>> 
>> May be we can introduce a CXF property which can require the media type
>> parser to do some kind of strict validation of the main type against the
>> well-known types (text, application, image, few others), such as 'xxx'
>> in the 'xxx/aaa' etc, as subtypes are tricky to validate they can be
>> dynamic, wildcards, composite (as in xxx/a+b), etc. That may be a good
>> enhancement.
> 
> The issue was caused by Tika ExceptionMapper 'lost' during the 
> migration, due to CXF requiring the explicit registration of providers 
> and with Jersey auto-discovering it.
> 
> While I've been always pessimistic about the class scanning support, I 
> appreciate that  it can simplify the 'life' of applications depending on 
> simple, non-configurable providers, hence
> 
> https://issues.apache.org/jira/browse/CXF-4199
> 
> I'm still contemplating though whether we should optionally validate 
> main media types or not :-)
> 
> Thanks, Sergey
> 
> 
>> 
>>> 
>>> 
>>> Sergey Beryozkin-5 wrote
>>>> 
>>>> "xxx/xxx" is a valid MediaType format so CXF parses it and lets through.
>>>> I think it really boils down to whether "xxx/xxx" can be treated as a
>>>> valid media type or not. We can add a check for "xxx/xxx" but then
>>>> someone will set "text/123" - that is acceptable enough but I guess not
>>>> something the tika server wants to accept :-)
>>>> 
>>> 
>>> True, good point. In fact, what we ought to do with both communities is
>>> to inject Tika in there :) and to use Tika for detection and then drive
>>> what we do in CXF from there (and thus in tika-server). Cross
>>> fertilization
>>> of projects!
>>> :)
>>> 
>> Sounds good :-)
>>> 
>>> Sergey Beryozkin-5 wrote
>>>> 
>>>>> I guess I can update UnpackagerResource to guard against this and throw
>>>>> HTTP
>>>>> 415 itself, but it seems a bit intrusive and I was really hoping for
>>>>> more
>>>>> of a seamless
>>>>> Jar/Maven drop in fix.
>>>> 
>>>> I think the proper fix is to explicitly list supported content-types,
>>>> using wildcards when possible, example image/*, or a/b+*, etc
>>>> 
>>> 
>>> One other question I had though in this situation. It seems like CXF
>>> breaks
>>> the incoming
>>> InputStream if I specify "xxx/xxx". For example, Paul and I have tested,
>>> specifying
>>> a *known* Accept type, with the same stream and the Content-type:
>>> application/msword,
>>> and we have seen the request go through via CXF and Tika parses the
>>> stream
>>> correctly
>>> using its AutoDetectParser. Now, if we change the Content-type to
>>> "xxx/xxx",
>>> then Tika
>>> and its AutoDetectParser get the stream, but it appears to have been
>>> changed. Is
>>> CXF changing the InputStream or defaulting to a different encoding or
>>> whatever? It seems
>>> like if CXF is going to let through "xxx/xxx" and then the stream gets to
>>> Tika, instead of
>>> 500 like we're getting in the test415 right now, we should be getting
>>> HTTP
>>> 200 OK at
>>> the very least, which we do get if we change the Content-type to
>>> something
>>> with the same
>>> encoding as the input stream, in which case it looks like CXF doesn't
>>> modify
>>> the stream.
>>> For example, we tried changing Content-type to application/json just
>>> to see
>>> what would
>>> happen on test415, and that returned a 204 error.
>>> 
>> 
>> It's a bug somewhere in the Tika Detector implementation. I did not
>> debug it but narrowed it down. Tika keeps the repository of well known
>> types and if the repository is not aware of the given media type then it
>> attempts to detect by peeking into the input stream.
>> 
>> I reproduced 500 again by providing "application/json+cloud" custom
>> media type with the composite subtype.
>> 
>> Cheers, Sergey
>> 
>>> Thoughts?
>>> 
>>> Cheers,
>>> Chris& Paul
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://cxf.547215.n5.nabble.com/TIKA-593-odd-behavior-related-to-CXF-JAX-RS-services-and-415-Http-response-codes-tp5600131p5600648.html
>>> 
>>> Sent from the cxf-user mailing list archive at Nabble.com.
>> 
>> 
> 
> 
> -- 
> Sergey Beryozkin
> 
> Talend Community Coders
> http://coders.talend.com/
> 
> Blog: http://sberyozkin.blogspot.com


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to