Re: Multipart values are not trimed

Sergey Beryozkin Tue, 06 Nov 2012 07:38:50 -0800

Hi

Hi,


To get a bigger picture let me explain what I would like to actually
craft :

In a multipart POST request, I'd like to have form params and a file
attachement (like the example above). And I would like to handle myself
the
inputstream of the file. In order do stuff like
    - checking some headers, for example Content-Length on one of the
Attachement, Content-Disposition etc
    - consuming the content of the given inputstream of this part to
store
it
in a temporary file

However in the MessageBodyReader, the entityStream looks like it's been
closed and already consumed. Debugging reveals that an
AttatchmentDeserializer already consumed the stream, and created an
Attachement collection, however my provider wasn't called at that time.
If
the opportunity is available I would like to copy these bytes to another
outputstream.

   The provider for TemporaryBinaryFile is called later, when individual

parts are deserialized.


   Is it possible or should I use attachments ? I'd like as much as
possible

avoid technical code in the resource, and have a reference to a
    TemporaryBinaryFile.


  You can use org.apache.cxf.jaxrs.ext.****multipart.Attachment instead

of

TemporaryBinaryFile, check Content-Type and Content-Disposition, and then
do 'attachment.getObject(****TemporaryBinaryFile.class)':


post(@Multipart("someid") Attachment attachment) {
     attachment.getContentType();
     attachment.****getContentDisposition();
     attachment.getObject(****TemporaryBinaryFile.class)

}

Actually, you can optimize it slightly by adding a 'type' parameter to
@Multipart(value = "someid", type = "text/plain")

Ok, thx for that :)
Do you think it will be possible to stream directly the content of the
attachment to another outputstream ? The attachment can have a large size
like 20 MB maybe more, I'd like to keep memory consumption as low as
possible.

  CXF will internally manage saving the stream to the temp folder if the

part is large.

You can do

attachment.getObject(**InputStream.class),

in which case you will have to deal with InputStream directly or you can
do it within your own TemporaryBinaryFile MBR when you do

attachment.getObject(**TemporaryBinaryFile.class)


Fantastic :)
I would have preferred to have a avoid dealing with technical code in
direct way, so I will probably keep a reference to the inputStream in a
renamed StreamableBinaryFile.

Is it possible to have the size of the attachment in a safer way than this
(if the Content-Length isn't present) ?

((AttachmentDataSource)
attachment.getDataHandler().getDataSource()).cache.size()

Note that the cache field would be accessed via reflexion.

I think the better option, assuming you'd like to enforce a certainlimit, is to use attachment-max-size property:


http://cxf.apache.org/docs/security.html#Security-Multiparts


I'm crafting a resource that should accept multipart POST request.

Here's the method :

==============================******==================
      @POST
      @Produces({MediaType.******APPLICATION_JSON})
      @Consumes(MediaType.MULTIPART_******FORM_DATA)


      public MetaData archive(@FormParam("title") String title,
                                      @FormParam("revision") String
revision,
                                      @Multipart("archive")
TemporaryBinaryFile
temporaryBinaryFile) {
==============================******==================



Also I tried with @Multipart instead of @FormParam

==============================******==================
      @POST
      @Produces({MediaType.******APPLICATION_JSON})
      @Consumes(MediaType.MULTIPART_******FORM_DATA)


      public DocumentMetaData archive(@Multipart(value = "title",
required =
false) @FormParam("title") String title,
                                      @Multipart(value = "revision",
required =
false) String revision,
                                      @Multipart("archive")
TemporaryBinaryFile
temporaryBinaryFile) {


  You have @FormParam and @Multipart attached to 'title', drop

@FormParam,
I
think it only works because 'title' is a simple parameter.



  Yes I wrongly copied/ modified the code in the mail, however I tested

both
setup separately.
Anyway, as you advised me I will inly use Multipart now.




     ==============================******==================

And here is the raw request :
==============================******==================
Address: 
http://localhost:8080/api/v1.******0/document/archive<http://localhost:8080/api/v1.****0/document/archive>
<http://**localhost:8080/api/v1.**0/**document/archive<http://localhost:8080/api/v1.**0/document/archive>

<http://**localhost:8080/api/**v1.0/**document/archive<http:/**
/localhost:8080/api/v1.0/**document/archive<http://localhost:8080/api/v1.0/document/archive>


  Encoding: ISO-8859-1

Http-Method: POST
Content-Type: multipart/form-data;boundary=******partie

Headers: {Accept=[*/*], accept-charset=[ISO-8859-1,**
utf-8;q=0.7,*;q=0.3],
accept-encoding=[gzip,deflate,******sdch], Content-Length=[301],
content-type=[multipart/form-******data;boundary=partie]}


Payload:
--partie
Content-Disposition: form-data; name="title"
Content-ID: title

the.title
--partie
Content-Disposition: form-data; name="revision"
Content-ID: revision

some.revision
--partie
Content-Disposition: form-data; name="archive"; filename="file.txt"
Content-Type: text/plain

I've got a woman, way over town...
--partie
==============================******==================



However the title and revision values are incorrect because they are
ended
by a new line char '\n'. Hence these parameters are not validated by
my
validator (which is using Message.getContent),

I don't think this is a normal behavior, but I might be wrong, maybe
about
the specs, or my request. Note that I had to add the Content-ID when
using
the Multipart annotation.


  What CXF version is it ? Content-Disposition 'name' is definitely

checked
too.

Also I found part of the code that should check the Content-Disposition,
however I have found that the first letter 'C' disappeared and the key
in
the attachment header is now 'ontent-Disposition' which can complicate
things further, and probably explains why, I needed a Content-ID header
in
each part. Although the first part got his header Content-Disposition
always correctly decoded. Adding another new line after the boundary
fixes
looks like a workaround though, but i'd rather not impose this on the
API
users :/

I couldn't figure out yet where the code could is consuming the
additional
char. I just know that at some point, the LazyAttachmentCollection has
the
remaining attachment (AttachmentImpl), and the first header is wrong.


  I think it is the bug of the code the posts the multipart, I recall

exactly the same issue reported when RESTClient was used

Isn't it this issue ? 
https://issues.apache.org/**jira/browse/CXF-2704<https://issues.apache.org/jira/browse/CXF-2704>


Looks like so, but I also do recall the same issue with RESTClient payloads

  About Content-Disposition name, it is checked only if there is no

Content-ID, however it seems at some point the default Content-ID is
added "
[email protected]", which defeats the purpose of the
following
code.

       private static boolean *matchAttachmentId(Attachment at, Multipart
mid,
MediaType multipartType)* {
           if (at.getContentId().equals(mid.****value())) {

               return true;
           }
           ContentDisposition cd = at.getContentDisposition();
           if (cd != null&&    mid.value().equals(cd.****
getParameter("name")))

{
               return true;
           }
           return false;
       }

   default Content-ID is added on the output, it is not added during the

read...

I'm not 100% sure how everything worked, but at some point the
MultipartProvider.readFrom is called from the
JAXRSUtils.**readFromMessageBodyReader, which will indirectly call the
above
code :

      public Object *readFrom*(Class<Object>   c, Type t, Annotation[] anns,

MediaType mt,
                             MultivaluedMap<String, String>   headers,
InputStream is) throws IOException, WebApplicationException {

// ...

          Multipart id = AnnotationUtils.getAnnotation(**anns,
Multipart.class);
          Attachment multipart = *AttachmentUtils.getMultipart(**c, id,
mt,
infos)*;

          if (multipart != null) {
              return fromAttachment(multipart, c, t, anns);
          } else if (id != null&&   !id.required()) {


// ...

      }



      public static Attachment getMultipart(Class<Object>   c,
                                            Multipart id,
                                            MediaType mt,
                                            List<Attachment>   infos) throws
IOException {

          if (id != null) {
              for (Attachment a : infos) {
                  if (*matchAttachmentId(a, id, mt)*) {

                      checkMediaTypes(a.**getContentType(), id.type());
                      return a;
                  }
              }
// ...
      }

I'm not sure of the implications, but it might be possible to fix this
with
the following code :

      private static boolean matchAttachmentId(Attachment at, Multipart
mid,
MediaType multipartType) {
          ContentDisposition cd = at.getContentDisposition();
          boolean matchContentDispositionName = cd != null&&
mid.value().equals(cd.**getParameter("name"));
          boolean matchContentId = at.getContentId().equals(mid.**
value());

          return matchContentId || matchContentDispositionName;
      }

What exactly you are proposing to fix though ?


Damn, forgive me I stayed too long at work yesterday night and missed
things, that affected my mail this morning as well it seems ! I was
mistaken by the fact that the fist letter of the first header in the second
and following attachment are missing, hence in my case Content-Disposition
isn't parsed by CXF.

Anyway the above code works correctly. ....shame on me !


Again thank very much, I owe you a beer or two !


No problems at all :-), thanks for stressing the code :-)

Cheers, Sergey


  Cheers
-- Brice



--
Sergey Beryozkin

Talend Community Coders
http://coders.talend.com/

Blog: http://sberyozkin.blogspot.com

Re: Multipart values are not trimed

Reply via email to