Hi,

we received a heap dump from our client, where there is a thread holding 260GB of memory while trying to activating some seemingly large content:
at java.io.FileInputStream.readBytes([BII)I (Native Method)
  at java.io.FileInputStream.read([B)I (FileInputStream.java:177)
  at org.apache.commons.io.IOUtils.copyLarge(Ljava/io/InputStream;Ljava/io/OutputStream;)J (IOUtils.java:1025)
  at org.apache.commons.io.IOUtils.copy(Ljava/io/InputStream;Ljava/io/OutputStream;)I (IOUtils.java:999)
  at info.magnolia.module.exchangesimple.Transporter.transport(Ljava/net/HttpURLConnection;Linfo/magnolia/module/exchangesimple/ActivationContent;)V (Transporter.java:134)
  at info.magnolia.module.exchangesimple.SimpleSyndicator.activate(Linfo/magnolia/cms/exchange/Subscriber;Linfo/magnolia/module/exchangesimple/ActivationContent;)Ljava/lang/String; (SimpleSyndicator.java:173)
  at info.magnolia.module.exchangesimple.SimpleSyndicator$2.run()V (SimpleSyndicator.java:120)
  at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run()V (Unknown Source)
  at java.lang.Thread.run()V (Thread.java:662)
Accumulated Objects
Class Name Shallow Heap Retained Heap Percentage
176 268.476.352 33,87%
\ 40 268.435.520 33,87%
.\ 268.435.480 268.435.480 33,87%

It seems that PosterOutputStream, being a ByteArrayOutputStream by inheritance, will buffer the whole activation request in memory. In addition, by looking at the Magnolia code, it seems that for a single activation, this will happen once for every subscriber. So that's going to put a lot of load on the GC when a single large binary is activated, or even worse, when several users are simultaneously activating large binaries.

There has been a similar discussion here earlier: http://www.mail-archive.com/user-l...@magnolia-cms.com/msg01809.html, where Jan was wondering why a ByteArrayOutputStream was used by java.net.Connection, resulting in the OOME for large activations. The problem could be avoided if "chunked transfer coding" was used during the activation requests from author to public servers.

I think it would be really great if chunking was used reliably, since currently system stability will become at danger with large binaries. So I started digging a bit.

There is a method java.net.HttpURLConnection.setChunkedStreamingMode(int) to enable chunking, and to me it seems that this needs explicit invocation in order to get chunking to happen, i.e. I don't think that chunking can be enabled by means of some configuration. The method's javadoc says that the request could fail if the server does not support chunking. RFC 2616, on the other hand, says that 'All HTTP/1.1 applications MUST be able to receive and decode the "chunked" transfer-coding'.

So if the Magnolia code would always invoke HttpURLConnection.setChunkedStreamingMode(int), this would only add the requirement of a HTTP/1.1 capable server being used for the public instances. I'd think that this shouldn't be a big problem? Alternatively, there could be a fallback to non-chunked mode in case of failure.

What do you think of this improvement?

Regards,
Jörg


--
Dipl. inf. Jörg von Frantzius, System Architect
Email mailto:joerg.frantz...@aperto.de
Phone +49 30 283921-318
Fax +49 30 283921-29
Aperto AG - In der Pianofabrik
Chausseestraße 5, D-10115 Berlin-Mitte
Web http://www.aperto.de
HRB 77049, AG Berlin Charlottenburg
Vorstand: Dirk Buddensiek



----------------------------------------------------------------
For list details see
http://www.magnolia-cms.com/home/community/mailing-lists.html
To unsubscribe, E-mail to: <dev-list-unsubscr...@magnolia-cms.com>
----------------------------------------------------------------

Reply via email to