Is the CAS.size() larger than the serialized version or smaller?
What are you actually doing to the CAS? Just serializing/deserializing
a couple of times in a row, or do you actually add feature structures?
The sample code you show doesn't give any hint about where the CAS comes
from and what is being done with it.

-- Richard

> On 12.01.2016, at 03:06, D. Heinze <> wrote:
> I'm having a problem with CAS serializationWithCompression.  I am processing
> a few million text document on an IBM P8 with 16 physical SMTP 8 cpus, 200GB
> RAM, Ubuntu 14.04.3 LTS and IBM Java 1.8.
> I run 55 UIMA pipelines concurrently.  I'm using UIMA 2.6.0.
> I use serializeWithCompression to save the final state of the processing on
> each document to a file for later processing.
> However, the size of the serialized CAS just keeps growing.  The size of the
> CAS is stable, but the serialized CASes just keep getting bigger. I even
> went to creating a new CAS for each process instead of using cas.reset().  I
> have also tried writing the serialized CAS to a byte array output stream
> first and then to a file, but it is the serializeWithCompression that caused
> the size problem not writing the file.
> Here's what the code looks like.  Flushing or not flushing does not make a
> difference.  Closing or not closing the file output strem does not make a
> difference (other than leaking memory).  I've also tried doing
> serializeWithCompression with type filtering.  Wanted to try using a Marker,
> but cannot see how to do that.  The problem exists regardless of doing 1 or
> 55 pipelines concurrently.
>        File fout = new File(documentPath);
>        fos = new FileOutputStream(fout);
>        org.apache.uima.cas.impl.Serialization.serializeWithCompression(
> cas, fos);
>        fos.flush();
>        fos.close();
> "serializedCas size " + cas.size() + " ToFile " +
> documentPath);
> Suggestions will be appreciated.
> Thanks / Dan

Reply via email to