Is the CAS.size() larger than the serialized version or smaller? What are you actually doing to the CAS? Just serializing/deserializing a couple of times in a row, or do you actually add feature structures? The sample code you show doesn't give any hint about where the CAS comes from and what is being done with it.
-- Richard > On 12.01.2016, at 03:06, D. Heinze <dhei...@gnoetics.com> wrote: > > I'm having a problem with CAS serializationWithCompression. I am processing > a few million text document on an IBM P8 with 16 physical SMTP 8 cpus, 200GB > RAM, Ubuntu 14.04.3 LTS and IBM Java 1.8. > > I run 55 UIMA pipelines concurrently. I'm using UIMA 2.6.0. > > I use serializeWithCompression to save the final state of the processing on > each document to a file for later processing. > > However, the size of the serialized CAS just keeps growing. The size of the > CAS is stable, but the serialized CASes just keep getting bigger. I even > went to creating a new CAS for each process instead of using cas.reset(). I > have also tried writing the serialized CAS to a byte array output stream > first and then to a file, but it is the serializeWithCompression that caused > the size problem not writing the file. > > Here's what the code looks like. Flushing or not flushing does not make a > difference. Closing or not closing the file output strem does not make a > difference (other than leaking memory). I've also tried doing > serializeWithCompression with type filtering. Wanted to try using a Marker, > but cannot see how to do that. The problem exists regardless of doing 1 or > 55 pipelines concurrently. > > > > File fout = new File(documentPath); > > fos = new FileOutputStream(fout); > > org.apache.uima.cas.impl.Serialization.serializeWithCompression( > cas, fos); > > fos.flush(); > > fos.close(); > > logger.info( "serializedCas size " + cas.size() + " ToFile " + > documentPath); > > > > Suggestions will be appreciated. > > > > Thanks / Dan > > >