>> The extracted text produced by the new .NET version (build from revision >> 979772) is very slightly inferior to that of 1.2.1. > > Hmm, are you using the 1.2.1 version + the trunk version of the build.xml > or do you use the everything from the trunk including the last changes I > made for improving text extraction?
a. Complete trunk revision 979772 built with build.NET target b. 1.2.1 binary download converted with ikvmc (i. e., no mix between different revisions) > Can you be more specific about the differences? There are some useful line breaks in the older version. An example: BMC Medical Genetic 2010, 11:55References vs. BMC Medical Genetic 2010, 11:55 References >> The extracted text produced by Java and .NET (both 1.2.1) slightly differs. > > That's strange, perhaps an encoding issue ... Maybe, though I explicitly set the output to UTF-8 in bot Java and .NET. I just compared the output again, and it looks like it has got something to do with line breaks.

