>> The extracted text produced by the new .NET version (build from revision
>> 979772) is very slightly inferior to that of 1.2.1.
>
> Hmm, are you using the 1.2.1 version + the trunk version of the build.xml
> or do you use the everything from the trunk including the last changes I
> made for improving text extraction?

a. Complete trunk revision 979772 built with build.NET target
b. 1.2.1 binary download converted with ikvmc

(i. e., no mix between different revisions)

> Can you be more specific about the differences?

There are some useful line breaks in the older version. An example:

BMC Medical Genetic  2010, 11:55References

vs.

BMC Medical Genetic  2010, 11:55
References


>> The extracted text produced by Java and .NET (both 1.2.1) slightly
differs.
>
> That's strange, perhaps an encoding issue ...

Maybe, though I explicitly set the output to UTF-8 in bot Java and .NET. I
just compared the output again, and it looks like it has got something to
do with line breaks.

Reply via email to