https://bugs.freedesktop.org/show_bug.cgi?id=70423

          Priority: medium
            Bug ID: 70423
          Assignee: libreoffice-bugs@lists.freedesktop.org
           Summary: FILEOPEN: Unexpected Addition Of Windows Line Breaks
                    to LinuxText File
          Severity: normal
    Classification: Unclassified
                OS: Windows (All)
          Reporter: john876...@gmail.com
          Hardware: Other
        Whiteboard: BSA
            Status: UNCONFIRMED
           Version: 4.1.2.3 rc
         Component: Writer
           Product: LibreOffice

Problem description: 

When loading a large Linux Test file (where end of line characters are
represented in hexidecimal by 0a)into the Windows version of Writer, Writer
will spontaneously add a windows end of line character (represented in
hexidecimal by 0d 0a) approximately every 9900 characters in the text.  It is
"approximately" every 9900 characters because Writer seems to purposely put an
end of line character at the first sign of white space after the 9900 character
mark.  Results in the file I was working with yielded the extra character at
character 9904, then again 9905 characters later, then again 9901, 9900, 9905,
9900, and 9903 characters later.  (I stopped counting at this point.) 
Re-saving the file as a text file will save these extra bytes into the saved
text file and a binary compare will reveal this.

This means that lines of text are broken up weirdly in the middle of sentences.
 There is no conversion of any kind between the Linux end-of-file and Windows
end-of-file.  Merely extra characters are added.

If I were to convert all Linux end-of-line characters into windows end-of-line
characters BEFORE loading the text file into Writer, Writer does not appear to
alter the file unexpectedly.

Steps to reproduce:
1. Get a sufficiently large text file with no windows end-of-line characters
(hexidecimal representation 0d 0a).  A few hundred kilobytes should do,
although I first noticed it on an 81 MB file.
2. Copy the text file so you have an original "Linux copy" and a "Windows copy"
that you can play with in Writer.  (Eventually, you will perform a binary
compare against the two.)
3. Load the Windows copy of the file into the Windows version of Writer.  **The
changes are visible at this point if you know where to look in the text file.**
 I will continue on so that you can easily see where the changes are made. 
(Side Note: I used the portable version of Writer.  I did not test the Linux
version.)
4. Re-save the file.  You'll probably have to "alter the file" by deleting one
character and then typing that exact character back in.  (Do not use "undo".) 
Save the file as a text file.
5. Do a binary compare between the Linux file and Windows file to find the
exact places where Writer has altered the file.  Even a text compare should
yield the problem spots.  I used Frhed for looking at the file in binary and
WinMerge for my file compares.  Frhed helped me figure out that the error was
occurring approximately every 9900 characters / bytes.
6. Note how the extra characters occur at the white spaces and not necessarily
next to the Linux end-of-line characters.

Current behavior:

Extra end-of-line characters are added to the file next to white spaces
approximately every 9900 characters / bytes.  This can even be seen after the
Linux text file is loaded for the first time but before the file is re-saved.

Expected behavior:

No addition of end-of-line characters even in a file as large as 81 MB.  One
possible option is to convert all Linux end-of-file characters into Windows
end-of-file characters.  This would require an option so the user can decide
how to output end-of-file characters during the save process.

Special Note 1: This bug was reproduced not only with the 81 MB file, but a 250
kB file as well.

Special Note 2: Although unnecessary to reproduce the bug, the 81 MB file I was
using came from here: http://www.imdb.com/interfaces .  Click one of the FTP
sites under Plain Text Data Files.  I used the trivia.list file.
Operating System: Windows 8
Version: 4.1.2.3 rc

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

Reply via email to