Re: [patch] Respect file's EOL/NOEOL settings

Roland Eggner Mon, 06 Jul 2015 17:29:14 -0700

Hi Ben

thank you for your detailed reply and friendly advices.  Sorry if my message 
has 
been misunderstood:  My intent was to add another view, which could eventually 
help Pavel and Bram, to save work time or to extend the scope of solved 
problems.

On 2015-06-24 Wednesday at 07:49 -0700 Ben Fritz wrote:
> On Wednesday, June 24, 2015 at 5:12:01 AM UTC-5, Roland Eggner wrote:
> > There are more cases, where vim tries to fix “errors” or “irregularities” 
> > of 
> > files and thereby damages them, unless “++binary” has been included in the 
> > reading command.  Just two examples:
> > 
> > (1)  viminfo files with register contents resulting from alternating 
> >      fileencondings, e.g. utf-8, latin1, latin9:  When viewing or editing 
> > such 
> >      viminfo files, including “++binary” in the reading command avoids data 
> >      damage.
> 
> First of all, .viminfo files are not really intended to be hand-edited, 
> although the format is simple enough that it's certainly possible most of the 
> time.

Maybe I am a “villain” because of calling vim via a wrapper script which edits 
.viminfo files.  This gives me a buffer list, which keeps entries until 
referred 
files disappear from filesystem or their atime is older than 3 weeks.  And it 
decreases the frequency of losing text marks `a - `z by deletion of jump lists.

> However, you're expecting something fairly unreasonable.  Vim has no way of 
> marking different regions of a file as having different encodings.  In fact I 
> am not aware of any text editors that DO allow this.  How does the editor 
> know what encoding to apply to any new text?  How are the regions delimited, 
> especially if the delimiter could have different representations in different 
> encodings?  In the case that you have multiple encodings in a file, the file 
> really and truly *IS* a binary file.

My intent was exactly this conclusion to be drawn by readers.

If line specific encodings could be implemented properly, it would be of little 
use in the case of my example (2):  the patch files would appear redundant.

Bram added this probably related entry to the todo list more than 5 years ago:
> When a register contains illegal bytes, writing viminfo in utf-8 and reading
> it back doesn't result in utf-8. (Devin Bayer)

“:help viminfo-encoding” appears to give a hint in the last sentence, that 
different encodings used for different lines under certain circumstances might 
be intentional:
> …
>         :set viminfo+=c
> Vim will then attempt to convert the text in the viminfo file from the
> 'encoding' value it was written with to the current 'encoding' value.
> This requires Vim to be compiled with the |+iconv| feature.  Filenames
> are not converted.

> But why do you have multiple encodings in the file?  The encoding of text in 
> the _viminfo file should only depend on the 'encoding' option of Vim, it 
> should not depend on the fileencoding option of the various files.  Are you 
> setting 'encoding' differently as you open files in different files?  You 
> should not be doing that...you should keep 'encoding' set to utf-8 and change 
> 'fileencoding' as needed.

Yes, in theory it should, but praxis differs, despite I am doing for many years 
exactly what you recommend.  Detailed bug reports regarding this topic must 
wait, until I find more spare time.

> > How can I specify the binary attribute, when vim tries to restore this 
> >      register contents in a later session?  vim-7.4 appears to ignore the 
> >      line “*encoding=utf-8“ on reading of viminfo files.
> 
> Vim does not, by default, detect any encoding from any text in the file, 
> except for reading a BOM to detect certain Unicode encodings.

If you distinguish the  _guessing_  of encodings performed in 
src/fileio.c:readfile() from  _detection_  of declared encodings:  fully agreed.

> For that, you need a plugin.  I am fond of Autofenc:  
> http://www.vim.org/scripts/script.php?script_id=2721

Thank you for this reference.  I will check it when I update my vim 
installations the next time (not in the near future).

> > (2)  The patch file resulting from the diff between old and new files after 
> >      a command similar to “iconv -f ISO-8859-1 -t utf-8 …” usually needs to 
> > be 
> >      treated as binary.  vim damages such patch files on writing, unless 
> > the 
> >      reading command includes “++binary”.
> > 
> 
> See above.  Such a file *is* a binary file, it cannot be anything else.

My intent was exactly this conclusion to be drawn by readers.

> If you forget to read it with ++binary in the first place, you can always ":e 
> ++binary" after loading.
> 
> > A concept which can be reused  _consistently_  for the solution of all 
> > problems 
> > of this class probably can save a lot of future work time.
> > 
> 
> I disagree that these problems are in the same class at all, …

Two points of disagreeing (here and in the last but one paragraph below) 
besides 
agreeing in every other point maybe we can carry?

> … but the option for preventing such issues already exists: ":e ++binary".  
> The binary option tells Vim not to mess with any of the bytes in the file.  
> That's what you want, right?  Is something wrong with this option?

The option is ok for me.  Just vim should use it  _automatically_  whenever an 
invalid utf-8 sequence occurs in a file being read.  This would protect the 
user 
from data loss much better than the heuristic “if not valid utf-8 then latin1” 
resulting from the default value of option “fileencodings”.

Conversion from any commonly used multibyte encoding to any extended latin 
encoding is always a partial data loss,  _even_  if all used characters 
actually 
have codepoints in the latin encoding.  For this reason e.g. the “recode” 
utility in such cases warns “ambiguous output” and refuses to perform the 
conversion, unless option “--force” is used.  Similarly, and in line with its 
many other efforts to protect users from data loss, vim should abort and give 
the following warning, when the execution of a command would involve the 
conversion from a multibyte encoding to an extended latin encoding:
  “Encoding conversion from … to latinX would cause data loss.  Aborted.
  If you are absolutely sure, add option "I-want-to-loose-some-data" and retry.“
When I have been a lesser experienced user many years ago, this would have 
helped me much more, than just the mention of data loss in “:help 
'fileencodings'”.

> > Transparent decompression and compression of *.gz *.bz2 … files is 
> > implemented 
> > with certain autocommands.  This autocommands require much less lines of 
> > code 
> > than the patch proposed by Pavel.  Why not solving the “missing trailing 
> > EOL” 
> > problem with similar autocommands?  Less lines of code means less time to 
> > wait 
> > until Bram can merge this and other patches.
> > 
> 
> The autocmds to prevent messing with the EOL are here:  
> http://vim.wikia.com/wiki/Preserve_missing_end-of-line_at_end_of_text_files
> 
> > Are the more lines of code of Pavels patch outweight by better reusability, 
> > compared to autocommands?
> > 
> 
> ":set respecteol" is much easier for the end user than installing a plugin or 
> writing a few dozen lines of vimscript to work around the editor's lack of an 
> option to turn off an undesired behavior.

Working with *.csproj files is a topic for software engineers, not for “end 
users”.  The missing EOL problem is and will remain rare, because rareness is 
the precondition to benefit from breaking a standard.  Bacteria learned this 
several billions of years ago.  Microsoft apparently just reuses this knowledge 
for its business policy.

>  … And probably less fragile as well, since it will be baked into the editor 
> code.  And autocmds can be bypassed or interfere with each other.  And see 
> the discussion on that link about weird behavior with respect to undo for the 
> autocmds...that's not very comforting that we never figured that out.

“… less fragile …” to me seems a common place rather than an argument:  Driving 
faster  _always_  increases risk and requires additional attention.  You can 
use 
seat belts or go by foot, as you prefer.  In software engineering this is just 
similar.  And the undo discussion does not matter:  after an  _automatically_  
added EOL there is no desire for an undo.

If he does not like autocommands for this task, maybe my second alternative 
solution proposed in my reply to Pavel in this thread can provide him 
a satisfying solution with less long term effort.

-- 
Best regards,
Roland Eggner

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

pgpwH_OobrPWU.pgp
Description: PGP signature

Re: [patch] Respect file's EOL/NOEOL settings

Raspunde prin e-mail lui