Hi all, I have the task to convert the repository of one of our main products from mercurial to git. I've already done some test conversions using fast-export (https://github.com/frej/fast-export), and the results look very good.
After running the conversion, I'll also remove some big binary files which should never have been committed there using the BFG Repo Cleaner (https://rtyley.github.io/bfg-repo-cleaner/). I've also tested that, and it seems to work as intended. What should also be done in this conversion process is to change the file encoding of all source code (java) files from ISO-8859-1/15 to UTF-8, and here I'm asking for advice. Right now, I have a script which would guess the encoding of each file using `file --mime-encoding $file` and convert it from the guessed encoding to UTF-8 using `iconv`. I'd run it on every active branch and commit the results individually after the general hg to git conversion. Well, the "on every active branch" is the problematic point. It's time-consuming manual work with chances to shoot yourself in the foot. Additionally, developers might introduce encoding problems again when switching between converted and non-converted branches because the IDE defines the standard encoding per project (root directory) and not per branch... Long story short: can I somehow manage to do the ISO-8859 to UTF-8 conversion in the process of converting from hg to git so that the end result looks like the project has used UTF-8 straight from the beginning? Sadly, it's not the case that every java file has always used ISO-8859. Some files have been switched between ISO-8859 and UTF-8 several times due to broken/missing editor configurations. Thereby, encoding errors have been introduced which resulted, e.g., in files containing the Unicode Replacement Character or what you get when you save that again as ISO-8859. Since these errors are only in comments and string literals, they usually didn't pop up immediately because the project still compiled... Bonus question: I guess I can somehow configure our Git server (we use a GitLab instance if that matters) to reject pushes containing non-UTF-8 changes to java files. How would I do that? And in case I need to do the "convert each active branch to UTF-8 using some extra commit" approach, is there a way to exclude a list of legacy branches from this rule? Thanks a lot in advance, Tassilo -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.