Re: Import/Export as a fast way to purge files from Git?

2018-10-31 Thread Lars Schneider
> On Sep 24, 2018, at 7:24 PM, Elijah Newren wrote: > > On Sun, Sep 23, 2018 at 6:08 AM Lars Schneider > wrote: >> >> Hi, >> >> I recently had to purge files from large Git repos (many files, many >> commits). >> The usual recommenda

Re: Import/Export as a fast way to purge files from Git?

2018-09-23 Thread Lars Schneider
> On Sep 23, 2018, at 4:55 PM, Eric Sunshine wrote: > > On Sun, Sep 23, 2018 at 9:05 AM Lars Schneider > wrote: >> I recently had to purge files from large Git repos (many files, many >> commits). >> The usual recommendation is to use `git filter-branc

Import/Export as a fast way to purge files from Git?

2018-09-23 Thread Lars Schneider
Hi, I recently had to purge files from large Git repos (many files, many commits). The usual recommendation is to use `git filter-branch --index-filter` to purge files. However, this is *very* slow for large repos (e.g. it takes 45min to remove the `builtin` directory from git core). I realized

Re: Find commit that referenced a blob first

2018-07-19 Thread Lars Schneider
> On Jul 19, 2018, at 11:19 PM, Stefan Beller wrote: > > On Thu, Jul 19, 2018 at 2:02 PM Lars Schneider > wrote: >> >> Hi, >> >> I have a blob hash and I would like to know what commit referenced >> this blob first in a given Git repo. > &g

Find commit that referenced a blob first

2018-07-19 Thread Lars Schneider
Hi, I have a blob hash and I would like to know what commit referenced this blob first in a given Git repo. I could iterate through all commits sorted by date (or generation number) and then recursively search in the referenced trees until I find my blob. I wonder, is this the most efficient

Re: [PATCH v1 2/2] convert: add alias support for 'working-tree-encoding' attributes

2018-07-08 Thread Lars Schneider
> On Jul 8, 2018, at 8:30 PM, larsxschnei...@gmail.com wrote: > > From: Lars Schneider > > In 107642fe26 ("convert: add 'working-tree-encoding' attribute", > 2018-04-15) we added an attribute which defines the working tree > encoding of a file. > >

Re: [PATCH v1 1/2] convert: refactor conversion driver config parsing

2018-07-08 Thread Lars Schneider
> On Jul 8, 2018, at 8:30 PM, larsxschnei...@gmail.com wrote: > > From: Lars Schneider > > Refactor conversion driver config parsing to ease the parsing of new > configs in a subsequent patch. > > No functional change intended. > > Signed-off-by: Lars Sch

Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-07-02 Thread Lars Schneider
> -----Lars Schneider wrote: - > To: Jeff King > From: Lars Schneider > Date: 06/28/2018 18:21 > Cc: "brian m. carlson" , Steve Groeger > , git@vger.kernel.org > Subject: Re: Use of new .gitattributes working-tree-encoding attribute across > different

Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-06-28 Thread Lars Schneider
> On Jun 28, 2018, at 4:34 PM, Jeff King wrote: > > On Thu, Jun 28, 2018 at 02:44:47AM +, brian m. carlson wrote: > >> On Wed, Jun 27, 2018 at 07:54:52AM +, Steve Groeger wrote: >>> We have common code that is supposed to be usable across different >>> platforms and hence different

Re: [RFC PATCH v1] http: add http.keepRejectedCredentials config

2018-06-07 Thread Lars Schneider
> On 04 Jun 2018, at 11:55, Jeff King wrote: > > On Mon, Jun 04, 2018 at 12:18:59PM -0400, Martin-Louis Bright wrote: > >> Why must the credentials must be deleted after receiving the 401 (or >> any) error? What's the rationale for this? > > Because Git only tries a single credential per

Re: [ANNOUNCE] Git v2.18.0-rc1

2018-06-04 Thread Lars Schneider
> On 04 Jun 2018, at 06:53, Junio C Hamano wrote: > > A release candidate Git v2.18.0-rc1 is now available for testing > at the usual places. It is comprised of 842 non-merge commits > since v2.17.0, contributed by 65 people, 20 of which are new faces. > > ... > > * The new

[RFC PATCH v1] http: add http.keepRejectedCredentials config

2018-06-04 Thread lars . schneider
From: Lars Schneider If a Git HTTP server responds with 401 or 407, then Git tells the credential helper to reject and delete the credentials. In general this is good. However, in certain automation environments it is not desired to remove credentials automatically. This is in particular

Re: worktrees vs. alternates

2018-05-16 Thread Lars Schneider
> On 16 May 2018, at 11:29, Ævar Arnfjörð Bjarmason <ava...@gmail.com> wrote: > > > On Wed, May 16 2018, Lars Schneider wrote: > >> I am looking into different options to cache Git repositories on build >> machines. The two most promising ways seem to be git-w

worktrees vs. alternates

2018-05-16 Thread Lars Schneider
Hi, I am looking into different options to cache Git repositories on build machines. The two most promising ways seem to be git-worktree [1] and git-alternates [2]. I wonder if you see an advantage of one over the other? My impression is that git-worktree supersedes git-alternates. Would that

Re: Optimizing writes to unchanged files during merges?

2018-04-17 Thread Lars Schneider
> On 16 Apr 2018, at 19:45, Jacob Keller <jacob.kel...@gmail.com> wrote: > > On Mon, Apr 16, 2018 at 10:43 AM, Jacob Keller <jacob.kel...@gmail.com> wrote: >> On Mon, Apr 16, 2018 at 9:07 AM, Lars Schneider >> <larsxschnei...@gmail.com> wrote: >>>

Re: Optimizing writes to unchanged files during merges?

2018-04-17 Thread Lars Schneider
> On 16 Apr 2018, at 19:04, Ævar Arnfjörð Bjarmason <ava...@gmail.com> wrote: > > > On Mon, Apr 16 2018, Lars Schneider wrote: > >>> On 16 Apr 2018, at 04:03, Linus Torvalds <torva...@linux-foundation.org> >>> wrote: >>> >>> On

Re: Optimizing writes to unchanged files during merges?

2018-04-16 Thread Lars Schneider
> On 16 Apr 2018, at 04:03, Linus Torvalds > wrote: > > On Sun, Apr 15, 2018 at 6:44 PM, Junio C Hamano wrote: >> >> I think Elijah's corrected was_tracked() also does not care "has >> this been renamed". > > I'm perfectly happy with the

[PATCH v13 10/10] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-04-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> UTF supports lossless conversion round tripping and conversions between UTF and other encodings are mostly round trip safe as Unicode aims to be a superset of all other character encodings. However, certain encodings (e.g. SHIFT-JIS) are

[PATCH v13 09/10] convert: add tracing for 'working-tree-encoding' attribute

2018-04-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable tracing for content that is reencoded with the 'working-tree-encoding' attribute. This is useful to debug encoding issues. Signed-off-by: Lars Schneider <la

[PATCH v13 07/10] convert: add 'working-tree-encoding' attribute

2018-04-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Git recognizes files encoded with ASCII or one of its supersets (e.g. UTF-8 or ISO-8859-1) as text files. All other encodings are usually interpreted as binary and consequently built-in Git text processing tools (e.g. 'git diff') as well as mo

[PATCH v13 06/10] utf8: add function to detect a missing UTF-16/32 BOM

2018-04-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> If the endianness is not defined in the encoding name, then let's be strict and require a BOM to avoid any encoding confusion. The is_missing_required_utf_bom() function returns true if a required BOM is missing. The Unicode standard ins

[PATCH v13 03/10] strbuf: add a case insensitive starts_with()

2018-04-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Check in a case insensitive manner if one string is a prefix of another string. This function is used in a subsequent commit. Signed-off-by: Lars Schneider <larsxschnei...@gmail.com> --- git-compat-util.h | 1 + strbuf.c | 9

[PATCH v13 04/10] utf8: teach same_encoding() alternative UTF encoding names

2018-04-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> The function same_encoding() could only recognize alternative names for UTF-8 encodings. Teach it to recognize all kinds of alternative UTF encoding names (e.g. utf16). While we are at it, fix a crash that would occur if same_encoding() was

[PATCH v13 01/10] strbuf: remove unnecessary NUL assignment in xstrdup_tolower()

2018-04-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Since 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we allocate the buffer for the lower case string with xmallocz(). This already ensures a NUL at the end of the allocated buffer. Remove the unnecessary assignment. Sign

[PATCH v13 05/10] utf8: add function to detect prohibited UTF-16/32 BOM

2018-04-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used [1]. The function returns true if this is the case. This function is used in a subsequent commit. [1] http://unicode.org/faq/utf_bo

[PATCH v13 00/10] convert: add support for different encodings

2018-04-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Hi, Patches 1-6,9 are preparation and helper functions. Patch 7,8,10 are the actual change. This series is based on v2.16.0 and Torsten's 8462ff43e4 (convert_to_git(): safe_crlf/checksafe becomes int conv_flags, 2018-01-13). The seri

[PATCH v13 02/10] strbuf: add xstrdup_toupper()

2018-04-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Create a copy of an existing string and make all characters upper case. Similar xstrdup_tolower(). This function is used in a subsequent commit. Signed-off-by: Lars Schneider <larsxschnei...@gmail.com> --- strbuf.c | 12 ++

[PATCH v13 08/10] convert: check for detectable errors in UTF encodings

2018-04-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Check that new content is valid with respect to the user defined 'working-tree-encoding' attribute. Signed-off-by: Lars Schneider <larsxschnei...@gmail.com> --- convert.c| 61

Re: [PATCH v11 06/10] convert: add 'working-tree-encoding' attribute

2018-04-15 Thread Lars Schneider
> On 05 Apr 2018, at 18:41, Torsten Bögershausen <tbo...@web.de> wrote: > > On 01.04.18 15:24, Lars Schneider wrote: >>> TRUE or false are values, but just wrong ones. >>> If this test is removed, the user will see "failed to encode "TRUE&quo

Re: [PATCH v2 1/5] core.aheadbehind: add new config setting

2018-04-03 Thread Lars Schneider
> On 04 Jan 2018, at 20:26, Jeff King wrote: > > On Wed, Dec 27, 2017 at 09:41:30AM -0800, Junio C Hamano wrote: > >> Jeff King writes: >> >>> I, too, had a funny feeling about calling this "core". But I didn't have >>> a better name, as I'm not sure what other

Re: [PATCH v12 00/10] convert: add support for different encodings

2018-04-03 Thread Lars Schneider
> On 02 Apr 2018, at 20:31, Lars Schneider <larsxschnei...@gmail.com> wrote: > > >> On 29 Mar 2018, at 20:37, Junio C Hamano <gits...@pobox.com> wrote: >> >> lars.schnei...@autodesk.com writes: >> >>> From: Lars Schneider <larsxsc

Re: [PATCH v12 00/10] convert: add support for different encodings

2018-04-02 Thread Lars Schneider
> On 29 Mar 2018, at 20:37, Junio C Hamano <gits...@pobox.com> wrote: > > lars.schnei...@autodesk.com writes: > >> From: Lars Schneider <larsxschnei...@gmail.com> >> >> Patches 1-6,9 are preparation and helper functions. Patch 4 is new. >> Patch

Re: [GSoC] [PATCH] travis-ci: added clang static analysis

2018-04-01 Thread Lars Schneider
> On 13 Mar 2018, at 18:45, Siddhartha Mishra <sidm1...@gmail.com> wrote: > > On Mon, Mar 12, 2018 at 3:49 PM, Lars Schneider > <larsxschnei...@gmail.com> wrote: >> Hi, >> >> That looks interesting but I agree with Dscho that we should not limit >&g

Re: [PATCH v12 04/10] utf8: teach same_encoding() alternative UTF encoding names

2018-04-01 Thread Lars Schneider
> On 16 Mar 2018, at 19:19, Eric Sunshine wrote: > > On Fri, Mar 16, 2018 at 1:50 PM, Junio C Hamano wrote: >> Eric Sunshine writes: >>> However, I'm having a tough time imagining cases in which callers >>> would want

Re: [PATCH v11 06/10] convert: add 'working-tree-encoding' attribute

2018-04-01 Thread Lars Schneider
> On 18 Mar 2018, at 08:24, Torsten Bögershausen <tbo...@web.de> wrote: > > Some comments inline > > On Fri, Mar 09, 2018 at 06:35:32PM +0100, lars.schnei...@autodesk.com wrote: >> From: Lars Schneider <larsxschnei...@gmail.com> >> >> Git reco

Re: What's cooking in git.git (Mar 2018, #05; Wed, 28)

2018-04-01 Thread Lars Schneider
> On 30 Mar 2018, at 12:32, Lars Schneider <larsxschnei...@gmail.com> wrote: > > >> On 30 Mar 2018, at 11:24, Ævar Arnfjörð Bjarmason <ava...@gmail.com> wrote: >> >> >> On Wed, Mar 28 2018, Junio C. Hamano wrote: >> >>> * ls/checko

Re: What's cooking in git.git (Mar 2018, #05; Wed, 28)

2018-03-30 Thread Lars Schneider
> On 30 Mar 2018, at 11:24, Ævar Arnfjörð Bjarmason wrote: > > > On Wed, Mar 28 2018, Junio C. Hamano wrote: > >> * ls/checkout-encoding (2018-03-16) 10 commits >> - convert: add round trip check based on 'core.checkRoundtripEncoding' >> - convert: add tracing for

Re: [PATCH v2] travis-ci: enable more warnings on travis linux-gcc job

2018-03-17 Thread Lars Schneider
> On 17 Mar 2018, at 09:01, Duy Nguyen wrote: > > On Fri, Mar 16, 2018 at 10:22 PM, Jeff King wrote: >>> diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh >>> index 3735ce413f..f6f346c468 100755 >>> --- a/ci/run-build-and-tests.sh >>> +++

Re: [PATCH v6 00/14] Serialized Git Commit Graph

2018-03-16 Thread Lars Schneider
> On 14 Mar 2018, at 21:43, Junio C Hamano wrote: > > Derrick Stolee writes: > >> This v6 includes feedback around csum-file.c and the rename of hashclose() >> to finalize_hashfile(). These are the first two commits of the series, so >> they could be

Re: [PATCH v12 04/10] utf8: teach same_encoding() alternative UTF encoding names

2018-03-15 Thread Lars Schneider
kinds of alternative UTF encoding >> names. >> >> Signed-off-by: Lars Schneider <larsxschnei...@gmail.com> >> --- >> diff --git a/utf8.c b/utf8.c >> @@ -401,11 +401,27 @@ void strbuf_utf8_replace(struct strbuf *sb_src, int >>

Re: What's cooking in git.git (Mar 2018, #03; Wed, 14)

2018-03-15 Thread Lars Schneider
> On 15 Mar 2018, at 20:18, Lars Schneider <larsxschnei...@gmail.com> wrote: > > >> On 15 Mar 2018, at 02:34, Junio C Hamano <gits...@pobox.com> wrote: >> >> ... >> >> * ls/checkout-encoding (2018-03-09) 10 commits >> - convert: a

[PATCH v12 08/10] convert: check for detectable errors in UTF encodings

2018-03-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Check that new content is valid with respect to the user defined 'working-tree-encoding' attribute. Signed-off-by: Lars Schneider <larsxschnei...@gmail.com> --- convert.c| 61

[PATCH v12 09/10] convert: add tracing for 'working-tree-encoding' attribute

2018-03-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable tracing for content that is reencoded with the 'working-tree-encoding' attribute. This is useful to debug encoding issues. Signed-off-by: Lars Schneider <la

[PATCH v12 10/10] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> UTF supports lossless conversion round tripping and conversions between UTF and other encodings are mostly round trip safe as Unicode aims to be a superset of all other character encodings. However, certain encodings (e.g. SHIFT-JIS) are

[PATCH v12 04/10] utf8: teach same_encoding() alternative UTF encoding names

2018-03-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> The function same_encoding() checked only for alternative UTF-8 encoding names. Teach it to check for all kinds of alternative UTF encoding names. This function is used in a subsequent commit. Signed-off-by: Lars Schneider <la

[PATCH v12 07/10] convert: add 'working-tree-encoding' attribute

2018-03-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Git recognizes files encoded with ASCII or one of its supersets (e.g. UTF-8 or ISO-8859-1) as text files. All other encodings are usually interpreted as binary and consequently built-in Git text processing tools (e.g. 'git diff') as well as mo

[PATCH v12 06/10] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> If the endianness is not defined in the encoding name, then let's be strict and require a BOM to avoid any encoding confusion. The is_missing_required_utf_bom() function returns true if a required BOM is missing. The Unicode standard ins

[PATCH v12 02/10] strbuf: add xstrdup_toupper()

2018-03-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Create a copy of an existing string and make all characters upper case. Similar xstrdup_tolower(). This function is used in a subsequent commit. Signed-off-by: Lars Schneider <larsxschnei...@gmail.com> --- strbuf.c | 12 ++

[PATCH v12 00/10] convert: add support for different encodings

2018-03-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Hi, Patches 1-6,9 are preparation and helper functions. Patch 4 is new. Patch 7,8,10 are the actual change. This series depends on Torsten's 8462ff43e4 (convert_to_git(): safe_crlf/checksafe becomes int conv_flags, 2018-01-13) which is a

[PATCH v12 05/10] utf8: add function to detect prohibited UTF-16/32 BOM

2018-03-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used [1]. The function returns true if this is the case. This function is used in a subsequent commit. [1] http://unicode.org/faq/utf_bo

[PATCH v12 01/10] strbuf: remove unnecessary NUL assignment in xstrdup_tolower()

2018-03-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Since 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we allocate the buffer for the lower case string with xmallocz(). This already ensures a NUL at the end of the allocated buffer. Remove the unnecessary assignment. Sign

[PATCH v12 03/10] strbuf: add a case insensitive starts_with()

2018-03-15 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Check in a case insensitive manner if one string is a prefix of another string. This function is used in a subsequent commit. Signed-off-by: Lars Schneider <larsxschnei...@gmail.com> --- git-compat-util.h | 1 + strbuf.c | 9

Re: [PATCH v11 08/10] convert: advise canonical UTF encoding names

2018-03-15 Thread Lars Schneider
> On 09 Mar 2018, at 20:11, Junio C Hamano <gits...@pobox.com> wrote: > > lars.schnei...@autodesk.com writes: > >> From: Lars Schneider <larsxschnei...@gmail.com> >> >> The canonical name of an UTF encoding has the format UTF, dash, number, >>

Re: [PATCH v11 06/10] convert: add 'working-tree-encoding' attribute

2018-03-15 Thread Lars Schneider
> On 09 Mar 2018, at 20:10, Junio C Hamano wrote: > > lars.schnei...@autodesk.com writes: > >> +static const char *default_encoding = "UTF-8"; >> + >> ... >> +static const char *git_path_check_encoding(struct attr_check_item *check) >> +{ >> +const char *value =

Re: What's cooking in git.git (Mar 2018, #03; Wed, 14)

2018-03-15 Thread Lars Schneider
> On 15 Mar 2018, at 02:34, Junio C Hamano wrote: > > ... > > * ls/checkout-encoding (2018-03-09) 10 commits > - convert: add round trip check based on 'core.checkRoundtripEncoding' > - convert: add tracing for 'working-tree-encoding' attribute > - convert: advise canonical

Re: How to debug a "git merge"?

2018-03-15 Thread Lars Schneider
> On 14 Mar 2018, at 23:20, Jeff King <p...@peff.net> wrote: > > On Wed, Mar 14, 2018 at 05:56:04PM +0100, Lars Schneider wrote: > >> I am investigating a Git merge (a86dd40fe) in which an older version of >> a file won over the newer version. I try to understa

Re: How to debug a "git merge"?

2018-03-14 Thread Lars Schneider
> On 14 Mar 2018, at 18:02, Derrick Stolee <sto...@gmail.com> wrote: > > On 3/14/2018 12:56 PM, Lars Schneider wrote: >> Hi, >> >> I am investigating a Git merge (a86dd40fe) in which an older version of >> a file won over the newer version. I try to un

How to debug a "git merge"?

2018-03-14 Thread Lars Schneider
Hi, I am investigating a Git merge (a86dd40fe) in which an older version of a file won over the newer version. I try to understand why this is the case. I can reproduce the merge with the following commands: $ git checkout -b test a02fa3303 $ GIT_MERGE_VERBOSITY=5 git merge --verbose c1b82995c

Re: [git-sizer] Implications of a large commit object

2018-03-14 Thread Lars Schneider
> On 14 Mar 2018, at 09:33, Michael Haggerty <mhag...@alum.mit.edu> wrote: > > On Wed, Mar 14, 2018 at 9:14 AM, Lars Schneider > <larsxschnei...@gmail.com> wrote: >> I am using Michael's fantastic Git repo analyzer tool "git-sizer" [*] >> and it

[git-sizer] Implications of a large commit object

2018-03-14 Thread Lars Schneider
Hi, I am using Michael's fantastic Git repo analyzer tool "git-sizer" [*] and it detected a very large commit of 7.33 MiB in my repo (see chart below). This large commit is expected. I've imported that repo from another version control system but excluded all binary files (e.g. images) and some

Re: [GSoC] [PATCH] travis-ci: added clang static analysis

2018-03-12 Thread Lars Schneider
Hi, That looks interesting but I agree with Dscho that we should not limit this to master/maint. I assume you did run this on TravisCI already? Can you share a link? I assume you did find errors? Can we fix them or are there too many? If there are existing errors, how do we define a "successful"

Re: [GSoC][PATCH] git-ci: use pylint to analyze the git-p4 code

2018-03-12 Thread Lars Schneider
Hi Viet, > On 12 Mar 2018, at 03:20, Viet Hung Tran wrote: > > This is my submission as a microproject for the Google Summer of code. > I apologize for not setting the [GSoC] in my previous email > at <20180312020855.7950-1-viethtran1...@gmail.com>. > Please ignore it.

Re: [PATCH v11 07/10] convert: check for detectable errors in UTF encodings

2018-03-09 Thread Lars Schneider
> On 09 Mar 2018, at 20:00, Junio C Hamano wrote: > > lars.schnei...@autodesk.com writes: > >> +const char *advise_msg = _( >> +"The file '%s' contains a byte order " >> +"mark (BOM). Please use %.6s

[PATCH v11 04/10] utf8: add function to detect prohibited UTF-16/32 BOM

2018-03-09 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used [1]. The function returns true if this is the case. This function is used in a subsequent commit. [1] http://unicode.org/faq/utf_bo

[PATCH v11 07/10] convert: check for detectable errors in UTF encodings

2018-03-09 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Check that new content is valid with respect to the user defined 'working-tree-encoding' attribute. Signed-off-by: Lars Schneider <larsxschnei...@gmail.com> --- convert.c| 48 +++

[PATCH v11 09/10] convert: add tracing for 'working-tree-encoding' attribute

2018-03-09 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable tracing for content that is reencoded with the 'working-tree-encoding' attribute. This is useful to debug encoding issues. Signed-off-by: Lars Schneider <la

[PATCH v11 06/10] convert: add 'working-tree-encoding' attribute

2018-03-09 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Git recognizes files encoded with ASCII or one of its supersets (e.g. UTF-8 or ISO-8859-1) as text files. All other encodings are usually interpreted as binary and consequently built-in Git text processing tools (e.g. 'git diff') as well as mo

[PATCH v11 03/10] strbuf: add a case insensitive starts_with()

2018-03-09 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Check in a case insensitive manner if one string is a prefix of another string. This function is used in a subsequent commit. Signed-off-by: Lars Schneider <larsxschnei...@gmail.com> --- git-compat-util.h | 1 + strbuf.c | 9

[PATCH v11 05/10] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-09 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> If the endianness is not defined in the encoding name, then let's be strict and require a BOM to avoid any encoding confusion. The is_missing_required_utf_bom() function returns true if a required BOM is missing. The Unicode standard ins

[PATCH v11 01/10] strbuf: remove unnecessary NUL assignment in xstrdup_tolower()

2018-03-09 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Since 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we allocate the buffer for the lower case string with xmallocz(). This already ensures a NUL at the end of the allocated buffer. Remove the unnecessary assignment. Sign

[PATCH v11 08/10] convert: advise canonical UTF encoding names

2018-03-09 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> The canonical name of an UTF encoding has the format UTF, dash, number, and an optionally byte order in upper case (e.g. UTF-8 or UTF-16BE). Some iconv versions support alternative names without a dash or with lower case characters. To

[PATCH v11 10/10] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-09 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> UTF supports lossless conversion round tripping and conversions between UTF and other encodings are mostly round trip safe as Unicode aims to be a superset of all other character encodings. However, certain encodings (e.g. SHIFT-JIS) are

[PATCH v11 02/10] strbuf: add xstrdup_toupper()

2018-03-09 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Create a copy of an existing string and make all characters upper case. Similar xstrdup_tolower(). This function is used in a subsequent commit. Signed-off-by: Lars Schneider <larsxschnei...@gmail.com> --- strbuf.c | 12 ++

[PATCH v11 00/10] convert: add support for different encodings

2018-03-09 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Hi, Patches 1-5,9 are preparation and helper functions. Patch 6-8,10 are the actual change. Patch 8 is new. This series depends on Torsten's 8462ff43e4 (convert_to_git(): safe_crlf/checksafe becomes int conv_flags, 2018-01-13) which is a

Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-09 Thread Lars Schneider
> On 07 Mar 2018, at 19:04, Eric Sunshine <sunsh...@sunshineco.com> wrote: > > On Wed, Mar 7, 2018 at 12:30 PM, <lars.schnei...@autodesk.com> wrote: >> Check that new content is valid with respect to the user defined >> 'working-tree-encoding' attribute. >

Re: [PATCH v10 3/9] strbuf: add a case insensitive starts_with()

2018-03-09 Thread Lars Schneider
> On 09 Mar 2018, at 00:12, Junio C Hamano wrote: > > Duy Nguyen writes: > >>> extern int starts_with(const char *str, const char *prefix); >>> +extern int startscase_with(const char *str, const char *prefix); >> >> This name is a bit hard to read. Boost

Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread Lars Schneider
> On 07 Mar 2018, at 23:57, Junio C Hamano <gits...@pobox.com> wrote: > > Lars Schneider <larsxschnei...@gmail.com> writes: > >> At this point I thought it would make sense to make the advised >> encoding name uppercase in both situations. OK with

Re: [PATCH v10 9/9] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-07 Thread Lars Schneider
> On 07 Mar 2018, at 23:52, Junio C Hamano <gits...@pobox.com> wrote: > > Lars Schneider <larsxschnei...@gmail.com> writes: > >> I don't think HT makes too much sense. However, isspace() is nice >> and I will use it. Being more permissive on the inputs should

Re: [PATCH v10 6/9] convert: add 'working-tree-encoding' attribute

2018-03-07 Thread Lars Schneider
content is added to the index, then Git converts the >> content to a canonical UTF-8 representation. On checkout Git will >> reverse the conversion. >> >> Signed-off-by: Lars Schneider <larsxschnei...@gmail.com> >> --- >> Documentation/gitattributes.txt

Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread Lars Schneider
> On 07 Mar 2018, at 23:32, Junio C Hamano <gits...@pobox.com> wrote: > > Lars Schneider <larsxschnei...@gmail.com> writes: > >> I also would have liked to advise "UTF-16" instead of "UTF16" as >> you suggested. However, that requ

Re: [PATCH v10 9/9] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-07 Thread Lars Schneider
> On 07 Mar 2018, at 20:59, Junio C Hamano wrote: > > lars.schnei...@autodesk.com writes: > >> +static int check_roundtrip(const char* enc_name) > > The asterisk sticks to the variable, not type. Argh. I need to put this check into Travis CI ;-) >> +{ >> +/* >> +

Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread Lars Schneider
> On 07 Mar 2018, at 20:49, Junio C Hamano wrote: > > lars.schnei...@autodesk.com writes: > >> +static int validate_encoding(const char *path, const char *enc, >> + const char *data, size_t len, int die_on_error) >> +{ >> +/* We only check for UTF here

[PATCH v10 8/9] convert: add tracing for 'working-tree-encoding' attribute

2018-03-07 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable tracing for content that is reencoded with the 'working-tree-encoding' attribute. This is useful to debug encoding issues. Signed-off-by: Lars Schneider <la

[PATCH v10 6/9] convert: add 'working-tree-encoding' attribute

2018-03-07 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Git recognizes files encoded with ASCII or one of its supersets (e.g. UTF-8 or ISO-8859-1) as text files. All other encodings are usually interpreted as binary and consequently built-in Git text processing tools (e.g. 'git diff') as well as mo

[PATCH v10 5/9] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-07 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> If the endianness is not defined in the encoding name, then let's be strict and require a BOM to avoid any encoding confusion. The is_missing_required_utf_bom() function returns true if a required BOM is missing. The Unicode standard ins

[PATCH v10 4/9] utf8: add function to detect prohibited UTF-16/32 BOM

2018-03-07 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used [1]. The function returns true if this is the case. This function is used in a subsequent commit. [1] http://unicode.org/faq/utf_bo

[PATCH v10 0/9] convert: add support for different encodings

2018-03-07 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Hi, Patches 1-5,8 are preparation and helper functions. Patch 3 is new. Patch 6,7,9 are the actual change. This series depends on Torsten's 8462ff43e4 (convert_to_git(): safe_crlf/checksafe becomes int conv_flags, 2018-01-13) which is a

[PATCH v10 1/9] strbuf: remove unnecessary NUL assignment in xstrdup_tolower()

2018-03-07 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Since 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we allocate the buffer for the lower case string with xmallocz(). This already ensures a NUL at the end of the allocated buffer. Remove the unnecessary assignment. Sign

[PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Check that new content is valid with respect to the user defined 'working-tree-encoding' attribute. Signed-off-by: Lars Schneider <larsxschnei...@gmail.com> --- convert.c| 55

[PATCH v10 9/9] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-07 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> UTF supports lossless conversion round tripping and conversions between UTF and other encodings are mostly round trip safe as Unicode aims to be a superset of all other character encodings. However, certain encodings (e.g. SHIFT-JIS) are

[PATCH v10 3/9] strbuf: add a case insensitive starts_with()

2018-03-07 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Check in a case insensitive manner if one string is a prefix of another string. This function is used in a subsequent commit. Signed-off-by: Lars Schneider <larsxschnei...@gmail.com> --- git-compat-util.h | 1 + strbuf.c | 9

[PATCH v10 2/9] strbuf: add xstrdup_toupper()

2018-03-07 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Create a copy of an existing string and make all characters upper case. Similar xstrdup_tolower(). This function is used in a subsequent commit. Signed-off-by: Lars Schneider <larsxschnei...@gmail.com> --- strbuf.c | 12 ++

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-06 Thread Lars Schneider
> On 07 Mar 2018, at 00:07, Junio C Hamano <gits...@pobox.com> wrote: > > Junio C Hamano <gits...@pobox.com> writes: > >> Lars Schneider <larsxschnei...@gmail.com> writes: >> >>>> Also "UTF16" or other spelling >&g

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-06 Thread Lars Schneider
> On 06 Mar 2018, at 23:53, Junio C Hamano <gits...@pobox.com> wrote: > > Lars Schneider <larsxschnei...@gmail.com> writes: > >>> Also "UTF16" or other spelling >>> the platform may support but this code fails to recognise will go >>>

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-06 Thread Lars Schneider
> On 06 Mar 2018, at 21:50, Junio C Hamano wrote: > > lars.schnei...@autodesk.com writes: > >> +int is_missing_required_utf_bom(const char *enc, const char *data, size_t >> len) >> +{ >> +return ( >> + !strcmp(enc, "UTF-16") && >> + !(has_bom_prefix(data,

Re: [PATCH v9 5/8] convert: add 'working-tree-encoding' attribute

2018-03-06 Thread Lars Schneider
. All other encodings are usually >> interpreted as binary and consequently built-in Git text processing >> tools (e.g. 'git diff') as well as most Git web front ends do not >> visualize the content. >> [...] >> Signed-off-by: Lars Schneider <larsxschnei...@gmail.com&g

Re: [PATCH v9 6/8] convert: check for detectable errors in UTF encodings

2018-03-06 Thread Lars Schneider
> On 06 Mar 2018, at 02:23, Junio C Hamano <gits...@pobox.com> wrote: > > Lars Schneider <larsxschnei...@gmail.com> writes: > >>> On 05 Mar 2018, at 22:50, Junio C Hamano <gits...@pobox.com> wrote: >>> >>> lars.schnei...@autodesk.com w

Re: [PATCH v9 6/8] convert: check for detectable errors in UTF encodings

2018-03-05 Thread Lars Schneider
> On 05 Mar 2018, at 22:50, Junio C Hamano wrote: > > lars.schnei...@autodesk.com writes: > >> +static int validate_encoding(const char *path, const char *enc, >> + const char *data, size_t len, int die_on_error) >> +{ >> +if (!memcmp("UTF-", enc, 4)) {

Re: Contributor Summit planning

2018-03-05 Thread Lars Schneider
> On 03 Mar 2018, at 11:39, Jeff King wrote: > > On Sat, Mar 03, 2018 at 05:30:10AM -0500, Jeff King wrote: > >> As in past years, I plan to run it like an unconference. Attendees are >> expected to bring topics for group discussion. Short presentations are >> also welcome.

[PATCH v9 7/8] convert: add tracing for 'working-tree-encoding' attribute

2018-03-04 Thread lars . schneider
From: Lars Schneider <larsxschnei...@gmail.com> Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable tracing for content that is reencoded with the 'working-tree-encoding' attribute. This is useful to debug encoding issues. Signed-off-by: Lars Schneider <la

  1   2   3   4   5   6   7   8   9   10   >