Re: [PATCH v2 5/5] Revert "repository: pre-initialize hash algo pointer"

2018-02-25 Thread brian m. carlson
On Sun, Feb 25, 2018 at 10:29:29AM +0700, Duy Nguyen wrote:
> When I set unknown hash algo here, I think some test failed
> mysteriously because it used rawsz field (which has value zero), it
> didn't match some expectation, the code went to the error handling
> path, which eventually failed with some error message, but it's not
> obvious that the problem was rawsz being zero and back tracking that
> took me some time.
> 
> With NULL hash_algo, any dereferencing fails immediately with a nice
> stack trace. Another reason to push me towards NULL hash algo is, even
> if we prefer nice messages over segmentation faults, we can't avoid it
> completely anyway (empty_tree and empty_blob are still NULL in unknown
> hash algo and will cause segfaults). Might as well make things
> consistent and always segfault.

That makes sense.  Thanks for the explanation.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v2 5/5] Revert "repository: pre-initialize hash algo pointer"

2018-02-24 Thread Duy Nguyen
On Sun, Feb 25, 2018 at 5:58 AM, brian m. carlson
 wrote:
>> diff --git a/repository.c b/repository.c
>> index 4ffbe9bc94..0d715f4fdb 100644
>> --- a/repository.c
>> +++ b/repository.c
>> @@ -5,7 +5,7 @@
>>
>>  /* The main repository */
>>  static struct repository the_repo = {
>> - NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, _index, 
>> _algos[GIT_HASH_SHA1], 0, 0
>> + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, _index, 
>> NULL, 0, 0
>
> I'm wondering, now that you have the name field for the unknown value,
> if that might be a better choice here than NULL.  I don't have a strong
> preference either way, so whatever you decide here is fine.

I did try that first, but for the purpose of catching uninitialized
algo use, NULL is better.

When I set unknown hash algo here, I think some test failed
mysteriously because it used rawsz field (which has value zero), it
didn't match some expectation, the code went to the error handling
path, which eventually failed with some error message, but it's not
obvious that the problem was rawsz being zero and back tracking that
took me some time.

With NULL hash_algo, any dereferencing fails immediately with a nice
stack trace. Another reason to push me towards NULL hash algo is, even
if we prefer nice messages over segmentation faults, we can't avoid it
completely anyway (empty_tree and empty_blob are still NULL in unknown
hash algo and will cause segfaults). Might as well make things
consistent and always segfault.
-- 
Duy


Re: [PATCH v2 5/5] Revert "repository: pre-initialize hash algo pointer"

2018-02-24 Thread brian m. carlson
On Sat, Feb 24, 2018 at 10:34:29AM +0700, Nguyễn Thái Ngọc Duy wrote:
> This reverts commit e26f7f19b6c7485f04234946a59ab8f4fd21d6d1. The root
> problem, git clone not setting up the_hash_algo, has been fixed in the
> previous patch.
> 
> Since this is a dangerous move and could potentially break stuff after
> release (and leads to workaround like the reverted commit), the
> workaround technically remains, but is hidden behind a new environment
> variable GIT_HASH_FIXUP. This should let the users continue to use git
> while we fix the problem. This variable can be deleted after one or two
> releases.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  common-main.c| 10 ++
>  repository.c |  2 +-
>  t/helper/test-dump-split-index.c |  2 ++
>  3 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/common-main.c b/common-main.c
> index 6a689007e7..fbfa98c3b8 100644
> --- a/common-main.c
> +++ b/common-main.c
> @@ -1,6 +1,7 @@
>  #include "cache.h"
>  #include "exec_cmd.h"
>  #include "attr.h"
> +#include "repository.h"
>  
>  /*
>   * Many parts of Git have subprograms communicate via pipe, expect the
> @@ -40,5 +41,14 @@ int main(int argc, const char **argv)
>  
>   restore_sigpipe_to_default();
>  
> + /*
> +  * Temporary recovery measure while hashing code is being
> +  * refactored. This variable should be gone after everybody
> +  * has used the_hash_algo in one or two releases and nobody
> +  * complains anything.
> +  */
> + if (getenv("GIT_HASH_FIXUP"))
> + repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +
>   return cmd_main(argc, argv);
>  }
> diff --git a/repository.c b/repository.c
> index 4ffbe9bc94..0d715f4fdb 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -5,7 +5,7 @@
>  
>  /* The main repository */
>  static struct repository the_repo = {
> - NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, _index, 
> _algos[GIT_HASH_SHA1], 0, 0
> + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, _index, NULL, 
> 0, 0

I'm wondering, now that you have the name field for the unknown value,
if that might be a better choice here than NULL.  I don't have a strong
preference either way, so whatever you decide here is fine.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


[PATCH v2 5/5] Revert "repository: pre-initialize hash algo pointer"

2018-02-23 Thread Nguyễn Thái Ngọc Duy
This reverts commit e26f7f19b6c7485f04234946a59ab8f4fd21d6d1. The root
problem, git clone not setting up the_hash_algo, has been fixed in the
previous patch.

Since this is a dangerous move and could potentially break stuff after
release (and leads to workaround like the reverted commit), the
workaround technically remains, but is hidden behind a new environment
variable GIT_HASH_FIXUP. This should let the users continue to use git
while we fix the problem. This variable can be deleted after one or two
releases.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 common-main.c| 10 ++
 repository.c |  2 +-
 t/helper/test-dump-split-index.c |  2 ++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/common-main.c b/common-main.c
index 6a689007e7..fbfa98c3b8 100644
--- a/common-main.c
+++ b/common-main.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "exec_cmd.h"
 #include "attr.h"
+#include "repository.h"
 
 /*
  * Many parts of Git have subprograms communicate via pipe, expect the
@@ -40,5 +41,14 @@ int main(int argc, const char **argv)
 
restore_sigpipe_to_default();
 
+   /*
+* Temporary recovery measure while hashing code is being
+* refactored. This variable should be gone after everybody
+* has used the_hash_algo in one or two releases and nobody
+* complains anything.
+*/
+   if (getenv("GIT_HASH_FIXUP"))
+   repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
return cmd_main(argc, argv);
 }
diff --git a/repository.c b/repository.c
index 4ffbe9bc94..0d715f4fdb 100644
--- a/repository.c
+++ b/repository.c
@@ -5,7 +5,7 @@
 
 /* The main repository */
 static struct repository the_repo = {
-   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, _index, 
_algos[GIT_HASH_SHA1], 0, 0
+   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, _index, NULL, 
0, 0
 };
 struct repository *the_repository = _repo;
 
diff --git a/t/helper/test-dump-split-index.c b/t/helper/test-dump-split-index.c
index e44430b699..b759fe3fc1 100644
--- a/t/helper/test-dump-split-index.c
+++ b/t/helper/test-dump-split-index.c
@@ -12,6 +12,8 @@ int cmd_main(int ac, const char **av)
struct split_index *si;
int i;
 
+   setup_git_directory();
+
do_read_index(_index, av[1], 1);
printf("own %s\n", sha1_to_hex(the_index.sha1));
si = the_index.split_index;
-- 
2.16.1.435.g8f24da2e1a