bug#50271: In the src/dfasearch.c file I find the function dfasupported undefined

2021-09-01 Thread Norihiro Tanaka
On Mon, 30 Aug 2021 20:30:36 +0800 (CST) yangzhuangzhuang wrote: > The function dfasupported is referenced in the submission below, but > is not found with the > definition.commit:https://git.savannah.gnu.org/cgit/grep.git/commit/src?id=ae65513edc80a1b65f19264b9bed95d870602967 dfasupported

bug#45432: Use both --include and --exclude at the same time

2021-01-05 Thread Norihiro Tanaka
On Mon, 4 Jan 2021 09:55:48 -0800 Jim Meyering wrote: > tags 45432 moreinfo > stop > > On Fri, Dec 25, 2020 at 8:57 AM Fred .Flintstone wrote: > > It seems --exclude does nothing when --include is used. It would be useful > > to be able to use both together, in order to do things such as

bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6

2020-12-06 Thread Norihiro Tanaka
On Sat, 5 Dec 2020 10:06:27 -0800 Jim Meyering wrote: > Thank you for that patch. Can you say a little more about the domain > of the problem? > I.e., is it specific to invocations with "-w"? > Can you provide an example that exhibits the performance improvement, > with timings? The test case

bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6

2020-12-03 Thread Norihiro Tanaka
compared to version 3.3, and that can be remedied. It converts to grep only if the potential match does not match the word frequently. From 1bfcdca658bd91dd6b8e6e3a96c9e77678bb4d2e Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka Date: Thu, 3 Dec 2020 17:22:50 +0900 Subject: [PATCH] grep: improvemen

bug#43862: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax

2020-11-01 Thread Norihiro Tanaka
On Sun, 1 Nov 2020 11:39:55 -0800 Jim Meyering wrote: > We must accept the fact that extreme regular expressions will cause > resource exhaustion like that when processed by classical regex_* > functions. This is yet another good reason to prefer PCRE and to use > grep's -P option. In that

bug#44351: Bug in grep v3.2 onwards in regular expression matching

2020-11-01 Thread Norihiro Tanaka
Hi, By the way, I was wondering whether to add the test to ere.tests or spencer1.tests or to a new file. How should they be used properly?

bug#44351: Bug in grep v3.2 onwards in regular expression matching

2020-11-01 Thread Norihiro Tanaka
lar nodes in series are not merged. From 88bad5597445650f4e1bca663a82d4e4d14c93f3 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka Date: Sun, 1 Nov 2020 16:31:38 +0900 Subject: [PATCH] dfa: remain similar nodes in series in optimization DFA was merging similar nodes illegally, example a+a+a as a+a. Now similar nodes in series are

bug#43863: [PATCH] grep: remove unusable code

2020-10-10 Thread Norihiro Tanaka
On Fri, 9 Oct 2020 12:53:47 +0300 Shlomi Fish wrote: > Hi Norihiro Tanaka! > > On Thu, 08 Oct 2020 18:55:50 +0900 > Norihiro Tanaka wrote: Thanks, not 'unusable' but 'unused' is right. From 4d91494963ab1645417682af548d162021607f40 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka

bug#43527: [PATCH] grep: avoid unneeded compilation of regex

2020-09-26 Thread Norihiro Tanaka
On Sat, 26 Sep 2020 18:12:37 -0700 Paul Eggert wrote: > The patch should be harmless (though this fact isn't trivial) and I can > see it being useful for plausible future performance improvements, so it > would make sense to install it after the next release. No longer need the patch. This

bug#43623: [PATCH] dfa: remove unused the member of structure

2020-09-25 Thread Norihiro Tanaka
Hi, Now the member 'first_end' in struct dfa is used. It should be removed. Thanks, Norihiro From ce3f6337b651128d405137a58656e623579cf17d Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka Date: Sat, 26 Sep 2020 09:50:01 +0900 Subject: [PATCH] dfa: remove unused the member of structure * lib

bug#43577: wrong result for grep -io in turkish locale

2020-09-23 Thread Norihiro Tanaka
I attach the fix for the bug. Regex is fixed in Paul, thank you. From 884c46aadbe6a2f7203f84d4173a515ca4ccf8de Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka Date: Thu, 24 Sep 2020 10:39:46 +0900 Subject: [PATCH] grep: fix ignore-case Turkish bug * src/grep.c (fgrep_icase_charlen): Do

bug#43577: wrong result for grep -io in turkish locale

2020-09-23 Thread Norihiro Tanaka
In turkish locale, upper and lower case are mapped as following. U0049 <-> U0131 U0069 <-> U0130 It's expected that both following test cases returns U0130, but later returns nothing. $ printf '\304\260\n' >I # U0130 $ env LC_ALL=tr_TR.utf8 grep -i i I ? # U0130 $ env LC_ALL=tr_TR.utf8

bug#43527: [PATCH] grep: avoid unneeded compilation of regex

2020-09-22 Thread Norihiro Tanaka
409624 in Fcompile (pattern=0x23c1240 "i\n", size=1, ignored=0, exact=true) at kwsearch.c:56 #4 0x00409378 in main (argc=4, argv=0x7ffe76048388) at grep.c:2977 From 6118c3ee14c6131ec544244b1fabf05c3a913bd6 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka Date: Wed, 23 Sep 2020 07:33:32 +0900

bug#43527: [PATCH] grep: avoid unneeded compilation of regex

2020-09-22 Thread Norihiro Tanaka
On Tue, 22 Sep 2020 08:50:03 -0700 Jim Meyering wrote: > On Tue, Sep 22, 2020 at 7:54 AM Norihiro Tanaka wrote: > > On Mon, 21 Sep 2020 17:33:25 -0700 > > Jim Meyering wrote: > ... > > > Here are the two patches (tested on top of a third that updates to > > &g

bug#43527: [PATCH] grep: avoid unneeded compilation of regex

2020-09-22 Thread Norihiro Tanaka
On Mon, 21 Sep 2020 17:33:25 -0700 Jim Meyering wrote: > On Sun, Sep 20, 2020 at 6:34 PM Jim Meyering wrote: > > > > On Sun, Sep 20, 2020 at 12:17 AM Norihiro Tanaka wrote: > > > Hi, > > > Performace for as following case is fixed in bug#43040. > > >

bug#43527: [PATCH] grep: avoid unneeded compilation of regex

2020-09-20 Thread Norihiro Tanaka
ecute (void *, char const *, size_t, size_t *, char const *); /* grep.c */ -- 1.7.1 From ca0d0c9e79478df4645c15a5a885955d1c6221c9 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka Date: Sun, 20 Sep 2020 16:00:04 +0900 Subject: [PATCH] dfa: change dfasupported() to global function * lib/dfa.c

bug#41700: grep -v always exiting with 1 for empty file

2020-06-04 Thread Norihiro Tanaka
On Wed, 3 Jun 2020 20:26:41 -0700 Andi Kleen wrote: > > % grep --version > grep (GNU grep) 3.4 > ... > % echo -n > foo > % grep -v foo foo ; echo $? > 1 > > Would expect it to exit with zero in this case, since foo is not in the > file. > > When the file is one byte it works as expected: >

bug#40634: Massive pattern list handling with -E format seems very slow since 2.28.

2020-04-18 Thread Norihiro Tanaka
On Sun, 19 Apr 2020 07:41:49 +0900 Norihiro Tanaka wrote: > > On Sat, 18 Apr 2020 00:22:26 +0900 > Norihiro Tanaka wrote: > > > > > On Fri, 17 Apr 2020 10:24:42 +0900 > > Norihiro Tanaka wrote: > > > > > > > > On Fri, 1

bug#40634: Massive pattern list handling with -E format seems very slow since 2.28.

2020-04-18 Thread Norihiro Tanaka
On Sat, 18 Apr 2020 00:22:26 +0900 Norihiro Tanaka wrote: > > On Fri, 17 Apr 2020 10:24:42 +0900 > Norihiro Tanaka wrote: > > > > > On Fri, 17 Apr 2020 09:35:36 +0900 > > Norihiro Tanaka wrote: > > > > > > > > On Th

bug#40634: Massive pattern list handling with -E format seems very slow since 2.28.

2020-04-17 Thread Norihiro Tanaka
On Fri, 17 Apr 2020 10:24:42 +0900 Norihiro Tanaka wrote: > > On Fri, 17 Apr 2020 09:35:36 +0900 > Norihiro Tanaka wrote: > > > > > On Thu, 16 Apr 2020 16:00:29 -0700 > > Paul Eggert wrote: > > > > > On 4/16/20 3:53 PM, Norihiro Tanaka wrot

bug#40634: Massive pattern list handling with -E format seems very slow since 2.28.

2020-04-16 Thread Norihiro Tanaka
On Fri, 17 Apr 2020 09:35:36 +0900 Norihiro Tanaka wrote: > > On Thu, 16 Apr 2020 16:00:29 -0700 > Paul Eggert wrote: > > > On 4/16/20 3:53 PM, Norihiro Tanaka wrote: > > > > > I have had no idea to solve the problem yet. If we revert it, bug#33357 > &

bug#40634: Massive pattern list handling with -E format seems very slow since 2.28.

2020-04-16 Thread Norihiro Tanaka
On Thu, 16 Apr 2020 16:00:29 -0700 Paul Eggert wrote: > On 4/16/20 3:53 PM, Norihiro Tanaka wrote: > > > I have had no idea to solve the problem yet. If we revert it, bug#33357 > > will come back. > > Yes, I'd rather not revert if we can help it. > > My

bug#40634: Massive pattern list handling with -E format seems very slow since 2.28.

2020-04-16 Thread Norihiro Tanaka
On Thu, 16 Apr 2020 09:31:32 -0700 Paul Eggert wrote: > On 4/15/20 11:56 PM, Norihiro Tanaka wrote: > > > It seems to a lot of time is spent in dfa.c:replace(). > > It was added at d6df3873c7abc243683d0e8fccbfde4e76f23e53 in gnulib. > > It would be pretty dras

bug#40634: Massive pattern list handling with -E format seems very slow since 2.28.

2020-04-16 Thread Norihiro Tanaka
+ grep-2.2/src/grep -E -v -m1 -f grep-patterns.txt /dev/null grep-2.2/src/grep: invalid option -- 'm' Usage: grep [OPTION]... PATTERN [FILE]... Try `grep --help' for more information. real 0.00 user 0.00 sys 0.00 + grep-2.3/src/grep -E -v -m1 -f grep-patterns.txt /dev/null grep-2.3/src/grep:

bug#33249: [PATCH] grep: grouping of patterns including back reference

2019-12-23 Thread Norihiro Tanaka
On Sun, 22 Dec 2019 16:57:12 -0800 Paul Eggert wrote: > On 11/3/18 9:25 PM, Norihiro Tanaka wrote: > > > $ seq -f '%040g' 0 | sed '1s/$/\\(0\\)\\1/' >pat > > Thanks for the test case and sorry about the delay. And thanks for spotting > the > speedup

bug#34053: [PATCH] grep: fix slow for multiple word matching

2019-12-19 Thread Norihiro Tanaka
On Wed, 18 Dec 2019 18:55:01 -0800 Jim Meyering wrote: > On Tue, Nov 26, 2019 at 2:38 PM Norihiro Tanaka wrote: > > On Sun, 13 Jan 2019 08:45:47 +0900 > > Norihiro Tanaka wrote: > > > grep uses KWset matcher for multiple word matching. It is very slow when > &g

bug#34053: [PATCH] grep: fix slow for multiple word matching

2019-11-26 Thread Norihiro Tanaka
On Sun, 13 Jan 2019 08:45:47 +0900 Norihiro Tanaka wrote: > Hi, > > grep uses KWset matcher for multiple word matching. It is very slow when > most of the parts matched to a pattern are not words. So, if a part firstly > matched to pattern is not a word, use the grep matcher t

bug#38223: grep >=2.28 cannot handle -wF correctly under LANG=ja_JP.eucjp

2019-11-17 Thread Norihiro Tanaka
On Sat, 16 Nov 2019 22:45:56 -0800 Jim Meyering wrote: > On Sat, Nov 16, 2019 at 8:36 PM Jim Meyering wrote: > > On Sat, Nov 16, 2019 at 4:02 PM Norihiro Tanaka wrote: > > > On Sat, 16 Nov 2019 11:00:38 -0800 > > > Jim Meyering wrote: > > > >

bug#38223: grep >=2.28 cannot handle -wF correctly under LANG=ja_JP.eucjp

2019-11-16 Thread Norihiro Tanaka
hed, I found extreamly slowdown. yes $(printf %040d 0) | head -100 >k time -p env LC_ALL=ja_JP.eucjp src/grep -F -w 0 k First patch fixes it, and second improves performance more. From 0202a83b3d0de224a5d606958e3719244d546548 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka Date: Sun, 17

bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union)

2019-10-19 Thread Norihiro Tanaka
On Tue, 15 Oct 2019 12:48:17 +1100 "Trent W. Buck" wrote: > Package: grep > Version: 3.3-1 > Severity: wishlist > > This bug was originally reported as > https://bugs.debian.org/940464 > > Trent W. Buck wrote: > > (Surely someone has already asked for this, but I can't see where. > > I may

bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union)

2019-10-16 Thread Norihiro Tanaka
On Tue, 15 Oct 2019 12:48:17 +1100 "Trent W. Buck" wrote: > Package: grep > Version: 3.3-1 > Severity: wishlist > > This bug was originally reported as > https://bugs.debian.org/940464 > > Trent W. Buck wrote: > > (Surely someone has already asked for this, but I can't see where. > > I may

bug#34951: [PATCH] grep: a kwset matcher not work in a grep matcher

2019-03-22 Thread Norihiro Tanaka
On Sat, 23 Mar 2019 08:06:35 +0900 Norihiro Tanaka wrote: > A kwset matcher is not built in a grep matcher after token re-order is > introduced in commit 5c7a0371823876cca7a1347fa09ca26bbbff0c98 in dfa. > It caused performance degradation in some typical cases. This bug is > introd

bug#34951: [PATCH] grep: a kwset matcher not work in a grep matcher

2019-03-22 Thread Norihiro Tanaka
rom fca6a4c3b9e0757637b7a2009ca8b9070a6874f5 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka Date: Sat, 23 Mar 2019 07:18:37 +0900 Subject: [PATCH] dfa: separate parse and compile phase DFAMUST() must be called after parse and before tokens re-order which is introduced in commit 5c7a0371823876cca7a1347fa09ca26bbbff0

bug#34054: Error in compilation of pcresearch if we have no pcre library

2019-01-12 Thread Norihiro Tanaka
Hi, I pulled current master of grep from git repository and built it on fedora 29, and recieved following error. When we have no pcre library, DIE() in Pcompile and Pexecute is called, but noreturn attribute is set to their functions. Thanks, Norihiro $ make .. depbase=`echo pcresearch.o

bug#34053: [PATCH] grep: fix slow for multiple word matching

2019-01-12 Thread Norihiro Tanaka
-p src/grep -wf pat inp real 0.32 user 0.31 sys 0.00 Thanks, Norihiro From b4f07fa0288ad68932fc606ed760fd61db9df6d0 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka Date: Sun, 13 Jan 2019 07:53:32 +0900 Subject: [PATCH] grep: fix slow for multiple word matching grep uses KWset matcher for multipl

bug#33249: [PATCH] grep: grouping of patterns including back reference

2018-11-03 Thread Norihiro Tanaka
On Sat, 3 Nov 2018 21:02:19 -0700 Paul Eggert wrote: > Norihiro Tanaka wrote: > > Even the pattern has no back-references, compilation by regex run for > > each line. So Syntax errors will be detected as even your present. > > OK, but then I'm afraid I don't unde

bug#33249: [PATCH] grep: grouping of patterns including back reference

2018-11-03 Thread Norihiro Tanaka
On Sat, 3 Nov 2018 08:29:39 -0700 Paul Eggert wrote: > Norihiro Tanaka wrote: > > By this change, each fragment is divided into > > groups by whether the fragment includes back reference in a pattern or > > not. a frgment which includes back reference constitutes group,

bug#33249: [PATCH] grep: grouping of patterns including back reference

2018-11-03 Thread Norihiro Tanaka
17 00:00:00 2001 From: Norihiro Tanaka Date: Sat, 3 Nov 2018 18:56:18 +0900 Subject: [PATCH] grep: grouping of a pattern with multiple lines When grep uses regex, it splits a pattern with multiple lines by newline character into fragments. Compilation and executution run for each fragment.

bug#33116: [PATCH 1/6] dfa: remove unneeded code

2018-10-22 Thread Norihiro Tanaka
env LC_ALL=C src/grep -vf in in real 39.20 user 20.35 sys 18.78 (After) $ time -p env LC_ALL=C src/grep -vf in in real 6.87 user 6.38 sys 0.48 Thanks, Norihiro From 65f156cd0e605c11a40877d8c070a185def699e5 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka Date: Mon, 22 Oct 2018 23:22:40 +0900 Subj

bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA

2018-09-19 Thread Norihiro Tanaka
On Tue, 18 Sep 2018 22:13:38 -0700 Jim Meyering wrote: > Also, when I compared grep compiled at > 123620af88f55c3e0cc9f0aed7311c72f625bc82 (latest, including your > changes) and that compiled at the prior commit, > 9c11510507ebcd31671f10d9b88532f8e6657ad2, I find that the new version > takes

bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA

2018-09-18 Thread Norihiro Tanaka
Paul Eggert wrote: > Thanks for the patch. A quick question: what does the identifier > "dfautf8noss" stand for? I couldn't figure it out. It means "No use superset for utf8". I thought of various things for the name of the function, but I could not think of a good name.

bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA

2018-09-17 Thread Norihiro Tanaka
rom 3193191730d6ecb3a0c4e38b461484deaf819f87 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka Date: Mon, 17 Sep 2018 22:20:37 +0900 Subject: [PATCH 1/2] dfa: simplify initial state To simplify initial state enables to be easy to optimization of NFA. dfa.c (enum token): Add new element BEG. (prtok): Adjust due to add

bug#29668: grep: Fatal problem with (big) file

2017-12-15 Thread Norihiro Tanaka
On Wed, 13 Dec 2017 16:03:57 -0800 Paul Eggert <egg...@cs.ucla.edu> wrote: > On 12/13/2017 03:25 PM, Norihiro Tanaka wrote: > > I don't seem that that's problem. the user pass output of grep to wc -l, > > so `Binary file ... matches' line is also counted by `wc' as one

bug#29668: grep: Fatal problem with (big) file

2017-12-13 Thread Norihiro Tanaka
On Tue, 12 Dec 2017 16:28:09 -0800 Paul Eggert <egg...@cs.ucla.edu> wrote: > On 12/11/2017 03:36 PM, Norihiro Tanaka wrote: > > Perhaps, characters not to be able to recognize in your locale included > > in Tieliikenne 5.0.csv and volvot.csv are included. > &

bug#29668: grep: Fatal problem with (big) file

2017-12-11 Thread Norihiro Tanaka
On Mon, 11 Dec 2017 23:45:25 +0200 pg wrote: > $ awk '/Volvo/' Tieliikenne5.0.csv | wc -l > 266175 > $ grep Volvo Tieliikenne5.0.csv | wc -l > 1638 > $ awk '/N3/' volvot.csv | wc -l > 17822 > $ grep N3 volvot.csv | wc -l > 1701 Perhaps, characters not to be able to

bug#26832: bug on grep 3.0

2017-05-08 Thread Norihiro Tanaka
On Mon, 8 May 2017 16:56:31 +0900 Masataka Kawasaki wrote: > I found a bug on grep 3.0 on 64bit cygwin. > It seems that '\/' before '$' causes probrems. > > grep 2.25(correct) > >echo rr/| grep '^.*\/$' > rr/ > >echo rr/| gawk '/^.*\/$/' > rr/ > >echo rr/|

bug#25499: [PATCH] grep: fix matching not longest pattern with grep -Fo

2017-01-21 Thread Norihiro Tanaka
On Sat, 21 Jan 2017 08:09:00 -0800 Jim Meyering wrote: > Nice. I am glad you caught that. > I've adjusted some wording and will push this soon: Thanks for replying and adjusting quickly. Your adjustment is also very useful for me to learn English.

bug#25499: [PATCH] grep: fix matching not longest pattern with grep -Fo

2017-01-21 Thread Norihiro Tanaka
grep -Fo may not match longest pattern in grep 2.26 or later including current master. $ printf 'abce\n' > in $ printf 'abcd\nc\nbce\n' > pat $ LC_ALL=C src/grep -Fof pat in c We expect "bce" in this case. From 2e75efbf90869abfeafc0ab9fcd4fa4b453c0b2a Mon Sep 17 00:00:00 200

bug#25479: memory leaks in dfa

2017-01-18 Thread Norihiro Tanaka
(main.c:459) > > There may be other paths as well. > > Can y'all track this down and fix? > > Thanks, > > Arnold Thanks for the report. It is caused by temporarily allocated memory not freed. From 3479bce8542f75c11e6b0b9907e22b26d91865ca Mon Sep 17 00:00:00 2001 From: Norihiro Ta

bug#21763: bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-28 Thread Norihiro Tanaka
On Tue, 27 Dec 2016 22:37:25 -0800 Paul Eggert <egg...@cs.ucla.edu> wrote: > Norihiro Tanaka wrote: > > So I wrote the patch to use fgrep matcher for both. > > Thanks, I installed that after tweaking the commit message and omitting > unnecessary parens. Thanks, I confirmed it.

bug#22239: bug#21763: bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-27 Thread Norihiro Tanaka
On Mon, 26 Dec 2016 12:07:49 -0800 Paul Eggert <egg...@cs.ucla.edu> wrote: > Norihiro Tanaka wrote: > > Hmm, how about the following test cases, although it is extreame? > > I don't think we need to worry about performance for the case when -w > is given, an

bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-26 Thread Norihiro Tanaka
On Fri, 23 Dec 2016 17:38:42 -0800 Paul Eggert wrote: > No. Thanks, I hadn't considered that possibility. I looked into the > slowdown and installed the attached patches, which cause 'grep' to > run about as fast on this test case as grep 2.25 (though not as fast > as grep

bug#21763: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-21 Thread Norihiro Tanaka
On Tue, 20 Dec 2016 21:17:01 -0800 Paul Eggert wrote: > I installed the attached patches into grep master. These fix the > performance regressions noted at the start of Bug#22357. I see that > the related performance problems noted in Bug#21763 seem to be fixed > too, I

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-20 Thread Norihiro Tanaka
On Mon, 19 Dec 2016 15:38:12 -0800 Paul Eggert wrote: > but the old 'replace' called 'delete' up to N times, Yes, but constraint == 0 does not happen mostly, so in delete() in "while" does not pass normally. > Anyway, I verified that the change improved performance on the

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-19 Thread Norihiro Tanaka
On Sun, 18 Dec 2016 23:48:10 -0800 Paul Eggert wrote: > >> 'delete' is > >> O(N); 'replace' calls 'delete' in a loop and is therefore O(N**2). > >> 'epsclosure' calls 'replace' in a loop and so I suppose it is O(N**3). > >> I haven't looked into how likely the worst-case

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-17 Thread Norihiro Tanaka
On Wed, 14 Dec 2016 17:19:27 -0800 Paul Eggert wrote: > I was referring to code with his proposed patch installed. 'delete' is > O(N); 'replace' calls 'delete' in a loop and is therefore O(N**2). > 'epsclosure' calls 'replace' in a loop and so I suppose it is O(N**3). > I

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-11 Thread Norihiro Tanaka
On Sun, 11 Dec 2016 05:28:56 -0600 Trevor Cordes wrote: > On my box the above runs for >2m (never completes before I ^C) on the > version **AFTER** the commits (v2.22). On the test build just *BEFORE* > the commits (2.21.73-8058), it runs in <2s. So for me, I had a

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-10 Thread Norihiro Tanaka
/dev/null Thanks, Norihiro From 19502d13120d612fc89b922c9b28cc3030ea0674 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka <nori...@kcn.ne.jp> Date: Sun, 11 Dec 2016 09:35:50 +0900 Subject: [PATCH] dfa: performance improvement for removal of epsilon closure * lib/dfa.c (delete): Use binary search to find deleted index

bug#24975: Matching issues with characters whose encoding ends in some other character

2016-11-28 Thread Norihiro Tanaka
haracters. Thanks, Norihiro From 67484a67d7d310d76a2eb80b68a8ec8eb5c6a7fc Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka <nori...@kcn.ne.jp> Date: Mon, 28 Nov 2016 22:26:07 +0900 Subject: [PATCH] dfa: avoid match middle in multibyte character * lib/dfa.c (transit_state): If fails in matchin

bug#24975: Matching issues with characters whose encoding ends in some other character

2016-11-28 Thread Norihiro Tanaka
haracters. Thanks, Norihiro From 67484a67d7d310d76a2eb80b68a8ec8eb5c6a7fc Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka <nori...@kcn.ne.jp> Date: Mon, 28 Nov 2016 22:26:07 +0900 Subject: [PATCH] dfa: avoid match middle in multibyte character * lib/dfa.c (transit_state): If fails in matchin

bug#24941: Early termination bug in grep 2.26

2016-11-15 Thread Norihiro Tanaka
On Tue, 15 Nov 2016 11:35:15 -0800 Jim Meyering wrote: > I suppose you mean in addition to the S_ISFIFO test? That sounds good. > We should retain the optimization when reading from stdin that is a > non-pipe. This can also happen in stdin. If we redirect stdout to

bug#24609: egrep '2\.?[0?9]' datafile does not work as expected

2016-10-04 Thread Norihiro Tanaka
On Tue, 4 Oct 2016 15:38:00 +0800 Lam Bruce wrote: > Dear Sir/Madam: > > I put all files in the atttachment. > >cat datafile > > northwest NW Charles Main 3.0 .98 3 34 > western WE Sharon Gray 5.3 .97 5 23 > southwest SW Lewis Dalsass 2.7 .8 2 18 >

bug#24458: [PATCH] grep: add news entry for fix to bug#24233

2016-09-18 Thread Norihiro Tanaka
On Sun, 18 Sep 2016 10:25:29 -0700 Jim Meyering <j...@meyering.net> wrote: > On Sun, Sep 18, 2016 at 2:27 AM, Norihiro Tanaka <nori...@kcn.ne.jp> wrote: > > I wrote a test case, but did not add new entry. I want to add it to > > news, as the bug is fixed after grep 2.

bug#24458: [PATCH] grep: add news entry for fix to bug#24233

2016-09-18 Thread Norihiro Tanaka
I wrote a test case, but did not add new entry. I want to add it to news, as the bug is fixed after grep 2.25 release. The bug is fixed in commit ad468bbe3df027f29ecb236283084fb60b734f68 by chance. From c27a4ecadd867286730c6b5b96b8bb36dda138c4 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka <n

bug#24262: [PATCH 2/2] dfa: support not newline_anchor of regex

2016-09-08 Thread Norihiro Tanaka
On Thu, 8 Sep 2016 09:48:03 -0700 Paul Eggert wrote: > I installed that on Savannah master grep. Thanks. > I assume this is something I messed up when merging the DFA changes? If so, > sorry about that. It's too bad this part of the code can't be exercised by >

bug#24262: [PATCH 2/2] dfa: support not newline_anchor of regex

2016-09-08 Thread Norihiro Tanaka
On Fri, 2 Sep 2016 22:07:18 -0700 Paul Eggert <egg...@cs.ucla.edu> wrote: > Norihiro Tanaka wrote: > > You say we can simplified by the changes for > > multithreading, but two changes in the patch are needed. > > Thanks, I missed that. I installed your patch, al

bug#24262: [PATCH 2/2] dfa: support not newline_anchor of regex

2016-09-03 Thread Norihiro Tanaka
On Fri, 2 Sep 2016 22:07:18 -0700 Paul Eggert <egg...@cs.ucla.edu> wrote: > Norihiro Tanaka wrote: > > You say we can simplified by the changes for > > multithreading, but two changes in the patch are needed. > > Thanks, I missed that. I installed your patch, al

bug#23932: dfa: use algorithm for single byte character to any single byte character in input text always

2016-09-02 Thread Norihiro Tanaka
On Fri, 2 Sep 2016 20:00:12 -0700 Paul Eggert <egg...@cs.ucla.edu> wrote: > Norihiro Tanaka wrote: > > > I seem that you lost a part > > of my proposition on rebase. If it is not intentional, would you review > > the part again? > > Thanks for catc

bug#24262: [PATCH 2/2] dfa: support not newline_anchor of regex

2016-09-02 Thread Norihiro Tanaka
On Fri, 2 Sep 2016 15:35:22 -0700 Paul Eggert <egg...@cs.ucla.edu> wrote: > Norihiro Tanaka wrote: > > However, the patch adds an argument to dfasyntax(). To synchronize > > between grep and dfa easily, I expect it is applied before dfa is moved > > to gnulib. >

bug#23752: [PATCH] grep: try fgrep matcher for case insensitive matching by grep -F in multibyte locale

2016-09-02 Thread Norihiro Tanaka
On Thu, 1 Sep 2016 09:50:11 -0700 Paul Eggert wrote: > Suppose all the multibyte characters in the pattern are non-letters, so that > case-folding does not affect them. Could grep -iF be fast in that case? I prefer DFA matcher to KWset matcher due to low memory. grep -F

bug#24009: [PATCH] grep: use fastmap in regex

2016-09-02 Thread Norihiro Tanaka
On Thu, 1 Sep 2016 22:32:12 -0700 Paul Eggert <egg...@cs.ucla.edu> wrote: > Norihiro Tanaka wrote: > > I think this patch should be suspended because of this issue. > > I reported it to glibc developers. > > https://sourceware.org/bugzilla/show_bug.cgi?id=2038

bug#24260: [PATCH 1/6] dfa: thread-safety: remove 'dfa' global in dfa.c

2016-08-20 Thread Norihiro Tanaka
On Fri, 19 Aug 2016 18:03:19 -0500 Zev Weiss wrote: > Okay -- so your question is about the necessity of making operations other > than dfaexec() thread-safe? That's reasonable, though (obviously) I went > ahead made the other operations thread-safe anyway. > > 1) It

bug#24260: [PATCH 1/6] dfa: thread-safety: remove 'dfa' global in dfa.c

2016-08-19 Thread Norihiro Tanaka
On Sat, 20 Aug 2016 07:25:06 +0900 Norihiro Tanaka <nori...@kcn.ne.jp> wrote: > Hi Zev, > > Thanks for replying. I say a reverse thing. > > I believe that there is no problem if only dfaexec() is thread safe. In > other words, I think that variables that we mus

bug#24260: [PATCH 1/6] dfa: thread-safety: remove 'dfa' global in dfa.c

2016-08-19 Thread Norihiro Tanaka
On Fri, 19 Aug 2016 16:46:16 -0500 Zev Weiss wrote: > I'm not sure I understand -- the first patch in my series just removes the > global dfa variable and instead passes it as a parameter. This alone doesn't > make the whole thing thread-safe, it's just a first step

bug#24260: [PATCH 1/6] dfa: thread-safety: remove 'dfa' global in dfa.c

2016-08-19 Thread Norihiro Tanaka
On Thu, 18 Aug 2016 05:50:14 -0500 Zev Weiss wrote: > * src/dfa.c: remove global dfa struct. A pointer to a struct dfa is > instead added as a parameter to the functions that had been using the > global. Hi, Why we move global variable DFA into struct dfa, Although

bug#24262: [PATCH 2/2] dfa: support not newline_anchor of regex

2016-08-19 Thread Norihiro Tanaka
On Thu, 18 Aug 2016 09:21:56 -0600 arn...@skeeve.com wrote: > Norihiro Tanaka <nori...@kcn.ne.jp> wrote: > > > The patch introduces not newline_anchor option of regex to dfa. grep is > > always newline_anchor, so newer codes is not used. I expect it is used > > by

bug#24262: [PATCH 2/2] dfa: support not newline_anchor of regex

2016-08-19 Thread Norihiro Tanaka
On Thu, 18 Aug 2016 23:57:27 +0900 Norihiro Tanaka <nori...@kcn.ne.jp> wrote: > The patch introduces not newline_anchor option of regex to dfa. grep is > always newline_anchor, so newer codes is not used. I expect it is used > by sed and gawk. > > However, the p

bug#24262: [PATCH 2/2] dfa: support not newline_anchor of regex

2016-08-18 Thread Norihiro Tanaka
On Thu, 18 Aug 2016 23:57:27 +0900 Norihiro Tanaka <nori...@kcn.ne.jp> wrote: > The patch introduces not newline_anchor option of regex to dfa. grep is > always newline_anchor, so newer codes is not used. I expect it is used > by sed and gawk. > > However, the p

bug#24262: [PATCH 2/2] dfa: support not newline_anchor of regex

2016-08-18 Thread Norihiro Tanaka
is moved to gnulib. From b31ebd2bb5aae54ba46ac3bc88161872b50f9513 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka <nori...@kcn.ne.jp> Date: Thu, 11 Aug 2016 11:53:24 +0900 Subject: [PATCH 2/2] dfa: support not newline_anchor of regex * src/dfa.c (char_context): Define context for not newline_

bug#24261: [PATCH 1/2] dfa: simplify and optimize at initial state in execution

2016-08-18 Thread Norihiro Tanaka
Sep 17 00:00:00 2001 From: Norihiro Tanaka <nori...@kcn.ne.jp> Date: Sun, 14 Aug 2016 11:21:48 +0900 Subject: [PATCH 1/2] dfa: simplify and optimize at initial state in execution * src/dfa.c (skip_remains_mb): Remove argument *pwc. Update calller. (dfaexec_main): Simplify and optimize at i

bug#24250: [PATCH] dfa: simplify to find state index for state 0

2016-08-17 Thread Norihiro Tanaka
Now, state indexes for state 0 state 0 are 0 for CTX_NEWLINE context, D->initstate_notbol for CTX_NONE context and D->min_trcount - 1 for CTX_LETTER. The patch uses them instead of calling state_index(). From bb5fc2fa08e9f2b17d147c3649328254deb84166 Mon Sep 17 00:00:00 2001 From: Norihiro

bug#23932: dfa: use algorithm for single byte character to any single byte character in input text always

2016-08-17 Thread Norihiro Tanaka
On Tue, 16 Aug 2016 23:35:22 +0900 Norihiro Tanaka <nori...@kcn.ne.jp> wrote: > I updated the patch due to change in bug#21486, and added a patch > including a minor change. I wrote third patch. After first patch, we do not have to separate next state by context, transit_state()

bug#21486: [PATCH 3/3] dfa: cache transition from a state with dot expression in non-UTF8 multibyte locales

2016-08-16 Thread Norihiro Tanaka
On Tue, 16 Aug 2016 00:51:52 -0700 Paul Eggert wrote: > Thanks for writing that patch. I installed it in grep master (after > tweaking the commit message a bit) and am marking this bug report as > done. > > I noticed what appears to be a problem in the patch, in the code: >

bug#24159: [PATCH] dfa: minor fix for whether dfa is fast or not

2016-08-06 Thread Norihiro Tanaka
On Fri, 5 Aug 2016 22:02:31 -0700 Jim Meyering wrote: > I have examined the logs, which suggest it was a false positive in a > parallelized "make check" run, due to that test's 3-second timeout. I > have tried repeatedly to reproduce that failure, so far without > success, but

bug#24159: [PATCH] dfa: minor fix for whether dfa is fast or not

2016-08-05 Thread Norihiro Tanaka
On Fri, 5 Aug 2016 13:29:43 -0700 Jim Meyering <j...@meyering.net> wrote: > On Fri, Aug 5, 2016 at 4:30 AM, Norihiro Tanaka <nori...@kcn.ne.jp> wrote: > > dfaoptimize() is not set fast flag even if it is success, but it is wrong. > > If success, dfa matcher use

bug#24159: [PATCH] dfa: minor fix for whether dfa is fast or not

2016-08-05 Thread Norihiro Tanaka
dfaoptimize() is not set fast flag even if it is success, but it is wrong. If success, dfa matcher uses algorithm for single byte, and it is so fast. I think this bug does not affect for grep, but it will affect with the patch that I just sent to gawk.

bug#24009: [PATCH] grep: use fastmap in regex

2016-07-16 Thread Norihiro Tanaka
d be one. From 1337006597a7d7e14993af14e57d47d6b483fb0d Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka <nori...@kcn.ne.jp> Date: Sun, 17 Jul 2016 01:25:18 +0900 Subject: [PATCH] grep: use fastmap in regex * src/dfasearch.c (GEAcompile): Use fastmap in regex. --- src/dfasearch.c |3

bug#24009: [PATCH] grep: use fastmap in regex

2016-07-16 Thread Norihiro Tanaka
real 0.46 user 0.38 sys 0.07 However, if grep uses fastmap, fails in case-fold-titlecase test. It means that grep's behavior differ from sed and gawk, as they use fastmap, although it seems to be a bug in regex. From 1337006597a7d7e14993af14e57d47d6b483fb0d Mon Sep 17 00:00:00 2001 From: Norihiro

bug#23989: [PATCH] dfa: Reindent dfa.c

2016-07-15 Thread Norihiro Tanaka
On Fri, 15 Jul 2016 12:29:38 +0200 Paul Eggert wrote: > Thanks. I think the internal tabs are deliberate, so let's leave those alone. > (Admittedly the code is not consistent in this area.) I installed the other > white-space changes. Thanks. Although I also felt that

bug#23989: [PATCH] dfa: Reindent dfa.c

2016-07-14 Thread Norihiro Tanaka
Reindent this like: indent with indent --no-tabs -l79 -Tsize_t -Tbool -Twint_t -Tposition_set -Tmust dfa.c and adjust it. From 0f36f5c5072caafecf1c774fe60d2cc25ab849a9 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka <nori...@kcn.ne.jp> Date: Fri, 15 Jul 2016 07:44:32 +0900 Subject: [PATC

bug#23983: [PATCH] grep: fix crash with a pattern of alternation of two same characters

2016-07-14 Thread Norihiro Tanaka
1 From: Norihiro Tanaka <nori...@kcn.ne.jp> Date: Thu, 14 Jul 2016 23:45:45 +0900 Subject: [PATCH] grep: fix crash with a pattern of alternation of two same characters grep -F crashes with pattern as 0\n0. This bug is introduced in 966f6586fbce3081ce6e5e2f9b55301b0ec3d2b4. * src/kwset.c (m

bug#23932: dfa: use algorithm for single byte character to any single byte character in input text always

2016-07-10 Thread Norihiro Tanaka
05 [locale C (ref.)] $ time -p env LC_ALL=C src/grep .a.b in real 0.23 user 0.11 sys 0.09 $ time -p env LC_ALL=C src/grep '.\{41\}' in real 0.22 user 0.13 sys 0.06 From 3d0c130808c974f1271561c7433b2aa661c49507 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka <nori...@kcn.ne.jp> Date: Sun, 10 Jul

bug#21486: [PATCH 3/3] dfa: cache transition from a state with dot expression in non-UTF8 multibyte locales

2016-07-09 Thread Norihiro Tanaka
I now rebased previous patch. From 3646ea4418e9dd63706f84f2da13ea0428d8ab75 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka <nori...@kcn.ne.jp> Date: Sat, 12 Sep 2015 12:28:09 +0900 Subject: [PATCH] dfa: cache transition from a state with dot expression in non-UTF8 multibyte locales In no

bug#23752: [PATCH] grep: try fgrep matcher for case insensitive matching by grep -F in multibyte locale

2016-06-12 Thread Norihiro Tanaka
.03 If a pattern has any multibyte character, grep -F is still slow. $ printf '\xb3\xa4\n' >>pat $ time -p env LC_ALL=ja_JP.eucjp src/grep -Fivf pat in real 103.38 user 93.81 sys 2.46 From fe6fe68f0098704846da9e64f56073a5d5171ce5 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka <nori...@kcn.ne.jp>

bug#18800: dfa: prefer bool at DFA interfaces

2016-04-24 Thread Norihiro Tanaka
On Wed, 20 Apr 2016 23:57:46 -0700 Paul Eggert wrote: > In updating Bug#18000's patches to the current grep source, I couldn't build > with just the first patch installed, so I squashed the first two patches into > one. Also, I changed a few more 'int's into 'bool's and

bug#23234: unexpected results with charset handling in GNU grep 2.23

2016-04-09 Thread Norihiro Tanaka
On Wed, 6 Apr 2016 18:25:16 -0700 Paul Eggert wrote: > On 04/06/2016 04:15 PM, Eric Blake wrote: > > And yes, maybe we could change grep to print the "Binary file matches" > > message to stderr, but that in turn will probably break other scripts, > > and lead to even more

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-03-14 Thread Norihiro Tanaka
On Mon, 14 Mar 2016 14:31:50 +0800 JQK wrote: > # env time grep -w -f <(seq 20) <(shuf -i 1-20 -n 250) > : > 288.77user 64.23system 10:35.71elapsed 55%CPU (0avgtext+0avgdata > 3492784maxresident)k > 8967032inputs+0outputs (154389major+1493890minor)pagefaults 0swaps The

bug#22103: bug#20526: grep BUG: text file is detected as binary

2016-01-08 Thread Norihiro Tanaka
On Wed, 6 Jan 2016 09:57:46 -0800 Paul Eggert wrote: > On 01/06/2016 12:32 AM, Paul Eggert wrote: > > I installed the attached patch, which fixed this performance bug for me. > Whoops! I forgot to 'git add src/search.h' before committing. We also need > the attached

bug#20526: grep BUG: text file is detected as binary

2016-01-08 Thread Norihiro Tanaka
On Wed, 6 Jan 2016 09:57:46 -0800 Paul Eggert wrote: > On 01/06/2016 12:32 AM, Paul Eggert wrote: > > I installed the attached patch, which fixed this performance bug for me. > Whoops! I forgot to 'git add src/search.h' before committing. We also need > the attached

bug#20526: grep BUG: text file is detected as binary

2016-01-02 Thread Norihiro Tanaka
bug I recently > introduced here. Thanks, I see that it is good idea, but I propose minor change for your fix. Perhaps, it will be what you want. From d36cf4208363c0f56ff32d38a9fea422342036fe Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka <nori...@kcn.ne.jp> Date: Sat, 2 Jan 2016 00:20:43 +09

  1   2   3   4   >