On Mon, 30 Aug 2021 20:30:36 +0800 (CST)
yangzhuangzhuang wrote:
> The function dfasupported is referenced in the submission below, but
> is not found with the
> definition.commit:https://git.savannah.gnu.org/cgit/grep.git/commit/src?id=ae65513edc80a1b65f19264b9bed95d870602967
dfasupported
On Mon, 4 Jan 2021 09:55:48 -0800
Jim Meyering wrote:
> tags 45432 moreinfo
> stop
>
> On Fri, Dec 25, 2020 at 8:57 AM Fred .Flintstone wrote:
> > It seems --exclude does nothing when --include is used. It would be useful
> > to be able to use both together, in order to do things such as
On Sat, 5 Dec 2020 10:06:27 -0800
Jim Meyering wrote:
> Thank you for that patch. Can you say a little more about the domain
> of the problem?
> I.e., is it specific to invocations with "-w"?
> Can you provide an example that exhibits the performance improvement,
> with timings?
The test case
compared to version 3.3, and that can be remedied.
It converts to grep only if the potential match does not match the word
frequently.
From 1bfcdca658bd91dd6b8e6e3a96c9e77678bb4d2e Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka
Date: Thu, 3 Dec 2020 17:22:50 +0900
Subject: [PATCH] grep: improvemen
On Sun, 1 Nov 2020 11:39:55 -0800
Jim Meyering wrote:
> We must accept the fact that extreme regular expressions will cause
> resource exhaustion like that when processed by classical regex_*
> functions. This is yet another good reason to prefer PCRE and to use
> grep's -P option. In that
Hi,
By the way, I was wondering whether to add the test to ere.tests or
spencer1.tests or to a new file. How should they be used properly?
lar nodes in series are not merged.
From 88bad5597445650f4e1bca663a82d4e4d14c93f3 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka
Date: Sun, 1 Nov 2020 16:31:38 +0900
Subject: [PATCH] dfa: remain similar nodes in series in optimization
DFA was merging similar nodes illegally, example a+a+a as a+a.
Now similar nodes in series are
On Fri, 9 Oct 2020 12:53:47 +0300
Shlomi Fish wrote:
> Hi Norihiro Tanaka!
>
> On Thu, 08 Oct 2020 18:55:50 +0900
> Norihiro Tanaka wrote:
Thanks, not 'unusable' but 'unused' is right.
From 4d91494963ab1645417682af548d162021607f40 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka
On Sat, 26 Sep 2020 18:12:37 -0700
Paul Eggert wrote:
> The patch should be harmless (though this fact isn't trivial) and I can
> see it being useful for plausible future performance improvements, so it
> would make sense to install it after the next release.
No longer need the patch.
This
Hi,
Now the member 'first_end' in struct dfa is used.
It should be removed.
Thanks,
Norihiro
From ce3f6337b651128d405137a58656e623579cf17d Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka
Date: Sat, 26 Sep 2020 09:50:01 +0900
Subject: [PATCH] dfa: remove unused the member of structure
* lib
I attach the fix for the bug. Regex is fixed in Paul, thank you.
From 884c46aadbe6a2f7203f84d4173a515ca4ccf8de Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka
Date: Thu, 24 Sep 2020 10:39:46 +0900
Subject: [PATCH] grep: fix ignore-case Turkish bug
* src/grep.c (fgrep_icase_charlen): Do
In turkish locale, upper and lower case are mapped as following.
U0049 <-> U0131
U0069 <-> U0130
It's expected that both following test cases returns U0130, but later
returns nothing.
$ printf '\304\260\n' >I # U0130
$ env LC_ALL=tr_TR.utf8 grep -i i I
? # U0130
$ env LC_ALL=tr_TR.utf8
409624 in Fcompile (pattern=0x23c1240 "i\n", size=1, ignored=0,
exact=true) at kwsearch.c:56
#4 0x00409378 in main (argc=4, argv=0x7ffe76048388) at grep.c:2977
From 6118c3ee14c6131ec544244b1fabf05c3a913bd6 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka
Date: Wed, 23 Sep 2020 07:33:32 +0900
On Tue, 22 Sep 2020 08:50:03 -0700
Jim Meyering wrote:
> On Tue, Sep 22, 2020 at 7:54 AM Norihiro Tanaka wrote:
> > On Mon, 21 Sep 2020 17:33:25 -0700
> > Jim Meyering wrote:
> ...
> > > Here are the two patches (tested on top of a third that updates to
> > &g
On Mon, 21 Sep 2020 17:33:25 -0700
Jim Meyering wrote:
> On Sun, Sep 20, 2020 at 6:34 PM Jim Meyering wrote:
> >
> > On Sun, Sep 20, 2020 at 12:17 AM Norihiro Tanaka wrote:
> > > Hi,
> > > Performace for as following case is fixed in bug#43040.
> > >
ecute (void *, char const *, size_t, size_t *, char const *);
/* grep.c */
--
1.7.1
From ca0d0c9e79478df4645c15a5a885955d1c6221c9 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka
Date: Sun, 20 Sep 2020 16:00:04 +0900
Subject: [PATCH] dfa: change dfasupported() to global function
* lib/dfa.c
On Wed, 3 Jun 2020 20:26:41 -0700
Andi Kleen wrote:
>
> % grep --version
> grep (GNU grep) 3.4
> ...
> % echo -n > foo
> % grep -v foo foo ; echo $?
> 1
>
> Would expect it to exit with zero in this case, since foo is not in the
> file.
>
> When the file is one byte it works as expected:
>
On Sun, 19 Apr 2020 07:41:49 +0900
Norihiro Tanaka wrote:
>
> On Sat, 18 Apr 2020 00:22:26 +0900
> Norihiro Tanaka wrote:
>
> >
> > On Fri, 17 Apr 2020 10:24:42 +0900
> > Norihiro Tanaka wrote:
> >
> > >
> > > On Fri, 1
On Sat, 18 Apr 2020 00:22:26 +0900
Norihiro Tanaka wrote:
>
> On Fri, 17 Apr 2020 10:24:42 +0900
> Norihiro Tanaka wrote:
>
> >
> > On Fri, 17 Apr 2020 09:35:36 +0900
> > Norihiro Tanaka wrote:
> >
> > >
> > > On Th
On Fri, 17 Apr 2020 10:24:42 +0900
Norihiro Tanaka wrote:
>
> On Fri, 17 Apr 2020 09:35:36 +0900
> Norihiro Tanaka wrote:
>
> >
> > On Thu, 16 Apr 2020 16:00:29 -0700
> > Paul Eggert wrote:
> >
> > > On 4/16/20 3:53 PM, Norihiro Tanaka wrot
On Fri, 17 Apr 2020 09:35:36 +0900
Norihiro Tanaka wrote:
>
> On Thu, 16 Apr 2020 16:00:29 -0700
> Paul Eggert wrote:
>
> > On 4/16/20 3:53 PM, Norihiro Tanaka wrote:
> >
> > > I have had no idea to solve the problem yet. If we revert it, bug#33357
> &
On Thu, 16 Apr 2020 16:00:29 -0700
Paul Eggert wrote:
> On 4/16/20 3:53 PM, Norihiro Tanaka wrote:
>
> > I have had no idea to solve the problem yet. If we revert it, bug#33357
> > will come back.
>
> Yes, I'd rather not revert if we can help it.
>
> My
On Thu, 16 Apr 2020 09:31:32 -0700
Paul Eggert wrote:
> On 4/15/20 11:56 PM, Norihiro Tanaka wrote:
>
> > It seems to a lot of time is spent in dfa.c:replace().
> > It was added at d6df3873c7abc243683d0e8fccbfde4e76f23e53 in gnulib.
>
> It would be pretty dras
+ grep-2.2/src/grep -E -v -m1 -f grep-patterns.txt /dev/null
grep-2.2/src/grep: invalid option -- 'm'
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
real 0.00
user 0.00
sys 0.00
+ grep-2.3/src/grep -E -v -m1 -f grep-patterns.txt /dev/null
grep-2.3/src/grep:
On Sun, 22 Dec 2019 16:57:12 -0800
Paul Eggert wrote:
> On 11/3/18 9:25 PM, Norihiro Tanaka wrote:
>
> > $ seq -f '%040g' 0 | sed '1s/$/\\(0\\)\\1/' >pat
>
> Thanks for the test case and sorry about the delay. And thanks for spotting
> the
> speedup
On Wed, 18 Dec 2019 18:55:01 -0800
Jim Meyering wrote:
> On Tue, Nov 26, 2019 at 2:38 PM Norihiro Tanaka wrote:
> > On Sun, 13 Jan 2019 08:45:47 +0900
> > Norihiro Tanaka wrote:
> > > grep uses KWset matcher for multiple word matching. It is very slow when
> &g
On Sun, 13 Jan 2019 08:45:47 +0900
Norihiro Tanaka wrote:
> Hi,
>
> grep uses KWset matcher for multiple word matching. It is very slow when
> most of the parts matched to a pattern are not words. So, if a part firstly
> matched to pattern is not a word, use the grep matcher t
On Sat, 16 Nov 2019 22:45:56 -0800
Jim Meyering wrote:
> On Sat, Nov 16, 2019 at 8:36 PM Jim Meyering wrote:
> > On Sat, Nov 16, 2019 at 4:02 PM Norihiro Tanaka wrote:
> > > On Sat, 16 Nov 2019 11:00:38 -0800
> > > Jim Meyering wrote:
> > >
>
hed, I found extreamly slowdown.
yes $(printf %040d 0) | head -100 >k
time -p env LC_ALL=ja_JP.eucjp src/grep -F -w 0 k
First patch fixes it, and second improves performance more.
From 0202a83b3d0de224a5d606958e3719244d546548 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka
Date: Sun, 17
On Tue, 15 Oct 2019 12:48:17 +1100
"Trent W. Buck" wrote:
> Package: grep
> Version: 3.3-1
> Severity: wishlist
>
> This bug was originally reported as
> https://bugs.debian.org/940464
>
> Trent W. Buck wrote:
> > (Surely someone has already asked for this, but I can't see where.
> > I may
On Tue, 15 Oct 2019 12:48:17 +1100
"Trent W. Buck" wrote:
> Package: grep
> Version: 3.3-1
> Severity: wishlist
>
> This bug was originally reported as
> https://bugs.debian.org/940464
>
> Trent W. Buck wrote:
> > (Surely someone has already asked for this, but I can't see where.
> > I may
On Sat, 23 Mar 2019 08:06:35 +0900
Norihiro Tanaka wrote:
> A kwset matcher is not built in a grep matcher after token re-order is
> introduced in commit 5c7a0371823876cca7a1347fa09ca26bbbff0c98 in dfa.
> It caused performance degradation in some typical cases. This bug is
> introd
rom fca6a4c3b9e0757637b7a2009ca8b9070a6874f5 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka
Date: Sat, 23 Mar 2019 07:18:37 +0900
Subject: [PATCH] dfa: separate parse and compile phase
DFAMUST() must be called after parse and before tokens re-order which is
introduced in commit 5c7a0371823876cca7a1347fa09ca26bbbff0
Hi,
I pulled current master of grep from git repository and built it on
fedora 29, and recieved following error.
When we have no pcre library, DIE() in Pcompile and Pexecute is called,
but noreturn attribute is set to their functions.
Thanks,
Norihiro
$ make
..
depbase=`echo pcresearch.o
-p src/grep -wf pat inp
real 0.32
user 0.31
sys 0.00
Thanks,
Norihiro
From b4f07fa0288ad68932fc606ed760fd61db9df6d0 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka
Date: Sun, 13 Jan 2019 07:53:32 +0900
Subject: [PATCH] grep: fix slow for multiple word matching
grep uses KWset matcher for multipl
On Sat, 3 Nov 2018 21:02:19 -0700
Paul Eggert wrote:
> Norihiro Tanaka wrote:
> > Even the pattern has no back-references, compilation by regex run for
> > each line. So Syntax errors will be detected as even your present.
>
> OK, but then I'm afraid I don't unde
On Sat, 3 Nov 2018 08:29:39 -0700
Paul Eggert wrote:
> Norihiro Tanaka wrote:
> > By this change, each fragment is divided into
> > groups by whether the fragment includes back reference in a pattern or
> > not. a frgment which includes back reference constitutes group,
17 00:00:00 2001
From: Norihiro Tanaka
Date: Sat, 3 Nov 2018 18:56:18 +0900
Subject: [PATCH] grep: grouping of a pattern with multiple lines
When grep uses regex, it splits a pattern with multiple lines by newline
character into fragments. Compilation and executution run for each
fragment.
env LC_ALL=C src/grep -vf in in
real 39.20
user 20.35
sys 18.78
(After)
$ time -p env LC_ALL=C src/grep -vf in in
real 6.87
user 6.38
sys 0.48
Thanks,
Norihiro
From 65f156cd0e605c11a40877d8c070a185def699e5 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka
Date: Mon, 22 Oct 2018 23:22:40 +0900
Subj
On Tue, 18 Sep 2018 22:13:38 -0700
Jim Meyering wrote:
> Also, when I compared grep compiled at
> 123620af88f55c3e0cc9f0aed7311c72f625bc82 (latest, including your
> changes) and that compiled at the prior commit,
> 9c11510507ebcd31671f10d9b88532f8e6657ad2, I find that the new version
> takes
Paul Eggert wrote:
> Thanks for the patch. A quick question: what does the identifier
> "dfautf8noss" stand for? I couldn't figure it out.
It means "No use superset for utf8".
I thought of various things for the name of the function, but I could
not think of a good name.
rom 3193191730d6ecb3a0c4e38b461484deaf819f87 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka
Date: Mon, 17 Sep 2018 22:20:37 +0900
Subject: [PATCH 1/2] dfa: simplify initial state
To simplify initial state enables to be easy to optimization of NFA.
dfa.c (enum token): Add new element BEG.
(prtok): Adjust due to add
On Wed, 13 Dec 2017 16:03:57 -0800
Paul Eggert <egg...@cs.ucla.edu> wrote:
> On 12/13/2017 03:25 PM, Norihiro Tanaka wrote:
> > I don't seem that that's problem. the user pass output of grep to wc -l,
> > so `Binary file ... matches' line is also counted by `wc' as one
On Tue, 12 Dec 2017 16:28:09 -0800
Paul Eggert <egg...@cs.ucla.edu> wrote:
> On 12/11/2017 03:36 PM, Norihiro Tanaka wrote:
> > Perhaps, characters not to be able to recognize in your locale included
> > in Tieliikenne 5.0.csv and volvot.csv are included.
>
&
On Mon, 11 Dec 2017 23:45:25 +0200
pg wrote:
> $ awk '/Volvo/' Tieliikenne5.0.csv | wc -l
> 266175
> $ grep Volvo Tieliikenne5.0.csv | wc -l
> 1638
> $ awk '/N3/' volvot.csv | wc -l
> 17822
> $ grep N3 volvot.csv | wc -l
> 1701
Perhaps, characters not to be able to
On Mon, 8 May 2017 16:56:31 +0900
Masataka Kawasaki wrote:
> I found a bug on grep 3.0 on 64bit cygwin.
> It seems that '\/' before '$' causes probrems.
>
> grep 2.25(correct)
> >echo rr/| grep '^.*\/$'
> rr/
> >echo rr/| gawk '/^.*\/$/'
> rr/
> >echo rr/|
On Sat, 21 Jan 2017 08:09:00 -0800
Jim Meyering wrote:
> Nice. I am glad you caught that.
> I've adjusted some wording and will push this soon:
Thanks for replying and adjusting quickly. Your adjustment is also very
useful for me to learn English.
grep -Fo may not match longest pattern in grep 2.26 or later including
current master.
$ printf 'abce\n' > in
$ printf 'abcd\nc\nbce\n' > pat
$ LC_ALL=C src/grep -Fof pat in
c
We expect "bce" in this case.
From 2e75efbf90869abfeafc0ab9fcd4fa4b453c0b2a Mon Sep 17 00:00:00 200
(main.c:459)
>
> There may be other paths as well.
>
> Can y'all track this down and fix?
>
> Thanks,
>
> Arnold
Thanks for the report. It is caused by temporarily allocated memory not
freed.
From 3479bce8542f75c11e6b0b9907e22b26d91865ca Mon Sep 17 00:00:00 2001
From: Norihiro Ta
On Tue, 27 Dec 2016 22:37:25 -0800
Paul Eggert <egg...@cs.ucla.edu> wrote:
> Norihiro Tanaka wrote:
> > So I wrote the patch to use fgrep matcher for both.
>
> Thanks, I installed that after tweaking the commit message and omitting
> unnecessary parens.
Thanks, I confirmed it.
On Mon, 26 Dec 2016 12:07:49 -0800
Paul Eggert <egg...@cs.ucla.edu> wrote:
> Norihiro Tanaka wrote:
> > Hmm, how about the following test cases, although it is extreame?
>
> I don't think we need to worry about performance for the case when -w
> is given, an
On Fri, 23 Dec 2016 17:38:42 -0800
Paul Eggert wrote:
> No. Thanks, I hadn't considered that possibility. I looked into the
> slowdown and installed the attached patches, which cause 'grep' to
> run about as fast on this test case as grep 2.25 (though not as fast
> as grep
On Tue, 20 Dec 2016 21:17:01 -0800
Paul Eggert wrote:
> I installed the attached patches into grep master. These fix the
> performance regressions noted at the start of Bug#22357. I see that
> the related performance problems noted in Bug#21763 seem to be fixed
> too, I
On Mon, 19 Dec 2016 15:38:12 -0800
Paul Eggert wrote:
> but the old 'replace' called 'delete' up to N times,
Yes, but constraint == 0 does not happen mostly, so in delete() in
"while" does not pass normally.
> Anyway, I verified that the change improved performance on the
On Sun, 18 Dec 2016 23:48:10 -0800
Paul Eggert wrote:
> >> 'delete' is
> >> O(N); 'replace' calls 'delete' in a loop and is therefore O(N**2).
> >> 'epsclosure' calls 'replace' in a loop and so I suppose it is O(N**3).
> >> I haven't looked into how likely the worst-case
On Wed, 14 Dec 2016 17:19:27 -0800
Paul Eggert wrote:
> I was referring to code with his proposed patch installed. 'delete' is
> O(N); 'replace' calls 'delete' in a loop and is therefore O(N**2).
> 'epsclosure' calls 'replace' in a loop and so I suppose it is O(N**3).
> I
On Sun, 11 Dec 2016 05:28:56 -0600
Trevor Cordes wrote:
> On my box the above runs for >2m (never completes before I ^C) on the
> version **AFTER** the commits (v2.22). On the test build just *BEFORE*
> the commits (2.21.73-8058), it runs in <2s. So for me, I had a
/dev/null
Thanks,
Norihiro
From 19502d13120d612fc89b922c9b28cc3030ea0674 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Sun, 11 Dec 2016 09:35:50 +0900
Subject: [PATCH] dfa: performance improvement for removal of epsilon closure
* lib/dfa.c (delete): Use binary search to find deleted index
haracters.
Thanks,
Norihiro
From 67484a67d7d310d76a2eb80b68a8ec8eb5c6a7fc Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Mon, 28 Nov 2016 22:26:07 +0900
Subject: [PATCH] dfa: avoid match middle in multibyte character
* lib/dfa.c (transit_state): If fails in matchin
haracters.
Thanks,
Norihiro
From 67484a67d7d310d76a2eb80b68a8ec8eb5c6a7fc Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Mon, 28 Nov 2016 22:26:07 +0900
Subject: [PATCH] dfa: avoid match middle in multibyte character
* lib/dfa.c (transit_state): If fails in matchin
On Tue, 15 Nov 2016 11:35:15 -0800
Jim Meyering wrote:
> I suppose you mean in addition to the S_ISFIFO test? That sounds good.
> We should retain the optimization when reading from stdin that is a
> non-pipe.
This can also happen in stdin. If we redirect stdout to
On Tue, 4 Oct 2016 15:38:00 +0800
Lam Bruce wrote:
> Dear Sir/Madam:
>
> I put all files in the atttachment.
>
>cat datafile
>
> northwest NW Charles Main 3.0 .98 3 34
> western WE Sharon Gray 5.3 .97 5 23
> southwest SW Lewis Dalsass 2.7 .8 2 18
>
On Sun, 18 Sep 2016 10:25:29 -0700
Jim Meyering <j...@meyering.net> wrote:
> On Sun, Sep 18, 2016 at 2:27 AM, Norihiro Tanaka <nori...@kcn.ne.jp> wrote:
> > I wrote a test case, but did not add new entry. I want to add it to
> > news, as the bug is fixed after grep 2.
I wrote a test case, but did not add new entry. I want to add it to
news, as the bug is fixed after grep 2.25 release.
The bug is fixed in commit ad468bbe3df027f29ecb236283084fb60b734f68 by
chance.
From c27a4ecadd867286730c6b5b96b8bb36dda138c4 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <n
On Thu, 8 Sep 2016 09:48:03 -0700
Paul Eggert wrote:
> I installed that on Savannah master grep.
Thanks.
> I assume this is something I messed up when merging the DFA changes? If so,
> sorry about that. It's too bad this part of the code can't be exercised by
>
On Fri, 2 Sep 2016 22:07:18 -0700
Paul Eggert <egg...@cs.ucla.edu> wrote:
> Norihiro Tanaka wrote:
> > You say we can simplified by the changes for
> > multithreading, but two changes in the patch are needed.
>
> Thanks, I missed that. I installed your patch, al
On Fri, 2 Sep 2016 22:07:18 -0700
Paul Eggert <egg...@cs.ucla.edu> wrote:
> Norihiro Tanaka wrote:
> > You say we can simplified by the changes for
> > multithreading, but two changes in the patch are needed.
>
> Thanks, I missed that. I installed your patch, al
On Fri, 2 Sep 2016 20:00:12 -0700
Paul Eggert <egg...@cs.ucla.edu> wrote:
> Norihiro Tanaka wrote:
>
> > I seem that you lost a part
> > of my proposition on rebase. If it is not intentional, would you review
> > the part again?
>
> Thanks for catc
On Fri, 2 Sep 2016 15:35:22 -0700
Paul Eggert <egg...@cs.ucla.edu> wrote:
> Norihiro Tanaka wrote:
> > However, the patch adds an argument to dfasyntax(). To synchronize
> > between grep and dfa easily, I expect it is applied before dfa is moved
> > to gnulib.
>
On Thu, 1 Sep 2016 09:50:11 -0700
Paul Eggert wrote:
> Suppose all the multibyte characters in the pattern are non-letters, so that
> case-folding does not affect them. Could grep -iF be fast in that case?
I prefer DFA matcher to KWset matcher due to low memory. grep -F
On Thu, 1 Sep 2016 22:32:12 -0700
Paul Eggert <egg...@cs.ucla.edu> wrote:
> Norihiro Tanaka wrote:
> > I think this patch should be suspended because of this issue.
> > I reported it to glibc developers.
> > https://sourceware.org/bugzilla/show_bug.cgi?id=2038
On Fri, 19 Aug 2016 18:03:19 -0500
Zev Weiss wrote:
> Okay -- so your question is about the necessity of making operations other
> than dfaexec() thread-safe? That's reasonable, though (obviously) I went
> ahead made the other operations thread-safe anyway.
>
> 1) It
On Sat, 20 Aug 2016 07:25:06 +0900
Norihiro Tanaka <nori...@kcn.ne.jp> wrote:
> Hi Zev,
>
> Thanks for replying. I say a reverse thing.
>
> I believe that there is no problem if only dfaexec() is thread safe. In
> other words, I think that variables that we mus
On Fri, 19 Aug 2016 16:46:16 -0500
Zev Weiss wrote:
> I'm not sure I understand -- the first patch in my series just removes the
> global dfa variable and instead passes it as a parameter. This alone doesn't
> make the whole thing thread-safe, it's just a first step
On Thu, 18 Aug 2016 05:50:14 -0500
Zev Weiss wrote:
> * src/dfa.c: remove global dfa struct. A pointer to a struct dfa is
> instead added as a parameter to the functions that had been using the
> global.
Hi,
Why we move global variable DFA into struct dfa, Although
On Thu, 18 Aug 2016 09:21:56 -0600
arn...@skeeve.com wrote:
> Norihiro Tanaka <nori...@kcn.ne.jp> wrote:
>
> > The patch introduces not newline_anchor option of regex to dfa. grep is
> > always newline_anchor, so newer codes is not used. I expect it is used
> > by
On Thu, 18 Aug 2016 23:57:27 +0900
Norihiro Tanaka <nori...@kcn.ne.jp> wrote:
> The patch introduces not newline_anchor option of regex to dfa. grep is
> always newline_anchor, so newer codes is not used. I expect it is used
> by sed and gawk.
>
> However, the p
On Thu, 18 Aug 2016 23:57:27 +0900
Norihiro Tanaka <nori...@kcn.ne.jp> wrote:
> The patch introduces not newline_anchor option of regex to dfa. grep is
> always newline_anchor, so newer codes is not used. I expect it is used
> by sed and gawk.
>
> However, the p
is moved
to gnulib.
From b31ebd2bb5aae54ba46ac3bc88161872b50f9513 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Thu, 11 Aug 2016 11:53:24 +0900
Subject: [PATCH 2/2] dfa: support not newline_anchor of regex
* src/dfa.c (char_context): Define context for not newline_
Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Sun, 14 Aug 2016 11:21:48 +0900
Subject: [PATCH 1/2] dfa: simplify and optimize at initial state in execution
* src/dfa.c (skip_remains_mb): Remove argument *pwc. Update calller.
(dfaexec_main): Simplify and optimize at i
Now, state indexes for state 0 state 0 are 0 for CTX_NEWLINE context,
D->initstate_notbol for CTX_NONE context and D->min_trcount - 1 for
CTX_LETTER. The patch uses them instead of calling state_index().
From bb5fc2fa08e9f2b17d147c3649328254deb84166 Mon Sep 17 00:00:00 2001
From: Norihiro
On Tue, 16 Aug 2016 23:35:22 +0900
Norihiro Tanaka <nori...@kcn.ne.jp> wrote:
> I updated the patch due to change in bug#21486, and added a patch
> including a minor change.
I wrote third patch. After first patch, we do not have to separate next
state by context, transit_state()
On Tue, 16 Aug 2016 00:51:52 -0700
Paul Eggert wrote:
> Thanks for writing that patch. I installed it in grep master (after
> tweaking the commit message a bit) and am marking this bug report as
> done.
>
> I noticed what appears to be a problem in the patch, in the code:
>
On Fri, 5 Aug 2016 22:02:31 -0700
Jim Meyering wrote:
> I have examined the logs, which suggest it was a false positive in a
> parallelized "make check" run, due to that test's 3-second timeout. I
> have tried repeatedly to reproduce that failure, so far without
> success, but
On Fri, 5 Aug 2016 13:29:43 -0700
Jim Meyering <j...@meyering.net> wrote:
> On Fri, Aug 5, 2016 at 4:30 AM, Norihiro Tanaka <nori...@kcn.ne.jp> wrote:
> > dfaoptimize() is not set fast flag even if it is success, but it is wrong.
> > If success, dfa matcher use
dfaoptimize() is not set fast flag even if it is success, but it is wrong.
If success, dfa matcher uses algorithm for single byte, and it is so fast.
I think this bug does not affect for grep, but it will affect with the
patch that I just sent to gawk.
d be one.
From 1337006597a7d7e14993af14e57d47d6b483fb0d Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Sun, 17 Jul 2016 01:25:18 +0900
Subject: [PATCH] grep: use fastmap in regex
* src/dfasearch.c (GEAcompile): Use fastmap in regex.
---
src/dfasearch.c |3
real 0.46
user 0.38
sys 0.07
However, if grep uses fastmap, fails in case-fold-titlecase test. It
means that grep's behavior differ from sed and gawk, as they use fastmap,
although it seems to be a bug in regex.
From 1337006597a7d7e14993af14e57d47d6b483fb0d Mon Sep 17 00:00:00 2001
From: Norihiro
On Fri, 15 Jul 2016 12:29:38 +0200
Paul Eggert wrote:
> Thanks. I think the internal tabs are deliberate, so let's leave those alone.
> (Admittedly the code is not consistent in this area.) I installed the other
> white-space changes.
Thanks. Although I also felt that
Reindent this like:
indent with indent --no-tabs -l79 -Tsize_t -Tbool -Twint_t
-Tposition_set -Tmust dfa.c
and adjust it.
From 0f36f5c5072caafecf1c774fe60d2cc25ab849a9 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Fri, 15 Jul 2016 07:44:32 +0900
Subject: [PATC
1
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Thu, 14 Jul 2016 23:45:45 +0900
Subject: [PATCH] grep: fix crash with a pattern of alternation of two same
characters
grep -F crashes with pattern as 0\n0. This bug is introduced in
966f6586fbce3081ce6e5e2f9b55301b0ec3d2b4.
* src/kwset.c (m
05
[locale C (ref.)]
$ time -p env LC_ALL=C src/grep .a.b in
real 0.23
user 0.11
sys 0.09
$ time -p env LC_ALL=C src/grep '.\{41\}' in
real 0.22
user 0.13
sys 0.06
From 3d0c130808c974f1271561c7433b2aa661c49507 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Sun, 10 Jul
I now rebased previous patch.
From 3646ea4418e9dd63706f84f2da13ea0428d8ab75 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Sat, 12 Sep 2015 12:28:09 +0900
Subject: [PATCH] dfa: cache transition from a state with dot expression in
non-UTF8 multibyte locales
In no
.03
If a pattern has any multibyte character, grep -F is still slow.
$ printf '\xb3\xa4\n' >>pat
$ time -p env LC_ALL=ja_JP.eucjp src/grep -Fivf pat in
real 103.38
user 93.81
sys 2.46
From fe6fe68f0098704846da9e64f56073a5d5171ce5 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
On Wed, 20 Apr 2016 23:57:46 -0700
Paul Eggert wrote:
> In updating Bug#18000's patches to the current grep source, I couldn't build
> with just the first patch installed, so I squashed the first two patches into
> one. Also, I changed a few more 'int's into 'bool's and
On Wed, 6 Apr 2016 18:25:16 -0700
Paul Eggert wrote:
> On 04/06/2016 04:15 PM, Eric Blake wrote:
> > And yes, maybe we could change grep to print the "Binary file matches"
> > message to stderr, but that in turn will probably break other scripts,
> > and lead to even more
On Mon, 14 Mar 2016 14:31:50 +0800
JQK wrote:
> # env time grep -w -f <(seq 20) <(shuf -i 1-20 -n 250)
> :
> 288.77user 64.23system 10:35.71elapsed 55%CPU (0avgtext+0avgdata
> 3492784maxresident)k
> 8967032inputs+0outputs (154389major+1493890minor)pagefaults 0swaps
The
On Wed, 6 Jan 2016 09:57:46 -0800
Paul Eggert wrote:
> On 01/06/2016 12:32 AM, Paul Eggert wrote:
> > I installed the attached patch, which fixed this performance bug for me.
> Whoops! I forgot to 'git add src/search.h' before committing. We also need
> the attached
On Wed, 6 Jan 2016 09:57:46 -0800
Paul Eggert wrote:
> On 01/06/2016 12:32 AM, Paul Eggert wrote:
> > I installed the attached patch, which fixed this performance bug for me.
> Whoops! I forgot to 'git add src/search.h' before committing. We also need
> the attached
bug I recently
> introduced here.
Thanks, I see that it is good idea, but I propose minor change for your
fix. Perhaps, it will be what you want.
From d36cf4208363c0f56ff32d38a9fea422342036fe Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Sat, 2 Jan 2016 00:20:43 +09
1 - 100 of 373 matches
Mail list logo