On Sun, Nov 18, 2018 at 02:14:07AM +0100, Ingo Schwarze wrote:
>
> currently, when you call apropos(1) in the default mode without
> explicitly specifying '=' for substring search or '~' for regular
> expression search, page names and one-line descriptions are
> searched case-insensitively for the substring specified.
>
> It appears that traditionally, FreeBSD apropos used to treat
> the argument as a regular expression in this mode, and so does
> the apropos contained in the man-db package which is common on
> Linux; see: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223556
>
> Yuri Pankov suggests an "#ifdef __FreeBSD__" stunt in portable
> mandoc, but i think switching to regular expressions by default
> would be beneficial for OpenBSD as well: more powerful, and closer
> to what other systems do.
>
> It is quite rare that one wants to search for words including regular
> expression special characters. After the change, it will still be
> possible by either escaping them, as in
>
> $ apropos 'c\+\+'
> $ apropos '\|x\|' # yields trunc(3)
> $ apropos '\$\[' # yields arybase(3p)
>
> or by explicitly requesting substring search with the already
> existing and already documented '=' operator, as in
>
> $ apropos =c++
> $ apropos '=|x|'
> $ apropos =$[
>
> Any concerns about committing the patch below?
>
> Note that i am *not* proposing to change the behaviour with respect
> to case sensitivity. Default behaviour will remain case insensitive,
> substring search will remain always case insensitive. The
> explicit '~' operator will remain case-sensitive, unless the
> already existing and documented option -i is specified.
Unsure about full the implications of breaking backwards compatibility
for the interpretation of special characters, but for the typical usage
and the vast majority of manpage titles I think this makes the default
behavior more powerful without laying a minefield of "gotchas" for the
user. This is nice.
I mean, I guess there's c++(1)/g++(1). Currently "apropos c++" just
finds what you're looking for instead of complaining about RE syntax
like this:
$ apropos c++
apropos: regcomp /c++/: repetition-operator operand invalid
apropos: ignoring trailing
... but that's the best annoying breakage I've got.
ok cheloha@, with one pseudo-nit in-line.
P.S. I had never looked into it before, but this is the behavior
specified for man(1)'s '-k' option since at least SUSv2. That is,
arguments to "man -k" should, according to the spec, be interpreted
as case-insensitive extended regular expressions and not merely
string literals.
So, as 'man -k' is just apropos(1), this change would make man(1)
more compliant with POSIX.1-2008, which we claim now in man.1 with
its current (apparently non-compliant?) behavior anyway.
Unclear if this is accidental or what.
> Index: apropos.1
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/apropos.1,v
> retrieving revision 1.39
> diff -u -p -r1.39 apropos.1
> --- apropos.1 23 Feb 2018 18:53:49 -0000 1.39
> +++ apropos.1 18 Nov 2018 00:33:47 -0000
> @@ -51,8 +51,7 @@ searches for
> .Xr makewhatis 8
> databases in the default paths stipulated by
> .Xr man 1
> -and uses case-insensitive substring matching
> -.Pq the Cm = No operator
> +and uses case-insensitive regular expression matching
You could specify that these are extended, i.e. not basic, regular
expressions. I always appreciate when it's spelled out, but my
guess is that most people assume EREs when it isn't specified.
Up to you.
> over manual names and descriptions
> .Pq the Li \&Nm No and Li \&Nd No macro keys .
> Multiple terms imply pairwise
> Index: mansearch.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/mansearch.c,v
> retrieving revision 1.60
> diff -u -p -r1.60 mansearch.c
> --- mansearch.c 22 Aug 2017 17:50:02 -0000 1.60
> +++ mansearch.c 18 Nov 2018 00:33:47 -0000
> @@ -764,8 +764,9 @@ exprterm(const struct mansearch *search,
> cs = 0;
> } else if ((val = strpbrk(argv[*argi], "=~")) == NULL) {
> e->bits = TYPE_Nm | TYPE_Nd;
> - e->match.type = DBM_SUB;
> - e->match.str = argv[*argi];
> + e->match.type = DBM_REGEX;
> + val = argv[*argi];
> + cs = 0;
> } else {
> if (val == argv[*argi])
> e->bits = TYPE_Nm | TYPE_Nd;
>