[PATCH 0/10] search: more mairix prefix compatibility

2016-09-08 Thread Eric Wong
This brings us closer to the behavior of mairix(1) for search by supporting n:, t:, c:, f:, tc:, tcf:, n:, b:, and bs: prefixes as documented in the mairix(1) manpage. We also introduce the use of q: and nq: prefixes for quoted and non-quoted text, respectively. There is a schema version change i

[PATCH 07/10] search: fix compatibility with Debian wheezy

2016-09-08 Thread Eric Wong
Specifying the "d:" field only worked for NumberValueRangeProcessor in older versions of Xapian, such as the one in Debian wheezy (libsearch-xapian-perl=1.2.10.0-1) This slipped through since I rarely use wheezy, anymore, and perhaps nobody else does, either. Perhaps wheezy support may be dropped

[PATCH 02/10] search: drop longer subject: prefix for search

2016-09-08 Thread Eric Wong
We only document the "s:" anyways. While the long name is more descriptive, the ambiguity makes agnostic caching (by Varnish or similar) slightly harder and longer URLs are more likely to be accidentally truncated when shared. --- lib/PublicInbox/Search.pm | 1 - t/search.t| 14 +

[PATCH 08/10] search: avoid mindlessly calling body_set

2016-09-08 Thread Eric Wong
It's not worth entering a complex codepath in Email::MIME to save some (probably immeasurable amount of) memory, here. We've already stopped doing this in our WWW code a while back, too. If we really cared enough about it, we'd prioritize work on a streaming replacement for Email::MIME. --- lib/P

[PATCH 03/10] search: more granular message body searching

2016-09-08 Thread Eric Wong
"bs:" and "b:" are adapted from mairix(1) We will also support searching explicitly for quoted vs non-quoted text via "q:" and "nq:" prefixes since sometimes readers will not care for quoted text. In the future, we will support parsing diffs (perhaps when repobrowse integration is complete). Not

[PATCH 10/10] search: index attachment filenames

2016-09-08 Thread Eric Wong
And while we're at it, ensure searching inside displayable attachment bodies works. --- lib/PublicInbox/Search.pm| 3 ++- lib/PublicInbox/SearchIdx.pm | 4 t/search.t | 44 3 files changed, 50 insertions(+), 1 deletion(-) d

[PATCH 09/10] search: match the behavior of WWW for indexing text

2016-09-08 Thread Eric Wong
The basic rule is that if it is displayable via our WWW interface, it should be indexable text for Xapian search. --- lib/PublicInbox/SearchIdx.pm | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm

[PATCH 04/10] search: fix space regressions from recent changes

2016-09-08 Thread Eric Wong
As of Xapian 1.0.4 (from 2007) is possible to use Search::Xapian::QueryParser::add_prefix multiple times with the same user field name but different term prefixes. This brings my current git@vger mirror from 6.5GB to 2.1GB (both sizes are after xapian-compact). --- lib/PublicInbox/Search.pm|

[PATCH 06/10] search: increase term positions for each quoted hunk

2016-09-08 Thread Eric Wong
We pay a storage cost for storing positional information in Xapian, make good use of it by attempting to preserve it for (hopefully) better search results. --- lib/PublicInbox/SearchIdx.pm | 23 +++ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/lib/PublicInbox

[PATCH 05/10] search: match quote detection behavior of view

2016-09-08 Thread Eric Wong
This is stricter than the mutt quote_regexp default ("^([ \t]*[|>:}#])+" on Debian jessie), but matches what we have in View.pm. I prefer the stricter quote detection since it is less ambiguous and less likely to hide/obscure important details. --- lib/PublicInbox/SearchIdx.pm | 2 +- 1 file chan

[PATCH 01/10] search: allow searching user fields (To/Cc/From)

2016-09-08 Thread Eric Wong
Sometimes it can be useful to search based on who the message was sent to, sent by, or Cc:-ed. Of course, headers can be faked, but they usually are not... Anyways this mostly matches the behavior of mairix(1). --- lib/PublicInbox/Search.pm| 10 +++- lib/PublicInbox/SearchIdx.pm | 59 +++

[PATCH 2/2] import: run "git gc --auto" when done

2016-09-08 Thread Eric Wong
We need to prevent excessive repository growth for public-inbox-watch and public-inbox-mda users. --- lib/PublicInbox/Import.pm | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm index 083fb1b..611f7b1 100644 --- a/lib/PublicInbox/Import.pm ++

[PATCH 1/2] import: hoist out common run_die subroutine

2016-09-08 Thread Eric Wong
We will be reusing this in the next commit, too. --- lib/PublicInbox/Import.pm | 21 ++--- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm index 09dd38d..083fb1b 100644 --- a/lib/PublicInbox/Import.pm +++ b/lib/P

[PATCH 0/2] import: run "git gc --auto"

2016-09-08 Thread Eric Wong
This change is way overdue :x Better late than never, I guess. Eric Wong (2): import: hoist out common run_die subroutine import: run "git gc --auto" when done lib/PublicInbox/Import.pm | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-)

[PATCH] doc: document PERL_INLINE_DIRECTORY usage

2016-09-08 Thread Eric Wong
For now, we will document this since it allows better performance without the burden of extensions. Perhaps one day far in the future Perl can natively support vfork(2) AND that version of Perl will be widely available, but I suspect that day is at least a decade away, if not two: https:/

[PATCH] import: hoist out _check_path function

2016-09-08 Thread Eric Wong
This reduces duplication, slightly. We may be using it yet again in a to-be-introduced function (or we may not introduce it). --- lib/PublicInbox/Import.pm | 37 ++--- 1 file changed, 18 insertions(+), 19 deletions(-) diff --git a/lib/PublicInbox/Import.pm b/lib/P

[PATCH] view: handle missing Content-Type in message

2016-09-08 Thread Eric Wong
Email::MIME internally assumes "text/plain" for messages missing a Content-Type, but does not expose that in the Email::MIME::content_type API method. We must assume it ourselves to avoid uninitialized value warnings for the rare (nowadays) MUAs which do not set it. --- lib/PublicInbox/View.pm |