Re: [PATCH] disallow NUL characters in Message-ID and List-Id

2023-11-27 Thread Eric W. Biederman
Eric Wong writes: > While MTAs seem to stop '\0' from appearing in headers, users > fetching archives via git remain susceptible to having '\0' land > in archives. So we'll filter them out of Xapian and SQLite DBs > to avoid interopability problems with CLI tools since there's no > known message

Re: Read emails in the archive

2021-10-15 Thread Eric W. Biederman
Eric Wong writes: > Jεan Sacren wrote: >> public-inbox developers, >> >> I'm totally new to public-inbox. But I checked out the whole tree and >> built using the master branch[0]. >> >> If I execute this[1]: >> >> git clone --mirror http://lore.kernel.org/netdev/0 netdev/git/0.git >>

Could public-inbox do something helpful with .mailmap?

2020-08-17 Thread Eric W. Biederman
I just dug up some old emails and I got at least one persons current email address wrong because they have changed their email address frequently. They have an update to their preferred email address in the .mailmap in the linux-kernel source. Is there any chance public-inbox could look at .mai

Re: [PATCH] t/import: test for nasty characters

2020-07-04 Thread Eric W. Biederman
Eric Wong writes: > Eric Wong wrote: >> "Eric W. Biederman" wrote: >> > - $name =~ tr/<>//d; >> > + $name =~ tr/\n\r<>$/ /d; >> >> Is getting rid of '$' an effort to avoid double interpolation by Per

[PATCH] Import: Be more careful with names in email

2020-07-03 Thread Eric W. Biederman
n the future transform a few more characters into spaces, and don't use string interpolation, use comma separated variables instead. Signed-off-by: "Eric W. Biederman" --- I honestly don't know if I have closed all of the holes when implementing the code this way. But this chang

Re: [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox

2020-05-16 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> Eric Wong writes: >> > "Eric W. Biederman" wrote: >> >> > The email messages are placed without modification into the public >> >> > inbox repository so minimize changes

Re: [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox

2020-05-16 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> ebied...@xmission.com (Eric W. Biederman) writes: >> >> > The command imap_fetch connects to the specified imap mailbox and >> > fetches any unfetch messages than waits with imap idle until there ar

Re: [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox

2020-05-15 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes: > The command imap_fetch connects to the specified imap mailbox and > fetches any unfetch messages than waits with imap idle until there are > more messages to fetch. > > By default messages are placed in the specified public inbox

[PATCH] PublicInbox::Inbox.pm: Default unset address to a one element array

2020-05-15 Thread Eric W. Biederman
PublicInbox::Config.pm::_fill() assumes that address is an array. Therefore when handling an unset address use an array containing a single string, instead of a single string. Signed-off-by: "Eric W. Biederman" --- I accidentially created a public inbox without an address at some

[PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox

2020-05-15 Thread Eric W. Biederman
mirror so I don't want automation to accidentally cause something important to be lost. No email messages are deleted from the server instead IMAPTracker is used to remember which messages were downloaded. Signed-off-by: "Eric W. Biederman" --- scripts/i

[PATCH 1/2] IMAPTracker: Add a helper to track our place in reading imap mailboxes

2020-05-15 Thread Eric W. Biederman
multiple IMAP mailboxes you will need multiple trackers. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/IMAPTracker.pm | 73 ++ 1 file changed, 73 insertions(+) create mode 100644 lib/PublicInbox/IMAPTracker.pm diff --git a/lib/PublicInbox/IMAPTr

Re: I have figured out IMAP IDLE

2020-05-14 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> Eric Wong writes: >> > Is your stuff based on Mail::IMAPClient still working well? >> >> Yes. I have fixed a few things to make the code more robust. >> Which is mostly me learning how Mail::IMA

Re: I have figured out IMAP IDLE

2020-05-13 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> >> A few days ago I stumbled upon this magic decoder ring for IMAP. >> The "Ten Commandments of How to Write an IMAP client" >> >> https://www.washington.edu/imap/documentation/commndmt.txt

Re: I have figured out IMAP IDLE

2019-11-03 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> >> A few days ago I stumbled upon this magic decoder ring for IMAP. >> The "Ten Commandments of How to Write an IMAP client" >> >> https://www.washington.edu/imap/documentation/commndmt.tx

Re: RFC: monthly epochs for v2

2019-10-29 Thread Eric W. Biederman
Eric Wong writes: > Konstantin Ryabitsev wrote: >> On Fri, Oct 25, 2019 at 12:22:14PM +, Eric Wong wrote: >> > > I'm not sure about a libpublicinbox... I have been really >> > > hesitant to depend on shared C/C++ libraries whenever I use Perl >> > > or Ruby because of build and install compl

I have figured out IMAP IDLE

2019-10-29 Thread Eric W. Biederman
A few days ago I stumbled upon this magic decoder ring for IMAP. The "Ten Commandments of How to Write an IMAP client" https://www.washington.edu/imap/documentation/commndmt.txt The part I was most clearly missing was that for IMAP it is better to open multiple sockets (one per mail folder on t

Re: [PATCH 12/14] mda: support multiple List-ID matches

2019-10-28 Thread Eric W. Biederman
l whose List-ID won't be configured. > Cc: Eric W. Biederman > Link: https://public-inbox.org/meta/87pniltscf@x220.int.ebiederm.org/ > --- > lib/PublicInbox/MDA.pm| 19 +-- > script/public-inbox-learn | 5 +++-- > script/public-inbox-mda | 7 +

Re: what should happen when mda sees multiple List-ID headers?

2019-10-25 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> There are two reasonable things that can be done, and I suggest >> we do them both. >> - Print a warning. (To be deleted if this case turns out to be common). >> - Deliver to all of the lists you have mail

Re: what should happen when mda sees multiple List-ID headers?

2019-10-24 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: > > ... nothing? Just checked my mail server and it's not out of > space and I'm not seeing any errors in logs. Anyways I'm > offline for a bit and will be back (hopefully :x) Apologies, I accidentally

Re: what should happen when mda sees multiple List-ID headers?

2019-10-24 Thread Eric W. Biederman
Eric Wong writes: > Given my recent traumatic experience[*] around multiple > From/To/Cc/Subject headers; I guess we should prepare for the > possibility of multiple List-ID headers showing up in -mda. > > Right now, we handle the first one (and I'm updating -learn to > support List-ID, too); but

Re: what should happen when mda sees multiple List-ID headers?

2019-10-24 Thread Eric W. Biederman
-- unsubscribe: meta+unsubscr...@public-inbox.org archive: https://public-inbox.org/meta/

Re: [PATCH 1/4] PublicInbox::Import Smuggle a raw message into add

2019-10-15 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> >> Date: Tue, 15 Jan 2019 16:36:42 -0600 >> >> I don't trust the MIME type to not munge my email messages in horrible >> ways upon occasion. Therefore allow for passing in the raw message

Re: ibx->{listid} autoviv fixup [was: [PATCH 0/4] Various bits to support import_imap_mailbox]

2019-10-10 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> Eric Wong, >> >> These should all of my generic patches to support my import_imap_mailbox >> script. The really important patch that adds to the support for List-ID >> to public inbox configuration

Re: [PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order

2019-10-10 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> >> Date: Thu, 16 May 2019 19:26:47 -0500 >> >> To make the results reproducible and comprehensible when >> a large number of mail boxes are being processed process the >> mail boxes

Re: Do I need multiple publicinbox..address values?

2019-10-10 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> Eric Wong writes: >> >> > "Eric W. Biederman" wrote: >> >> my $tracker = PublicInbox::IMAPTracker->new(); >> > >> > Thanks. What's PublicInbox::IMAPTracker?

[PATCH 4/4] IMAPTracker: Add a helper to track our place in reading imap mailboxes

2019-10-09 Thread Eric W. Biederman
Date: Fri, 27 Jul 2018 20:54:27 -0500 This removes the need to delete from an imap mailbox when downloading it's messages. Signed-off-by: "Eric W. Biederman" --- This is simple and potentially very useful. lib/PublicInbox/IMAPTracker.pm | 73 +++

[PATCH 3/4] Config.pm: Add support for looking up repos by their directories

2019-10-09 Thread Eric W. Biederman
Date: Sun, 29 Jul 2018 15:52:57 -0500 Signed-off-by: "Eric W. Biederman" --- Hmm. I thought I was using this but now that I am quickly checking I don't see this being used anywhere. I think PublicInbox::Admin::resolve_inboxes has superceded this functionality. Please feel fr

[PATCH 2/4] PublicInbox::Config: Process mailboxes in sorted order

2019-10-09 Thread Eric W. Biederman
Date: Thu, 16 May 2019 19:26:47 -0500 To make the results reproducible and comprehensible when a large number of mail boxes are being processed process the mail boxes in sorted order, instead of in random hash order. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/Config.pm

[PATCH 1/4] PublicInbox::Import Smuggle a raw message into add

2019-10-09 Thread Eric W. Biederman
Date: Tue, 15 Jan 2019 16:36:42 -0600 I don't trust the MIME type to not munge my email messages in horrible ways upon occasion. Therefore allow for passing in the raw message value instead of trusting the mime object to preserve it. Signed-off-by: "Eric W. Biederman" ---

[PATCH 0/4] Various bits to support import_imap_mailbox

2019-10-09 Thread Eric W. Biederman
Eric Wong, These should all of my generic patches to support my import_imap_mailbox script. The really important patch that adds to the support for List-ID to public inbox configuration file I have already sent. I haven't written tests and I get the following test failure when I run make test

Re: Do I need multiple publicinbox..address values?

2019-10-09 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> my $tracker = PublicInbox::IMAPTracker->new(); > > Thanks. What's PublicInbox::IMAPTracker? Something that keeps the last fetched UID in an sqlite database. I will follow up with a patch for that as well

Re: Do I need multiple publicinbox..address values?

2019-10-08 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> Eric Wong writes: >> >> > Alyssa Ross wrote: >> > >> >> Subject: Do I need multiple publicinbox..address values? >> > >> > Absolutely not >> > >> &g

[PATCH] Config.pm: Add support for mailing list information

2019-10-08 Thread Eric W. Biederman
the configuration that is configured to handle that mailing list. Signed-off-by: "Eric W. Biederman" --- The relevant snippet from my imap import program looks like: sub list_hdr_ibx($$) { my ($config, $list_hdr) = @_; my $list_id; if ($list_hdr =~ m/\0/) {

Re: Do I need multiple publicinbox..address values?

2019-10-08 Thread Eric W. Biederman
Eric Wong writes: > Alyssa Ross wrote: > >> Subject: Do I need multiple publicinbox..address values? > > Absolutely not > >> Suppose I have a mailing list, foo-disc...@example.org, and a >> public-inbox set up, subscribed to that mailing list, that is subscribed >> to that list as public-inbox+f

Re: Git-only operation mode

2019-09-25 Thread Eric W. Biederman
Eric Wong writes: > Konstantin Ryabitsev wrote: >> On Wed, Sep 25, 2019 at 07:45:03PM +, Eric Wong wrote: >> > > Is there a way to run just the archiver component of public-inbox -- >> > > just >> > > writing to git repos without any of the indexing/frontend bits? One of >> > > the >> > > i

Re: Q: Did you do something to message number recently?

2019-06-25 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> The add method computes the number using num_for, which uses >> Msgmpa::mid_insert. >> >> Short of the sequence number for msgmap getting scrambled I don't >> see how that can go wrong. Sigh. &

Re: Q: Did you do something to message number recently?

2019-06-24 Thread Eric W. Biederman
Eric Wong writes: > Eric Wong wrote: >> "Eric W. Biederman" wrote: >> > >> > Eric, >> > >> > I am just starting to dig into this, I just noticed that I have several >> > inboxes that are seeing huge skips in message numbers

Q: Did you do something to message number recently?

2019-06-24 Thread Eric W. Biederman
Eric, I am just starting to dig into this, I just noticed that I have several inboxes that are seeing huge skips in message numbers assigned in msgmap. Do you have any idea why this would be? If not I will dig in and figure this out. I just figured I would ask in case you have any handy canid

Re: [PATCH] PublicInbox::Import Extend add with a optional raw message parameter

2019-05-19 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> Eric Wong writes: >> >> > "Eric W. Biederman" wrote: >> >> >> >> I don't trust the MIME type to not munge my email messages in horrible >> >> ways u

Re: [RFC][PATCH] Config.pm: Add support for mailing list information

2019-05-18 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> >> The world has turned since I first started following mailing lists and >> to my surprise every mailling list that I am subscribed to properly >> sets the "List-ID:" mailing list header. So inst

Re: [PATCH] PublicInbox::Import Extend add with a optional raw message parameter

2019-05-18 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> >> I don't trust the MIME type to not munge my email messages in horrible >> ways upon occasion. Therefore allow for passing in the raw message >> value instead of trusting the mime object to

[PATCH] PublicInbox::Import Extend add with a optional raw message parameter

2019-05-16 Thread Eric W. Biederman
I don't trust the MIME type to not munge my email messages in horrible ways upon occasion. Therefore allow for passing in the raw message value instead of trusting the mime object to preserve it. Signed-off-by: "Eric W. Biederman" --- The context here is because the only c

[PATCH] PublicInbox::Import::add: Consolidate subject handling

2019-05-16 Thread Eric W. Biederman
Consolidate subject handling in the add function to make it easier to read and understand. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/Import.pm | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Impor

[RFC][PATCH] Config.pm: Add support for mailing list information

2019-05-16 Thread Eric W. Biederman
handle that mailing list. Signed-off-by: "Eric W. Biederman" --- Some context. I have my mailing list email coming in via imap, and I have a script that looks at List-ID and delivers them to the appropriate public-inbox. I am hoping to get my script at least into the PublicInbox sc

Re: IMAP server [was: Q: V2 format]

2018-10-01 Thread Eric W. Biederman
Johannes Berg writes: > On Fri, 2018-09-28 at 23:01 +0200, Eric W. Biederman wrote: >> >> I have looked at gnus and there is support in there for performing >> searches via the old gmane web interface. Public inbox already provides >> an attribute that tells you what

Re: IMAP server [was: Q: V2 format]

2018-09-28 Thread Eric W. Biederman
Johannes Berg writes: > Sorry to just jump into an old thread; I was wondering about IMAP server > support as well, in particular because unlike NNTP that allows pushing > the search to the server, and that would be useful for local archives. > >> Hosting an IMAP/POP3 server is way more overhead

Re: [PATCH] Import.pm: When purging replace a purged file with a zero length file

2018-08-10 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> >> This ensures that the number of added files remains the same and thus >> the article numbers derived from a repository will remain the same. >> >> I think this is the last place in public-inbox

[PATCH] Import.pm: When purging replace a purged file with a zero length file

2018-08-09 Thread Eric W. Biederman
archive. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/Import.pm | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm index bfa7a8053297..3df7d98f298b 100644 --- a/lib/PublicInbox/Import.pm +++ b/lib/P

Re: [WIP] searchidx: support incremental indexing on indexlevel=basic

2018-08-02 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes: > ebied...@xmission.com (Eric W. Biederman) writes: > >> Eric Wong writes: >> >>> I wrote: >>>> While testing this, it looks like I introduced a bug to >>>> indexlevel=basic which broke increment

Re: [WIP] searchidx: support incremental indexing on indexlevel=basic

2018-08-02 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes: > Eric Wong writes: > >> I wrote: >>> While testing this, it looks like I introduced a bug to >>> indexlevel=basic which broke incremental indexing when I made it >>> possible to upgrade to (medium|ful

Re: [WIP] searchidx: support incremental indexing on indexlevel=basic

2018-08-02 Thread Eric W. Biederman
Eric Wong writes: > I wrote: >> While testing this, it looks like I introduced a bug to >> indexlevel=basic which broke incremental indexing when I made it >> possible to upgrade to (medium|full). Patch coming for that in >> a bit... > > Eep, I think there's deeper problems with indexlevel=basic

Re: [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned

2018-08-02 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> --- a/lib/PublicInbox/Msgmap.pm >> +++ b/lib/PublicInbox/Msgmap.pm >> @@ -51,6 +51,10 @@ sub new_file { >> $dbh->begin_work; >> $self->created_at(time) unles

[PATCH 11/13] SearchIdx,V2Writeable: Update num_highwater on optimized deletes

2018-08-01 Thread Eric W. Biederman
syncs. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/SearchIdx.pm | 3 ++- lib/PublicInbox/V2Writable.pm | 10 ++ 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 2532c8dfd10d..54f82aa8e

[PATCH 13/13] V2Writeable.pm: In unindex_oid delete the message from msgmap

2018-08-01 Thread Eric W. Biederman
Now that we track the num highwater mark it is safe to remove messages from msgmap that have been previously allocated. Removing even the highest numbered article will no longer cause new message numbers to move backwards. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/V2W

[PATCH 12/13] V2Writeable.pm: Ensure that a found message number is in the msgmap

2018-08-01 Thread Eric W. Biederman
number to the msgmap in case it is not currently present. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/V2Writable.pm | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index 92d2672c78c4..4dd14331a78f 100644

[PATCH 10/13] t/v[12]reindex.t: Verify the num highwater is as expected

2018-08-01 Thread Eric W. Biederman
Instrument the tests to verify the highwater num highwater mark is where it is expected. Signed-off-by: "Eric W. Biederman" --- t/v1reindex.t | 10 ++ t/v2reindex.t | 7 +++ 2 files changed, 17 insertions(+) diff --git a/t/v1reindex.t b/t/v1reindex.t index 87

[PATCH 04/13] t/v[12]reindex.t: Place expected second in Xapian tests

2018-08-01 Thread Eric W. Biederman
Place the expected value second in is and isnt tests because when these tests fail they report the second value as the expected value. A report saying got 0 expected 8 'no Xapian search results' can be confusing. Signed-off-by: "Eric W. Biederman" --- t/v1reindex.t | 6 +

[PATCH 02/13] t/v1reindex.t: Isolate the test cases

2018-08-01 Thread Eric W. Biederman
While inspecting the tests I realized that because we have been reusing variables there can be a memory between one test case and another. Add scopes and local variables to prevent an unintended memory between one test and another. Signed-off-by: "Eric W. Biederman" --- t/v1reind

[PATCH 05/13] t/v[12]reindex.t: Test that the resulting msgmap is as expected

2018-08-01 Thread Eric W. Biederman
Deeply inspect the entire message map in the reindexing tests as the actual message order is significant and can result in surprises. Signed-off-by: "Eric W. Biederman" --- t/v1reindex.t | 35 +++ t/v2reindex.t | 33

[PATCH 06/13] t/v[12]reindex.t: Test incremental indexing works

2018-08-01 Thread Eric W. Biederman
because things don't yet work as they should. Signed-off-by: "Eric W. Biederman" --- t/v1reindex.t | 194 + t/v2reindex.t | 195 ++ 2 files changed, 389 insertions(+) diff --git a/t/

[PATCH 03/13] t/v2reindex.t: Isolate the test cases more

2018-08-01 Thread Eric W. Biederman
While inspecting the tests I realized that because we have been reusing variables there can be a memory between one test case and another. Add scopes and local variables to prevent an unintended memory between one test cases. Signed-off-by: "Eric W. Biederman" --- t/v2rein

[PATCH 09/13] t/v[12]reindex.t Verify num_highwater

2018-08-01 Thread Eric W. Biederman
Signed-off-by: "Eric W. Biederman" --- t/v1reindex.t | 7 +++ t/v2reindex.t | 7 +++ 2 files changed, 14 insertions(+) diff --git a/t/v1reindex.t b/t/v1reindex.t index 8e78aa761333..876c9db3441a 100644 --- a/t/v1reindex.t +++ b/t/v1reindex.t @@ -246,6 +246,7 @@ ok(!-d $xa

[PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned

2018-08-01 Thread Eric W. Biederman
update the indexers to use this value. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/Msgmap.pm | 23 +-- lib/PublicInbox/SearchIdx.pm | 8 lib/PublicInbox/V2Writable.pm | 4 ++-- 3 files changed, 27 insertions(+), 8 deletions(-) diff -

[PATCH 07/13] SearchIdx.pm: Always assign numbers backwards during incremental indexing

2018-08-01 Thread Eric W. Biederman
When walking messages newest to oldest, assigning the larger numbers before smaller numbers ensures older messages get smaller numbers. This leads to the possibility of a msgmap that can be regenerated when needed. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/SearchI

[PATCH 01/13] Import.pm: Don't assume {in} and {out} always exist

2018-08-01 Thread Eric W. Biederman
the code reveals this can happen anytime gfi_start has not been called. So just fix atfork_child to skip closing file descriptors that have not yet been setup. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/Import.pm | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/PublicIn

[PATCH 00/13]: Incremental index fixes

2018-08-01 Thread Eric W. Biederman
that can be very confusing as old messages show up before newer ones. Finally in v2 deleted messages have not been being deleted from the msgmap. Which while great for keeping message numbers from going backwards it means things still show up that shouldn't. Eric W. Biederman (13): Impo

Re: [RFC][PATCH] ProcessPipe.pm: Use read not sysread

2018-07-30 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> >> While playing with git fast export I discovered that mixing <> and >> read would give inconsistent results. I tracked the issue down to >> using sysread in ProcessPipe instead of plain read. >&g

[RFC][PATCH] ProcessPipe.pm: Use read not sysread

2018-07-29 Thread Eric W. Biederman
cient needs to use buffered I/O. Signed-off-by: "Eric W. Biederman" --- Am I missing something or was this a fundamental bug from the beginning? lib/PublicInbox/ProcessPipe.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/ProcessPipe.pm b/lib/Publi

Re: Searching via git grep?

2018-07-20 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes: > Eric Wong writes: > >> "Eric W. Biederman" wrote: >>> My current goal is to make it pleasant to read linux-kernel and possibly >>> other large archives on my personal machine. Right now the git &

Re: Searching via git grep?

2018-07-20 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> My current goal is to make it pleasant to read linux-kernel and possibly >> other large archives on my personal machine. Right now the git >> trees for linux-kernel are aboug 6.8G. Small enough to fit in RAM

Re: Searching via git grep?

2018-07-19 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> Have you considered searching public inboxes via git grep? > > Not yet... > >> For a big server lore.kernel.org with a lot of searches and a lot of >> clients it might not make sense. But for home use wh

Re: [PATCH] Import.pm: Deal with potentially missing From and Sender headers

2018-07-19 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> >> Use ||= '' to ensure that if the From or Sender header is not present >> the code sees an empty string and instead of undefined. >> >> I had some email messages with a From field without

Searching via git grep?

2018-07-19 Thread Eric W. Biederman
Have you considered searching public inboxes via git grep? For a big server lore.kernel.org with a lot of searches and a lot of clients it might not make sense. But for home use where searches are rare and the indexes can not be kept in ram, but the mailbox might fit git grep sounds attractive?

[PATCH] Import.pm: Deal with potentially missing From and Sender headers

2018-07-19 Thread Eric W. Biederman
Use ||= '' to ensure that if the From or Sender header is not present the code sees an empty string and instead of undefined. I had some email messages with a From field without an @ (because the sender was local) and without a Sender which were causing errors when imported. I think this was ba

Re: [PATCH v2 1/3] Making the search indexes optional

2018-07-19 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> >> This is my respin of these patches. I have used the levels: >> full, medium, basic. >> >> I think basic conveys the message that it is ok to run with and you can >> expect most things to wor

[PATCH v2 3/4] public-inbox-init: Initialize indexlevel

2018-07-18 Thread Eric W. Biederman
If indexlevel is specified on the command line prefer that. If indexlevel is specified in the config file prefer that. If indexlevel is not specified anywhere default to full. This should make indexlevel somewhat approachable. Signed-off-by: "Eric W. Biederman" --- I believe t

[PATCH v2 3/3] SearchIdx: Allow the amount of indexing be configured

2018-07-18 Thread Eric W. Biederman
themselves. Update the reindex tests to exercise the full medium and basic code paths Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/Config.pm| 2 +- lib/PublicInbox/SearchIdx.pm | 8 +++ t/v1reindex.t| 43 +++- t/v2rein

[PATCH v2 2/3] SearchIdx: Add the mechanism for making all Xapian indexing optional

2018-07-18 Thread Eric W. Biederman
call is made conditional upon index levels of 'full' and 'medium'. The index levels that index positions and terms the two things public-inbox uses Xapian to index. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/SearchIdx.pm | 172 ++---

[PATCH v2 1/3] SearchIdx.pm: Make indexing search positions optional

2018-07-18 Thread Eric W. Biederman
os as well. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/SearchIdx.pm | 94 +++- 1 file changed, 49 insertions(+), 45 deletions(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 0e0796c12c12..b19618c71508 100644 -

[PATCH v2 1/3] Making the search indexes optional

2018-07-18 Thread Eric W. Biederman
o run with all 3 different levels so at least these code paths get exercised. Eric W. Biederman (3): SearchIdx.pm: Make indexing search positions optional SearchIdx: Add the mechanism for making all Xapian indexing optional SearchIdx: Allow the amount of indexing be configured

Re: [PATCH 3/3] SearchIdx: Allow the amount of indexing be configured

2018-07-18 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> This adds a new inbox configuration option 'indexlevel' that can take >> the values 'positions', 'terms', and 'over'. > > The names of these user-facing configuration

[PATCH 3/4] t/v2reindex.t: Don't reuse $ibx as two different kinds of variable

2018-07-17 Thread Eric W. Biederman
Signed-off-by: "Eric W. Biederman" --- t/v2reindex.t | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/t/v2reindex.t b/t/v2reindex.t index f8e2b1b2d46e..5bc307f1cac1 100644 --- a/t/v2reindex.t +++ b/t/v2reindex.t @@ -14,13 +14,13 @@ foreach my $mod (qw(DBD::SQL

[PATCH 4/4] t/v2reindex.t: Swap the order of minmax tests so errors make sense

2018-07-17 Thread Eric W. Biederman
Previously if a minmax test failed it would say it was expecting the incorrect value, which is confusing when looking into why the test fails. Signed-off-by: "Eric W. Biederman" --- t/v2reindex.t | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/t/v2rei

[PATCH 2/4] t/search.t t/v2writable.t: Teach search tests to fail more cleanly.

2018-07-17 Thread Eric W. Biederman
Now that some of the indexes are optionals these tests might fail so teach them to fail more cleanly. Signed-off-by: "Eric W. Biederman" --- t/search.t | 45 ++--- t/v2writable.t | 2 +- 2 files changed, 27 insertions(+), 20 deletions(-)

[PATCH 1/4] t/v2reindex.t: Ensure the numbers 1 to 10 are used

2018-07-17 Thread Eric W. Biederman
Signed-off-by: "Eric W. Biederman" --- t/v2reindex.t | 1 + 1 file changed, 1 insertion(+) diff --git a/t/v2reindex.t b/t/v2reindex.t index 9bc271fc2d35..f8e2b1b2d46e 100644 --- a/t/v2reindex.t +++ b/t/v2reindex.t @@ -48,6 +48,7 @@ if ('test remove later') { $im->done;

[PATCH 0/4] minor test cleanups

2018-07-17 Thread Eric W. Biederman
While developing the ability to disable the indexes I found a few places where the existing tests could be slightly improved. Here are my improvements. Eric W. Biederman (4): t/v2reindex.t: Ensure the numbers 1 to 10 are used t/search.t t/v2writable.t: Teach search tests to fail

[PATCH 2/3] SearchIdx: Add the mechanism for making all Xapian indexing optional

2018-07-17 Thread Eric W. Biederman
. The new call is made conditional upon index levels of 'position' and 'terms' The two things public-inbox uses Xapian to index. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/SearchIdx.pm | 171 ++- 1 file changed, 88 insert

[PATCH 3/3] SearchIdx: Allow the amount of indexing be configured

2018-07-17 Thread Eric W. Biederman
except the positions of terms is indexed. When set to 'over' terms and positions are not indexed. Just the Overview database for NNTP is created. Which is still quite good and allows searching for messages by Message-ID. But there are no indexes to support searching inside the em

[PATCH 1/3] SearchIdx.pm: Make indexing search positions optional

2018-07-17 Thread Eric W. Biederman
os as well. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/SearchIdx.pm | 94 +++- 1 file changed, 49 insertions(+), 45 deletions(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 0e0796c12c12..cc92c389a152 100644 -

[PATCH 0/3] Making the search indexes optional

2018-07-17 Thread Eric W. Biederman
change. Eric W. Biederman (3): SearchIdx.pm: Make indexing search positions optional SearchIdx: Add the mechanism for making all Xapian indexing optional SearchIdx: Allow the amount of indexing be configured lib/PublicInbox/Config.pm| 2 +- lib/PublicInbox/SearchIdx.pm | 255

[PATCH] SearchIdx: Decrement regen_down even for added messages that are later deleted.

2018-07-17 Thread Eric W. Biederman
v2 trees already do this and when the indexes are deleted and rebuilt they maintain they commit numbers. Add a v1 version of the v2reindex test to verify that reindexing is working properly on v1 as well as v2. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/SearchIdx.pm |

Re: msgmap serial number regeneration [was: Q: V2 format]

2018-07-16 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> I believe we can modify the msg number assignment to assign numbers to >> deletes as well as adds. Short of the same Message-ID coming up twice >> that should be enough for the current backwards loop to assig

Re: msgmap serial number regeneration [was: Q: V2 format]

2018-07-14 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> ebied...@xmission.com (Eric W. Biederman) writes: >> > Eric Wong writes: >> >> "Eric W. Biederman" wrote: >> >>> >> >>> Because of the parallelism in V2 I have n

Re: IMAP server [was: Q: V2 format]

2018-07-13 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> > "Eric W. Biederman" wrote: >> >> Eric Wong writes: >> > As far as personal mail goes, I wouldn't want serial numbers at all >> > (more unnecessary state to keep track of). &

Re: bug: v2 deletes on incremental fetch [was: Q: V2 format]

2018-07-13 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> Eric Wong writes: >> > "Eric W. Biederman" wrote: >> >> Then I am going to report a probable bug. In V2 in public-inbox-index >> >> I can not find a path from finding a &#x

Re: Q: V2 format

2018-07-13 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes: > Eric Wong writes: > >> "Eric W. Biederman" wrote: >>> >>> Because of the parallelism in V2 I have noticed messages in numbered >>> in an order that does not correspond to their commit order. S

Re: Q: V2 format

2018-07-13 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> Eric Wong writes: >> > "Eric W. Biederman" wrote: >> >> I have been digging through the code looking so I can understand the v2 >> >> format and I have some ideas on how things mi

Re: Warnings from git fsck after lkml import

2018-07-12 Thread Eric W. Biederman
Konstantin Ryabitsev writes: > On Thu, Jul 05, 2018 at 11:13:46PM +, Eric Wong wrote: >>"Eric W. Biederman" wrote: >>> It looks like public-inbox has some challenges when importing some >>> questionable emails. The import of lkml has resulted in severa

Re: Q: V2 format

2018-07-12 Thread Eric W. Biederman
Eric Wong writes: > "Eric W. Biederman" wrote: >> I have been digging through the code looking so I can understand the v2 >> format and I have some ideas on how things might be improved, and some >> questions so that I understand. > > Great to know you&

Re: Q: V2 format

2018-07-11 Thread Eric W. Biederman
Konstantin Ryabitsev writes: > On Wed, Jul 11, 2018 at 03:01:53PM -0500, Eric W. Biederman wrote: >> Names. Is there a good reason not to use message numbers as the names >> in the git repositories? (Other than the cost to change the code?) That >> would remove the need

  1   2   >