Re: GDPR compliance

2023-11-28 Thread Patrick Zhai
It's not that insane, it's about several weeks however the big segment can stay there for quite long if there's not enough update for a merge policy to pick it up On Tue, Nov 28, 2023, 17:14 Dongyu Xu wrote: > What is the expected grace time for the data-deletion request to take > place? > >

Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 1199 - Unstable!

2023-11-28 Thread Michael McCandless
OK I pushed a fix. Mike On Tue, Nov 28, 2023 at 7:32 PM Michael McCandless < luc...@mikemccandless.com> wrote: > I think maybe LuceneTestCase.newSearcher is turning on concurrency > (setting the executor randomly). Since this test explicitly passes a "no > concurrency" collector manager I

Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 1199 - Unstable!

2023-11-28 Thread Michael McCandless
I think maybe LuceneTestCase.newSearcher is turning on concurrency (setting the executor randomly). Since this test explicitly passes a "no concurrency" collector manager I think we should switch to "new IndexSearcher(...)". Mike On Tue, Nov 28, 2023 at 7:29 PM Michael McCandless <

Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 1199 - Unstable!

2023-11-28 Thread Michael McCandless
This reproduces for me. Maybe related to LUCENE-10002 / #240? Mike On Tue, Nov 28, 2023 at 1:58 AM Apache Jenkins Server < jenk...@builds.apache.org> wrote: > Build: > https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/1199/ > > 1 tests failed. > FAILED:

Re: GDPR compliance

2023-11-28 Thread Patrick Zhai
Thanks Robert and Dawid, I think what you said is reasonable to me, I can keep the MP private then I guess(and it's not hard to code it out anyway so I guess people can still figure it out easily if they're facing a similar situation). For our case I think we do have some other constraints so we

Re: GDPR compliance

2023-11-28 Thread Robert Muir
and if you delete those segments, will that data ever be actually removed from the underlying physical storage? equally uncertain. deleting a file from the filesystem is similar to what lucene is doing, it doesn't really delete anything from the disk, just allows it to be overwritten by future

Re: GDPR compliance

2023-11-28 Thread Dongyu Xu
What is the expected grace time for the data-deletion request to take place? I'm not expert about the policy but I think something like "I need my data to be gone in next 2 second" is unreasonable. Tony X From: Robert Muir Sent: Tuesday, November 28, 2023

Re: GDPR compliance

2023-11-28 Thread Ilan Ginzburg
Are larger and older segments even certain to ever be merged in practice? I was assuming that if there is not a lot of new indexed content and not a lot of older documents being deleted, large older segment might never have to be merged. On Tue 28 Nov 2023 at 20:53, Robert Muir wrote: > I

Re: GDPR compliance

2023-11-28 Thread Robert Muir
I don't think there's any problem with GDPR, and I don't think users should be running unnecessary "optimize". GDRP just says data should be erased without "undue" delay. waiting for a merge to nuke the deleted docs isn't "undue", there is a good reason for it. On Tue, Nov 28, 2023 at 2:40 PM

GDPR compliance

2023-11-28 Thread Patrick Zhai
Hi Folks, In LinkedIn we need to comply with GDPR for a large part of our data, and an important part of it is that we need to be sure we have completely deleted the data the user requested to delete within a certain period of time. The way we have come up with so far is to: 1. Record the segment

Re: Lucene 9.9.0 Release

2023-11-28 Thread Chris Hegarty
Hi Guo, Thanks for the update. Let’s push the 9.9.0 branch cut until tomorrow (rather than today as previously suggested), which should allow time to determine the outstanding issues you mentioned below. That should be more straightforward all round. New 9.9.0 branch cut 12:00 29th Nov 2023