Ah yes .. brilliant thanks Joel !  I think this is exactly what I was
looking for, I wasn't aware of the executor decorator.

Thanks all again for the suggestions and the interesting possibilities
available.

Kind regards,
Dan

On Tue, 7 Sept 2021 at 13:51, Joel Bernstein <[email protected]> wrote:

> There was a design implemented in Streaming Expression for large scale
> alerting described here:
>
>
> https://joelsolr.blogspot.com/2017/01/deploying-solrs-new-parallel-executor.html
>
> In this design you would store each alert in Solr as a topic expression.
> Then a single daemon can run all the topics or it can be parallelized.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, Sep 7, 2021 at 6:32 AM Charlie Hull <
> [email protected]>
> wrote:
>
> > Hi Dan,
> >
> > Yuval and my suggestions both rely on the same underlying code (Luwak,
> > now called Lucene Monitor). This lets you store a set of Lucene queries
> > and run them against every new document.
> >
> > The Lucene Monitor allows for very high-performance matching (I know of
> > situations with around 1m stored queries, monitoring 1m new documents a
> > day running on a few tens of nodes) and it does this with some clever
> > optimisations: effectively it builds an index of your stored queries,
> > and turns each new document into a query across this index (I know it
> > sounds confusing!). It's a 'reverse search'. Check out the original
> > Luwak project as it's got links to several presentations and blogs
> > showing how others have implemented these systems.
> >
> > The bit you'll have to build is the Solr layer and then the code that
> > uses this to generate alerts - and Solcolator and
> > https://github.com/o19s/solr-monitor are two examples of how to do the
> > first part, which you can build on. The facility to do a reverse search
> > is not built into Solr - yet, unlike Elasticsearch's Percolator.
> >
> > Best
> >
> > Charlie
> >
> > On 07/09/2021 10:24, Dan Rosher wrote:
> > > Thanks Eric, Charlie and Yuval for all the feedback and suggestions.
> > >
> > > Eric: Yes I thought the monitoring might be a it of a pain, esp with
> > > millions of them, I'll have to check out the topic code, but I wondered
> > if
> > > I can look @ the checkpoint collections for uniqueIds that haven't been
> > > updated for a 'while' which might suggest the demon had stopped/died,
> > > rather than checking each daemon individually?
> > >
> > > I was also wondering whether it's possible, or a useful enhancement to
> > look
> > > at the replica index version (as opposed to _vesion_ ) for the topic
> > > streaming expression to skip queries where the replica index is the
> same
> > as
> > > what we might store in the checkpoint collection ? For collections that
> > > update infrequently I think this might be useful.
> > >
> > > Charlie: It was for email alerts, so a user stores a query for
> collection
> > > docs to match against, and then the system emails matches to the user.
> Do
> > > you think solr-monitor can be used for this purpose?
> > >
> > > Yuval: I like the idea of using the UpdateProcessor, at least there's
> no
> > > need for deamons or monitoring of them, but would this scale for
> millions
> > > of email queries though?
> > >
> > > Many thanks again to all.
> > >
> > > Kind regards,
> > > Dan
> > >
> > >
> > >
> > >
> > > On Mon, 6 Sept 2021 at 18:47, Yuval Paz <[email protected]>
> > wrote:
> > >
> > >> Me and my team are building upon this solcolator:
> > >> https://github.com/SOLR4189/solcolator
> > >>
> > >> Currently the processor is build for Solr 6.5.1, we are working on
> > updating
> > >> our Solr and I hope to release a complete version of our Solcolator
> as
> > >> open source then (it will be for version 8.6.x).
> > >>
> > >> Making it an update processor (either make it the last element and
> > replace
> > >> the usual processor that index the document, or by using it as the one
> > from
> > >> last processor in the collection, and so allow monitoring also atomic
> > >> updates [which is relatively costly]).
> > >>
> > >> By making it an update processor we don't rely on the streaming
> deamon,
> > >> which we found unsatisfying as we wish to allow users to define their
> > own
> > >> monitors over the index.
> > >>
> > >> On Mon, Sep 6, 2021, 8:25 PM Charlie Hull <
> > [email protected]
> > >> wrote:
> > >>
> > >>> Are you trying to monitor a stream of emails for certain patterns? In
> > >>> which case you might look at the Lucene Monitor
> > >>>
> > >>>
> > >>
> >
> https://lucene.apache.org/core/8_2_0/monitor/index.html?overview-summary.html
> > >>> https://issues.apache.org/jira/browse/LUCENE-8766, which was
> > originally
> > >>> Luwak - at my previous company Flax we helped build several
> large-scale
> > >>> monitoring systems with this https://github.com/flaxsearch/luwak .
> > It's
> > >>> not officially surfaced in Solr yet although my colleague Scott
> Stults
> > >>> has been working on some ideas: https://github.com/o19s/solr-monitor
> > >>>
> > >>> best
> > >>> Charlie
> > >>>
> > >>> On 06/09/2021 14:32, Dan Rosher wrote:
> > >>>> Hi,
> > >>>>
> > >>>> I was wondering if anyone had tried email alerts with streaming
> > >>>> expressions, and what their experience was if attempting this with
> say
> > >> 12
> > >>>> million emails / day? Traditionally this might have been done with a
> > >>>> database cursor iterator daily.
> > >>>>
> > >>>> I was thinking if something like the following pseudocode expression
> > >> with
> > >>>> 'kafka' as a custom push expression:
> > >>>>
> > >>>> daemon(id="alertId",
> > >>>>          runInterval="1000",
> > >>>>          kafka(
> > >>>>           kafka_topic,
> > >>>>           alertId,
> > >>>>           topic(email_alerts,
> > >>>>             doc_collection,
> > >>>>             q="email query",
> > >>>>             fl="id, title, abstract",
> > >>>>             id="alertId",
> > >>>>             initialCheckpoint=0)
> > >>>>           )
> > >>>>
> > >>>> If you have done something like this 'where' would you typically run
> > >> the
> > >>>> daemon, on replicas away from replicas running web queries?
> > >>>>
> > >>>> Many thanks in advance for any advice / suggestions,
> > >>>>
> > >>>> Dan
> > >>>>
> > >>> --
> > >>> Charlie Hull - Managing Consultant at OpenSource Connections Limited
> > >>> <www.o19s.com>
> > >>> Founding member of The Search Network <https://thesearchnetwork.com/
> >
> > >>> and co-author of Searching the Enterprise
> > >>> <https://opensourceconnections.com/about-us/books-resources/>
> > >>> tel/fax: +44 (0)8700 118334
> > >>> mobile: +44 (0)7767 825828
> > >>>
> > >>> OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
> > >>> Amtsgericht Charlottenburg | HRB 230712 B
> > >>> Geschäftsführer: John M. Woodell | David E. Pugh
> > >>> Finanzamt: Berlin Finanzamt für Körperschaften II
> > >>>
> >
> > --
> > Charlie Hull - Managing Consultant at OpenSource Connections Limited
> > <www.o19s.com>
> > Founding member of The Search Network <https://thesearchnetwork.com/>
> > and co-author of Searching the Enterprise
> > <https://opensourceconnections.com/about-us/books-resources/>
> > tel/fax: +44 (0)8700 118334
> > mobile: +44 (0)7767 825828
> >
> > OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
> > Amtsgericht Charlottenburg | HRB 230712 B
> > Geschäftsführer: John M. Woodell | David E. Pugh
> > Finanzamt: Berlin Finanzamt für Körperschaften II
> >
>

Reply via email to