Ah yes .. brilliant thanks Joel ! I think this is exactly what I was looking for, I wasn't aware of the executor decorator.
Thanks all again for the suggestions and the interesting possibilities available. Kind regards, Dan On Tue, 7 Sept 2021 at 13:51, Joel Bernstein <[email protected]> wrote: > There was a design implemented in Streaming Expression for large scale > alerting described here: > > > https://joelsolr.blogspot.com/2017/01/deploying-solrs-new-parallel-executor.html > > In this design you would store each alert in Solr as a topic expression. > Then a single daemon can run all the topics or it can be parallelized. > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Tue, Sep 7, 2021 at 6:32 AM Charlie Hull < > [email protected]> > wrote: > > > Hi Dan, > > > > Yuval and my suggestions both rely on the same underlying code (Luwak, > > now called Lucene Monitor). This lets you store a set of Lucene queries > > and run them against every new document. > > > > The Lucene Monitor allows for very high-performance matching (I know of > > situations with around 1m stored queries, monitoring 1m new documents a > > day running on a few tens of nodes) and it does this with some clever > > optimisations: effectively it builds an index of your stored queries, > > and turns each new document into a query across this index (I know it > > sounds confusing!). It's a 'reverse search'. Check out the original > > Luwak project as it's got links to several presentations and blogs > > showing how others have implemented these systems. > > > > The bit you'll have to build is the Solr layer and then the code that > > uses this to generate alerts - and Solcolator and > > https://github.com/o19s/solr-monitor are two examples of how to do the > > first part, which you can build on. The facility to do a reverse search > > is not built into Solr - yet, unlike Elasticsearch's Percolator. > > > > Best > > > > Charlie > > > > On 07/09/2021 10:24, Dan Rosher wrote: > > > Thanks Eric, Charlie and Yuval for all the feedback and suggestions. > > > > > > Eric: Yes I thought the monitoring might be a it of a pain, esp with > > > millions of them, I'll have to check out the topic code, but I wondered > > if > > > I can look @ the checkpoint collections for uniqueIds that haven't been > > > updated for a 'while' which might suggest the demon had stopped/died, > > > rather than checking each daemon individually? > > > > > > I was also wondering whether it's possible, or a useful enhancement to > > look > > > at the replica index version (as opposed to _vesion_ ) for the topic > > > streaming expression to skip queries where the replica index is the > same > > as > > > what we might store in the checkpoint collection ? For collections that > > > update infrequently I think this might be useful. > > > > > > Charlie: It was for email alerts, so a user stores a query for > collection > > > docs to match against, and then the system emails matches to the user. > Do > > > you think solr-monitor can be used for this purpose? > > > > > > Yuval: I like the idea of using the UpdateProcessor, at least there's > no > > > need for deamons or monitoring of them, but would this scale for > millions > > > of email queries though? > > > > > > Many thanks again to all. > > > > > > Kind regards, > > > Dan > > > > > > > > > > > > > > > On Mon, 6 Sept 2021 at 18:47, Yuval Paz <[email protected]> > > wrote: > > > > > >> Me and my team are building upon this solcolator: > > >> https://github.com/SOLR4189/solcolator > > >> > > >> Currently the processor is build for Solr 6.5.1, we are working on > > updating > > >> our Solr and I hope to release a complete version of our Solcolator > as > > >> open source then (it will be for version 8.6.x). > > >> > > >> Making it an update processor (either make it the last element and > > replace > > >> the usual processor that index the document, or by using it as the one > > from > > >> last processor in the collection, and so allow monitoring also atomic > > >> updates [which is relatively costly]). > > >> > > >> By making it an update processor we don't rely on the streaming > deamon, > > >> which we found unsatisfying as we wish to allow users to define their > > own > > >> monitors over the index. > > >> > > >> On Mon, Sep 6, 2021, 8:25 PM Charlie Hull < > > [email protected] > > >> wrote: > > >> > > >>> Are you trying to monitor a stream of emails for certain patterns? In > > >>> which case you might look at the Lucene Monitor > > >>> > > >>> > > >> > > > https://lucene.apache.org/core/8_2_0/monitor/index.html?overview-summary.html > > >>> https://issues.apache.org/jira/browse/LUCENE-8766, which was > > originally > > >>> Luwak - at my previous company Flax we helped build several > large-scale > > >>> monitoring systems with this https://github.com/flaxsearch/luwak . > > It's > > >>> not officially surfaced in Solr yet although my colleague Scott > Stults > > >>> has been working on some ideas: https://github.com/o19s/solr-monitor > > >>> > > >>> best > > >>> Charlie > > >>> > > >>> On 06/09/2021 14:32, Dan Rosher wrote: > > >>>> Hi, > > >>>> > > >>>> I was wondering if anyone had tried email alerts with streaming > > >>>> expressions, and what their experience was if attempting this with > say > > >> 12 > > >>>> million emails / day? Traditionally this might have been done with a > > >>>> database cursor iterator daily. > > >>>> > > >>>> I was thinking if something like the following pseudocode expression > > >> with > > >>>> 'kafka' as a custom push expression: > > >>>> > > >>>> daemon(id="alertId", > > >>>> runInterval="1000", > > >>>> kafka( > > >>>> kafka_topic, > > >>>> alertId, > > >>>> topic(email_alerts, > > >>>> doc_collection, > > >>>> q="email query", > > >>>> fl="id, title, abstract", > > >>>> id="alertId", > > >>>> initialCheckpoint=0) > > >>>> ) > > >>>> > > >>>> If you have done something like this 'where' would you typically run > > >> the > > >>>> daemon, on replicas away from replicas running web queries? > > >>>> > > >>>> Many thanks in advance for any advice / suggestions, > > >>>> > > >>>> Dan > > >>>> > > >>> -- > > >>> Charlie Hull - Managing Consultant at OpenSource Connections Limited > > >>> <www.o19s.com> > > >>> Founding member of The Search Network <https://thesearchnetwork.com/ > > > > >>> and co-author of Searching the Enterprise > > >>> <https://opensourceconnections.com/about-us/books-resources/> > > >>> tel/fax: +44 (0)8700 118334 > > >>> mobile: +44 (0)7767 825828 > > >>> > > >>> OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin > > >>> Amtsgericht Charlottenburg | HRB 230712 B > > >>> Geschäftsführer: John M. Woodell | David E. Pugh > > >>> Finanzamt: Berlin Finanzamt für Körperschaften II > > >>> > > > > -- > > Charlie Hull - Managing Consultant at OpenSource Connections Limited > > <www.o19s.com> > > Founding member of The Search Network <https://thesearchnetwork.com/> > > and co-author of Searching the Enterprise > > <https://opensourceconnections.com/about-us/books-resources/> > > tel/fax: +44 (0)8700 118334 > > mobile: +44 (0)7767 825828 > > > > OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin > > Amtsgericht Charlottenburg | HRB 230712 B > > Geschäftsführer: John M. Woodell | David E. Pugh > > Finanzamt: Berlin Finanzamt für Körperschaften II > > >
