Re: Use stream result like a query (alternative to innerJoin)

Joel Bernstein Mon, 23 Nov 2020 12:22:35 -0800

There are two streams that behave like that.

One is the "nodes" expression, which is not going to work for this use case
because it does everything in memory.


The second one is the "fetch" expression which behaves like a nested loop
join with some limitations. Unfortunately the main limitation is likely to
be a blocker for you which is that it doesn't support one-to-many joins yet.

Joel Bernstein
http://joelsolr.blogspot.com/


On Sun, Nov 22, 2020 at 10:37 AM ufuk yılmaz <[email protected]>
wrote:

> Hi all,
>
> I’m looking for a way to query two collections and find documents that
> exist in both, I know this can be done with innerJoin streaming expression
> but I want to avoid it, since one of the collection streams can possibly
> have billions of results:
>
> Let’s say two collections are:
>
> deletedItems = [{deletedItemId: 1}, {deletedItemId: 2}...]
> items = [
>         {
>                 id: 1,
>                 name: "a"
>         },
>         {       id: 2,
>                 name: "b"
>         },
>         {
>                 id: 3,
>                 name: "c"
>         }.....
> ]
>
> “deletedItems” contain a few documents compared to “items” collection
> (1mil vs 2-3 bil). If I query them both with a typical query in our system,
> deletedItems gives a few thousand results but items give tens/hundreds of
> millions. To use innerJoin, I have to stream the whole items result to
> worker node over network.
>
> Is there a way to avoid this, something like using “deletedItems” result
> as a query to “items” stream?
>
> Thanks in advance for the help
>
> Sent from Mail for Windows 10
>
>

Re: Use stream result like a query (alternative to innerJoin)

Reply via email to