Hi Mikhail et al,

Thanks a lot for a very thorough answer. This is an impressive piece of
knowledge you just shared.

Not surprisingly, I was caught unprepared by the 'v=...' part of the
answer. This brought me to the links you posted (starts with http). From
those links I went to the more updated link (starts with https), which
brought me to other very resourceful links. Combined with some meditation
session, it came into my mind that it is not possible to express block
queries using mathematical logic only. The format of the input document is
deeply built into the query expression and answering. Expressing these
queries mathematically / logically may give an impression that solr is
capable of answering (NP-?) hard problems. I have a feeling though that
solr answers to queries in polynomial (or even almost linear) times.

Just to connect the remaining dots.. What's happening under the hood of
solr in answering query [1] from [2]? Is it really so that inverted index
is used to identify the vectors of ids, that are scanned linearly in a hope
to get matches on _root_ and other internal variables?

[1] q=+{!parent which=type_s:product v=$skuq} +{!parent
which=type_s:product v=$vendorq}&skuq=+COLOR_s:Blue +SIZE_s:XL +{!parent
which=type_s:sku v='+QTY_i:[10 TO *] +STATE_s:CA'}&vendorq=+NAME_s:Bob
+PRICE_i:[20 TO 25]
[2]
https://blog.griddynamics.com/searching-grandchildren-and-siblings-with-solr-block-join/

Thanks!
Arturas

On Wed, Apr 4, 2018 at 12:36 PM, Mikhail Khludnev <m...@apache.org> wrote:

> q=+{!parent which=ntype:p v='+msg:Hello +person:Arturas'} +{!parent which=
> ntype:p v='+msg:ciao +person:Vai'}
>
> On Wed, Apr 4, 2018 at 12:19 PM, Arturas Mazeika <maze...@gmail.com>
> wrote:
>
> > Hi Mikhail et al,
> >
> > It seems to me that the nested documents must include nodes that encode
> the
> > level of nodes (within the document). Therefore, the minimal example must
> > include the node type. Is the following structure sufficient?
> >
> > {
> >     "id":1,
> >     "ntype":"p",
> >     "_childDocuments_":
> >     [
> >         {"id":"1_1", "ntype":"c", "person":"Vai",     "time":"3:14",
> > "msg":"Hello"},
> >         {"id":"1_2", "ntype":"c", "person":"Arturas", "time":"3:14",
> > "msg":"Hello"},
> >         {"id":"1_3", "ntype":"c", "person":"Vai",     "time":"3:15",
> > "msg":"Coz Mathias is working on another system- different screen."},
> >         {"id":"1_4", "ntype":"c", "person":"Vai",     "time":"3:15",
> > "msg":"It can get annoying"},
> >         {"id":"1_5", "ntype":"c", "person":"Arturas", "time":"3:15",
> > "msg":"Thank you. this is very nice of you"},
> >         {"id":"1_6", "ntype":"c", "person":"Vai",     "time":"3:16",
> > "msg":"ciao"},
> >         {"id":"1_7", "ntype":"c", "person":"Arturas", "time":"3:16",
> > "msg":"ciao"}
> >     ]
> > },
> > {
> >     "id":2,
> >     "ntype":"p",
> >     "_childDocuments_":
> >     [
> >         {"id":"2_1", "ntype":"c", "person":"Vai",     "time":"4:14",
> > "msg":"Hi"},
> >         {"id":"2_2", "ntype":"c", "person":"Arturas", "time":"4:14",
> > "msg":"IBM Watson"},
> >         {"id":"2_3", "ntype":"c", "person":"Vai",     "time":"4:15",
> > "msg":"need to retain content"},
> >         {"id":"2_4", "ntype":"c", "person":"Vai",     "time":"4:15",
> > "msg":"It can get annoying"},
> >         {"id":"2_5", "ntype":"c", "person":"Arturas", "time":"4:15",
> > "msg":"You can make all your meetings more access"},
> >         {"id":"2_6", "ntype":"c", "person":"Vai",     "time":"4:16",
> > "msg":"Make every meeting a Skype meeting"},
> >         {"id":"2_7", "ntype":"c", "person":"Arturas", "time":"4:16",
> > "msg":"ciao"}
> >     ]
> > }
> >
> > How would a query look like that has a Hello from Person Arturas and ciao
> > from Person Vai?
> >
> > Cheers,
> > Arturas
> >
> >
> > On Tue, Apr 3, 2018 at 5:21 PM, Arturas Mazeika <maze...@gmail.com>
> wrote:
> >
> > > Hi Mikhail,
> > >
> > > Thanks a lot for the reply.
> > >
> > > You mentioned that
> > >
> > > q=+{!parent which.. v='+text:hello +person:A'} +{!parent
> > > which..v='+text:ciao +person:B'}
> > >
> > > is the way to go. How would it look like precisely for the following
> > > collection?
> > >
> > > {
> > >     "id":1,
> > >     "_childDocuments_":
> > >     [
> > >         {"id":"1_1", "person":"Vai"         , "time":"3:14",
> > > "msg":"Hello"},
> > >         {"id":"1_2", "person":"Arturas"     , "time":"3:14",
> > > "msg":"Hello"},
> > >         {"id":"1_3", "person":"Vai"         , "time":"3:15", "msg":"Coz
> > > Mathias is working on another system- different screen."},
> > >         {"id":"1_4", "person":"Vai"         , "time":"3:15", "msg":"It
> > can
> > > get annoying"},
> > >         {"id":"1_5", "person":"Arturas"     , "time":"3:15",
> "msg":"Thank
> > > you. this is very nice of you"},
> > >         {"id":"1_6", "person":"Vai"         , "time":"3:16",
> > "msg":"ciao"},
> > >         {"id":"1_7", "person":"Arturas"     , "time":"3:16",
> > "msg":"ciao"}
> > >     ]
> > > },
> > > {
> > >     "id":2,
> > >     "_childDocuments_":
> > >     [
> > >         {"id":"2_1", "person":"Vai"         , "time":"4:14",
> > > "msg":"Hello"},
> > >         {"id":"2_2", "person":"Arturas"     , "time":"4:14", "msg":"IBM
> > > Watson"},
> > >         {"id":"2_3", "person":"Vai"         , "time":"4:15",
> "msg":"need
> > > to retain content"},
> > >         {"id":"2_4", "person":"Vai"         , "time":"4:15", "msg":"It
> > can
> > > get annoying"},
> > >         {"id":"2_5", "person":"Arturas"     , "time":"4:15", "msg":"You
> > > can make all your meetings more access"},
> > >         {"id":"2_6", "person":"Vai"         , "time":"4:16",
> "msg":"Make
> > > every meeting a Skype meeting"},
> > >         {"id":"2_7", "person":"Arturas"     , "time":"4:16",
> > "msg":"ciao"}
> > >     ]
> > > }
> > >
> > > Cheers,
> > > Arturas
> > >
> > >
> > > On Tue, Apr 3, 2018 at 4:33 PM, Mikhail Khludnev <m...@apache.org>
> > wrote:
> > >
> > >> Hello, Arturas.
> > >>
> > >> TLDR; Please find inline below.
> > >>
> > >> On Tue, Apr 3, 2018 at 5:14 PM, Arturas Mazeika <maze...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Solr Fans,
> > >> >
> > >> > I am trying to make sense of information retrieval using expressions
> > >> like
> > >> > "some parent", "*only parent*", " *all parent*". I am also trying to
> > >> > understand the syntax "!parent which" and "!child of". On the
> > technical
> > >> > level, I am reading the following documents:
> > >> >
> > >> > [1]
> > >> > https://lucene.apache.org/solr/guide/7_2/other-parsers.
> > >> > html#block-join-query-parsers
> > >> > [2]
> > >> > https://lucene.apache.org/solr/guide/7_2/uploading-data-
> > >> > with-index-handlers.html#nested-child-documents
> > >> > [3] http://yonik.com/solr-nested-objects/
> > >> >
> > >> > and I am confused to read:
> > >> >
> > >> > This parser takes a query that matches some parent documents and
> > returns
> > >> > their children. The syntax for this parser is: q={!child
> > >> > of=<allParents>}<someParents>. The parameter allParents is a filter
> > that
> > >> > matches *only parent documents*; here you would define the field and
> > >> value
> > >> > that you used to identify *all parent documents*. The parameter
> > >> someParents
> > >> > identifies a query that will match some of the parent documents. The
> > >> output
> > >> > is the children.
> > >> >
> > >> > The first sentence talks about "matching" but does not define what
> > that
> > >> > means (and why it is only some parents matching?). The second
> sentence
> > >> > introduces a syntax of the parser, but blurs the understanding as
> > "some"
> > >> > and "all" of parents are combined into one sentence. My
> understanding
> > is
> > >> > that all documents are retrieve that satisfy a query. The query must
> > >> > express some constraints on the parent node and some on the child
> > node.
> > >> I
> > >> > have a feeling that "only parent documents" reads "criteria is
> > >> formulated
> > >> > over the parent part of {parent document}->{child document} of
> entity.
> > >> > My simplified conceptual world of solr looks in the following way:
> > >> >
> > >> > 1. Every document has an ID.
> > >> > 2. Every document may have additional attributes
> > >> > 3. Text attributes is what's at stake in solr. Sure we can search
> for
> > >> > products that costs at most X, but this is the added functionality.
> > For
> > >> > simplicity I am neglecting those here.
> > >> > 4. The user has an information need. She expresses it with
> (key)words
> > >> and
> > >> > hopes to find matching documents. For simplicity, I am skipping all
> > >> issues
> > >> > related to the information presentation of the documents
> > >> > 5. Analysis chain (and inverse index) are the key technologies solr
> is
> > >> > based upon. Once the chain-processing is applied, mathematical logic
> > >> kicks
> > >> > in, retrieving the documents (that are a set of processed,
> normalized,
> > >> > enriched tokens) matching the query (processed, normalized and
> > enriched
> > >> > tokens). Clearly, the logic function can be a fancy one (at least
> one
> > of
> > >> > query token is in the document set of tokens, etc.), ranking is used
> > to
> > >> > sort the results.
> > >> > 6. A nested document concept is introduced in solr. It needs to be
> > >> uploaded
> > >> > into the index structure using a specific handlers [2]. A nested
> > >> documents
> > >> > is a tree. A root may contain children documents, which may be
> parents
> > >> of
> > >> > grandchildren documents.
> > >> > 7. Querying nested documents is supported in the following manner:
> > >> >     7.1 Child documents are return that satisfies {parent
> > >> > document}->{document}
> > >> >     7.2 Parent documents are return that satisfy {document}->{child
> > >> > document}
> > >> >
> > >> > Would I be very wrong to have this conceptual picture?
> > >> >
> > >> > From this point, the situation is a bit bury in my head. At the
> core,
> > I
> > >> do
> > >> > not really understand what "a document" is anymore (since the
> complete
> > >> json
> > >> > or xml, so is a sub-json and sub-xml are documents, every document
> > must
> > >> > have an ID, does that meant the the subdocuments must have and ID
> too,
> > >> or
> > >> > sub-ids are also fine?), how to formulate mathematical expressions
> > over
> > >> > documents and what it means that the document satisfies my (key)word
> > >> query?
> > >> > Can we define a document to be the largest entity of information
> that
> > >> does
> > >> > not contain any other nested documents [4]? If this is defined and
> > >> > communicated like this already where can I find it? There is a use
> of
> > >> the
> > >> > clarification, as the concept of the document means different things
> > in
> > >> > different contexts (e.g., you can update only the "complete
> document"
> > in
> > >> > the index vs. parent document, etc.).
> > >> >
> > >> > Is it possible to formulate what's going on using mathematical
> logic?
> > >> Can
> > >> > one express something like
> > >> >
> > >> > { give documents d : d is a document, d is parent of document c, d
> > >> > satisfies logical criteria C1,....,CN, c satisfies logical criteria
> > >> > C1',...,CM'}
> > >> > { give documents c : c is a document, d is parent of document c, d
> > >> > satisfies logical criteria C1,....,CN, c satisfies logical criteria
> > >> > C1',...,CM'}
> > >> >
> > >> > here the meaning of document is as in definition [4] above.
> > >> >
> > >> > 1. Is it possible to retrieve all parent documents that have two
> > >> children
> > >> > c1 and c2? Consider a document that is a skype chat, and children
> are
> > >> > individual lines of communication in the chat. I would be looking
> for
> > >> the
> > >> > (parent) documents that have "hello" said by person A and "ciao"
> said
> > by
> > >> > person B (as two different sub-documents).
> > >> >
> > >>
> > >> q=+{!parent which.. v='+text:hello +person:A'} +{!parent which..
> > >> v='+text:ciao +person:B'}
> > >> The query syntax is really tricky and cumbersome.
> > >>
> > >>
> > >> >
> > >> > 2. Is it possible to search for documents such that they have a
> > >> grandchild
> > >> > and the grandchild has the word "hello"?
> > >> >
> > >>
> > >> http://blog-archive.griddynamics.com/2013/12/grandchildren-
> > >> and-siblings-with-block.html
> > >>
> > >>
> > >> >
> > >> > 3. Is it possible to search for documents that do not have children?
> > >> >
> > >> q=-{!parent which..}type:child
> > >> Beware that mixing parents and childfree products is not supported and
> > >> causes pain. as a workaround you need to put empty child placeholder
> > doc.
> > >> Sic. Sorry.
> > >>
> > >>
> > >> > Is this the right venue to discuss documentation of solr?
> > >> >
> > >> > Thanks!
> > >> > Arturas
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Sincerely yours
> > >> Mikhail Khludnev
> > >>
> > >
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Reply via email to