Hello, Arturas. TLDR; Please find inline below.
On Tue, Apr 3, 2018 at 5:14 PM, Arturas Mazeika <maze...@gmail.com> wrote: > Hi Solr Fans, > > I am trying to make sense of information retrieval using expressions like > "some parent", "*only parent*", " *all parent*". I am also trying to > understand the syntax "!parent which" and "!child of". On the technical > level, I am reading the following documents: > > [1] > https://lucene.apache.org/solr/guide/7_2/other-parsers. > html#block-join-query-parsers > [2] > https://lucene.apache.org/solr/guide/7_2/uploading-data- > with-index-handlers.html#nested-child-documents > [3] http://yonik.com/solr-nested-objects/ > > and I am confused to read: > > This parser takes a query that matches some parent documents and returns > their children. The syntax for this parser is: q={!child > of=<allParents>}<someParents>. The parameter allParents is a filter that > matches *only parent documents*; here you would define the field and value > that you used to identify *all parent documents*. The parameter someParents > identifies a query that will match some of the parent documents. The output > is the children. > > The first sentence talks about "matching" but does not define what that > means (and why it is only some parents matching?). The second sentence > introduces a syntax of the parser, but blurs the understanding as "some" > and "all" of parents are combined into one sentence. My understanding is > that all documents are retrieve that satisfy a query. The query must > express some constraints on the parent node and some on the child node. I > have a feeling that "only parent documents" reads "criteria is formulated > over the parent part of {parent document}->{child document} of entity. > My simplified conceptual world of solr looks in the following way: > > 1. Every document has an ID. > 2. Every document may have additional attributes > 3. Text attributes is what's at stake in solr. Sure we can search for > products that costs at most X, but this is the added functionality. For > simplicity I am neglecting those here. > 4. The user has an information need. She expresses it with (key)words and > hopes to find matching documents. For simplicity, I am skipping all issues > related to the information presentation of the documents > 5. Analysis chain (and inverse index) are the key technologies solr is > based upon. Once the chain-processing is applied, mathematical logic kicks > in, retrieving the documents (that are a set of processed, normalized, > enriched tokens) matching the query (processed, normalized and enriched > tokens). Clearly, the logic function can be a fancy one (at least one of > query token is in the document set of tokens, etc.), ranking is used to > sort the results. > 6. A nested document concept is introduced in solr. It needs to be uploaded > into the index structure using a specific handlers [2]. A nested documents > is a tree. A root may contain children documents, which may be parents of > grandchildren documents. > 7. Querying nested documents is supported in the following manner: > 7.1 Child documents are return that satisfies {parent > document}->{document} > 7.2 Parent documents are return that satisfy {document}->{child > document} > > Would I be very wrong to have this conceptual picture? > > From this point, the situation is a bit bury in my head. At the core, I do > not really understand what "a document" is anymore (since the complete json > or xml, so is a sub-json and sub-xml are documents, every document must > have an ID, does that meant the the subdocuments must have and ID too, or > sub-ids are also fine?), how to formulate mathematical expressions over > documents and what it means that the document satisfies my (key)word query? > Can we define a document to be the largest entity of information that does > not contain any other nested documents [4]? If this is defined and > communicated like this already where can I find it? There is a use of the > clarification, as the concept of the document means different things in > different contexts (e.g., you can update only the "complete document" in > the index vs. parent document, etc.). > > Is it possible to formulate what's going on using mathematical logic? Can > one express something like > > { give documents d : d is a document, d is parent of document c, d > satisfies logical criteria C1,....,CN, c satisfies logical criteria > C1',...,CM'} > { give documents c : c is a document, d is parent of document c, d > satisfies logical criteria C1,....,CN, c satisfies logical criteria > C1',...,CM'} > > here the meaning of document is as in definition [4] above. > > 1. Is it possible to retrieve all parent documents that have two children > c1 and c2? Consider a document that is a skype chat, and children are > individual lines of communication in the chat. I would be looking for the > (parent) documents that have "hello" said by person A and "ciao" said by > person B (as two different sub-documents). > q=+{!parent which.. v='+text:hello +person:A'} +{!parent which.. v='+text:ciao +person:B'} The query syntax is really tricky and cumbersome. > > 2. Is it possible to search for documents such that they have a grandchild > and the grandchild has the word "hello"? > http://blog-archive.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html > > 3. Is it possible to search for documents that do not have children? > q=-{!parent which..}type:child Beware that mixing parents and childfree products is not supported and causes pain. as a workaround you need to put empty child placeholder doc. Sic. Sorry. > Is this the right venue to discuss documentation of solr? > > Thanks! > Arturas > -- Sincerely yours Mikhail Khludnev