Re: some parent documents

Mikhail Khludnev Tue, 03 Apr 2018 07:33:28 -0700

Hello, Arturas.

TLDR; Please find inline below.


On Tue, Apr 3, 2018 at 5:14 PM, Arturas Mazeika <maze...@gmail.com> wrote:

> Hi Solr Fans,
>
> I am trying to make sense of information retrieval using expressions like
> "some parent", "*only parent*", " *all parent*". I am also trying to
> understand the syntax "!parent which" and "!child of". On the technical
> level, I am reading the following documents:
>
> [1]
> https://lucene.apache.org/solr/guide/7_2/other-parsers.
> html#block-join-query-parsers
> [2]
> https://lucene.apache.org/solr/guide/7_2/uploading-data-
> with-index-handlers.html#nested-child-documents
> [3] http://yonik.com/solr-nested-objects/
>
> and I am confused to read:
>
> This parser takes a query that matches some parent documents and returns
> their children. The syntax for this parser is: q={!child
> of=<allParents>}<someParents>. The parameter allParents is a filter that
> matches *only parent documents*; here you would define the field and value
> that you used to identify *all parent documents*. The parameter someParents
> identifies a query that will match some of the parent documents. The output
> is the children.
>
> The first sentence talks about "matching" but does not define what that
> means (and why it is only some parents matching?). The second sentence
> introduces a syntax of the parser, but blurs the understanding as "some"
> and "all" of parents are combined into one sentence. My understanding is
> that all documents are retrieve that satisfy a query. The query must
> express some constraints on the parent node and some on the child node. I
> have a feeling that "only parent documents" reads "criteria is formulated
> over the parent part of {parent document}->{child document} of entity.
> My simplified conceptual world of solr looks in the following way:
>
> 1. Every document has an ID.
> 2. Every document may have additional attributes
> 3. Text attributes is what's at stake in solr. Sure we can search for
> products that costs at most X, but this is the added functionality. For
> simplicity I am neglecting those here.
> 4. The user has an information need. She expresses it with (key)words and
> hopes to find matching documents. For simplicity, I am skipping all issues
> related to the information presentation of the documents
> 5. Analysis chain (and inverse index) are the key technologies solr is
> based upon. Once the chain-processing is applied, mathematical logic kicks
> in, retrieving the documents (that are a set of processed, normalized,
> enriched tokens) matching the query (processed, normalized and enriched
> tokens). Clearly, the logic function can be a fancy one (at least one of
> query token is in the document set of tokens, etc.), ranking is used to
> sort the results.
> 6. A nested document concept is introduced in solr. It needs to be uploaded
> into the index structure using a specific handlers [2]. A nested documents
> is a tree. A root may contain children documents, which may be parents of
> grandchildren documents.
> 7. Querying nested documents is supported in the following manner:
>     7.1 Child documents are return that satisfies {parent
> document}->{document}
>     7.2 Parent documents are return that satisfy {document}->{child
> document}
>
> Would I be very wrong to have this conceptual picture?
>
> From this point, the situation is a bit bury in my head. At the core, I do
> not really understand what "a document" is anymore (since the complete json
> or xml, so is a sub-json and sub-xml are documents, every document must
> have an ID, does that meant the the subdocuments must have and ID too, or
> sub-ids are also fine?), how to formulate mathematical expressions over
> documents and what it means that the document satisfies my (key)word query?
> Can we define a document to be the largest entity of information that does
> not contain any other nested documents [4]? If this is defined and
> communicated like this already where can I find it? There is a use of the
> clarification, as the concept of the document means different things in
> different contexts (e.g., you can update only the "complete document" in
> the index vs. parent document, etc.).
>
> Is it possible to formulate what's going on using mathematical logic? Can
> one express something like
>
> { give documents d : d is a document, d is parent of document c, d
> satisfies logical criteria C1,....,CN, c satisfies logical criteria
> C1',...,CM'}
> { give documents c : c is a document, d is parent of document c, d
> satisfies logical criteria C1,....,CN, c satisfies logical criteria
> C1',...,CM'}
>
> here the meaning of document is as in definition [4] above.
>
> 1. Is it possible to retrieve all parent documents that have two children
> c1 and c2? Consider a document that is a skype chat, and children are
> individual lines of communication in the chat. I would be looking for the
> (parent) documents that have "hello" said by person A and "ciao" said by
> person B (as two different sub-documents).
>

q=+{!parent which.. v='+text:hello +person:A'} +{!parent which..
v='+text:ciao +person:B'}
The query syntax is really tricky and cumbersome.


>
> 2. Is it possible to search for documents such that they have a grandchild
> and the grandchild has the word "hello"?
>

http://blog-archive.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html


>
> 3. Is it possible to search for documents that do not have children?
>
q=-{!parent which..}type:child
Beware that mixing parents and childfree products is not supported and
causes pain. as a workaround you need to put empty child placeholder doc.
Sic. Sorry.


> Is this the right venue to discuss documentation of solr?
>
> Thanks!
> Arturas
>



-- 
Sincerely yours
Mikhail Khludnev

Re: some parent documents

Reply via email to