Re: embedded documents

Erik Hatcher Mon, 25 Aug 2014 07:49:01 -0700

SOLR-6304 flattens a single JSON object into a single Solr document.  See 
Noble’s blog http://searchhub.org/2014/08/12/indexing-custom-json-data/ which 
states:


        split : This parameter is required if you wish to transform the input 
JSON . This is the path at which the JSON must be split . If the entire JSON 
makes a single solr document , the path must be “/” . 

The purpose of that issue was exactly this, to make Solr more “approachable” in 
that arbitrary (albeit structured, not random) JSON could be ingested into Solr 
without writing code.  Mission accomplished there :)

Your mention of block/join does pique my curiosity though.  There may need to 
be some additional tweaks to make this JSON loading be able to index things 
just right for that feature.

        Erik
          @ Lucidworks



On Aug 25, 2014, at 6:45 AM, Jack Krupansky <j...@basetechnology.com> wrote:

> Thanks, Erik, but... I've read that Jira several times over the past month, 
> it is is far too cryptic for me to make any sense out of what it is really 
> trying to do. A simpler approach is clearly needed.
> 
> My perception of SOLR-6304 is not that it indexes a single JSON object as a 
> single Solr document, but that it generates a collection of separate 
> documents, somewhat analogous to Lucene block/child documents, but... not 
> quite.
> 
> I understood the request on this message thread to be the flattening of a 
> single nested JSON object to a single Solr document.
> 
> IMHO, we need to be trying to make Solr more automatic and more approachable, 
> not an even more complicated "toolkit".
> 
> -- Jack Krupansky
> 
> -----Original Message----- From: Erik Hatcher
> Sent: Monday, August 25, 2014 9:32 AM
> To: solr-user@lucene.apache.org
> Subject: Re: embedded documents
> 
> Jack et al - there’s now this, which is available in the any-minute release 
> of Solr 4.10: https://issues.apache.org/jira/browse/SOLR-6304
> 
> Erik
> 
> On Aug 25, 2014, at 5:01 AM, Jack Krupansky <j...@basetechnology.com> wrote:
> 
>> That's a completely different concept, I think - the ability to return a 
>> single field value as a structured JSON object in the "writer", rather than 
>> simply "loading" from a nested JSON object and distributing the key values 
>> to normal Solr fields.
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message----- From: Bill Bell
>> Sent: Sunday, August 24, 2014 7:30 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: embedded documents
>> 
>> See my Jira. It supports it via json.fsuffix=_json&wt=json
>> 
>> http://mail-archives.apache.org/mod_mbox/lucene-dev/201304.mbox/%3CJIRA.12641293.1365394604231.125944.1365397875874@arcas%3E
>> 
>> Bill Bell
>> Sent from mobile
>> 
>> 
>>> On Aug 24, 2014, at 6:43 AM, "Jack Krupansky" <j...@basetechnology.com> 
>>> wrote:
>>> 
>>> Indexing and query of raw JSON would be a valuable addition to Solr, so 
>>> maybe you could simply explain more precisely your data model and 
>>> transformation rules. For example, when multi-level nesting occurs, what 
>>> does your loader do?
>>> 
>>> Maybe if the fielld names were derived by concatenating the full path of 
>>> JSON key names, like titles_json.FR, field_naming nesting could be handled 
>>> in a fully automated manner.
>>> 
>>> I had been thinking of filing a Jira proposing exactly that, so that even 
>>> the most deeply nested JSON maps could be supported, although combinations 
>>> of arrays and maps would be problematic.
>>> 
>>> -- Jack Krupansky
>>> 
>>> -----Original Message----- From: Michael Pitsounis
>>> Sent: Wednesday, August 20, 2014 7:14 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: embedded documents
>>> 
>>> Hello everybody,
>>> 
>>> I had a requirement to store complicated json documents in solr.
>>> 
>>> i have modified the JsonLoader to accept complicated json documents with
>>> arrays/objects as values.
>>> 
>>> It stores the object/array and then flatten it and  indexes the fields.
>>> 
>>> e.g  basic example document
>>> 
>>> {
>>>     "titles_json":{"FR":"This is the FR title" , "EN":"This is the EN
>>> title"} ,
>>>     "id": 1000003,
>>>     "guid": "3b2f2998-85ac-4a4e-8867-beb551c0b3c6"
>>> }
>>> 
>>> It will store titles_json:{"FR":"This is the FR title" , "EN":"This is the
>>> EN title"}
>>> and then index fields
>>> 
>>> titles.FR:"This is the FR title"
>>> titles.EN:"This is the EN title"
>>> 
>>> 
>>> Do you see any problems with this approach?
>>> 
>>> 
>>> 
>>> Regards,
>>> Michael Pitsounis

Re: embedded documents

Reply via email to