Re: index new discovered fileds of different types

Jan Høydahl Mon, 10 Jul 2017 02:14:22 -0700

I think Thaer’s answer clarify how they do it.
So at the time they assemble the full Solr doc to index, there may be a new 
field name not known in advance,
but to my understanding the RDF source contains information on the type (else 
they could not do the mapping
to dynamic field either) and so adding a field to the managed schema on the fly 
once an unknown field is detected
should work just fine!


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 10. jul. 2017 kl. 02.08 skrev Rick Leir <rl...@leirtech.com>:
> 
> Jan
> 
> I hope this is not off-topic, but I am curious: if you do not use the three 
> fields, subject, predicate, and object for indexing RDF
> then what is your algorithm? Maybe document nesting is appropriate for this? 
> cheers -- Rick
> 
> 
> On 2017-07-09 05:52 PM, Jan Høydahl wrote:
>> Hi,
>> 
>> I have personally written a Python script to parse RDF files into an 
>> in-memory graph structure and then pull data from that structure to index to 
>> Solr.
>> I.e. you may perfectly well have RDF (nt, turtle, whatever) as source but 
>> index sub structures in very specific ways.
>> Anyway, as Erick points out, that’s probably where in your code that you 
>> should use Managed Schema REST API in order to
>> 1. Query Solr for what fields are defined
>> 2. If you need to index a field that is not yet in Solr, add it, using the 
>> correct field type (your app should know)
>> 3. Push the data
>> 4. Repeat
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 8. jul. 2017 kl. 02.36 skrev Rick Leir <rl...@leirtech.com>:
>>> 
>>> Thaer
>>> Whoa, hold everything! You said RDF, meaning resource description 
>>> framework? If so, you have exactly three fields: subject, predicate, and 
>>> object. Maybe they are text type, or for exact matches you might want 
>>> string fields. Add an ID field, which could be automatically generated by 
>>> Solr, so now you have four fields. Or am I on a tangent again? Cheers -- 
>>> Rick
>>> 
>>> On July 7, 2017 6:01:00 AM EDT, Thaer Sammar <t.sam...@geophy.com> wrote:
>>>> Hi Jan,
>>>> 
>>>> Thanks!, I am exploring the schemaless option based on Furkan
>>>> suggestion. I
>>>> need the the flexibility because not all fields are known. We get the
>>>> data
>>>> from RDF database (which changes continuously). To be more specific, we
>>>> have a database and all changes on it are sent to a kafka queue. and we
>>>> have a consumer which listen to the queue and update the Solr index.
>>>> 
>>>> regards,
>>>> Thaer
>>>> 
>>>> On 7 July 2017 at 10:53, Jan Høydahl <jan....@cominvent.com> wrote:
>>>> 
>>>>> If you do not need the flexibility of dynamic fields, don’t use them.
>>>>> Sounds to me that you really want a field “price” to be float and a
>>>> field
>>>>> “birthdate” to be of type date etc.
>>>>> If so, simply create your schema (either manually, through Schema API
>>>> or
>>>>> using schemaless) up front and index each field as correct type
>>>> without
>>>>> messing with field name prefixes.
>>>>> 
>>>>> --
>>>>> Jan Høydahl, search solution architect
>>>>> Cominvent AS - www.cominvent.com
>>>>> 
>>>>>> 5. jul. 2017 kl. 15.23 skrev Thaer Sammar <t.sam...@geophy.com>:
>>>>>> 
>>>>>> Hi,
>>>>>> We are trying to index documents of different types. Document have
>>>>> different fields. fields are known at indexing time. We run a query
>>>> on a
>>>>> database and we index what comes using query variables as field names
>>>> in
>>>>> solr. Our current solution: we use dynamic fields with prefix, for
>>>> example
>>>>> feature_i_*, the issue with that
>>>>>> 1) we need to define the type of the dynamic field and to be able
>>>> to
>>>>> cover the type of discovered fields we define the following
>>>>>> feature_i_* for integers, feature_t_* for string, feature_d_* for
>>>>> double, ....
>>>>>> 1.a) this means we need to check the type of the discovered field
>>>> and
>>>>> then put in the corresponding dynamic field
>>>>>> 2) at search time, we need to know the right prefix
>>>>>> We are looking for help to find away to ignore the prefix and check
>>>> of
>>>>> the type
>>>>>> regards,
>>>>>> Thaer
>>>>> 
>>> -- 
>>> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>

Re: index new discovered fileds of different types

Reply via email to