Re: Recap on derived objects in Solr Index, 'schema in a can'

Dennis Gearon Wed, 22 Dec 2010 16:28:14 -0800

I think I'm just going to have to have my partner and I play with both cores 
and 
dynamic fields.


If multiple cores are queried, and the schemas match up in order and postion 
for 
the base fields, the 'extra' fields in the different cores just show up in the 
result set with their field names? The query against different cores, with 
'base 
attributes' and 'extended attributes' has to be tailored for each core, right? 
I.E., not querying for fields that don't exist?

(That could be handled by making the query a server side langauge object with 
inheritance for the extended fields)

 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



----- Original Message ----
From: Lance Norskog <goks...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Wed, December 22, 2010 1:45:04 PM
Subject: Re: Recap on derived objects in Solr Index, 'schema in a can'

A dynamic field just means that the schema allows any field with a
name matching the wildcard. That's all.

There is no support for referring to all of the existing fields in the
wildcard. That is, there is no support for "*_en:word" as a field
search. Nor is there any kind of grouping for facets. The feature for
addressing a particular field in some of the parameters does not
support wildcards. If you add wildcard fields, you have to remember
what they are.

On Wed, Dec 22, 2010 at 11:04 AM, Dennis Gearon <gear...@sbcglobal.net> wrote:
> I'm open to cores, if it's the faster(indexing/querying/keeping mentally
> straight) way to do things.
>
> But from what you say below, the eventual goal of the site would mean either 
>100
> extra 'generic' fields, or 1,000-100,000's of cores.
> Probably cores is easier to administer for security and does more accurate
> querying?
>
> What is the relationship between dynamic fields and the schema?
>
>  Dennis Gearon
>
>
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is usually a 
>better
> idea to learn from others’ mistakes, so you do not have to make them yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> ----- Original Message ----
> From: Erick Erickson <erickerick...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Wed, December 22, 2010 10:44:27 AM
> Subject: Re: Recap on derived objects in Solr Index, 'schema in a can'
>
> No, one cannot ignore the schema. If you try to add a field not in the
> schema you get
> an error. One could, however, use any arbitrary subset
> of the fields defined in the schema for any particular #document# in the
> index. Say
> your schema had fields f1, f2, f3...f10. You could have fields f1-f5 in one
> doc, and
> fields f6-f10 in another and f1, f4, f9 in another and.....
>
> The only field(s) that #must# be in a document are the required="true"
> fields.
>
> There's no real penalty for omitting fields from particular documents. This
> allows
> you to store "special" documents that aren't part of normal searches.
>
> You could, for instance, use a document to store meta-information about your
> index that had whatever meaning you wanted in a field(s) that *no* other
> document
> had. Your app could then read that "special" document and make use of that
> info.
> Searches on "normal" documents wouldn't return that doc, etc.
>
> You could effectively have N indexes contained in one index where a document
> in each logical sub-index had fields disjoint from the other logical
> sub-indexes.
> Why you'd do something like that rather than use cores is a very good
> question,
> but you #could# do it that way...
>
> All this is much different from a database where there are penalties for
> defining
> a large number of unused fields.
>
> Whether doing this is wise or not given the particular problem you're trying
> to
> solve is another discussion <G>..
>
> Best
> Erick
>
> On Mon, Dec 20, 2010 at 11:03 PM, Dennis Gearon <gear...@sbcglobal.net>wrote:
>
>> Based on more searches and manual consolidation, I've put together some of
>> the ideas for this already suggested in a summary below. The last item in
>> the
>> summary
>> seems to be interesting, low technical cost way of doing it.
>>
>> Basically, it treats the index like a 'BigTable', a la "No SQL".
>>
>> Erick Erickson pointed out:
>> "...but there's absolutely no requirement
>> that all documents in SOLR have the same fields..."
>>
>> I guess I don't have the right understanding of what goes into a Document
>> in Solr. Is it just a set of fields, each with it's own independent field
>> type
>> declaration/id, it's name, and it's content?
>>
>> So even though there's a schema for an index, one could ignore it and
>> jsut throw any other named fields and types and content at document
>> addition
>> time?
>>
>> So If I wanted to search on a base set, all documents having it, I could
>> then
>> additionally filter based on the (might be wrong use of this) dynamic
>> fields?
>>
>>
>>
>>
>>
>>
>> Origninal Thread that I started:
>> ----------------------------------------
>>
>>http://lucene.472066.n3.nabble.com/A-schema-inside-a-Solr-Schema-Schema-in-a-can-tt2103260.html
>>
>>l
>>
>>
>>-----------------------------------------------------------------------------------------------------
>>
>>-
>>
>> Repeat of the problem, (not actual ratios, numbers, i.e. could be WORSE!):
>>
>>-----------------------------------------------------------------------------------------------------
>>
>>-
>>
>>
>> 1/ Base object of some kind, x number of fields
>> 2/ Derived objects representing Divisiion in company, different customer
>> bases,
>> etc.
>>      each having 2 additional, unique fields.
>> 3/ Assume 1000 such derived object types
>> 4/ A 'flattened' Index would have the x base object fields,
>>    ****and 2000**** additional fields
>>
>>
>> ================================================
>> Solutions Posited
>> -----------------------
>>
>> A/ First thought, muliti-value columns as key pairs.
>>      1/ Difficult to access individual items of more than one 'word' length
>>             for querying in multivalued fields.
>>      2/ All sorts of statistical stuff probably wouldn't apply?
>>      3/ (James Dayer said:) There's also one "gotcha" we've experienced
>> when
>> searching acrosse
>>            multi-valued fields:  SOLR will match across field occurences.
>>             In the  example below, if you were to search
>> q=contrib_name:(james
>> AND smith),
>>             you will get this record back.  It matches one name from one
>> contributor  and
>>
>>             another name from a different contributor.  This is not what
>> our
>> users want.
>>
>>
>>             As a work-around, I am converting these to phrase queries with
>>             slop: "james smith"~50 ... Just use a slop # smaller than your
>> positionIncrementGap
>>
>>             and bigger than the # of terms entered.  This will  prevent the
>> cross-field matches
>>
>>             yet allow the words to occur in any  order.
>>
>>            The problem with this approach is that Lucene doesn't support
>> wildcards in phrases
>> B/ Dynamic fields was suggested, but I am not sure exactly how they
>>        work, and the person who suggested it was not sure it would work,
>> either.
>> C/ Different field naming conventions were suggested in field types were
>> similar.
>>        I can't predict that.
>> D/ Found this old thread, and i had other suggestions:
>>       1/ Use multiple cores, one for each record type/schema, aggregate
>> them in
>> during the query.
>>       2/ Use a fixed number of additional fields X 2. Eatch additional
>> field is
>> actually a pair of fields.
>>           The first of the pair gives the colmn name, the second gives the
>> data.
>>
>>            a) Although I like this, I wonder how many extra fields to use,
>>            b) it was pointed out that relevancy and other statistical
>> criterial
>> for queries might suffer.
>>       3/ Index the different objects exactly as they are, i.e. as Erick
>> Erickson said:
>>           "I'm not entirely sure this is germane, but there's absolutely no
>> requirement
>>
>>           that all documents in SOLR have the same fields. So it's possible
>> for
>> you to
>>
>>           index the "wildly different content" in "wildly different fields"
>> <G>. Then
>>
>>           searching for screen:LCD would be straightforward."...
>> Dennis Gearon
>>
>>
>> Signature Warning
>> ----------------
>> It is always a good idea to learn from your own mistakes. It is usually a
>> better
>> idea to learn from others’ mistakes, so you do not have to make them
>> yourself.
>> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>
>>
>> EARTH has a Right To Life,
>> otherwise we all die.
>>
>>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Recap on derived objects in Solr Index, 'schema in a can'

Reply via email to