Maybe if I were to say that the column "user_id" will become "user_ids" that would clarify things?
user_id:2002+AND+created:[${**from}+TO+${until}]+data:"more" becomes user_id*s*:2002+AND+created:[${**from}+TO+${until}]+data:"more" where I want 2002 to be an exact positive match on one of the user_ids embedded in the TEXT ... not string :) If I am totally off or making no sense, feedback it very welcome. I am just seeing lots of similar data going into my db and it feels like Solr should be able to handle this. I just want to know if transforming the data like that will still allow exact searches against a user_id. My language from a solr gurus point of view is probably *very* poorly phrased ... "exact" and TEXT might not go hand in hand. Is the TEXT "20 1442 35" parsed as "20" "1442" "35" so that a search against it for "1442" will yield "exact" results? A search against "442" wont match right? 1. "20 1442 35" 2. "20 442 35" 3. "20 1442" user_ids:1442 -> yields #1 & #3 always? user_ids:442 -> yields only #2 always? My lack of understanding about what solr does when it indexes is shining through :) On Fri, Jun 7, 2013 at 1:43 PM, z z <zenlok.testi...@gmail.com> wrote: > My language might be a bit off (I am saying "string" when I probably mean > "text" in the context of solr), but I'm pretty sure that my story is > unwavering ;) > > `id` int(11) NOT NULL AUTO_INCREMENT > `created` int(10) > `data` varbinary(255) > `user_id` int(11) > > So, imagine that we have 1000 entries come in where "data" above is > exactly the same for all 1000 entries, but user_id is different (id and > created being different is irrelevant). I am thinking that prior to > inserting into mysql, I should be able to concatenate the user_ids together > with whitespace and then insert them into something like: > > `id` int(11) NOT NULL AUTO_INCREMENT > `created` int(10) > `data` varbinary(255) > `user_id` blob > > Then on solr's end it will treat the user_id as Text and parse it (I want > to say tokenize, but maybe my language is incorrect here?). > > Then when I search > > user_id:2002+AND+created:[${**from}+TO+${until}]+data:"more" > > I want to be sure that if I look for user_id "2002", I will get data that > only has a value "2002" in the user_id column and that a separate user with > id "20" cannot accidentally pull data for user_id "2002" as a result of a > fuzzy (my language ok?) match of 20 against (20)02. > > Current schema definition: > > <field name="user_id" type="int" indexed="true" stored="true"/> > > New schema definition: > > <field name="user_id" type="user_id_string" indexed="true" > stored="true"/> > ... > <fieldType name="user_id_string" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.WhitespaceTokenizerFactory" > maxTokenLength="120"/> > </analyzer> > </fieldType> > >