I would focus on this : "
> 5> now kick off the DIH job and look again. > Now it shows a histogram, but most of the "terms" are long -- the full texts of (the table.column) eventlogtext.logtext, including the whitespace (with %0A used for newline characters)... So, it appears it is not being tokenized properly, correct?" Can you open from your Solr ui , the schema xml and show us the snippets for that field that seems to not tokenise ? Can you show us ( even a screenshot is fine) the schema browser page related ? Could be a problem of encoding ? Following Erick details about the analysis, what are your results ? Cheers 2015-09-24 8:04 GMT+01:00 Upayavira <u...@odoko.co.uk>: > typically, the index dir is inside the data dir. Delete the index dir > and you should be good. If there is a tlog next to it, you might want to > delete that also. > > If you dont have a data dir, i wonder whether you set the data dir when > creating your core or collection. Typically the instance dir and data > dir aren't needed. > > Upayavira > > On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote: > > OK, this is bizarre. You'd have had to set up SolrCloud by specifying the > > -zkRun command when you start Solr or the -zkHost; highly unlikely. On > > the > > admin page there would be a "cloud" link on the left side, I really doubt > > one's there. > > > > You should have a data directory, it should be the parent of the index > > and > > tlog directories. As of sanity check try looking at the analysis page. > > Type > > a bunch of words in the left hand side indexing box and uncheck the > > verbose > > box. As you can tell I'm grasping at straws. I'm still puzzled why you > > don't have a "data" directory here, but that shouldn't really matter. How > > did you create this index? I don't mean data import handler more how did > > you create the core that you're indexing to? > > > > Best, > > Erick > > > > On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers <mark.fenb...@noaa.gov> > > wrote: > > > > > On 9/23/2015 12:30 PM, Erick Erickson wrote: > > > > > >> Then my next guess is you're not pointing at the index you think you > are > > >> when you 'rm -rf data' > > >> > > >> Just ignore the Elall field for now I should think, although get rid > of it > > >> if you don't think you need it. > > >> > > >> DIH should be irrelevant here. > > >> > > >> So let's back up. > > >> 1> go ahead and "rm -fr data" (with Solr stopped). > > >> > > > I have no "data" dir. Did you mean "index" dir? I removed 3 index > > > directories (2 for spelling): > > > cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex > > > > > >> 2> start Solr > > >> 3> do NOT re-index. > > >> 4> look at your index via the schema-browser. Of course there should > be > > >> nothing there! > > >> > > > Correct! It said "there is no term info :(" > > > > > >> 5> now kick off the DIH job and look again. > > >> > > > Now it shows a histogram, but most of the "terms" are long -- the full > > > texts of (the table.column) eventlogtext.logtext, including the > whitespace > > > (with %0A used for newline characters)... So, it appears it is not > being > > > tokenized properly, correct? > > > > > >> Your logtext field should have only single tokens. The fact that you > have > > >> some very > > >> long tokens presumably with whitespace) indicates that you aren't > really > > >> blowing > > >> the index away between indexing. > > >> > > > Well, I did this time for sure. I verified that initially, because it > > > showed there was no term info until I DIH'd again. > > > > > >> Are you perhaps in Solr Cloud with more than one replica? > > >> > > > Not that I know of, but being new to Solr, there could be things going > on > > > that I'm not aware of. How can I tell? I certainly didn't set > anything up > > > for solrCloud deliberately. > > > > > >> In that case you > > >> might be getting the index replicated on startup assuming you didn't > > >> blow away all replicas. If you are in SolrCloud, I'd just delete the > > >> collection and > > >> start over, after insuring that you'd pushed the configset up to > > >> Zookeeper. > > >> > > >> BTW, I always look at the schema.xml file from the Solr admin window > just > > >> as > > >> a sanity check in these situations. > > >> > > > Good idea! But the one shown in the browser is identical to the one > I've > > > been editing! So that's not an issue. > > > > > > > -- -------------------------- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England