I would focus on this :

"

> 5> now kick off the DIH job and look again.
>
Now it shows a histogram, but most of the "terms" are long -- the full
texts of (the table.column) eventlogtext.logtext, including the whitespace
(with %0A used for newline characters)...  So, it appears it is not being
tokenized properly, correct?"
Can you open from your Solr ui , the schema xml and show us the snippets
for that field that seems to not tokenise ?
Can you show us ( even a screenshot is fine) the schema browser page
related ?
Could be a problem of encoding ?
Following Erick details about the analysis, what are your results ?

Cheers

2015-09-24 8:04 GMT+01:00 Upayavira <u...@odoko.co.uk>:

> typically, the index dir is inside the data dir. Delete the index dir
> and you should be good. If there is a tlog next to it, you might want to
> delete that also.
>
> If you dont have a data dir, i wonder whether you set the data dir when
> creating your core or collection. Typically the instance dir and data
> dir aren't needed.
>
> Upayavira
>
> On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote:
> > OK, this is bizarre. You'd have had to set up SolrCloud by specifying the
> > -zkRun command when you start Solr or the -zkHost; highly unlikely. On
> > the
> > admin page there would be a "cloud" link on the left side, I really doubt
> > one's there.
> >
> > You should have a data directory, it should be the parent of the index
> > and
> > tlog directories. As of sanity check try looking at the analysis page.
> > Type
> > a bunch of words in the left hand side indexing box and uncheck the
> > verbose
> > box. As you can tell I'm grasping at straws. I'm still puzzled why you
> > don't have a "data" directory here, but that shouldn't really matter. How
> > did you create this index? I don't mean data import handler more how did
> > you create the core that you're indexing to?
> >
> > Best,
> > Erick
> >
> > On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers <mark.fenb...@noaa.gov>
> > wrote:
> >
> > > On 9/23/2015 12:30 PM, Erick Erickson wrote:
> > >
> > >> Then my next guess is you're not pointing at the index you think you
> are
> > >> when you 'rm -rf data'
> > >>
> > >> Just ignore the Elall field for now I should think, although get rid
> of it
> > >> if you don't think you need it.
> > >>
> > >> DIH should be irrelevant here.
> > >>
> > >> So let's back up.
> > >> 1> go ahead and "rm -fr data" (with Solr stopped).
> > >>
> > > I have no "data" dir.  Did you mean "index" dir?  I removed 3 index
> > > directories (2 for spelling):
> > > cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex
> > >
> > >> 2> start Solr
> > >> 3> do NOT re-index.
> > >> 4> look at your index via the schema-browser. Of course there should
> be
> > >> nothing there!
> > >>
> > > Correct!  It said "there is no term info :("
> > >
> > >> 5> now kick off the DIH job and look again.
> > >>
> > > Now it shows a histogram, but most of the "terms" are long -- the full
> > > texts of (the table.column) eventlogtext.logtext, including the
> whitespace
> > > (with %0A used for newline characters)...  So, it appears it is not
> being
> > > tokenized properly, correct?
> > >
> > >> Your logtext field should have only single tokens. The fact that you
> have
> > >> some very
> > >> long tokens presumably with whitespace) indicates that you aren't
> really
> > >> blowing
> > >> the index away between indexing.
> > >>
> > > Well, I did this time for sure.  I verified that initially, because it
> > > showed there was no term info until I DIH'd again.
> > >
> > >> Are you perhaps in Solr Cloud with more than one replica?
> > >>
> > > Not that I know of, but being new to Solr, there could be things going
> on
> > > that I'm not aware of.  How can I tell?  I certainly didn't set
> anything up
> > > for solrCloud deliberately.
> > >
> > >> In that case you
> > >> might be getting the index replicated on startup assuming you didn't
> > >> blow away all replicas. If you are in SolrCloud, I'd just delete the
> > >> collection and
> > >> start over, after insuring that you'd pushed the configset up to
> > >> Zookeeper.
> > >>
> > >> BTW, I always look at the schema.xml file from the Solr admin window
> just
> > >> as
> > >> a sanity check in these situations.
> > >>
> > > Good idea!  But the one shown in the browser is identical to the one
> I've
> > > been editing!  So that's not an issue.
> > >
> > >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Reply via email to