Re: How to index long words with StandardTokenizerFactory?

Sergey Bartunov Sat, 23 Oct 2010 08:01:48 -0700

This is exactly what I did. Look:

>> >> 3) I replace lucene-core-2.9.3.jar in solr/lib/ by
>> my
>> >> lucene-core-2.9.3-dev.jar that I'd just compiled
>> >> 4) than I do "ant compile" and "ant dist" in solr
>> folder
>> >> 5) after that I recompile
>> solr/example/webapps/solr.war


On 23 October 2010 18:53, Ahmet Arslan <iori...@yahoo.com> wrote:
> I think you should replace your new lucene-core-2.9.3-dev.jar in 
> \apache-solr-1.4.1\lib and then create a new solr.war under 
> \apache-solr-1.4.1\dist. And copy this new solr.war to 
> solr/example/webapps/solr.war
>
> --- On Sat, 10/23/10, Sergey Bartunov <sbos....@gmail.com> wrote:
>
>> From: Sergey Bartunov <sbos....@gmail.com>
>> Subject: Re: How to index long words with StandardTokenizerFactory?
>> To: solr-user@lucene.apache.org
>> Date: Saturday, October 23, 2010, 5:45 PM
>> Yes. I did. Won't help.
>>
>> On 23 October 2010 17:45, Ahmet Arslan <iori...@yahoo.com>
>> wrote:
>> > Did you delete the folder
>> Jetty_0_0_0_0_8983_solr.war_** under
>> apache-solr-1.4.1\example\work?
>> >
>> > --- On Sat, 10/23/10, Sergey Bartunov <sbos....@gmail.com>
>> wrote:
>> >
>> >> From: Sergey Bartunov <sbos....@gmail.com>
>> >> Subject: Re: How to index long words with
>> StandardTokenizerFactory?
>> >> To: solr-user@lucene.apache.org
>> >> Date: Saturday, October 23, 2010, 3:56 PM
>> >> Here are all the files: http://rghost.net/3016862
>> >>
>> >> 1) StandardAnalyzer.java, StandardTokenizer.java -
>> patched
>> >> files from
>> >> lucene-2.9.3
>> >> 2) I patch these files and build lucene by typing
>> "ant"
>> >> 3) I replace lucene-core-2.9.3.jar in solr/lib/ by
>> my
>> >> lucene-core-2.9.3-dev.jar that I'd just compiled
>> >> 4) than I do "ant compile" and "ant dist" in solr
>> folder
>> >> 5) after that I recompile
>> solr/example/webapps/solr.war
>> >> with my new
>> >> solr and lucene-core jars
>> >> 6) I put my schema.xml in solr/example/solr/conf/
>> >> 7) then I do "java -jar start.jar" in
>> solr/example
>> >> 8) index big_post.xml
>> >> 9) trying to find this document by "curl
>> >> http://localhost:8983/solr/select?q=body:big*";
>> >> (big_post.xml contains
>> >> a long word bigaaaaa...aaaa)
>> >> 10) solr returns nothing
>> >>
>> >> On 23 October 2010 02:43, Steven A Rowe <sar...@syr.edu>
>> >> wrote:
>> >> > Hi Sergey,
>> >> >
>> >> > What does your ~34kb field value look like?
>>  Does
>> >> StandardTokenizer think it's just one token?
>> >> >
>> >> > What doesn't work?  What happens?
>> >> >
>> >> > Steve
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Sergey Bartunov [mailto:sbos....@gmail.com]
>> >> >> Sent: Friday, October 22, 2010 3:18 PM
>> >> >> To: solr-user@lucene.apache.org
>> >> >> Subject: Re: How to index long words
>> with
>> >> StandardTokenizerFactory?
>> >> >>
>> >> >> I'm using Solr 1.4.1. Now I'm successed
>> with
>> >> replacing lucene-core jar
>> >> >> but maxTokenValue seems to be used in
>> very strange
>> >> way. Currenty for
>> >> >> me it's set to 1024*1024, but I couldn't
>> index a
>> >> field with just size
>> >> >> of ~34kb. I understand that it's a little
>> weird to
>> >> index such a big
>> >> >> data, but I just want to know it doesn't
>> work
>> >> >>
>> >> >> On 22 October 2010 20:36, Steven A Rowe
>> <sar...@syr.edu>
>> >> wrote:
>> >> >> > Hi Sergey,
>> >> >> >
>> >> >> > I've opened an issue to add a
>> maxTokenLength
>> >> param to the
>> >> >> StandardTokenizerFactory configuration:
>> >> >> >
>> >> >> >        https://issues.apache.org/jira/browse/SOLR-2188
>> >> >> >
>> >> >> > I'll work on it this weekend.
>> >> >> >
>> >> >> > Are you using Solr 1.4.1?  I ask
>> because of
>> >> your mention of Lucene
>> >> >> 2.9.3.  I'm not sure there will ever be
>> a Solr
>> >> 1.4.2 release.  I plan on
>> >> >> targeting Solr 3.1 and 4.0 for the
>> SOLR-2188 fix.
>> >> >> >
>> >> >> > I'm not sure why you didn't get the
>> results
>> >> you wanted with your Lucene
>> >> >> hack - is it possible you have other
>> Lucene jars
>> >> in your Solr classpath?
>> >> >> >
>> >> >> > Steve
>> >> >> >
>> >> >> >> -----Original Message-----
>> >> >> >> From: Sergey Bartunov [mailto:sbos....@gmail.com]
>> >> >> >> Sent: Friday, October 22, 2010
>> 12:08 PM
>> >> >> >> To: solr-user@lucene.apache.org
>> >> >> >> Subject: How to index long words
>> with
>> >> StandardTokenizerFactory?
>> >> >> >>
>> >> >> >> I'm trying to force solr to
>> index words
>> >> which length is more than 255
>> >> >> >> symbols (this constant is
>> >> DEFAULT_MAX_TOKEN_LENGTH in lucene
>> >> >> >> StandardAnalyzer.java) using
>> >> StandardTokenizerFactory as 'filter' tag
>> >> >> >> in schema configuration XML.
>> Specifying
>> >> the maxTokenLength attribute
>> >> >> >> won't work.
>> >> >> >>
>> >> >> >> I'd tried to make the dirty
>> hack: I
>> >> downloaded lucene-core-2.9.3 src
>> >> >> >> and changed the
>> DEFAULT_MAX_TOKEN_LENGTH
>> >> to 1000000, built it to jar
>> >> >> >> and replaced original
>> lucene-core jar in
>> >> solr /lib. But seems like
>> >> >> >> that it had bring no effect.
>> >>
>> >
>> >
>> >
>> >
>>
>
>
>
>

Re: How to index long words with StandardTokenizerFactory?

Reply via email to