Tokenizers are not thread safe (I made a mistake yesterday saying they are - I 
don't know what I was thinking).
This is why:

public abstract class Tokenizer extends TokenStream {
  /** The text source for this Tokenizer. */
  protected Reader input;                                   <---- oops :(
  ...

public abstract class CharTokenizer extends Tokenizer {
  public CharTokenizer(Reader input) {
    super(input);
  }
  ...

Otis
 
--
Lucene Consulting -- http://lucene-consulting.com/


----- Original Message ----
From: Daniel Alheiros <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, June 22, 2007 12:43:50 PM
Subject: Re: add CJKTokenizer to solr

Sorry I've confused things a bit... The thread safeness have to be
considered only on the Tokenizers, not on the factories. So are the
Tokenizers thread safe?

Regards,
Daniel


On 22/6/07 11:36, "Daniel Alheiros" <[EMAIL PROTECTED]> wrote:

> Hi Hoss.
> 
> I've done a few tests using reflection to instantiate a simple object and
> the results will vary a lot depending on the JVM. As the JVM optimizes code
> as it is executed it will vary depending on the usage, but I think we have
> something to consider:
> 
> If done 1,000 samples (5 clean X loop of 200) and each sample is creating
> 100,000 objects and the results were:
> 
> With reflection:
>     - Average                      : 0.0005418
>     - Worst (first clean execution): 0.0007760
> 
> Without reflection:
>     - Average                      : 0.0000469
>     - Worst (first clean execution): 0.0002140
> 
> So comparing these numbers, I can see that using reflection on the average
> case will cost 10 times more than creating the object without reflection.
> 
> But my question is: Do we need to create factories so frequently or the are
> just create once and re-used (are they thread safe)? The term Factory made
> me think of a class that is responsible for building others instance, so
> usually they can be singletons... If they don't need to be created all the
> time it will not impact really and will give extra flexibility in terms of
> incorporating new Tokenizers (it would make easier to make Solr/Lucene
> versions less coupled).
> 
> Environment:
> java version "1.5.0_07"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_07-164)
> Java HotSpot(TM) Client VM (build 1.5.0_07-87, mixed mode, sharing)
> Heap size: 256M
> Running on a PowerPC - Mac OS/X 10.4.9 with 1.5Gb RAM
> 
> Regards,
> Daniel
> 
> 
> On 21/6/07 20:39, "Chris Hostetter" <[EMAIL PROTECTED]> wrote:
> 
>> 
>> : Why instead of that we don't create an UbberFactory that takes the
>> Tokenizer
>> : class as a parameter and instantiates the proper Tokenizer?
>> 
>> The idea has come up before ... and there's really no reason why it
>> wouldn't be okay to include a reflection based facotry like this in Solr
>> -- it just hasn't been done yet.
>> 
>> One of the reasons is that there are some performance costs associated
>> with the reflection, so we wouldn't want to competley replace the existing
>> "configuration via factory name" model with a "configure via class name
>> and an uber factory does the reflection quetly in the background" model
>> because it's the kind of appraoch that would really only make sense for
>> simple prototypes -- in any system where you are really concerned about
>> performacne, reflection on every analyzer call would probably be pretty
>> expensive.  (allthough i'd love to see benchmarks prove me wrong)
>> 
>> Another question in my mind is "why doesn't solr provide an optional jar
>> with factories for every tokenizer/tokenfilter in the lucene contribs?"
>> ... the only answer to that is that no one has bothered to crank out a
>> patch that does it.
>> 
>> http://www.nabble.com/Re%3A-making-schema.xml-nicer-to-read-use-p5939980.html
>> http://www.nabble.com/foo-tf1737025.html#a4720545
>> 
>> 
>> -Hoss
>> 
> 
> 
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in reliance on
> it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
> 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
                    



Reply via email to