April Seattle Hadoop/Scalability/NoSQL Meetup: Cassandra, Science, More!

2010-04-21 Thread Bradford Stephens
Hey there! Wanted to let you all know about our next meetup, April 28th. We've got a killer new venue thanks to Amazon. Check out the details at the link: http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/calendar/13072272/ Our Speakers this month: 1. Nick Dimiduk, Drawn to Scale: Intro to

[WEBINAR] Practical Search with Solr: Beyond just looking it up

2010-04-21 Thread Erik Hatcher
Below is the official announcement for our exciting upcoming webinar. This one is near and dear to my heart, so I'll be eagerly listening too, and participating with my experiences as it fits with the flow of the webinar. I'm a card-carrying library geek, and I've had the pleasure of worki

Can you help us - Filling out our Survey

2010-04-21 Thread Mário andré
Dear Developers, Can you help us and answer this survey: What approach do you use to comprehend software? It is fast and easy. Fill out our Survey in: www.neurominer.com/survey Thank you very much. - Mário André Federal Institute of Sergipe, Professor Student

Re: are long words split into up to 256 long tokens?

2010-04-21 Thread jm
ok https://issues.apache.org/jira/browse/LUCENE-2407 On Wed, Apr 21, 2010 at 4:18 PM, Uwe Schindler wrote: > Can you open a bug report to make this configureable, so we don't forget > this? E.g. StandardTokenizer is able to change this. > > Thanks, > Uwe > > - > Uwe Schindler > H.-H.-Meier-A

RE: are long words split into up to 256 long tokens?

2010-04-21 Thread Uwe Schindler
Can you open a bug report to make this configureable, so we don't forget this? E.g. StandardTokenizer is able to change this. Thanks, Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: jm [mailto:jmug

Re: are long words split into up to 256 long tokens?

2010-04-21 Thread jm
oh, yes it does extend CharTokenizer..thanks Ahmet. I had searched lucene source code for 256 and found nothing suspicious, and that was itself suspicious cause it looked clearly like an inner limit. Of course I should have searched for 255... I'll see how I proceed cause I don't want to use a cus

Re: are long words split into up to 256 long tokens?

2010-04-21 Thread Ahmet Arslan
> Is 256 some inner maximum too > in some > lucene internal that causes this? What is happening is that > the long > word is split into smaller words up to 256 and then the min > and max > limit applied. Is that correct? I have removed LengthFilter > and still > see the splitting at 256 happen. I w

Re: Short circuit in query ...

2010-04-21 Thread Michael McCandless
Lucene attempts to drive the query by the clause that's least frequent, so order or your clauses will not matter. But, it uses a simplistic heuristic to do so: it looks at the first docID for each sub-clause and then reorders them in decreasing docID order. This isn't a perfect optimization since

are long words split into up to 256 long tokens?

2010-04-21 Thread jm
I am analizying this wiht my custom analyzer: String s = "mail77 mail8 tc ro45mine durante jjkk

Re: Reaching the posting lists

2010-04-21 Thread Michael McCandless
They are indeed abstract in the IndexReader base class, but, the concrete implementation you get from IndexReader.open or IndexWriter.getReader implements the methods to return TermDocs/Positions, and, they return concrete implementations of these abstract classes. Mike 2010/4/21 Yağız Kargın : >

Re: Range Query Assistance

2010-04-21 Thread Otis Gospodnetic
Joseph, If you can, get the latest Lucene and use NumericField to index your dates with appropriate precision and then use NumericRangeQueries when searching. This will be faster than searching for string dates in a given range. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nut

Re: analyzer not working properly when indexing

2010-04-21 Thread jm
ok, got this. I upgraded my analyzer to new api but it was not correct... thanks On Wed, Apr 21, 2010 at 11:45 AM, Ian Lea wrote: > OK, so it does indeed look like a problem with your analyzer, as you > suspected. > > You could confirm that by using e.g. WhitespaceAnalyzer instead.  Then > mayb

Re: Reaching the posting lists

2010-04-21 Thread Yağız Kargın
Thanks for the answer. However those classes and methods are abstract. Should I write my own implementation? Since Lucene is able to do indexing and searching, I think there should be already implementation of these things. Sorry, if Mike's answer also includes this obviously. But I couldn't get i

Re: Short circuit in query ...

2010-04-21 Thread Ian Lea
The order does not matter but searches on terms with few matches are likely to be quicker than searches on terms with many matches, -- Ian. On Tue, Apr 20, 2010 at 11:12 PM, John4982 wrote: > > Hi > > does Lucene search uses short-circuit when i execute query like: > > A:10 AND b:20 AND c:30 >

Short circuit in query ...

2010-04-21 Thread John4982
Hi does Lucene search uses short-circuit when i execute query like: A:10 AND b:20 AND c:30 In general, does position of field names can impact search performance e.g. if field A with value 10 is more frequent is this mean that this will be slower than if value 10 is less frequent? best John --

Re: Set Analyzer without QueryParser

2010-04-21 Thread Ian Lea
If parts of your BooleanQuery need an analyzer, specify that when you create that bit of the overall query. BooleanQuery bq = new BooleanQuery(); NumericRangeQuery nrq = ...; QueryParser qp = new QueryParser(..., analyzer); Query q = qp.parse("whatever"); ... bq.add(nrq, ...); bq.add(q, ...); ...

Re: analyzer not working properly when indexing

2010-04-21 Thread Ian Lea
OK, so it does indeed look like a problem with your analyzer, as you suspected. You could confirm that by using e.g. WhitespaceAnalyzer instead. Then maybe post the code for your custom analyzer, or step through in a debugger or however you prefer to debug code. -- Ian. On Wed, Apr 21, 2010 a

Re: analyzer not working properly when indexing

2010-04-21 Thread jm
I am using a TermQuery so no analyzer used... protected static int getHitCount(Directory directory, String fieldName, String searchString) throws IOException { IndexSearcher searcher = new IndexSearcher(directory, true); //5 Term t = new Term(fieldName, searchString); Query