Re: How to return results with null values?

2007-02-08 Thread Chris Hostetter
: > take a look at RangeFilter and set both the upper and lower terms to be : > null -- : > or if you need a Query and can't use a filter, do the same thing with : > ConstantScoreRangeQuery. : This is like, it scores sth for each document that has a field, no matter : the content? it scores the

Re: Re : Re: Re : Re: Re : Re: Question concerning Analyzers

2007-02-08 Thread Erick Erickson
See below On 2/8/07, Xavier To <[EMAIL PROTECTED]> wrote: Thanks for helping me. I don't really understand what you mean by my Tokenizer "corrects" what the indexing analyzer did. You shouldn't have to do change the tokens in the usual case to get the search to work right. You mentioned

Re : Re: Re : Re: Re : Re: Question concerning Analyzers

2007-02-08 Thread Xavier To
Thanks for helping me. I don't really understand what you mean by my Tokenizer "corrects" what the indexing analyzer did. By the way, the tokenizer we use is one provided in Lucene. My guess is that the problem was that the analyzer was thought to be the same by the guy who made the search engi

Re: How to not tokenize HTML tag from input string

2007-02-08 Thread Yonik Seeley
On 2/8/07, Peter W. <[EMAIL PROTECTED]> wrote: Using a parser to get text out of HTML, XML (including RSS, ATOM) is only easy if you control the source documents. HTML pages in the wild are much different, generating exceptions you must catch and deal with. Yes, that's why the Solr version isn

Re: How to not tokenize HTML tag from input string

2007-02-08 Thread Peter W.
Hello, Using a parser to get text out of HTML, XML (including RSS, ATOM) is only easy if you control the source documents. HTML pages in the wild are much different, generating exceptions you must catch and deal with. For most projects you can probably use java.util.regex to obtain keywo

Re: 'a', 's' and 't' don't index properly

2007-02-08 Thread Erik Hatcher
On Feb 8, 2007, at 2:14 PM, Mike Klaas wrote: On 2/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: Is there a .NET version of Solr? Nope. But, here's the beauty of Solr... if you're not afraid of a JVM running Jetty, Tomcat, Resin, or many others then fire up (Java) Solr and use .NET

Re: Indexing a RDF document using lucene

2007-02-08 Thread phani kumar
Hi Erik, When i search for string for example "jan" in the above example,it should return me the class and property its associated as URI's and full text of object and the RDF document which it is contained.how can i acheive this. Thanks, phani. On 2/8/07, Erik Hatcher <[EMAIL PROTECTED]> wro

Re: 'a', 's' and 't' don't index properly

2007-02-08 Thread Mike Klaas
On 2/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: Is there a .NET version of Solr? Nope. -Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Slow performance (Fetching Hits)

2007-02-08 Thread Daniel Naber
On Thursday 08 February 2007 13:54, Laxmilal Menaria wrote: > This will take more than 30 secs for 1,50,000 docs (40 > MB Index).. What exactly takes this much time? You're not iterating over all hits, are you? Also see http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-1b15abeee21b0a72492b1

Re: Indexing a RDF document using lucene

2007-02-08 Thread Erik Hatcher
Elaborate on your querying needs :) Erik On Feb 8, 2007, at 1:18 PM, phani kumar wrote: Hi, I want to index a rdf document using lucene.RDF consists of a subject ,predicate ,object(as triples).suppose given a set of keywords defining a search,we want to match the URIrefs containg t

Re: categorisation

2007-02-08 Thread Erik Hatcher
On Feb 8, 2007, at 12:36 PM, Kainth, Sachin wrote: Chris has given an example of how to perform categorisation of lucene searches: String[] mfgs = ...; String query = "+category:cameras +price:[0 to 10]"; Query q = QueryParser.parse(query); Hits results = searcher.search(q, mySort) B

Re: Analyzers

2007-02-08 Thread Chris Lu
This is an example for PerFieldAnalyzerWrapper, public Analyzer getAnalyzer() throws ClassNotFoundException, InstantiationException, IllegalAccessException{ _analyzer = new PerFieldAnalyzerWrapper((Analyzer) Class.forName(this.getAnalyzerName()).newInstance()); ArrayList columns

Re: Empty search

2007-02-08 Thread Ronnie Kolehmainen
If you are refering to QueryParser, and if you mean that you want Lucene to *find everything* when you actually say *search for nothing*, you could easily extend current Queryparser to suit your needs: public class MyQueryParser extends QueryParser { public MyQueryParser(String f, Analy

Indexing a RDF document using lucene

2007-02-08 Thread phani kumar
Hi, I want to index a rdf document using lucene.RDF consists of a subject ,predicate ,object(as triples).suppose given a set of keywords defining a search,we want to match the URIrefs containg those keywords.so how can i index the triples using the lucene.please provide some help how to hash each

Re: How to return results with null values?

2007-02-08 Thread poeta simbolista
Chris Hostetter wrote: > > I'm not sure wether this question is about docs that have no value for a > field, or docs where the value of the field is null - > The former. Chris Hostetter wrote: > > adding a filter on that Field that requires *some* value might help. > Yep, that is what I

Re: Empty search

2007-02-08 Thread karl wettin
8 feb 2007 kl. 18.46 skrev Kainth, Sachin: Is it my imagination or does lucene produce an error if you present it with an empty string to search for? I presume you are referring to the QueryParser? It sounds about right that it would throw an exception at some point if you supplied it an

Re: Re : Re: Re : Re: Question concerning Analyzers

2007-02-08 Thread Erick Erickson
Well, you've proved that your problem is that the analyzer you're using when querying isn't matching what you use during indexing. I think that what you've done will lead you into significant problems down the road as your tokenizer then has to "correct" for what the index analyzer did though. Wh

Re: Analyzers

2007-02-08 Thread karl wettin
8 feb 2007 kl. 18.36 skrev Kainth, Sachin: Can you give me an example of how this might be done? The javadocs is generally a good place to start: http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/ PerFieldAnalyzerWrapper.html -- karl --

RE: Analyzers

2007-02-08 Thread Kainth, Sachin
Can you provide an example? -Original Message- From: Chris Lu [mailto:[EMAIL PROTECTED] Sent: 08 February 2007 17:35 To: java-user@lucene.apache.org Subject: Re: Analyzers This is totally possible. -- Chris Lu - Instant Full-Text Search On Any Database/Applicati

RE: Analyzers

2007-02-08 Thread Kainth, Sachin
Can you give me an example of how this might be done? -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 08 February 2007 17:34 To: java-user@lucene.apache.org Subject: Re: Analyzers Use PerFieldAnalyzerWrapper. On 2/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote:

categorisation

2007-02-08 Thread Kainth, Sachin
Chris has given an example of how to perform categorisation of lucene searches: String[] mfgs = ...; String query = "+category:cameras +price:[0 to 10]"; Query q = QueryParser.parse(query); Hits results = searcher.search(q, mySort) BitSet all = (new QueryFilter(q)).bits(reader) int[

Re: Analyzers

2007-02-08 Thread Chris Lu
This is totally possible. -- Chris Lu - Instant Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com On 2/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: Hi all, I wanted to know if it is possible to store some fields

Re: Analyzers

2007-02-08 Thread Erick Erickson
Use PerFieldAnalyzerWrapper. On 2/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: Hi all, I wanted to know if it is possible to store some fields in an index with one analyzers and other fields with another analyzer? Cheers Sachin This email and any attached files are confidential and copy

Re : Re: Re : Re: Question concerning Analyzers

2007-02-08 Thread Xavier To
Hey ! I tried using WhitespaceAnalyzer during the search and it works. I refactored the tokenizing process so it uses TokenStream instead of StringTokenizer and it works fine for one thing : the query "this is a test" becomes "thisisatest". I fixed it by adding a space after each token except f

Re: Increase performance using Pool of IndexSearchers?

2007-02-08 Thread Phillip Rhodes
Mohammad, According to the responses on this thread, there appears to be no performance benefit to using multiple instances of IndexSearcher Unless I hear otherwise, there is no point in creating such a pool. Phillip Mohammad Norouzi wrote: Hi would you tell how we can create a searcher pool

RE: 'a', 's' and 't' don't index properly

2007-02-08 Thread Kainth, Sachin
Thanks Erik, Is there a .NET version of Solr? Cheers Sachin -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 08 February 2007 15:26 To: java-user@lucene.apache.org Subject: Re: 'a', 's' and 't' don't index properly >From the javadoc... public final class *Simp

Re: 'a', 's' and 't' don't index properly

2007-02-08 Thread Erick Erickson
From the javadoc... public final class *SimpleAnalyzer*extends Analyzer An Analyzer that filters LetterTokenizer with LowerCaseFilter. On 2/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: Thanks Erik, Do you know of an analyzer which doesn't remove the characters 'a', 's' and 't'. Sachin

Re: Counting and Categorisation

2007-02-08 Thread Erik Hatcher
On Feb 8, 2007, at 8:55 AM, Kainth, Sachin wrote: Thanks for the reply. Since writing this I have in fact now implemented the BitSet version and it works quite successfully. However, I have now found out that we will be dealing with millions of records and that for this reason we can not

RE: 'a', 's' and 't' don't index properly

2007-02-08 Thread Kainth, Sachin
Thanks Erik, Do you know of an analyzer which doesn't remove the characters 'a', 's' and 't'. Sachin -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 08 February 2007 13:54 To: java-user@lucene.apache.org Subject: Re: 'a', 's' and 't' don't index properly This r

RE: Counting and Categorisation

2007-02-08 Thread Kainth, Sachin
Hi Erik, Thanks for the reply. Since writing this I have in fact now implemented the BitSet version and it works quite successfully. However, I have now found out that we will be dealing with millions of records and that for this reason we can not use such a solution. Can you tell me what solr

Re: 'a', 's' and 't' don't index properly

2007-02-08 Thread Erick Erickson
This really should be posted on the dotlucene list, but Your indexing analyzer is probably removing them. For instance, StandardAnalyzer uses a default set of stop words, and a, s, and t are definitely among them. You need to use a different analyzer than you are using. These will also be re

Re: Strange Behaviour of BooleanQuery?

2007-02-08 Thread Oliver Hummel
Eric, thanks for your reply. > I assume it's a typo, but your for loop wouldn't produce your example as > they'd all be the same field Actually there are three loops that add it for different fields... :-) > So, here's what I'd do. Use Query.toString() for both your BooleanQuery and > the qu

Re: Counting and Categorisation

2007-02-08 Thread Erik Hatcher
On Feb 8, 2007, at 8:28 AM, Kainth, Sachin wrote: This email is meant for Chris Hostetter and of course anyone else who may know about this, I wonder if I can ask you a question. I have been reading of how you at CNET have implemented categorisation and counting so that if i type "Kodak Ea

'a', 's' and 't' don't index properly

2007-02-08 Thread Kainth, Sachin
> Hello, > > I have a database of tracks, artists and albums and I'm indexing these > 3 attributes plus also the first letter of the track thus (incidently > I'm using dotlucene but the implementation of dotlucene is similar to > the Java one): > >Document Doc = new Document(); >String Al

Counting and Categorisation

2007-02-08 Thread Kainth, Sachin
This email is meant for Chris Hostetter and of course anyone else who may know about this, I wonder if I can ask you a question. I have been reading of how you at CNET have implemented categorisation and counting so that if i type "Kodak Easyshare" in the reviews section you not only get a big li

Re: Strange Behaviour of BooleanQuery?

2007-02-08 Thread Erick Erickson
I assume it's a typo, but your for loop wouldn't produce your example as they'd all be the same field But that said, I suspect that your QueryParser is analyzing the tokens you feed it differently than how they're included when you make your own BooleanQuery. So, here's what I'd do. Use Quer

Slow performance (Fetching Hits)

2007-02-08 Thread Laxmilal Menaria
Hello, We are using wild cards in our search for that we are passing field name and term/query to some other function(wildcardSearch() ) in some other class( QueryParserClass ) which extends MultiFieldQueryParser In wildcardSearch() we are calling super.getWildcardQuery(String field,String query

Re: Is there any way to optimize existing unoptimized index?

2007-02-08 Thread Michael McCandless
maureen tanuwidjaja wrote: May I also ask wheter there is a way to use writer.optimize() without indexing the files from the beginning? It took me about 17 hrs to finish building an unoptimized index(finish when I call IndexWriter.close() ).I just wonder wheter this existing index coul

Re: exception is hit while optimizing index

2007-02-08 Thread Michael McCandless
maureen tanuwidjaja wrote: I would like to know about optimizing index... The exception is hit due to disk full while optimizing the index and hence,the index has not been closed yet. Is the unclosed index dangerous?Can i perform searching in such index correctly?Is the index built r

Strange Behaviour of BooleanQuery?

2007-02-08 Thread Oliver Hummel
Hi @all, I'm a little confused about the behaviour of BooleanQuery. I have a custom parser that analyzes some text and constrcuts an "ANDed" BooleanQuery. toString delivers something like this: (+field1:term1 +field2:term2) Looks pretty normal to me, but the problem is it delivers no results (

Re: How to not tokenize HTML tag from input string

2007-02-08 Thread Chris Hostetter
Solr has an HTMLStripReader used by an two different tokenizers for doing the basics of ignoring tags when reading text ... it has one known bug when dealing with highlighting... http://lucene.apache.org/solr/api/org/apache/solr/analysis/HTMLStripReader.html http://lucene.apache.org/solr/api/org/