Re: Searching doubt

2009-08-04 Thread Shai Erera
I can think of another approach - during indexing, capture the word aboutus and index it as about us and aboutus in the same position. That way both queries will work. You'd need to write your own TokenFilter, maybe a SynonymTokenFilter (since this reminds me of synonyms usage) that accept a list

Re: Searching doubt

2009-08-04 Thread m.harig
Thanks This is my codw snippet IndexSearcher searcher = new IndexSearcher(indexDir); Analyzer analyzer = new StopAnalyzer(); WildcardQuery query = new WildcardQuery(new Term(DEFAULT_FIELD));

Re: Searching doubt

2009-08-04 Thread Shai Erera
I don't see that you use the Analyzer anywhere (i.e. it's created by not used?). Also, the wildcard query you create may be very inefficient, as it will expand all the terms under the DEFAULT_FIELD. If the DEFAULT_FIELD is the field where all your default searchable terms are indexed, there could

Re: How to improve search time?

2009-08-04 Thread prashant ullegaddi
I did that as well. Actually, we had 32 indexes initially. We searched them. It was even horrible. After that I merged them into 4 indexes. And did the same. No gain! Then, I had to merge 32 indexes into one. On Tue, Aug 4, 2009 at 10:48 AM, Anshum ansh...@gmail.com wrote: Hi Prashant, 8

Re: Searching doubt

2009-08-04 Thread m.harig
Thanks for your reply, my original code snippet is IndexSearcher searcher = new IndexSearcher(indexDir); Analyzer analyzer = new StopAnalyzer(); BooleanClause.Occur[] flags = { BooleanClause.Occur.SHOULD,

Re: How to improve search time?

2009-08-04 Thread Shashi Kant
Prashant, I have had better luck with even larger sized indices on similar platforms. Could you elaborate what types of queries you are running, Multifield? Boolean? combinations? etc. Also you might want to remove unnecessary stored fields from the index and move them to a relational db to

Indexed Field impact on Memory

2009-08-04 Thread Ganesh
Hello all, I am having a indexed field, If i am not using this field for any search query. Whether this field consume memory? If this field is part of filter query, then there would be any impact in memory consumption? I am going to break / shorten the Date Time field and one field might be

ParallelMultiSearcher and idf

2009-08-04 Thread Christian Reuschling
Hello, when searching over multiple indices, we create one IndexReader for each index, and wrap them into a MultiReader, that we use for IndexSearcher creation. This is fine for searching multiple indices on one machine, but in the case the indices are distributed over the (intra)net, this

Re: Searching doubt

2009-08-04 Thread m.harig
Thanks , i've noticed that , but the code is for known tokens, how do i do it for dynamic tokens , meaning , i don't know the urls , someone picked up the urls and i'll index it. Is there any technique to use while indexing ? am using lucene 2.4.0 version. Please suggest me. --

Re: Searching doubt

2009-08-04 Thread Shai Erera
If you don't know which tokens you'll face, then it's really a much harder problem. If you know where the token is, e.g. it's always in http://some.example.site/a/b/here will be the token to break/index.html, then it eases the task a bit. Otherwise you'll need to search every single token

Re: How to improve search time?

2009-08-04 Thread prashant ullegaddi
Shahi, Our queries are free text queries. But they will be expanded into: Multifield, Boolean. We are also expanding the original query using SynExpand of lucene. A simple query gets expanded to say a query of page size. And we are not storing any other fields except key (document IDs), target

Re: Searching doubt

2009-08-04 Thread darren
A, ok. Interesting problem there as well. I'll think on that one some too! cheers. Hi Darren, The question was, how given a string aboutus in a document, you can return that document as a result to the query about us (note the space). So we're mostly discussing how to detect and then

Re: Searching doubt

2009-08-04 Thread Phil Whelan
On Tue, Aug 4, 2009 at 8:31 AM, Shai Ereraser...@gmail.com wrote: Hi Darren, The question was, how given a string aboutus in a document, you can return that document as a result to the query about us (note the space). So we're mostly discussing how to detect and then break the word aboutus to

Slightly Off-topic: How to decide whether or not to add a document?

2009-08-04 Thread ohaya
Hi, I have an app to initially create a Lucene index, and to populate it with documents. I'm now working on that app to insert new documents into that Lucene index. In general, this new app, which is based loosely on the demo apps (e.g., IndexFiles.java), is working, i.e., I can run it with

Re: Searching doubt

2009-08-04 Thread Shai Erera
Interesting ... I don't have access to a Japanese dictionary, so I just extract bi-grams. But I guess that in this case, if one can access an English dictionary (are you aware of an open-source one, or free one BTW?), one can use the method you mention. But still, doing this for every Token you

Re: Slightly Off-topic: How to decide whether or not to add a document?

2009-08-04 Thread Ian Lea
A few suggestions: . Queue the docs once they are complete using something like JMS. . Get the document producers to write to e.g. xxx.tmp and rename to e.g. xxx.txt at the end . Get the document producers to write to a tmp folder and move to e.g. input/ when done . Find a file, store size,

Re: Slightly Off-topic: How to decide whether or not to add a document?

2009-08-04 Thread ohaya
Hi Ian, Thanks for the quick response. I forgot to mention, but in our case, the producers is part of a commercial package, so we don't have a way to get them to change anything, so I think the 1st 3 suggestions are not feasible for us. I have considered something like the 4th suggestion

Re: Slightly Off-topic: How to decide whether or not to add a document?

2009-08-04 Thread ohaya
Ian, One question about the 4th alternative: I was wondering how you implemented the sleep() in Java, esp. in such a way as not to mess up any of the Lucene stuff (in case there's threading)? Right now, my indexer/inserter app doesn't explicitly do any threading stuff. Thanks, Jim

Re: Searching doubt

2009-08-04 Thread Matthew Hall
Well.. search on both anyhow. about us OR aboutus should hit the spot I think. Matt Ian Lea wrote: The question was, how given a string aboutus in a document, you can return that document as a result to the query about us (note the space). So we're mostly discussing how to detect and then

Re: Searching doubt

2009-08-04 Thread N Hira
Good summary, Shai. I've missed some of this thread as well, but does anyone know what happened to the suggestion about query manipulation? e.g., query (about us) = query(about us, aboutus) query(credit card) = query(credit card, creditcard) Regards, -h - Original Message

Re: Slightly Off-topic: How to decide whether or not to add a document?

2009-08-04 Thread Ian Lea
Jim The sleep is simply try { Thread.sleep(millis); } catch (InterruptedException ie) { } No threading issues that I'm aware of, despite the method living in the Thread class. But you're right about it possibly impacting performance, if you've got to sleep for a

Re: Slightly Off-topic: How to decide whether or not to add a document?

2009-08-04 Thread ohaya
Hi Ian, Ok, thanks for the additional info. I've implemented check for both file.lastModified and file.length(), and it seems to work in my dev environment (Windows), so I'll have to test on a real system. Thanks again, Jim Ian Lea ian@gmail.com wrote: Jim The sleep is

Re: Slightly Off-topic: How to decide whether or not to add a document?

2009-08-04 Thread Amin Mohammed-Coleman
I've been working on a indexing solution using Spring integration and lucene. the example project uses jms to create work items (index add or update) and then a service that polls for work to do. I should have this complete soon and will be putting it on google code. Not much of help right now

Re: How do you Parse a query to convert numbers to strings

2009-08-04 Thread Luis Alves
Hi Paul, In 2.9, you can use the new query parser in contrib. You should look at: original.config.FieldBoostMapAttribute original.config.FieldBoostMapFCListener original.processors.BoostQueryNodeProcessor original.builders.BoostQueryNodeBuilder this code implements boost

Re: Searching doubt

2009-08-04 Thread Shai Erera
I had suggested that in my first response, but I think Harig's problem is that those words are not known in advance. Therefore, facing the query about us and converting it to aboutus is simple, but what about queries like united states, or united states of america? Should they be 'grouped'

Nightly build link is broken

2009-08-04 Thread Adriano Crestani
Hi, I was trying to download a nightly build jar, so I went to Lucene website and clicked on the link that redirected to: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/ and I got a Firefox can't establish a connection to the server at lucene.zones.apache.org:8080. Is the link

Re: Nightly build link is broken

2009-08-04 Thread Michael McCandless
Hmmm... that link is old. The right one is: http://hudson.zones.apache.org/hudson/view/Lucene/job/Lucene-trunk/ Which page did you find that link on? Mike On Tue, Aug 4, 2009 at 5:40 PM, Adriano Crestaniadrianocrest...@apache.org wrote: Hi, I was trying to download a nightly build jar,

Re: Searching doubt

2009-08-04 Thread Phil Whelan
(sorry, tangent. I'll be quick) On Tue, Aug 4, 2009 at 8:42 AM, Shai Ereraser...@gmail.com wrote: Interesting ... I don't have access to a Japanese dictionary, so I just extract bi-grams. Shai - if you're interested in parsing Japanese, check out Kakasi. It can split into words and convert

A Presentation on Building a Hadoop + Lucene System Architecture

2009-08-04 Thread Bradford Stephens
Hey all, I just wanted to send a link to a presentation I made on how my company is building its entire core BI infrastructure around Hadoop, HBase, Lucene, and more. It features a decent amount of practical advice: from rules for approaching scalability problems, to why we chose certain aspects

Re: Searching doubt

2009-08-04 Thread m.harig
Thanks all, but how nutch handle this problem? am aware of nutch but not in depth. If i search the keyword about us , nutch gives me exactly what i want. Is there any scoring techinques? please let me know. -- View this message in context:

Re: A Presentation on Building a Hadoop + Lucene System Architecture

2009-08-04 Thread m.harig
Hello Do you've any idea about the integration of Lucene with Hadoop BrickMcLargeHuge wrote: Hey all, I just wanted to send a link to a presentation I made on how my company is building its entire core BI infrastructure around Hadoop, HBase, Lucene, and more. It features