I can think of another approach - during indexing, capture the word
aboutus and index it as about us and aboutus in the same position.
That way both queries will work. You'd need to write your own TokenFilter,
maybe a SynonymTokenFilter (since this reminds me of synonyms usage) that
accept a list
Thanks
This is my codw snippet
IndexSearcher searcher = new IndexSearcher(indexDir);
Analyzer analyzer = new StopAnalyzer();
WildcardQuery query = new WildcardQuery(new
Term(DEFAULT_FIELD));
I don't see that you use the Analyzer anywhere (i.e. it's created by not
used?).
Also, the wildcard query you create may be very inefficient, as it will
expand all the terms under the DEFAULT_FIELD. If the DEFAULT_FIELD is the
field where all your default searchable terms are indexed, there could
I did that as well. Actually, we had 32 indexes initially. We searched them.
It was even horrible.
After that I merged them into 4 indexes. And did the same. No gain!
Then, I had to merge 32 indexes into one.
On Tue, Aug 4, 2009 at 10:48 AM, Anshum ansh...@gmail.com wrote:
Hi Prashant,
8
Thanks for your reply,
my original code snippet is
IndexSearcher searcher = new IndexSearcher(indexDir);
Analyzer analyzer = new StopAnalyzer();
BooleanClause.Occur[] flags = { BooleanClause.Occur.SHOULD,
Prashant, I have had better luck with even larger sized indices on
similar platforms. Could you elaborate what types of queries you are
running, Multifield? Boolean? combinations? etc. Also you might want
to remove unnecessary stored fields from the index and move them to a
relational db to
Hello all,
I am having a indexed field, If i am not using this field for any search query.
Whether this field consume memory?
If this field is part of filter query, then there would be any impact in memory
consumption?
I am going to break / shorten the Date Time field and one field might be
Hello,
when searching over multiple indices, we create one IndexReader for each index,
and wrap them into a MultiReader, that we use for IndexSearcher creation.
This is fine for searching multiple indices on one machine, but in the case the
indices are distributed over the (intra)net, this
Thanks ,
i've noticed that , but the code is for known tokens, how do i
do it for dynamic tokens , meaning , i don't know the urls , someone picked
up the urls and i'll index it. Is there any technique to use while indexing
? am using lucene 2.4.0 version. Please suggest me.
--
If you don't know which tokens you'll face, then it's really a much harder
problem. If you know where the token is, e.g. it's always in
http://some.example.site/a/b/here will be the token to break/index.html,
then it eases the task a bit. Otherwise you'll need to search every single
token
Shahi,
Our queries are free text queries. But they will be expanded into:
Multifield, Boolean.
We are also expanding the original query using SynExpand of lucene. A simple
query
gets expanded to say a query of page size.
And we are not storing any other fields except key (document IDs), target
A, ok. Interesting problem there as well.
I'll think on that one some too!
cheers.
Hi Darren,
The question was, how given a string aboutus in a document, you can
return
that document as a result to the query about us (note the space). So
we're
mostly discussing how to detect and then
On Tue, Aug 4, 2009 at 8:31 AM, Shai Ereraser...@gmail.com wrote:
Hi Darren,
The question was, how given a string aboutus in a document, you can return
that document as a result to the query about us (note the space). So we're
mostly discussing how to detect and then break the word aboutus to
Hi,
I have an app to initially create a Lucene index, and to populate it with
documents. I'm now working on that app to insert new documents into that
Lucene index.
In general, this new app, which is based loosely on the demo apps (e.g.,
IndexFiles.java), is working, i.e., I can run it with
Interesting ... I don't have access to a Japanese dictionary, so I just
extract bi-grams. But I guess that in this case, if one can access an
English dictionary (are you aware of an open-source one, or free one
BTW?), one can use the method you mention.
But still, doing this for every Token you
A few suggestions:
. Queue the docs once they are complete using something like JMS.
. Get the document producers to write to e.g. xxx.tmp and rename to
e.g. xxx.txt at the end
. Get the document producers to write to a tmp folder and move to e.g.
input/ when done
. Find a file, store size,
Hi Ian,
Thanks for the quick response.
I forgot to mention, but in our case, the producers is part of a commercial
package, so we don't have a way to get them to change anything, so I think the
1st 3 suggestions are not feasible for us.
I have considered something like the 4th suggestion
Ian,
One question about the 4th alternative: I was wondering how you implemented
the sleep() in Java, esp. in such a way as not to mess up any of the Lucene
stuff (in case there's threading)?
Right now, my indexer/inserter app doesn't explicitly do any threading stuff.
Thanks,
Jim
Well.. search on both anyhow.
about us OR aboutus should hit the spot I think.
Matt
Ian Lea wrote:
The question was, how given a string aboutus in a document, you can return
that document as a result to the query about us (note the space). So we're
mostly discussing how to detect and then
Good summary, Shai.
I've missed some of this thread as well, but does anyone know what happened to
the suggestion about query manipulation?
e.g., query (about us) = query(about us, aboutus)
query(credit card) = query(credit card, creditcard)
Regards,
-h
- Original Message
Jim
The sleep is simply
try { Thread.sleep(millis); }
catch (InterruptedException ie) { }
No threading issues that I'm aware of, despite the method living in
the Thread class.
But you're right about it possibly impacting performance, if you've
got to sleep for a
Hi Ian,
Ok, thanks for the additional info.
I've implemented check for both file.lastModified and file.length(), and it
seems to work in my dev environment (Windows), so I'll have to test on a real
system.
Thanks again,
Jim
Ian Lea ian@gmail.com wrote:
Jim
The sleep is
I've been working on a indexing solution using Spring integration and
lucene. the example project uses jms to create work items (index add or
update) and then a service that polls for work to do. I should have this
complete soon and will be putting it on google code. Not much of help right
now
Hi Paul,
In 2.9, you can use the new query parser in contrib.
You should look at:
original.config.FieldBoostMapAttribute
original.config.FieldBoostMapFCListener
original.processors.BoostQueryNodeProcessor
original.builders.BoostQueryNodeBuilder
this code implements boost
I had suggested that in my first response, but I think Harig's problem is
that those words are not known in advance. Therefore, facing the query
about us and converting it to aboutus is simple, but what about queries
like united states, or united states of america? Should they be
'grouped'
Hi,
I was trying to download a nightly build jar, so I went to Lucene website
and clicked on the link that redirected to:
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/
and I got a Firefox can't establish a connection to the server at
lucene.zones.apache.org:8080.
Is the link
Hmmm... that link is old. The right one is:
http://hudson.zones.apache.org/hudson/view/Lucene/job/Lucene-trunk/
Which page did you find that link on?
Mike
On Tue, Aug 4, 2009 at 5:40 PM, Adriano
Crestaniadrianocrest...@apache.org wrote:
Hi,
I was trying to download a nightly build jar,
(sorry, tangent. I'll be quick)
On Tue, Aug 4, 2009 at 8:42 AM, Shai Ereraser...@gmail.com wrote:
Interesting ... I don't have access to a Japanese dictionary, so I just
extract bi-grams.
Shai - if you're interested in parsing Japanese, check out Kakasi. It
can split into words and convert
Hey all,
I just wanted to send a link to a presentation I made on how my
company is building its entire core BI infrastructure around Hadoop,
HBase, Lucene, and more. It features a decent amount of practical
advice: from rules for approaching scalability problems, to why we
chose certain aspects
Thanks all,
but how nutch handle this problem? am aware of nutch but not in
depth. If i search the keyword about us , nutch gives me exactly what i
want. Is there any scoring techinques? please let me know.
--
View this message in context:
Hello
Do you've any idea about the integration of Lucene with Hadoop
BrickMcLargeHuge wrote:
Hey all,
I just wanted to send a link to a presentation I made on how my
company is building its entire core BI infrastructure around Hadoop,
HBase, Lucene, and more. It features
31 matches
Mail list logo