Hi Guys,
I am using Lucene with Neo4j. Currently I have queries working well with a
combination of Exact and Fuzzy matches in one query.
However, we desire a report that first takes the ranking and boosting as the
highest priority, but then we want to sort my first name and last name, and
Lucene won't be aware that you've got duplicate documents, but scoring
does take account of the number of documents in which search terms
appear. See http://lucene.apache.org/java/3_5_0/scoring.html and the
javadocs for oal.search.Similarity.
Only you can say whether or not you need to worry
Just use one of the search() methods that does sorting and specify an
array of sort fields with SortField.SCORE first, then your name
fields. But be aware that complex real world textual queries and docs
rarely produce identical scores.
You could post-process the results and group them into
As far as I'm aware recent versions of lucene, including the
highlighter, should work out of the box.
I'd guess that highlighting would be the most resource intensive and
therefore troublesome bit.
I'm not aware of any sample code showing lucene working on Android,
but from my very limited
What is LUCENE_INDEX_DIRECTORY? Some static string in your app?
Lucene knows nothing about your app, JSP, or what app server you are
using. It requires a file system path and it is up to you to provide
that. I always use a full path since I prefer to store indexes
outside the app and it avoids
All packages used: core3.4, queries3.4, facet3.5.
Once every 3 minutes I *refreshTax* and once per day I *reopenEveryting*.
*InitWriters()*
writer = new ThreadedIndexWriter
taxWriter = new LuceneTaxonomyWriter
// because the reader can't start if doesn't have a valid taxIndex directory
Sequence of operations seems logical, I don't see straight why this does
not work.
Could you minimize this to a small stand-alone program that does not work
as expected? This will allow to recreate the problem here and debug it.
It is interesting that facet 3.5 is used with core 3.4 and queries
Could you minimize this to a small stand-alone program that does not work
as expected?
This will be hard, because of the bug only appearing after a couple of days
or more and i'm starting to think that it is triggered by high data
volumes. I'll try to minimize the code and serve more data to
Could you minimize this to a small stand-alone program that does not work
as expected?
This will be hard, because of the bug only appearing after a couple of days
or more and i'm starting to think that it is triggered by high data
volumes. I'll try to minimize the code and serve more data
Even though the NumericRangeQuery.new* methods do not support
BigInteger, the underlying recursive algorithm supports any sized
number.
Has this been explored?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
Using a static string is fine - it just wasn't clear from your
original post what it was.
I usually use a full path read from a properties file so that I can
change it without a recompile, have different settings on
test/live/whatever systems, etc. Works for me, but isn't the only way
to do it.
You can store the index in WEB_INF directory, just use something:
ServletContext.getRealPath(/WEB-INF/data/myIndexName);
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Ian Lea [mailto:ian@gmail.com]
List,
I am trying to incorporate the Latent Dirichlet Allocation (LDA) topic
model into Lucene. Briefly, the LDA model extracts topics
(distribution over words) from a set of documents, and then represents
each document with topic vectors. For example, documents could be
represented as:
d1 = (0,
Hi Stephen,
We are doing something similar, and we store as a multifield with each
document as (d,z) pairs where we store the z's (scores) as payloads for
each d (topic). We have had to build a custom similarity which
implements the scorePayload function. So to find docs for a given d
(topic), we
Hi Uwe ,
I need to do something similar... can u plz tell me how can i pass integer
in my fuzzy search query?
say for ex. i am searching like q=major~0.6
i want to match terms after prefix maj. how can i pass integer to do that
way ?
Thanks.
Uwe Schindler wrote
Hi,
You can pass an
Awesome. Thanks guys!
On Mon, Nov 28, 2011 at 12:19 PM, Uwe Schindler u...@thetaphi.de wrote:
You can store the index in WEB_INF directory, just use something:
ServletContext.getRealPath(/WEB-INF/data/myIndexName);
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
Hi folks,
I'm researching the best options to use for analysing/storing newspaper
pages in out online archive, and wondered if anyone has any good hints
or tips on good practice for this type of media?
I'm currently thinking alone the lines of using a customised
StandardAnalyser (no stop
Hi Meghana,
You can only do that by directly instantiating the FuzzyQuery, not via
parsed queries.
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: meghana [mailto:meghana.rav...@amultek.com]
Sent:
Hi Dawn,
I assume that when you refer to the impact of stop words, you're concerned
about query-time performance? You should consider the possibility that
performance without removing stop words is good enough that you won't have to
take any steps to address the issue.
That said, there are
Hi Steve,
On 28/11/2011 19:43, Steven A Rowe wrote:
I assume that when you refer to the impact of stop words, you're concerned
about query-time performance? You should consider the possibility that performance
without removing stop words is good enough that you won't have to take any steps
You can easily use just the CommonGrams stuff from Solr in your pure
lucene project.
There are a couple of useful docs on stop words and common grams et al at
http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
I am applying the PorterStemFilter at both indexing and search time.
As for schema, I have 3 fields: title, subtitle and notes. When the user
enters a query string of */a*itis/*, my software turns this into an actual
Lucene query of */title: a*itis OR subtitle: a*itis OR notes: a*itis/* and I
22 matches
Mail list logo