What are you getting for the scores? If it's NaN I think you'll need
to use a TopFieldCollector. See for example
http://www.gossamer-threads.com/lists/lucene/java-user/86309
--
Ian.
On Tue, Nov 27, 2012 at 3:51 AM, Andy Yu ukour...@gmail.com wrote:
Hi All,
Now I want to sort by a field
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/package-summary.html#package_description
might help. Or Google something like how does lucene work.
The question on cores might be better asked on the solr list, assuming
you are talking about Solr cores. But I bet the answer
As you can tell from the title, Lucene In Action is more about using
lucene than how it works internally, but yes, it is good and is worth
buying. If you're worried about how up to date it is, keep a copy of
the release notes and migration guides for later versions to hand.
--
Ian.
On Tue,
Well, according to the javadoc, PayloadTermQuery factors in the value
of the payload located at each of the positions where the Term
occurs.
Have you read some of the info available from Google by searching for
lucene payloads?
--
Ian.
On Fri, Nov 23, 2012 at 8:32 AM, wgggfiy
I'd use StandardAnalyzer, or ClassicAnalyzer. Also depends on how you
want to search. You probably want a query for John Smith to match
John Smith and Smith, John but maybe not John Brown and Sam
Smith. The latter is a problem. You can partially work round it by
using a BooleanQuery made up
You mean the time that a doc, any doc, was last added to an index?
I'm not aware of a way to do that directly.
You can store arbitrary data when you commit changes and get it back
again somehow. See IndexCommit.getUserData().
Or look at the lastmod timestamps of the files on disk.
--
Ian.
1. Does memory usage go up with multiple simultaneous searches - does
it need to load the data structures multiple times?
Lucene loads some stuff into RAM, but just once rather than for each
search. But there will of course be memory used for each search, more
concurrent searches will use
everything still works as before.
On Tue, Nov 20, 2012 at 12:20 PM, Ian Lea ian@gmail.com wrote:
You can upgrade the indexes with org.apache.lucene.index.IndexUpgrader.
You'll need to do it in steps, from 2.x to 3.x to 4.x, but should work
fine as far as I know.
--
Ian
Are you getting the same, improved or worse performance/throughput?
Has the bottleneck switched from IO to CPU?
--
Ian.
On Thu, Nov 8, 2012 at 12:40 PM, kiwi clive kiwi_cl...@yahoo.com wrote:
Having played with merge parameters and various index parameters, it seems
possible to change the
By far the most likely cause is that something somewhere in your code
is closing the searcher or the reader.
--
Ian.
On Thu, Nov 8, 2012 at 2:39 PM, Bin Lan b...@perimeterusa.com wrote:
We recently upgrade our lucene library from 1.9.1 to 3.6.1 and we run into
multiple AlreadyClosedException
Feels a bit of a hack, but you might be able to make it work by
storing the field name when MyPerFieldxxx.get(name) is called and
using that in MyPerFieldxxx.queryNorm() and coord() calls to do the
right thing, either inline or via the relevant Similarity subclass,
identified by the name.
--
From a glance the code looks OK, but there's lots you're not showing
that could cause it not to work - whatever you mean by that. Fails to
get hits on docs you think are in the index?
Look at the index with Luke to see what actually has been indexed.
Look at Query.toString() to see how the query
A couple of weeks ago Rafał Kuć told you how to store fields, and
Document.get(name) is very straightforward, What's the problem?
http://lucene.472066.n3.nabble.com/Storing-html-files-in-lucene-index-and-get-back-them-td4012877.html
--
Ian.
On Thu, Oct 25, 2012 at 1:08 PM, rajputadesh
Did you also find the response to that question?
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200801.mbox/%3c81162.81463...@web50303.mail.re2.yahoo.com%3E
Hard to think of any other ways than those mentioned there.
--
Ian.
On Thu, Oct 25, 2012 at 2:26 PM, Willi Haase
From http://lucene.apache.org/core/4_0_0/MIGRATE.html
TermPositions is renamed to DocsAndPositionsEnum, and no longer
extends the docs only enumerator (DocsEnum).
And the link is probably the answer to your second question.
--
Ian.
On Thu, Oct 25, 2012 at 2:50 PM, Ivan Vasilev
If you want email addresses, UAX29URLEmailAnalyzer is another alternative.
--
Ian.
On Wed, Oct 24, 2012 at 3:56 PM, Jack Krupansky j...@basetechnology.com wrote:
Yes, by design. StandardAnalyzer implements simple word boundaries (the
technical term is Unicode text segmentation), period. As
SortField.Type.STRING maybe?
Can't help with the other question. It's generally best to send one
question per message. Looking at the source code might help.
--
Ian.
On Wed, Oct 24, 2012 at 6:55 PM, Carlos de Luna Saenz
cdelunasa...@yahoo.com.mx wrote:
I am migrating code from Lucene 3 to
As Aditya said, you'll need to recreate that document.
http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_update_a_document_or_a_set_of_documents_that_are_already_indexed.3F
The fact that you only want to remove one value is irrelevant.
--
Ian.
On Mon, Oct 22, 2012 at 12:56 PM,
Exactly what method in which class of which version of lucene are you
trying to override? There is no Bits method. There is no
indexReader.Documents method.
I said this earlier in this thread: Presumably you're aware of the
transient nature of lucene internal docids and the per-segment
Yes, IndexWriter.updateDocument() deletes and then adds. See the
javadocs. So your index will have deleted docs. Why do you care?
They'll go away eventually as segments get merged.
If you really do care, see IndexWriter,forceMergeDeletes(). See also
the javadoc for that: This is often a
these segments gets merged, i will have my
document count going down right?
On Wed, Oct 17, 2012 at 6:33 PM, Ian Lea ian@gmail.com wrote:
Yes, IndexWriter.updateDocument() deletes and then adds. See the
javadocs. So your index will have deleted docs. Why do you care?
They'll go away eventually
I would expect a filter to be quicker than adding thousands of clauses
because Filters are just bit sets and operations are extremely fast.
But never take performance predictions, particularly from me, on trust
- test it in your app with your index on your hardware.
To use a filter here I think
You'll certainly need to factor in the performance of NFS versus local disks.
My experience is that smallish low activity indexes work just fine on
NFS, but large high activity indexes are not so good, particularly if
you have a lot of modifications to the index.
You may want to install a custom
as possible!
(rsync is way more your friend for transporting and replication à la solr
should also be considered)
paul
Le 2 oct. 2012 à 11:10, Ian Lea a écrit :
You'll certainly need to factor in the performance of NFS versus local
disks.
My experience is that smallish low activity indexes
Are you loading it from disk, adding loads of docs then writing it
back to disk? That would do it.
How many docs in the memory index? How many on disk? What version of lucene?
--
Ian.
On Fri, Sep 28, 2012 at 1:56 AM, Cheng zhoucheng2...@gmail.com wrote:
Hi,
I have a ram based index which
So you've got a MultiReader over some number of indexes, and in order
to delete stuff matched with that MultiReader you need an IndexWriter
for the specific index that holds the selected term. In Lucene 4.0.
Is that right?
There are getSequentialSubReaders() and readerIndex(int docID) methods
in
Most programs in all languages like plenty of memory. If you Google
lucene memory usage you'll get hits on articles by Lucene developers
and plenty more. Some bits may be more or less relevant to specific
versions of lucene,
As for the minimum memory I must give to Lucene for its optimal
The most likely explanation is simply that your filter doesn't match
any docs that do match your query.
See also
http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F
--
Ian.
On Thu, Sep 13, 2012 at 8:19 AM, sdr...@sina.com wrote:
Hi, problems with
You can do stuff with scopes and contexts and web.xml and whatever
(google something like tomcat application scope). Or use some static
classes or singletons to look after the single index.
--
Ian.
On Fri, Sep 7, 2012 at 6:10 AM, Kasun Perera kas...@opensource.lk wrote:
I have a web java/jsp
http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F
--
Ian.
On Wed, Sep 5, 2012 at 4:24 PM, Ramprakash Ramamoorthy
youngestachie...@gmail.com wrote:
Take a look at this query :
-HOSTNAME:ram AND SEVERITY:information
The above query isn't giving
tg2exe as in code.google.com/p/tg2exe/ Make TurboGears project to the
Stand Alone Windows ...? Are you sure you're posting this question
to the correct list?
--
Ian.
On Wed, Sep 5, 2012 at 3:56 PM, Antony Joseph antonyjosep...@gmail.com wrote:
Hello,
I have upgraded my lucene from 2.4.0 to
https://issues.apache.org/jira/browse/LUCENE-2348 suggests there are
long-standing and probably still current issues with DuplicateFilter
and multiple segments. I'm not sure if this could explain what you
are seeing. You could try calling optimize(1) on your index writer
and see if that makes a
Using a FieldSelector is likely to speed up the doc.get() calls, but
it is still liable to be slow. Can you use the lucene FieldCache?
Some other memory cache? Payloads?
--
Ian.
On Wed, Aug 22, 2012 at 4:39 PM, Sebastian R. egnu...@web.de wrote:
Dear all,
I am currently trying to
org.apache.lucene.index.PKIndexSplitter in contrib-misc sounds promising.
www.slideshare.net/abial/eurocon2010 Munching crunching - Lucene
index post-processing sounds well worth a look too.
Or just build new indexes from scratch routing docs to the correct
index however you choose.
--
Ian.
This won't work with TermRangeQuery because neither test 1 not test
3 are terms. test will be a term, output by the analyzer. You'll
be able to see the indexed terms in Luke.
Sounds very flaky anyway - you'd get term 10 xxx and term 100 xxx
as well as term 1 and term 2. If your TEST values are
?
Kind regards,
Jochen
2012/8/20 Ian Lea ian@gmail.com
This won't work with TermRangeQuery because neither test 1 not test
3 are terms. test will be a term, output by the analyzer. You'll
be able to see the indexed terms in Luke.
Sounds very flaky anyway - you'd get term 10 xxx and term
No. See the FAQ.
http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_update_a_document_or_a_set_of_documents_that_are_already_indexed.3F
There are a couple of ideas floating around e.g.
http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/
or
Can't see how you could do it with standard queries, but you could
reverse the process and use a MemoryIndex.
Add the single target phrase to the memory index then loop round all
docs executing a search for each one. Maybe use PrefixQuery although
I'd worry about performance. Try it and see.
Loads of stuff will have changed between those 2 versions - since you
can, I'd just reindex.
--
Ian.
On Tue, Aug 14, 2012 at 10:59 PM, sunil Kumar Verma
sunilkv.ve...@gmail.com wrote:
We have recently moved to 3.6 from lucene 2.2 and have seen that the way
tokens get indexed are not the
Is this a lucene question or a mysql question or what? Since this is
the lucene list let's assume you're asking about how to get multiple
values for a field from an index. Document.getValues(keyword) looks
promising: Returns an array of values of the field specified.
--
Ian.
On Wed, Aug 15,
Sounds extremely unlikely. What is the query? What analyzer? What
version of lucene? What about other strings containing $$?
--
Ian.
On Tue, Aug 14, 2012 at 9:13 AM, zhoucheng2008 zhoucheng2...@gmail.com wrote:
Hi,
I have a big index, and when I searched it with a title string Cla$$War,
);
sb.add(new CharsRef(base2), new CharsRef(syn2), true);
SynonymMap smap = sb.build();
Hope that helps. There may be an easier way. Have you tried looking
at the source code/test cases?
--
Ian.
On Fri, Aug 10, 2012 at 6:24 PM, Ricardo r...@rand.org wrote:
Ian Lea ian.lea at gmail.com writes
You can add parsed queries to a BooleanQuery. Would that help in this case?
SnowballAnalyzer sba = whatever();
QueryParser qp = new QueryParser(..., sba);
Query q1 = qp.parse(some snowball string);
Query q2 = qp.parse(some other snowball string);
BooleanQuery bq = new BooleanQuery();
bq.add(q1,
.
thanks for the help,
Bill
-Original Message-
From: Ian Lea [mailto:ian@gmail.com]
Sent: Friday, August 03, 2012 9:32 AM
To: java-user@lucene.apache.org
Subject: Re: Analyzer on query question
You can add parsed queries to a BooleanQuery. Would that help in this case
it this way over the original
method. I just don't know if the original way I described is wrong or
will give me bad results.
thanks for the help,
Bill
-Original Message-
From: Ian Lea [mailto:ian@gmail.com]
Sent: Friday, August 03, 2012 9:32 AM
To: java-user@lucene.apache.org
Lucene 4.0 allows you to use custom codecs and there may be one that
would be better for this sort of data, or you could write one.
In your tests is it the searching that is slow or are you reading lots
of data for lots of docs? The latter is always likely to be slow.
General performance advice
If you are using QueryParser use fear dark~2 tight free~3.
See also PhraseQuery.setSlop(n). You could also look at the Span
queries e.g. SpanNearQuery.
--
Ian.
On Wed, Jul 25, 2012 at 6:13 AM, neerajshah84 neerajsha...@gmail.com wrote:
how can i put multiplue proximity search in lucene??
Look into spans and line, or sentence, delimiters and tokens, and
position increment gaps. Google will help you. You can do a whole
lot of stuff with spans - see
http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ for a
good intro.
Lucene 2.9 is ancient. You should upgrade.
--
Ian.
I bet it's expected. From http://en.wikipedia.org/wiki/Elision_(French)
In written French, elision (both phonetic and orthographic) is
obligatory for the following words:
...
the preposition de
...
Le père d'Albert vient d'arriver.
So surely the removal of d' is correct.
--
Ian.
On
is that the filter don't remove d' (and c' too).
Shall i open an issue on jira ?
On 07/25/2012 04:36 PM, Ian Lea wrote:
I bet it's expected. From http://en.wikipedia.org/wiki/Elision_(French)
In written French, elision (both phonetic and orthographic) is
obligatory for the following words
QueryParser returns a query. Just add that to the BooleanQuery.
QueryParser qp = ...;
BooleanQuery bq = new BooleanQuery();
Query parsedq = qp.parse(...);
bq.add(parsedq, ...);
--
Ian.
On Mon, Jul 23, 2012 at 1:16 PM, Deepak Shakya just...@gmail.com wrote:
Hey Jack,
Can you let me know
I can't answer your questions, but use of lucene's document ids as
persistent ids is strongly discouraged, particularly in version 4.x
where I think it just won't work at all. There was a related thread a
couple of weeks ago. See Uwe's message at
Just add the different subjects to the document e.g.
Doc doc = new Document();
for (String subject : subjects) {
Field f = new Field(subject, subject, ...);
doc.add(f);
}
Or concatenate the subjects and store the one long string.
If you don't want a search to potentially match terms from
Any thoughts on this?
Patience ...
Is it good to use multiple sort fields?
Absolutely, if that's what you need. On the other hand, if you don't
need it then it's a bad idea.
Using sort on docid will consume any memory?
Don't know. Certainly won't use less than not sorting this way.
Is
The release notice for 4.0-alpha sent to this list says file format
backwards compatibility is provided for indexes from the 3.0 series
so you won't be able to go straight from 2.x to 4.0. I'm sure that
will remain true for all 4.x releases. The comments about waiting for
a stable release of 4.0
I'd forgotten about IndexUpgrader, but I'd still go for 3.6. I
wouldn't want the complexity of shipping two versions of lucene and
having to get customers to run an upgrade script. And probably
wouldn't want to ship the first stable version of 4.0, even though
lucene is very stable and reliable.
That is one option. See recent thread (yesterday?) about possible
problems with that approach, and an alternative or two.
I've no idea how Google do it.
And I've no idea what you mean by problem with different subjects.
--
Ian.
On Wed, Jul 18, 2012 at 4:27 PM, 许超前 chaora...@gmail.com wrote:
So content is a String variable in your program holding a multi-line
value, is it? I'd double check exactly what that is holding before
you store it in the index.
--
Ian.
On Mon, Jul 16, 2012 at 4:56 AM, sam hairen...@yahoo.com.cn wrote:
I had done that,I used the docment.add(new
OOV or OOM? Always best to post a full stack trace, and version of
lucene, and OS.
Anyway - give your app more memory? Close searchers after use or some
period of inactivity?
Best long term solution is probably to merge the many small indexes
into one, or a few, larger indexes and restrict
I think you'll have to build the query up in code. RegexQuery in the
contrib queries package should be able to take care of #[0-9].
BooleanQuery bq = new BooleanQuery();
PrefixQuery pq = new PrefixQuery(...) // #
RegexQuery rq = new RegexQuery(...) // #[0-9]
bq.add(pq, );
bq.add(rq, ...);
data loss if it makes it more stable
and
performant.
thanks
On Mon, Jul 9, 2012 at 2:28 AM, Ian Lea ian@gmail.com wrote:
Is this on a local or remote file system? Is the file system itself
OK? Is something else messing with your lucene index at the same
time?
--
Ian
You don't know how to split the string containing the data you want to index??
String s = 2012-07-06 11:11:43some message;
String timestamp = s.substring(0, 19);
String content = s.substring(19).trim();
is one way.
--
Ian.
On Mon, Jul 9, 2012 at 3:55 AM, sam
Is this on a local or remote file system? Is the file system itself
OK? Is something else messing with your lucene index at the same
time?
--
Ian.
On Sun, Jul 8, 2012 at 8:58 PM, T Vinod Gupta tvi...@readypulse.com wrote:
Hi,
My log files are showing the below exceptions almost at twice a
Split the data into 2 fields, timestamp and content. Store one lucene
document per line with the 2 fields, timestamp stored and not indexed
(unless you want to search on it), content stored and analyzed. Use
StandardAnalyzer unless you have special requirements.
Then close the IndexWriter, open
Where exactly are you using these double quoted strings? QueryParser?
It would help if you showed a code snippet.
Assuming your real data is more complex and the strings you are
searching for aren't necessarily at the start of the text, you'll need
some mix of wildcard and proximity searching.
ComplexPhraseQueryParser which looks interesting.
--
Ian.
On Wed, Jul 4, 2012 at 9:51 AM, Ian Lea ian@gmail.com wrote:
Where exactly are you using these double quoted strings? QueryParser?
It would help if you showed a code snippet.
Assuming your real data is more complex and the strings you
You can use the QueryParser proximity feature e.g. foo test~n where
n is the max distance you want them to be apart. Or look at the
SpanQuery stuff e.g. SpanNearQuery.
--
Ian.
On Tue, Jul 3, 2012 at 4:59 PM, Jochen Hebbrecht
jochenhebbre...@gmail.com wrote:
Hi all,
Imagine you have the
All words are important if they help people find what they want.
Maybe you want high frequency terms. See contrib class
org.apache.lucene.misc.HighFreqTerms.
--
Ian.
On Wed, Jun 27, 2012 at 3:04 AM, 齐保元 qibaoy...@126.com wrote:
meaningful just means the word is important than others,like
suppose document number..i have 2-3 GB index and every day
, it goes higher. so i cant use searcher.maxdoc(). So i need this solution.
Can you please help me out?
On Tue, Jun 26, 2012 at 10:42 PM, Ian Lea ian@gmail.com wrote:
Do you mean you want all hits that match B:abc, sorted by field
Add imageid as a stored field, no need to index it unless you want to
be able to search by it.
Add the tags as an analyzed indexed field. no need to store unless you
want to read/display the values. StandardAnalyzer will work fine.
Then use QueryParser to build a query like tags: car, execute
Do you mean you want all hits that match B:abc, sorted by field A? As
opposed to the top 100 hits sorted by field A? Just pass a higher
value in the search(query, ... 100, ...) call. It will be slower and
potentially use more memory but with only 10K docs you probably won't
notice.
--
Ian.
Please define meaningful.
--
Ian.
On Tue, Jun 26, 2012 at 10:39 AM, 齐保元 qibaoy...@126.com wrote:
hi, does anyone knows how to extract meaningful words from Lucene index?
-
To unsubscribe, e-mail:
It's probably an issue with analysis and colons and hyphens and dots,
maybe lower/upper case as well.
Are you using an analyzer? Which? If not, which might be consistent
with your usage of TermQuery, how are you storing the multiple values
for alt_id?
See also the FAQ entry Why am I getting no
I'm positive that StandardAnalyzer won't change drinks - water to
drinks -water. So it must be something in your code. Which you
don't show us. Best guess is that the changes you've made to the Flex
file have caused the problem. If you created your tokenizer by
copying and modifying
(Query: + query.toString(contents));
TopDocs results = searcher.search(query, 10);
Thanks
xpete
A Segunda, 25 de Junho de 2012 14:37:37 Ian Lea escreveu:
I'm positive that StandardAnalyzer won't change drinks - water to
drinks -water. So it must be something in your code. Which you
don't
The key thing is to be consistent. You can either replace your
TermQuery code with the output from QueryParser.parse, with QP created
with StandardAnalyzer, or index alt_id as Index.NOT_ANALYZED and stick
with TermQuery. I think the latter will work even with multiple
terms/tokens stored for
Do you mean NumericRangeQuery or a textual range query that happens to
be searching on numbers?
What exactly is wrong?
The rewrite method (are you calling this yourself? why?) does indeed
mess around with queries and some may end up wrapped with
ConstantScoreQuery. I can't remember what happens
Did you get an answer to this? Looking at the lucene test cases can
be a good way of finding out things like this. Reading Lucene In
Action is also highly recommended. May not have the exact answer to
this question but will teach you how to find out.
--
Ian.
On Mon, May 28, 2012 at 7:35 AM,
KeywordAnalyzer is the normal thing to use if you want exact matches.
--
Ian.
On Sat, May 26, 2012 at 11:37 AM, Yogesh patel
yogeshpateldai...@gmail.com wrote:
Hi
I would like to search on any analyzed field of lucene index with Exact
Match.
Is it possible to search with exact match
It's hard to believe that an upgrade from 3.0.3 to 3.4.0 would make
that much difference to CPU usage. Are you sure nothing else has
changed? Has the crawling/indexing elapsed time gone up in the same
proportion? Have you verified that the increased usage is actually
in lucene rather than
I've never come across this GroupingCollector stuff before so know
nothing about it apart from looking at the javadocs and may be talking
nonsense, but here goes anyway.
group by time span/web site: it appears that it will group by single
values, not ranges, So should work fine by website. Just
Lots of good tips in
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, linked from
the FAQ.
--
Ian.
On Tue, May 22, 2012 at 2:08 AM, Li Li fancye...@gmail.com wrote:
something wrong when writing in my android client.
if RAMDirectory do not help, i think the bottleneck is cpu. you may
important than sorting.
if don't sort, how can we implement this request? I'm stuck here.
and the discount has been convert to number already, thanks for your
information.
Thanks,
CQ
2012/5/21 Ian Lea ian@gmail.com
I'm not clear what you are asking.
Are you saying that you want keyword
Certainly lots of questions, and I can't answer most of them, but a
couple of comments/opinions.
Collecting all docs will potentially use a lot of memory but isn't
necessarily excessively slow. It's generally only doing something
like reading field values for all docs that can be prohibitively
I'm not clear what you are asking.
Are you saying that you want keyword matching to be more important
than sorting? If that's the case, don't sort.
Or are you saying that sorting of null values isn't doing what you
want? Use an actual value instead of null, whatever makes sense in
your
You may need to cut it down to something simpler, but I can't see any
reader.close() calls.
--
Ian.
On Fri, May 18, 2012 at 5:47 PM, Michel Blase mblas...@gmail.com wrote:
This is the code in charge of managing the Lucene index. Thanks for your
help!
package luz.aurora.lucene;
import
Document doc3 = new Document();
doc2.add(new Field(searchText, LMN Takeaway, Field.Store.YES,
doc2 != doc3.
Boosting by number of occurrences tends to happen automatically. See
IndexSearcher.explain() as I think someone already suggested. See
also javadocs for
No and no. MultiFieldQueryParser is the only thing that comes to mind
as being remotely close but you have to tell it the field names. I
guess you could use IndexReader.getFieldNames(...) to find indexed
fields and pass the output from that through a wildcard regexp and
feed the output from that
In versions from 3.3 onwards MMapDirectory is the default on 64-bit
linux. Not sure exactly what that means wrt your questions, but may
well be relevant.
--
Ian.
On Tue, May 15, 2012 at 3:51 PM, Lutz Fechner lfech...@hubwoo.com wrote:
Hi,
By design memory outside the JVM heap space should
I don't think there is an out of the box analyzer to do this but you
can easily build your own, incorporating
org.apache.lucene.analysis.ASCIIFoldingFilter into the chain.
--
Ian.
On Fri, May 11, 2012 at 11:01 AM, Li Li fancye...@gmail.com wrote:
I have some french hotels such as Elysée
Can't spot anything obviously wrong in your code and what you are
trying to do should work. Are you positive that what you think is the
second doc is really being added second? You only show one doc being
added. Are there already 7 docs in the index before you start?
--
Ian.
On Fri, May 11,
, Ian Lea ian@gmail.com wrote:
Can't spot anything obviously wrong in your code and what you are
trying to do should work. Are you positive that what you think is the
second doc is really being added second? You only show one doc being
added. Are there already 7 docs in the index before
You can't selectively update fields in docs read from an index, in old
or current versions of lucene. I think there are some ideas floating
around but nothing usable today as far as I know. You'll need to
rebuild the whole doc before passing it to writer.updateDocument().
--
Ian.
On Wed, May
Impossible to say - how big is big? How fast is fast? I'd start with
the simplest option and if it's fast enough, stop.
--
Ian.
On Sat, May 5, 2012 at 12:47 AM, Yang tedd...@gmail.com wrote:
I have an index containing all students, now I want to do an index
search inside an Apache
Similarity.setDefault(new MySimilarity()) is certainly better than the
2 calls I recommended. Thanks.
I find it hard to see why one might not want to do this in normal
usage but have a vague recollection of someone once outlining some
obscure scenarios where different similarities at index and
to rebuild my doc from
whole cloth and I'm reasonably sure it is working me :-)
Thanks!
-Original Message-
From: Ian Lea [mailto:ian@gmail.com]
Sent: Thursday, May 10, 2012 1:20 AM
To: java-user@lucene.apache.org
Subject: Re: update/re-add an existing document with numeric fields
You
You can override org.apache.lucene.search.Similarity/DefaultSimilarity
to tweak quite a lot of stuff.
computeNorm() may be the method you are interested in. Called at
indexing time so be sure to use the same implementation at index and
query time, using IndexWriterConfig.setSimilarity() and
that all queries and terms don't contain white
spaces.
Thanks again.
-Elmer
On 04/25/2012 02:53 PM, Ian Lea wrote:
You seem to be quietly going round in circles, by yourself! I suggest
a small self-contained program/test case with a RAM index created from
scratch. You can
If you really mean must and always, you'll probably have to
execute 2 searches. First on title alone then on description, or
title and description, merging the hit lists as appropriate.
--
Ian.
On Thu, Apr 26, 2012 at 8:30 PM, Akos Tajti akos.ta...@gmail.com wrote:
Jake,
we're already
You seem to be quietly going round in circles, by yourself! I suggest
a small self-contained program/test case with a RAM index created from
scratch. You can then experiment with inject on or off and if you
still can't figure it out, post the code and hopefully someone will be
able to help you
201 - 300 of 911 matches
Mail list logo