David and Adrien, thanks for your responses. Bringing up an old thread
here. Revisiting this question ...
> (so deleted docs == max docs) and call commit. Will/Can this segment still
> exist after commit?
SInce I am using Solr (8.11.1), the default deletion policy is
SolrDeletionPolicy which retai
On the 2nd question, we do not plan on leveraging this information to
figure out the codec: the codec that should be used to read a segment is
stored separately (also in segment infos).
It is mostly useful for diagnostics purposes. E.g. if we see an interesting
corruption case where checksums matc
> (so deleted docs == max docs) and call commit. Will/Can this segment still
> exist after commit?
>
Depends on your merge policy index deletion policy. You can configure
Lucene to keep older commits (and then you'll preserve all historical
segments).
I don't know the answer to your second quest
Following up on my questions since they didn't get much love the first
time. Any inputs are greatly appreciated!
Thanks,
Rahul
On Wed, Sep 14, 2022 at 3:58 PM Rahul Goswami wrote:
> Hello,
>
> I was going through some parts of the Lucene source and had some questions:
> 1) Can lucene have 0 doc
Hello,
I was going through some parts of the Lucene source and had some questions:
1) Can lucene have 0 document segments? Or will they always be purged
(either by TMP or otherwise) on a commit?
Eg: A segment has 4 docs, and I make a /update call to overwrite all 4 docs
(so deleted docs == max doc
Hi John,
I heard of many users who used Lucene for this use-case, it's
definitely a valid one. Indexes are stored mostly on disk, with a tiny
part of them being held in memory to guarantee good access speed.
Lucene supports both inverted indexes and KD trees up to 8 dimensions.
Lookup, sorting an
Greetings;
I'd like to play around with Lucene to offload some of my database lookups.
Is this a valid use of Lucene in your opinion(s)?
Indexes - they are stored on the file system as some kind of tree (I'm
guessing)?
Lookups and sorting - Can I lookup by date and sort asc/desc and paginate?
Can you try increasing your IndexWriter.setRAMBufferSizeMB? That flush
control logic will block incoming threads if the number of bytes trying to
flush to disk is too large relative to your RAM buffer.
Mike McCandless
http://blog.mikemccandless.com
On Mon, Mar 18, 2019 at 2:30 PM yuncheng lu
When i check the code lucene/core DocumentsWriter.preUpdate code. I see the
flushControl is used when thread is Stalled.
When we have a lot of documents write into disk which is SSD. We monitored
that the thread is all in flush, and request continuously use addDocument
which can go into preUpdate c
Hi Alexandre,
I don't have time for a call, but to give you some pointers, Lucene does
the following that may be related to natural language processing:
- Word segmentation via the `Tokenizer` class. It is rather simple for
western languages (including French, see StandardTokenizer), but less for
Good afternoon everyone,
I am working for a French company and in the scope of my work I am collecting
information on open source NLP tools available on the "market" worldwide.
I was looking for such intel on the internet and by reading some users'
comments but I figured, why not contact the per
Hi,
Here is my code to backup index files with Lucene Replicator, but It
doesn't work well, No files were backuped.
Could you check my code and give me your advice?
I bolded the key code.
public class IndexFiles {
private static Directory dir;
private static Path bakPath;
private sta
Hi,
Here is my code to backup index files with Lucene Replicator, but It
doesn't work well, No files were backuped.
Could you check my code and give me your advice?
I bolded the key code.
public class IndexFiles {
private static Directory dir;
private static P
Hi,
Here is my code to backup index files with Lucene Replicator, but It
doesn't work well, No files were backuped.
Could you check my code and give me your advice?
package com.wilddog.lucene;
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.c
Hi,
Here is my code to backup index files with Lucene Replicator, but It
doesn't work well, No files were backuped.
Could you check my code and give me your advice?
public class IndexFiles {
private static Directory dir;
private static Path bakPath;
private static LocalReplicator replicator;
Hi:
Would you mind doing websearch and cataloging the relevant pages into a
primer?
Thx,
Will
-Original Message-
From: 王建军 [mailto:jianjun200...@163.com]
Sent: Tuesday, September 22, 2015 4:02 AM
To: java-user@lucene.apache.org
Subject: hello,I have a problem about lucene,please help me
There is a Class org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter which
have two parameters,one is DEFAULT_MIN_BLOCK_SIZE,the other is
DEFAULT_MAX_BLOCK_SIZE;their default values is 25 and 48;when I make their
values to bigger,for example,200 and 398;And then to make index,the result is
There is a Class org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter
which have two parameters,one is DEFAULT_MIN_BLOCK_SIZE,the other is
DEFAULT_MAX_BLOCK_SIZE;their default values is 25 and 48;when I
make their values to bigger,for example,200 and 398;And then to make index,the
result
答复:RE: RE: About lucene memory consumptionHi Wang,would it be possible to
open a JIRA issue so we can track this?In any case, I would recommend to
disable compound files if you use NRTCachingDirectory (as a
workaround).Uwe-Uwe SchindlerH.-H.-Meier-Allee 63, D-28213
Bremenhttp://www.thetaphi.
> -Original Message-
> From: wangzhijiang999 [mailto:wangzhijiang...@aliyun.com]
> Sent: Tuesday, July 01, 2014 9:17 AM
> To: java-user
> Subject: 答复:RE: RE: About lucene memory consumption
>
> My application also meet this problem last year and I researched on the code
&
My application also meet this problem last year and I researched on the code
and found the reason.
The whole process is as follow:
1. When using NRTCachingDirectory, it will use RAMDirectory as cache and
MMapDirectory as delegate. The new segment will be created in the process of
flush or merge
"Uwe Schindler";;
Date: Sat, Jun 28, 2014 05:41 PM
To: "java-user";
Subject: RE: RE: About lucene memory consumption
Hi,
how does your configuration for NRTCaching directory looks like. There are 2
constructor params, one of the maxMergeSizeMB the other one is maxCac
Hi,
how does your configuration for NRTCaching directory looks like. There are 2
constructor params, one of the maxMergeSizeMB the other one is maxCachedMB. If
you correctly close (or release in case of ReaderManager/SearcherManager) all
indexes, this should limit the memory use.
There is no
use MMapDirectory instead
of NRTCachingDirectory?
Thanks & Best Regards!
-- Original --
From: "lubin";<308181...@qq.com>;
Date: Sat, Jun 28, 2014 02:03 PM
To: "java-user";
Subject: Re:RE: About lucene memory consum
quot;<308181...@qq.com>;
Subject: Re:RE: About lucene memory consumption
Could it be that you forgot to close older IndexReaders after getting a new NRT
one? This would be a huge memory leak.
I recommend to use SearcherManager to handle real time reopen correctly.
Uwe
Am 27. Juni 2014
the
>way, we commit the index for every 1000 email document.
>
>
> Could you give me kindly give me some tips to solve this problem?
>
>
>
>
>Thanks & Best Regards!
>
>
>
>
>
>
>
>
>
>
>-- Original --
you give me kindly give me some tips to solve this problem?
Thanks & Best Regards!
-- Original --
From: "Uwe Schindler";;
Date: Fri, Jun 27, 2014 08:36 PM
To: "java-user";
Subject: RE: About lucene memory consumption
age-
> From: 308181687 [mailto:308181...@qq.com]
> Sent: Friday, June 27, 2014 10:42 AM
> To: java-user
> Subject: About lucene memory consumption
>
> Hi, all
>
>
>I fould that the memory consumption of my lucene server is abnormal, and
> “jmap -histo ${pid}” show
Hi, all
I fould that the memory consumption of my lucene server is abnormal, and
“jmap -histo ${pid}” show that the class of byte[] consume almost all of the
memory. Is there memory leak in my app? Why so many byte[] instances?
The following is the top output of jmap:
num
thanks, Uwe. I missed it.
On Sun, Nov 4, 2012 at 3:04 PM, Uwe Schindler wrote:
> As explained in my first eMail, the class of the implementation is cached,
> not the instance. The factory returns a new instance of the cached class.
>
> Uwe
>
>
>
> lukai schrieb:
>
> >Hi, thanks for the reply. C
As explained in my first eMail, the class of the implementation is cached, not
the instance. The factory returns a new instance of the cached class.
Uwe
lukai schrieb:
>Hi, thanks for the reply. Could you elaborate "The AttributeFactory
>creates
>a new one for every new TokenStream instance.
Hi, thanks for the reply. Could you elaborate "The AttributeFactory creates
a new one for every new TokenStream instance." ? because i only find the
implementation like this:
private static Class getClassForInterface(Class attClass) {
final WeakReference> ref =
attClassImplMap.get(attCla
Hi,
> Hmmm, the reason i asked this question is regarding to implementation of :
>
> CharTermAttribute.
>
>
> It seems tokenizer will set token read from reader into it, and the following
> tokenstream can also get this instance. My concern is in a multi-thread
> envioment. another thread can
Hmmm, the reason i asked this question is regarding to implementation of :
CharTermAttribute.
It seems tokenizer will set token read from reader into it, and the
following tokenstream can also get this instance. My concern is in a
multi-thread envioment. another thread can also change the conte
Hi,
> I have two confused questions regarding Lucene implementation, hope
> someone can give me some clues.
>
> 1. It's about the AttributeSource/AttributeSourceImpl implemenation.
> Seems like the default instance was kept as "static"
> in DefaultAttributeFactory. But we get these instances i
Hi Yogesh,
I bet you are indexing A as an analyzed field and its values are getting
tokenized at each capital letter it finds. Try to index field A using
Field.Index.NOT_ANALYZED.
On Sun, Apr 15, 2012 at 3:44 AM, Yogesh patel
wrote:
> Hi,
>
> I have read apache lucene tutorial and implemented in
You can provide your own Similarity implementation, overriding
whichever of the methods you need in order to achieve your aims. Use
it via the setxxx methods mentioned in the javadocs and unless you
deliberately sort by some other field everything should fall into
place.
--
Ian.
2011/11/9 强继朋
lucene,
I hava a problem i don't know how to do, it's about Score Formula of
lucene. In the package of lucene, it provide a method in Class Similarity. My
question : if i want to only use some factors of Formula, such as TF and IDF.
And then i add some additional factors, in aim to
Thanks Steve. Helpful this slide. Greetings.
- Mensaje original -
De: "Steven A Rowe"
Para: java-user@lucene.apache.org
Enviados: Domingo, 10 de Julio 2011 21:45:48 (GMT-0500) Auto-Detected
Asunto: RE: Some question about Lucene
This slide show is a few years old, but
[mailto:yhdelg...@uci.cu]
Sent: Sunday, July 10, 2011 9:30 PM
To: java-user@lucene.apache.org
Subject: Some question about Lucene
Hello
I'm a new Lucene user. I have the following question: is posible to build a
crawler/spider with Lucene library or Lucene is only for index/search phases.
Hello
I'm a new Lucene user. I have the following question: is posible to build a
crawler/spider with Lucene library or Lucene is only for index/search phases. I
am studying three project: Nutch, Lucene and Solr but I don't see what is the
main difference between them.
Greetings .
--
Genève 2
> Tél. direct : +41 (0)22 388 00 95
> michel.paw...@etat.ge.ch
>
>
> -Message d'origine-
> De : Danil ŢORIN [mailto:torin...@gmail.com]
> Envoyé : mardi, 28. septembre 2010 07:57
> À : java-user@lucene.apache.org
> Objet : Re: Questions about Lucene usage
---Message d'origine-
De : Danil ŢORIN [mailto:torin...@gmail.com]
Envoyé : mardi, 28. septembre 2010 07:57
À : java-user@lucene.apache.org
Objet : Re: Questions about Lucene usage recommendations
You said you have 1000 fields...when performing search do you search
in all 1000 fie
seems required :-/
> 12) ok
>
> Regards,
>
> Michel
>
> -Message d'origine-
> De : Danil ŢORIN [mailto:torin...@gmail.com]
> Envoyé : lundi, 27. septembre 2010 14:53
> À : java-user@lucene.apache.org
> Objet : Re: Questions about Lucene usage recommendations
>
optimized the index the average search
time dropped from 10s to below 2s, now (after 2.5 weeks) the average search
time is 7s. Optimization seems required :-/
12) ok
Regards,
Michel
-Message d'origine-
De : Danil ŢORIN [mailto:torin...@gmail.com]
Envoyé : lundi, 27. septembre 2010 14
Lucene 2.1 is really old...you should be able to migrate to lucene 2.9
without changing your code (almost jar drop-in, but be careful on
analyzers), and there could be huge improvements if you use lucene
properly.
Few questions:
- what does "all data to be indexed is stored in DB fields" mean? you
Hello,
We have an application which is using lucene and we have strong
performance issues (on bad days, some searches take more than 2
minutes). I'm new to the Lucene component, thus I'm not sure Lucene is
correctly used and thus would like to have some information on lucene
usage recommendations.
: about lucene doc id recycle
Andi s you not optimize, as soon as two segments are merged, the docids are
also reassigned. It just takes some time. Normally the docids maximum number
maybe somewhere between current doc count and about 3 times doc count.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D
://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Monday, March 22, 2010 2:05 PM
> To: java-user@lucene.apache.org
> Subject: Re: about lucene doc id recycle
>
> Yes, when you call optimize, one si
Yes, when you call optimize, one side effect is that all the doc IDs are
reassigned
so they're contiguous..
HTH
Erick
On Mon, Mar 22, 2010 at 8:22 AM, luocanrao wrote:
> Total document number is not very big, but update is very frequency.
>
> So I wonder whether the doc id is growing bigger
Total document number is not very big, but update is very frequency.
So I wonder whether the doc id is growing bigger and bigger and never
getting smaller.
Do lucene has some technique recycling doc id??
Ps: I never call optimize method.
sync (fsync to the OS) tells the OS to make sure everything associated
with that file is moved to stable storage in the IO system. (It
doesn't read anything back).
On flush we write the files to disk, which is usually very fast since
it writes into the OS's RAM write cache, but we do not sync.
s
he new
created disk file("sync ").
I guess commit only cost twice time than flush?
-邮件原件-
发件人: Michael McCandless [mailto:luc...@mikemccandless.com]
发送时间: 2010年3月14日 18:14
收件人: java-user@lucene.apache.org
主题: Re: about lucene in action 2
Flushing means stuff (added docs, delet
Flushing means stuff (added docs, deletions) buffered in RAM are moved
to disk, ie, written as new segment files.
But the new segments_N file, referencing these new segments, is not written.
Nor are the files "sync"'d.
This means a newly opened or reopened reader will not see the changes.
In or
I am reading lucene in action 2,there is some question about it.
When a flush occurs, the writer creates new segment and deletion files in
the Directory. However,
these files are neither visible nor usable to a newly opened IndexReader
until the writer commits the
changes. It's important to unders
This might be OT but did you consider Google Desktop Search?
Seems that somebody reported success with hacking it to allow network file
system index/search: http://www.geekzone.co.nz/content.asp?contentid=3939
Regards,
Lukas
http://blog.lukas-vlcek.com/
2009/12/3 杨建华
> May be you can try Omni
May be you can try Omnifind Yahoo Edition.
2009/12/3 Weiwei Wang
> You can do everything related to search(full text or just paths) with
> Lucene:-)
>
> On Wed, Dec 2, 2009 at 11:26 PM, Stefan Trcek wrote:
>
> > On Wednesday 02 December 2009 16:20:28 Stefan Trcek wrote:
> > > On Wednesday 02 De
You can do everything related to search(full text or just paths) with
Lucene:-)
On Wed, Dec 2, 2009 at 11:26 PM, Stefan Trcek wrote:
> On Wednesday 02 December 2009 16:20:28 Stefan Trcek wrote:
> > On Wednesday 02 December 2009 15:50:45 archibal wrote:
> > > -optionnally i want to have a central
On Wednesday 02 December 2009 16:20:28 Stefan Trcek wrote:
> On Wednesday 02 December 2009 15:50:45 archibal wrote:
> > -optionnally i want to have a central server which index all data
> > (name of files, folders and file content) on network and i would
> > like to connect via a browser on the cen
On Wednesday 02 December 2009 15:50:45 archibal wrote:
>
> -optionnally i want to have a central server which index all data
> (name of files, folders and file content) on network and i would like
> to connect via a browser on the central server ? are there project
> who does this or something like
27;m actually looking for a software who can search in a computer (and on
> windows network drive) all files and the contents of files based on
> indexing
> method.
>
> I have few questions about lucene :
>
> - Lucene engine does index only the contents ? or is it possible to inde
rick
On Wed, Dec 2, 2009 at 9:50 AM, archibal wrote:
>
> Hello all,
>
> I'm actually looking for a software who can search in a computer (and on
> windows network drive) all files and the contents of files based on
> indexing
> method.
>
> I have few questio
Hello all,
I'm actually looking for a software who can search in a computer (and on
windows network drive) all files and the contents of files based on indexing
method.
I have few questions about lucene :
- Lucene engine does index only the contents ? or is it possible to index
the na
http://nlp.stanford.edu/IR-book/information-retrieval-book.html gives
a good introduction what happens under the hood of a search engine and
you can download it for free. It does not explain Lucene directly, but
a lot of IR algorithms that are used in Lucene (and any other search
engine) are explai
Mehdi,
your requirements sound to be fulfilled mostly by Apache Solr which is
a web-based packaging of Lucene.
paul.
Le 08-oct.-09 à 10:11, Mehdi Ben Hamida a écrit :
Hello,
I'm reviewing and doing some researches on Lucene Java 2.9.0, to
check if it
meets our needs.
Unfortunat
Hello,
I'm reviewing and doing some researches on Lucene Java 2.9.0, to check if it
meets our needs.
Unfortunately I don't find answers to some of my questions, and I hope you
can answer them, and provide any references that prove your answer.
- Do you confirm that Lucene enables load t
Here is the english version of the article for those who are interested.
Lucene version 2.9 released
Content-Management systems like the ones powering the channels at AOL,
social networks like LinkedIn, the cloud nebula cloud computing
platform at NASA: Nearly no application that does not need to
Hey Lucene Users,
Heise.de (
http://www.heise.de/open/artikel/Such-Engine-Lucene-in-Version-2-9-erschienen-810377.html)
has just published an article about the new 2.9 release.
Unfortunately they only published the german version while we tried to get
the english one too. Thanks to Isabel (http://
Simon, no problem. I am looking at it now. I will just post my
approach and let people tear it apart / get things moving :)
On Fri, Jul 31, 2009 at 2:45 PM, Simon
Willnauer wrote:
> @Michael: add yourself as a Watcher for the issue.
> @Robert: I can start working on this within the next weeks - ca
@Michael: add yourself as a Watcher for the issue.
@Robert: I can start working on this within the next weeks - can you help too?
simon
On Fri, Jul 31, 2009 at 7:49 PM, Robert Muir wrote:
> Michael, makes sense. most of the issues probably have some
> workaround, so reply back if you need.
>
> Th
Michael, makes sense. most of the issues probably have some
workaround, so reply back if you need.
Thanks for your feedback though, it is helpful to know that its important!
On Fri, Jul 31, 2009 at 1:36 PM, Michael Thomsen wrote:
> Not really. At this point, I just needed to know where the UCS4
>
Not really. At this point, I just needed to know where the UCS4
support stands. I'm reasonably familiar with the various analyzers and
what they can do. It's just the state of UCS4 support that might be an
issue for us.
Thanks,
Mike
On Fri, Jul 31, 2009 at 12:25 PM, Robert Muir wrote:
> Michael
Michael just out of curiousity, did you have a particular Analyzer in
mind you were planning on using, or rather certain features in Lucene
you were concerned would work with these codepoints?
On Fri, Jul 31, 2009 at 12:19 PM, Simon
Willnauer wrote:
> Hey Robert, good to see that you found the lin
Hey Robert, good to see that you found the link :)
On Fri, Jul 31, 2009 at 6:06 PM, Robert Muir wrote:
> Michael, as Simon mentioned I created an issue describing where you
> might run into trouble, at least in lucene core.
>
> The low-level lucene stuff, it treats these just fine (as surrogate pa
Michael, as Simon mentioned I created an issue describing where you
might run into trouble, at least in lucene core.
The low-level lucene stuff, it treats these just fine (as surrogate pairs).
But most analyzers run into some trouble. (things like
WhitespaceAnalyzer are ok)
Also wildcard queries
Thanks for your quick response!
Mike
On Fri, Jul 31, 2009 at 10:25 AM, Simon
Willnauer wrote:
> If I understand you correctly you are asking if lucene can deal with
> encodings that use more than 16 bit. Well yes and no but mainly no.
> The support for unicode 4.0 was introduced in Java 1.5 and l
If I understand you correctly you are asking if lucene can deal with
encodings that use more than 16 bit. Well yes and no but mainly no.
The support for unicode 4.0 was introduced in Java 1.5 and lucene core
has still back-compat requirements for java 1.4. Lucene's analyzers
make use of char[] all
Is Lucene capable of handling UCS4 data natively?
Thanks,
Mike
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
!
I think I'll buy Lucene in Action 2nd edition.
Till now I haven't got much specific questions. I'm only searching for info
material I can use for my dissertation.
--
View this message in context:
http://www.nabble.com/Detailed-Information-about-Lucene-2.4-%28english-german%
rching for detailed information and needed stuff about lucene
(version 2.4.0 !!!) to write a dissertation.
Does anyone know good english or german resources?
Thanks!
Matthias
--
View this message in context:
http://www.nabble.com/Detailed-Information-about-Lucene-2.4-%28english
Hi,
I'm searching for detailed information and needed stuff about lucene
(version 2.4.0 !!!) to write a dissertation.
Does anyone know good english or german resources?
Thanks!
Matthias
--
View this message in context:
http://www.nabble.com/Detailed-Information-about-Lucene-2.4-%28en
Hi all,
Firstly I have known that there is a FsDirectory class in Nutch-0.9 so
we can access the index on HDFS. But after I tested it, i found that we can
only read the index but can not to append or modify, I think the reason is
the one mentioned in the HDFS-file append issues, am I right?
Thanks Michael for your answer :)
Actually writer.addIndexesNoOptimize method can not help us because our
aim is to split indexes rather than to merge them. But you information
about setting autoCommit=true is very helpful for us because so we will
avoid sharing of stored fields and will be ab
Ivan Vasilev wrote:
Hi Lucene Guys,
As I see in the Lucene web site in file formats page the version
2.3 will have some changes in file formats that are very important
for us. First I will say what we do and then will ask my questions.
We distribute the index on some machines. The impleme
Hi Lucene Guys,
As I see in the Lucene web site in file formats page the version 2.3
will have some changes in file formats that are very important for us.
First I will say what we do and then will ask my questions.
We distribute the index on some machines. The implementation is made so
that
Can you provide a self contained test or at least some code for this?
On Jul 19, 2007, at 5:32 PM, Mark Miller wrote:
Hopefully someone will be able to give you some further insight
into this. To me, it looks like a corrupted index. If TermVectors
where not stored, at worst you should be see
Hopefully someone will be able to give you some further insight into
this. To me, it looks like a corrupted index. If TermVectors where not
stored, at worst you should be seeing a NullPointerException. Has this
index had anything interesting happen to it? Made with an older version
of Lucene, u
Hi all, I use query (+body:12) (+title:12) , but I got some wrong
message bellow:
java.io.IOException: read past EOF
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:137)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
Hi Tanya,
I think one option is to index each log file with 2 fields, the name of
the log file and a line of your log. This way you can do a query like
this: +log_file_name:"log1" +line:"word1" -(+line:"word1" +line"word2")
Hope it helps,
Rossini
On 6/13/07, Tanya Levshina <[EMAIL PROT
Hi,
1. I am dealing with the logs files and have to index the whole file (the
attempt to increase setMaxFielldLength eventually causes out of memory
error). I am sure that I am not a first person that encounters this problem.
What is the most efficient way to handle this situation?
2. I am index
Thanks. Do you know about any existing application that is
built on top of lucene that provides this functionality?
Tanya
-Original Message-
From: Erick Erickson [mailto:[EMAIL PROTECTED]
Sent: Friday, June 01, 2007 7:18 AM
To: java-user@lucene.apache.org
Subject: Re: question about luce
e
From: Will Johnson <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, 1 June, 2007 2:02:17 PM
Subject: RE: question about lucene
Solr, which is built on top of lucene and adds highlighting among other
features, gets close to what you want. Check out:
http://wiki.apache.
: java-user@lucene.apache.org
Subject: Re: question about lucene
Nope. But here's what I think you can do (although I haven't
tried this exactly, so caveat emptor).
Document doc = new Document();
doc.add("text", line1);
doc.add("text", line2);
doc.add("text&qu
07, Tanya Levshina <[EMAIL PROTECTED]> wrote:
Wow, it was fast! Thanks. Do you know about any existing application that
is
built on top of lucene that provides this functionality?
Tanya
-Original Message-
From: Erick Erickson [mailto:[EMAIL PROTECTED]
Sent: Friday, June 01, 2007 7
: question about lucene
No. Lucene is an *engine*, not an app that has a lot of stuff built on top
of it out of the box.
You have to index enough information to figure this out somehow.
Best
Erick
On 6/1/07, Tanya Levshina <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
>
>
> I
No. Lucene is an *engine*, not an app that has a lot of stuff built on top
of it out of the box.
You have to index enough information to figure this out somehow.
Best
Erick
On 6/1/07, Tanya Levshina <[EMAIL PROTECTED]> wrote:
Hi,
I've just downloaded Lucene, tried demo and looked at the do
Hi,
I've just downloaded Lucene, tried demo and looked at the documentation. The
Indexing and Searching work great and fast but I also need to display all
the actual "hits": the lines from the files that match a particular query.
Does Lucene provide means to do it?
Thanks a lot,
Tanya
-
taking theis discussion back to the user list
-
"Huajing Li" wrote on 29/05/2007:
> Hi Doron,
>
> Days ago I published a post in the Lucene user maillist asking
> about merging database data with Lucene que
"Karl Koch" <[EMAIL PROTECTED]> wrote:
> For the documents Lucene employs
> its norm_d_t which is explained as:
>
> norm_d_t : square root of number of tokens in d in the same field as t
Actually (by default) it is:
1 / sqrt(#tokens in d with same field as t)
> basically just the square root
Hello Karl,
I’m very interested in the details of Lucene’s scoring as well.
Karl Koch wrote:
For this reason, I do not understand why Lucene (in version 1.2) normalises the query(!) with
norm_q : sqrt(sum_t((tf_q*idf_t)^2))
which is also called cosine normalisation. This is a technique that
1 - 100 of 125 matches
Mail list logo