Re: Questions about Lucene source

2022-12-14 Thread Rahul Goswami
David and Adrien, thanks for your responses. Bringing up an old thread here. Revisiting this question ... > (so deleted docs == max docs) and call commit. Will/Can this segment still > exist after commit? SInce I am using Solr (8.11.1), the default deletion policy is SolrDeletionPolicy which retai

Re: Questions about Lucene source

2022-09-23 Thread Adrien Grand
On the 2nd question, we do not plan on leveraging this information to figure out the codec: the codec that should be used to read a segment is stored separately (also in segment infos). It is mostly useful for diagnostics purposes. E.g. if we see an interesting corruption case where checksums matc

Re: Questions about Lucene source

2022-09-17 Thread Dawid Weiss
> (so deleted docs == max docs) and call commit. Will/Can this segment still > exist after commit? > Depends on your merge policy index deletion policy. You can configure Lucene to keep older commits (and then you'll preserve all historical segments). I don't know the answer to your second quest

Re: Questions about Lucene source

2022-09-16 Thread Rahul Goswami
Following up on my questions since they didn't get much love the first time. Any inputs are greatly appreciated! Thanks, Rahul On Wed, Sep 14, 2022 at 3:58 PM Rahul Goswami wrote: > Hello, > > I was going through some parts of the Lucene source and had some questions: > 1) Can lucene have 0 doc

Questions about Lucene source

2022-09-14 Thread Rahul Goswami
Hello, I was going through some parts of the Lucene source and had some questions: 1) Can lucene have 0 document segments? Or will they always be purged (either by TMP or otherwise) on a commit? Eg: A segment has 4 docs, and I make a /update call to overwrite all 4 docs (so deleted docs == max doc

Re: Question about Lucene in my project ..

2019-05-28 Thread Adrien Grand
Hi John, I heard of many users who used Lucene for this use-case, it's definitely a valid one. Indexes are stored mostly on disk, with a tiny part of them being held in memory to guarantee good access speed. Lucene supports both inverted indexes and KD trees up to 8 dimensions. Lookup, sorting an

Question about Lucene in my project ..

2019-05-27 Thread John Dale
Greetings; I'd like to play around with Lucene to offload some of my database lookups. Is this a valid use of Lucene in your opinion(s)? Indexes - they are stored on the file system as some kind of tree (I'm guessing)? Lookups and sorting - Can I lookup by date and sort asc/desc and paginate?

Re: Ask about Lucene/Core/Index DocumentsWriter

2019-03-19 Thread Michael McCandless
Can you try increasing your IndexWriter.setRAMBufferSizeMB? That flush control logic will block incoming threads if the number of bytes trying to flush to disk is too large relative to your RAM buffer. Mike McCandless http://blog.mikemccandless.com On Mon, Mar 18, 2019 at 2:30 PM yuncheng lu

Ask about Lucene/Core/Index DocumentsWriter

2019-03-18 Thread yuncheng lu
When i check the code lucene/core DocumentsWriter.preUpdate code. I see the flushControl is used when thread is Stalled. When we have a lot of documents write into disk which is SSD. We monitored that the thread is all in flush, and request continuously use addDocument which can go into preUpdate c

Re: Looking for more information about Lucene

2018-05-22 Thread Adrien Grand
Hi Alexandre, I don't have time for a call, but to give you some pointers, Lucene does the following that may be related to natural language processing: - Word segmentation via the `Tokenizer` class. It is rather simple for western languages (including French, see StandardTokenizer), but less for

Looking for more information about Lucene

2018-05-22 Thread BABAUD Alexandre
Good afternoon everyone, I am working for a French company and in the scope of my work I am collecting information on open source NLP tools available on the "market" worldwide. I was looking for such intel on the internet and by reading some users' comments but I figured, why not contact the per

about Lucene Replicator

2016-01-25 Thread dancer朕
Hi, Here is my code to backup index files with Lucene Replicator, but It doesn't work well, No files were backuped. Could you check my code and give me your advice? I bolded the key code. public class IndexFiles { private static Directory dir; private static Path bakPath; private sta

about Lucene replicator

2016-01-25 Thread Dancer
Hi, Here is my code to backup index files with Lucene Replicator, but It doesn't work well, No files were backuped. Could you check my code and give me your advice? I bolded the key code. public class IndexFiles { private static Directory dir; private static P

ask for help about lucene replicator

2016-01-24 Thread juzhen
Hi, Here is my code to backup index files with Lucene Replicator, but It doesn't work well, No files were backuped. Could you check my code and give me your advice? package com.wilddog.lucene; import java.io.IOException; import java.nio.file.Path; import java.nio.file.Paths; import java.util.c

ask for help about Lucene Replicator

2016-01-23 Thread juzhen
Hi, Here is my code to backup index files with Lucene Replicator, but It doesn't work well, No files were backuped. Could you check my code and give me your advice? public class IndexFiles { private static Directory dir; private static Path bakPath; private static LocalReplicator replicator;

RE: hello,I have a problem about lucene,please help me to explain ,thank you

2015-09-22 Thread will martin
Hi: Would you mind doing websearch and cataloging the relevant pages into a primer? Thx, Will -Original Message- From: 王建军 [mailto:jianjun200...@163.com] Sent: Tuesday, September 22, 2015 4:02 AM To: java-user@lucene.apache.org Subject: hello,I have a problem about lucene,please help me

hello,I have a problem about lucene,please help me to explain ,thank you

2015-09-22 Thread 王建军
There is a Class org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter which have two parameters,one is DEFAULT_MIN_BLOCK_SIZE,the other is DEFAULT_MAX_BLOCK_SIZE;their default values is 25 and 48;when I make their values to bigger,for example,200 and 398;And then to make index,the result is

hello,I have a problem about lucene,please help me to explain ,thank you

2015-09-14 Thread 王建军
There is a Class org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter which have two parameters,one is DEFAULT_MIN_BLOCK_SIZE,the other is DEFAULT_MAX_BLOCK_SIZE;their default values is 25 and 48;when I make their values to bigger,for example,200 and 398;And then to make index,the result

答复:答复:RE: RE: About lucene memory consumption

2014-07-01 Thread wangzhijiang999
答复:RE: RE: About lucene memory consumptionHi Wang,would it be possible to open a JIRA issue so we can track this?In any case, I would recommend to disable compound files if you use NRTCachingDirectory (as a workaround).Uwe-Uwe SchindlerH.-H.-Meier-Allee 63, D-28213 Bremenhttp://www.thetaphi.

RE: 答复:RE: RE: About lucene memory consumption

2014-07-01 Thread Uwe Schindler
> -Original Message- > From: wangzhijiang999 [mailto:wangzhijiang...@aliyun.com] > Sent: Tuesday, July 01, 2014 9:17 AM > To: java-user > Subject: 答复:RE: RE: About lucene memory consumption > > My application also meet this problem last year and I researched on the code &

答复:RE: RE: About lucene memory consumption

2014-07-01 Thread wangzhijiang999
My application also meet this problem last year and I researched on the code and found the reason. The whole process is as follow: 1. When using NRTCachingDirectory, it will use RAMDirectory as cache and MMapDirectory as delegate. The new segment will be created in the process of  flush or merge

Re:RE: RE: About lucene memory consumption

2014-06-28 Thread 308181687
"Uwe Schindler";; Date: Sat, Jun 28, 2014 05:41 PM To: "java-user"; Subject: RE: RE: About lucene memory consumption Hi, how does your configuration for NRTCaching directory looks like. There are 2 constructor params, one of the maxMergeSizeMB the other one is maxCac

RE: RE: About lucene memory consumption

2014-06-28 Thread Uwe Schindler
Hi, how does your configuration for NRTCaching directory looks like. There are 2 constructor params, one of the maxMergeSizeMB the other one is maxCachedMB. If you correctly close (or release in case of ReaderManager/SearcherManager) all indexes, this should limit the memory use. There is no

Re:RE: About lucene memory consumption

2014-06-28 Thread 308181687
use ‍MMapDirectory instead of NRTCachingDirectory?‍‍ Thanks & Best Regards! ‍ -- Original -- From: "lubin";<308181...@qq.com>; Date: Sat, Jun 28, 2014 02:03 PM To: "java-user"; Subject: Re:RE: About lucene memory consum

Re:RE: About lucene memory consumption

2014-06-27 Thread 308181687
quot;<308181...@qq.com>; Subject: Re:RE: About lucene memory consumption Could it be that you forgot to close older IndexReaders after getting a new NRT one? This would be a huge memory leak. I recommend to use SearcherManager to handle real time reopen correctly. Uwe Am 27. Juni 2014

Re:RE: About lucene memory consumption

2014-06-27 Thread Uwe Schindler
the >way, we commit the index for every 1000 email document.‍ > > > Could you give me kindly give me some tips to solve this problem? > > > > >Thanks & Best Regards! > > > > > > >‍ > >‍ > >-- Original --

Re:RE: About lucene memory consumption

2014-06-27 Thread 308181687
you give me kindly give me some tips to solve this problem? Thanks & Best Regards! ‍ ‍ -- Original -- From: "Uwe Schindler";; Date: Fri, Jun 27, 2014 08:36 PM To: "java-user"; Subject: RE: About lucene memory consumption

RE: About lucene memory consumption

2014-06-27 Thread Uwe Schindler
age- > From: 308181687 [mailto:308181...@qq.com] > Sent: Friday, June 27, 2014 10:42 AM > To: java-user > Subject: About lucene memory consumption > > Hi, all > > >I fould that the memory consumption of ‍my lucene server is abnormal, and > “jmap -histo ${pid}” show

About lucene memory consumption

2014-06-27 Thread 308181687
Hi, all I fould that the memory consumption of ‍my lucene server is abnormal, and “jmap -histo ${pid}” show that the class of byte[] consume almost all of the memory. Is there memory leak in my app? Why so many byte[] instances? ‍ The following is the top output of jmap:‍ num

Re: Questions about lucene TokenStream

2012-11-04 Thread lukai
thanks, Uwe. I missed it. On Sun, Nov 4, 2012 at 3:04 PM, Uwe Schindler wrote: > As explained in my first eMail, the class of the implementation is cached, > not the instance. The factory returns a new instance of the cached class. > > Uwe > > > > lukai schrieb: > > >Hi, thanks for the reply. C

Re: Questions about lucene TokenStream

2012-11-04 Thread Uwe Schindler
As explained in my first eMail, the class of the implementation is cached, not the instance. The factory returns a new instance of the cached class. Uwe lukai schrieb: >Hi, thanks for the reply. Could you elaborate "The AttributeFactory >creates >a new one for every new TokenStream instance.

Re: Questions about lucene TokenStream

2012-11-04 Thread lukai
Hi, thanks for the reply. Could you elaborate "The AttributeFactory creates a new one for every new TokenStream instance." ? because i only find the implementation like this: private static Class getClassForInterface(Class attClass) { final WeakReference> ref = attClassImplMap.get(attCla

RE: Questions about lucene TokenStream

2012-11-04 Thread Uwe Schindler
Hi, > Hmmm, the reason i asked this question is regarding to implementation of : > > CharTermAttribute. > > > It seems tokenizer will set token read from reader into it, and the following > tokenstream can also get this instance. My concern is in a multi-thread > envioment. another thread can

Re: Questions about lucene TokenStream

2012-11-04 Thread lukai
Hmmm, the reason i asked this question is regarding to implementation of : CharTermAttribute. It seems tokenizer will set token read from reader into it, and the following tokenstream can also get this instance. My concern is in a multi-thread envioment. another thread can also change the conte

RE: Questions about lucene TokenStream

2012-11-04 Thread Uwe Schindler
Hi, > I have two confused questions regarding Lucene implementation, hope > someone can give me some clues. > > 1. It's about the AttributeSource/AttributeSourceImpl implemenation. > Seems like the default instance was kept as "static" > in DefaultAttributeFactory. But we get these instances i

Re: Need help About Lucene Query

2012-04-15 Thread Adriano Crestani
Hi Yogesh, I bet you are indexing A as an analyzed field and its values are getting tokenized at each capital letter it finds. Try to index field A using Field.Index.NOT_ANALYZED. On Sun, Apr 15, 2012 at 3:44 AM, Yogesh patel wrote: > Hi, > > I have read apache lucene tutorial and implemented in

Re: my question about lucene

2011-11-10 Thread Ian Lea
You can provide your own Similarity implementation, overriding whichever of the methods you need in order to achieve your aims. Use it via the setxxx methods mentioned in the javadocs and unless you deliberately sort by some other field everything should fall into place. -- Ian. 2011/11/9 强继朋

my question about lucene

2011-11-09 Thread 强继朋
lucene, I hava a problem i don't know how to do, it's about Score Formula of lucene. In the package of lucene, it provide a method in Class Similarity. My question : if i want to only use some factors of Formula, such as TF and IDF. And then i add some additional factors, in aim to

Re: Some question about Lucene

2011-07-10 Thread Yusniel Hidalgo Delgado
Thanks Steve. Helpful this slide. Greetings. - Mensaje original - De: "Steven A Rowe" Para: java-user@lucene.apache.org Enviados: Domingo, 10 de Julio 2011 21:45:48 (GMT-0500) Auto-Detected Asunto: RE: Some question about Lucene This slide show is a few years old, but

RE: Some question about Lucene

2011-07-10 Thread Steven A Rowe
[mailto:yhdelg...@uci.cu] Sent: Sunday, July 10, 2011 9:30 PM To: java-user@lucene.apache.org Subject: Some question about Lucene Hello I'm a new Lucene user. I have the following question: is posible to build a crawler/spider with Lucene library or Lucene is only for index/search phases.

Some question about Lucene

2011-07-10 Thread Ing. Yusniel Hidalgo Delgado
Hello I'm a new Lucene user. I have the following question: is posible to build a crawler/spider with Lucene library or Lucene is only for index/search phases. I am studying three project: Nutch, Lucene and Solr but I don't see what is the main difference between them. Greetings . --

Re: Questions about Lucene usage recommendations

2010-10-13 Thread Umesh Prasad
Genève 2 > Tél. direct : +41 (0)22 388 00 95 > michel.paw...@etat.ge.ch > > > -Message d'origine- > De : Danil ŢORIN [mailto:torin...@gmail.com] > Envoyé : mardi, 28. septembre 2010 07:57 > À : java-user@lucene.apache.org > Objet : Re: Questions about Lucene usage

RE: Questions about Lucene usage recommendations

2010-09-28 Thread Pawlak Michel (DCTI)
---Message d'origine- De : Danil ŢORIN [mailto:torin...@gmail.com] Envoyé : mardi, 28. septembre 2010 07:57 À : java-user@lucene.apache.org Objet : Re: Questions about Lucene usage recommendations You said you have 1000 fields...when performing search do you search in all 1000 fie

Re: Questions about Lucene usage recommendations

2010-09-27 Thread Danil ŢORIN
seems required :-/ > 12) ok > > Regards, > > Michel > > -Message d'origine- > De : Danil ŢORIN [mailto:torin...@gmail.com] > Envoyé : lundi, 27. septembre 2010 14:53 > À : java-user@lucene.apache.org > Objet : Re: Questions about Lucene usage recommendations >

RE: Questions about Lucene usage recommendations

2010-09-27 Thread Pawlak Michel (DCTI)
optimized the index the average search time dropped from 10s to below 2s, now (after 2.5 weeks) the average search time is 7s. Optimization seems required :-/ 12) ok Regards, Michel -Message d'origine- De : Danil ŢORIN [mailto:torin...@gmail.com] Envoyé : lundi, 27. septembre 2010 14

Re: Questions about Lucene usage recommendations

2010-09-27 Thread Danil ŢORIN
Lucene 2.1 is really old...you should be able to migrate to lucene 2.9 without changing your code (almost jar drop-in, but be careful on analyzers), and there could be huge improvements if you use lucene properly. Few questions: - what does "all data to be indexed is stored in DB fields" mean? you

Questions about Lucene usage recommendations

2010-09-27 Thread Pawlak Michel (DCTI)
Hello, We have an application which is using lucene and we have strong performance issues (on bad days, some searches take more than 2 minutes). I'm new to the Lucene component, thus I'm not sure Lucene is correctly used and thus would like to have some information on lucene usage recommendations.

答复: about lucene doc id recycle

2010-03-22 Thread luocanrao
: about lucene doc id recycle Andi s you not optimize, as soon as two segments are merged, the docids are also reassigned. It just takes some time. Normally the docids maximum number maybe somewhere between current doc count and about 3 times doc count. - Uwe Schindler H.-H.-Meier-Allee 63, D

RE: about lucene doc id recycle

2010-03-22 Thread Uwe Schindler
://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Monday, March 22, 2010 2:05 PM > To: java-user@lucene.apache.org > Subject: Re: about lucene doc id recycle > > Yes, when you call optimize, one si

Re: about lucene doc id recycle

2010-03-22 Thread Erick Erickson
Yes, when you call optimize, one side effect is that all the doc IDs are reassigned so they're contiguous.. HTH Erick On Mon, Mar 22, 2010 at 8:22 AM, luocanrao wrote: > Total document number is not very big, but update is very frequency. > > So I wonder whether the doc id is growing bigger

about lucene doc id recycle

2010-03-22 Thread luocanrao
Total document number is not very big, but update is very frequency. So I wonder whether the doc id is growing bigger and bigger and never getting smaller. Do lucene has some technique recycling doc id?? Ps: I never call optimize method.

Re: 答复: about lucene in action 2

2010-03-14 Thread Michael McCandless
sync (fsync to the OS) tells the OS to make sure everything associated with that file is moved to stable storage in the IO system. (It doesn't read anything back). On flush we write the files to disk, which is usually very fast since it writes into the OS's RAM write cache, but we do not sync. s

答复: about lucene in action 2

2010-03-14 Thread luocan19826164
he new created disk file("sync "). I guess commit only cost twice time than flush?   -邮件原件- 发件人: Michael McCandless [mailto:luc...@mikemccandless.com] 发送时间: 2010年3月14日 18:14 收件人: java-user@lucene.apache.org 主题: Re: about lucene in action 2   Flushing means stuff (added docs, delet

Re: about lucene in action 2

2010-03-14 Thread Michael McCandless
Flushing means stuff (added docs, deletions) buffered in RAM are moved to disk, ie, written as new segment files. But the new segments_N file, referencing these new segments, is not written. Nor are the files "sync"'d. This means a newly opened or reopened reader will not see the changes. In or

about lucene in action 2

2010-03-13 Thread luocanrao
I am reading lucene in action 2,there is some question about it. When a flush occurs, the writer creates new segment and deletion files in the Directory. However, these files are neither visible nor usable to a newly opened IndexReader until the writer commits the changes. It's important to unders

Re: About Lucene ...

2009-12-03 Thread Lukáš Vlček
This might be OT but did you consider Google Desktop Search? Seems that somebody reported success with hacking it to allow network file system index/search: http://www.geekzone.co.nz/content.asp?contentid=3939 Regards, Lukas http://blog.lukas-vlcek.com/ 2009/12/3 杨建华 > May be you can try Omni

Re: About Lucene ...

2009-12-03 Thread 杨建华
May be you can try Omnifind Yahoo Edition. 2009/12/3 Weiwei Wang > You can do everything related to search(full text or just paths) with > Lucene:-) > > On Wed, Dec 2, 2009 at 11:26 PM, Stefan Trcek wrote: > > > On Wednesday 02 December 2009 16:20:28 Stefan Trcek wrote: > > > On Wednesday 02 De

Re: About Lucene ...

2009-12-03 Thread Weiwei Wang
You can do everything related to search(full text or just paths) with Lucene:-) On Wed, Dec 2, 2009 at 11:26 PM, Stefan Trcek wrote: > On Wednesday 02 December 2009 16:20:28 Stefan Trcek wrote: > > On Wednesday 02 December 2009 15:50:45 archibal wrote: > > > -optionnally i want to have a central

Re: About Lucene ...

2009-12-02 Thread Stefan Trcek
On Wednesday 02 December 2009 16:20:28 Stefan Trcek wrote: > On Wednesday 02 December 2009 15:50:45 archibal wrote: > > -optionnally i want to have a central server which index all data > > (name of files, folders and file content) on network and i would > > like to connect via a browser on the cen

Re: About Lucene ...

2009-12-02 Thread Stefan Trcek
On Wednesday 02 December 2009 15:50:45 archibal wrote: > > -optionnally i want to have a central server which index all data > (name of files, folders and file content) on network and i would like > to connect via a browser on the central server ? are there project > who does this or something like

Re: About Lucene ...

2009-12-02 Thread Shashi Kant
27;m actually looking for a software who can search in a computer (and on > windows network drive) all files and the contents of files based on > indexing > method. > > I have few questions about lucene : > > - Lucene engine does index only the contents ? or is it possible to inde

Re: About Lucene ...

2009-12-02 Thread Erick Erickson
rick On Wed, Dec 2, 2009 at 9:50 AM, archibal wrote: > > Hello all, > > I'm actually looking for a software who can search in a computer (and on > windows network drive) all files and the contents of files based on > indexing > method. > > I have few questio

About Lucene ...

2009-12-02 Thread archibal
Hello all, I'm actually looking for a software who can search in a computer (and on windows network drive) all files and the contents of files based on indexing method. I have few questions about lucene : - Lucene engine does index only the contents ? or is it possible to index the na

Re: Books about lucene

2009-11-27 Thread Martijn v Groningen
http://nlp.stanford.edu/IR-book/information-retrieval-book.html gives a good introduction what happens under the hood of a search engine and you can download it for free. It does not explain Lucene directly, but a lot of IR algorithms that are used in Lucene (and any other search engine) are explai

Re: Review and questions about Lucene Java 2.9.0

2009-10-08 Thread Paul Libbrecht
Mehdi, your requirements sound to be fulfilled mostly by Apache Solr which is a web-based packaging of Lucene. paul. Le 08-oct.-09 à 10:11, Mehdi Ben Hamida a écrit : Hello, I'm reviewing and doing some researches on Lucene Java 2.9.0, to check if it meets our needs. Unfortunat

Review and questions about Lucene Java 2.9.0

2009-10-08 Thread Mehdi Ben Hamida
Hello, I'm reviewing and doing some researches on Lucene Java 2.9.0, to check if it meets our needs. Unfortunately I don't find answers to some of my questions, and I hope you can answer them, and provide any references that prove your answer. - Do you confirm that Lucene enables load t

Re: German article about Lucene 2.9

2009-10-05 Thread Simon Willnauer
Here is the english version of the article for those who are interested. Lucene version 2.9 released Content-Management systems like the ones powering the channels at AOL, social networks like LinkedIn, the cloud nebula cloud computing platform at NASA: Nearly no application that does not need to

German article about Lucene 2.9

2009-10-05 Thread Simon Willnauer
Hey Lucene Users, Heise.de ( http://www.heise.de/open/artikel/Such-Engine-Lucene-in-Version-2-9-erschienen-810377.html) has just published an article about the new 2.9 release. Unfortunately they only published the german version while we tried to get the english one too. Thanks to Isabel (http://

Re: Quick question about Lucene and UCS4

2009-07-31 Thread Robert Muir
Simon, no problem. I am looking at it now. I will just post my approach and let people tear it apart / get things moving :) On Fri, Jul 31, 2009 at 2:45 PM, Simon Willnauer wrote: > @Michael: add yourself as a Watcher for the issue. > @Robert: I can start working on this within the next weeks - ca

Re: Quick question about Lucene and UCS4

2009-07-31 Thread Simon Willnauer
@Michael: add yourself as a Watcher for the issue. @Robert: I can start working on this within the next weeks - can you help too? simon On Fri, Jul 31, 2009 at 7:49 PM, Robert Muir wrote: > Michael, makes sense. most of the issues probably have some > workaround, so reply back if you need. > > Th

Re: Quick question about Lucene and UCS4

2009-07-31 Thread Robert Muir
Michael, makes sense. most of the issues probably have some workaround, so reply back if you need. Thanks for your feedback though, it is helpful to know that its important! On Fri, Jul 31, 2009 at 1:36 PM, Michael Thomsen wrote: > Not really. At this point, I just needed to know where the UCS4 >

Re: Quick question about Lucene and UCS4

2009-07-31 Thread Michael Thomsen
Not really. At this point, I just needed to know where the UCS4 support stands. I'm reasonably familiar with the various analyzers and what they can do. It's just the state of UCS4 support that might be an issue for us. Thanks, Mike On Fri, Jul 31, 2009 at 12:25 PM, Robert Muir wrote: > Michael

Re: Quick question about Lucene and UCS4

2009-07-31 Thread Robert Muir
Michael just out of curiousity, did you have a particular Analyzer in mind you were planning on using, or rather certain features in Lucene you were concerned would work with these codepoints? On Fri, Jul 31, 2009 at 12:19 PM, Simon Willnauer wrote: > Hey Robert, good to see that you found the lin

Re: Quick question about Lucene and UCS4

2009-07-31 Thread Simon Willnauer
Hey Robert, good to see that you found the link :) On Fri, Jul 31, 2009 at 6:06 PM, Robert Muir wrote: > Michael, as Simon mentioned I created an issue describing where you > might run into trouble, at least in lucene core. > > The low-level lucene stuff, it treats these just fine (as surrogate pa

Re: Quick question about Lucene and UCS4

2009-07-31 Thread Robert Muir
Michael, as Simon mentioned I created an issue describing where you might run into trouble, at least in lucene core. The low-level lucene stuff, it treats these just fine (as surrogate pairs). But most analyzers run into some trouble. (things like WhitespaceAnalyzer are ok) Also wildcard queries

Re: Quick question about Lucene and UCS4

2009-07-31 Thread Michael Thomsen
Thanks for your quick response! Mike On Fri, Jul 31, 2009 at 10:25 AM, Simon Willnauer wrote: > If I understand you correctly you are asking if lucene can deal with > encodings that use more than 16 bit. Well yes and no but mainly no. > The support for unicode 4.0 was introduced in Java 1.5 and l

Re: Quick question about Lucene and UCS4

2009-07-31 Thread Simon Willnauer
If I understand you correctly you are asking if lucene can deal with encodings that use more than 16 bit. Well yes and no but mainly no. The support for unicode 4.0 was introduced in Java 1.5 and lucene core has still back-compat requirements for java 1.4. Lucene's analyzers make use of char[] all

Quick question about Lucene and UCS4

2009-07-31 Thread Michael Thomsen
Is Lucene capable of handling UCS4 data natively? Thanks, Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Detailed Information about Lucene 2.4 (english/german)

2009-01-20 Thread Matthias W.
! I think I'll buy Lucene in Action 2nd edition. Till now I haven't got much specific questions. I'm only searching for info material I can use for my dissertation. -- View this message in context: http://www.nabble.com/Detailed-Information-about-Lucene-2.4-%28english-german%

Re: Detailed Information about Lucene 2.4 (english/german)

2009-01-19 Thread Grant Ingersoll
rching for detailed information and needed stuff about lucene (version 2.4.0 !!!) to write a dissertation. Does anyone know good english or german resources? Thanks! Matthias -- View this message in context: http://www.nabble.com/Detailed-Information-about-Lucene-2.4-%28english

Detailed Information about Lucene 2.4 (english/german)

2009-01-19 Thread Matthias W.
Hi, I'm searching for detailed information and needed stuff about lucene (version 2.4.0 !!!) to write a dissertation. Does anyone know good english or german resources? Thanks! Matthias -- View this message in context: http://www.nabble.com/Detailed-Information-about-Lucene-2.4-%28en

Questions about lucene index on HDFS

2008-08-21 Thread Jarvis . Guo
Hi all, Firstly I have known that there is a FsDirectory class in Nutch-0.9 so we can access the index on HDFS. But after I tested it, i found that we can only read the index but can not to append or modify, I think the reason is the one mentioned in the HDFS-file append issues, am I right?

Re: Question about Lucene 2.3. file formats?

2008-01-23 Thread Ivan Vasilev
Thanks Michael for your answer :) Actually writer.addIndexesNoOptimize method can not help us because our aim is to split indexes rather than to merge them. But you information about setting autoCommit=true is very helpful for us because so we will avoid sharing of stored fields and will be ab

Re: Question about Lucene 2.3. file formats?

2008-01-22 Thread Michael McCandless
Ivan Vasilev wrote: Hi Lucene Guys, As I see in the Lucene web site in file formats page the version 2.3 will have some changes in file formats that are very important for us. First I will say what we do and then will ask my questions. We distribute the index on some machines. The impleme

Question about Lucene 2.3. file formats?

2008-01-22 Thread Ivan Vasilev
Hi Lucene Guys, As I see in the Lucene web site in file formats page the version 2.3 will have some changes in file formats that are very important for us. First I will say what we do and then will ask my questions. We distribute the index on some machines. The implementation is made so that

Re: Question about lucene query (+body:12) (+title:12) ?

2007-07-19 Thread Grant Ingersoll
Can you provide a self contained test or at least some code for this? On Jul 19, 2007, at 5:32 PM, Mark Miller wrote: Hopefully someone will be able to give you some further insight into this. To me, it looks like a corrupted index. If TermVectors where not stored, at worst you should be see

Re: Question about lucene query (+body:12) (+title:12) ?

2007-07-19 Thread Mark Miller
Hopefully someone will be able to give you some further insight into this. To me, it looks like a corrupted index. If TermVectors where not stored, at worst you should be seeing a NullPointerException. Has this index had anything interesting happen to it? Made with an older version of Lucene, u

Question about lucene query (+body:12) (+title:12) ?

2007-07-19 Thread li hao cho
Hi all, I use query (+body:12) (+title:12) , but I got some wrong message bellow: java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:137) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)

Re: multiple questions about lucene

2007-06-13 Thread Rafael Rossini
Hi Tanya, I think one option is to index each log file with 2 fields, the name of the log file and a line of your log. This way you can do a query like this: +log_file_name:"log1" +line:"word1" -(+line:"word1" +line"word2") Hope it helps, Rossini On 6/13/07, Tanya Levshina <[EMAIL PROT

multiple questions about lucene

2007-06-13 Thread Tanya Levshina
Hi, 1. I am dealing with the logs files and have to index the whole file (the attempt to increase setMaxFielldLength eventually causes out of memory error). I am sure that I am not a first person that encounters this problem. What is the most efficient way to handle this situation? 2. I am index

Re: question about lucene

2007-06-01 Thread Chris Lu
Thanks. Do you know about any existing application that is built on top of lucene that provides this functionality? Tanya -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Friday, June 01, 2007 7:18 AM To: java-user@lucene.apache.org Subject: Re: question about luce

Re: question about lucene

2007-06-01 Thread mark harwood
e From: Will Johnson <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, 1 June, 2007 2:02:17 PM Subject: RE: question about lucene Solr, which is built on top of lucene and adds highlighting among other features, gets close to what you want. Check out: http://wiki.apache.

RE: question about lucene

2007-06-01 Thread Will Johnson
: java-user@lucene.apache.org Subject: Re: question about lucene Nope. But here's what I think you can do (although I haven't tried this exactly, so caveat emptor). Document doc = new Document(); doc.add("text", line1); doc.add("text", line2); doc.add("text&qu

Re: question about lucene

2007-06-01 Thread Erick Erickson
07, Tanya Levshina <[EMAIL PROTECTED]> wrote: Wow, it was fast! Thanks. Do you know about any existing application that is built on top of lucene that provides this functionality? Tanya -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Friday, June 01, 2007 7

RE: question about lucene

2007-06-01 Thread Tanya Levshina
: question about lucene No. Lucene is an *engine*, not an app that has a lot of stuff built on top of it out of the box. You have to index enough information to figure this out somehow. Best Erick On 6/1/07, Tanya Levshina <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I&#

Re: question about lucene

2007-06-01 Thread Erick Erickson
No. Lucene is an *engine*, not an app that has a lot of stuff built on top of it out of the box. You have to index enough information to figure this out somehow. Best Erick On 6/1/07, Tanya Levshina <[EMAIL PROTECTED]> wrote: Hi, I've just downloaded Lucene, tried demo and looked at the do

question about lucene

2007-06-01 Thread Tanya Levshina
Hi, I've just downloaded Lucene, tried demo and looked at the documentation. The Indexing and Searching work great and fast but I also need to display all the actual "hits": the lines from the files that match a particular query. Does Lucene provide means to do it? Thanks a lot, Tanya

Fw: About Lucene-patch-446

2007-05-29 Thread Doron Cohen
- taking theis discussion back to the user list - "Huajing Li" wrote on 29/05/2007: > Hi Doron, > > Days ago I published a post in the Lucene user maillist asking > about merging database data with Lucene que

Re: Re: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-12 Thread Doron Cohen
"Karl Koch" <[EMAIL PROTECTED]> wrote: > For the documents Lucene employs > its norm_d_t which is explained as: > > norm_d_t : square root of number of tokens in d in the same field as t Actually (by default) it is: 1 / sqrt(#tokens in d with same field as t) > basically just the square root

Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-12 Thread Soeren Pekrul
Hello Karl, I’m very interested in the details of Lucene’s scoring as well. Karl Koch wrote: For this reason, I do not understand why Lucene (in version 1.2) normalises the query(!) with norm_q : sqrt(sum_t((tf_q*idf_t)^2)) which is also called cosine normalisation. This is a technique that

  1   2   >