Chinese Segmentation with Phase Query

2007-11-08 Thread Cedric Ho
Hi, We are having an issue while indexing Chinese Documents in Lucene. Some background first: Since CJK languages doesn't have space between words, we first have to determine the words from sentences. e.g. a sentence containing characters ABC, it may be segmented into AB, C or A, BC. the proble

Re: How to build your custom termfreq vector an add it to the field ?

2007-11-08 Thread Ariel
Very interesting the link you suggest me Mr Grant Ingersoll. Let see if I understand how the ranking issue in lucene could be implemented: 1. First I must create my own query class extending the abstract Query class. The only method I must implement from this class is toString. Is right this ?

SV: OutOfMemory-problems with SortComparatorSource / ScoreDocComparator

2007-11-08 Thread Tobias Hill
Hi Tobias, I had the similar problem with lucene custom sorting about two years ago. Please take a look at these two email threads: http://www.gossamer-threads.com/lists/lucene/java-dev/39100 http://www.gossamer-threads.com/lists/lucene/java-user/38016 It seems like they did not make any patch f

OutOfMemory-problems with SortComparatorSource / ScoreDocComparator

2007-11-08 Thread Tobias Hill
Hi, We have implemented a custom sort following the pattern in Lucene in Action. Unfortunately this has led to quite serious memory problems. When analyzing those (with a profiler) it seems that there are as many remaining instances of our SortComparatorSource as there have been queries against th

Re: Office 2007

2007-11-08 Thread Michael Prichard
What about http://www.openxml4j.org/? Any experience there? On Nov 8, 2007, at 10:04 AM, jm wrote: I havent upgraded yet but I think I read in the aperture list that they already had some extractors for some office 2007 stuff in trunk some ago. On Nov 8, 2007 3:09 PM, Grant Ingersoll <[EMAIL

Re: Office 2007

2007-11-08 Thread jm
I havent upgraded yet but I think I read in the aperture list that they already had some extractors for some office 2007 stuff in trunk some ago. On Nov 8, 2007 3:09 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > You might also consider asking on the Tika (a Lucene subproject > currently in Incu

RE: Sorting with MultiSearcher

2007-11-08 Thread WATHELET Thomas
I'm using Lucene core 2.2.0 The field i'm trying to sort is not presente in each documents into the index. I add the field only if I have a metadata for it Maybee it's the problem? My field is untokenized store and indexed. -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] S

Re: Office 2007

2007-11-08 Thread Grant Ingersoll
You might also consider asking on the Tika (a Lucene subproject currently in Incubation) and Aperture project sites (http://aperture.sourceforge.net ). Not sure if you will have any luck, but they are also focused on the extraction problem and may have thought more about it. -Grant On Nov

Office 2007

2007-11-08 Thread Michael Prichard
Hello, I know this has gone around a bit but anyone had any success with pulling text from Office 2007 files? Any recommendations? Thanks, Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

Re: Sorting with MultiSearcher

2007-11-08 Thread Mark Miller
Any other info or code snippets? I sort on multisearchers all the time and have never seen that behavior. - Mark (sorting on multisearchers since Lucene 1.4 ) WATHELET Thomas wrote: Hi, I have few Indexes with the same structure. I'm using MultiSearcher to search into those indexes and when I

Sorting with MultiSearcher

2007-11-08 Thread WATHELET Thomas
Hi, I have few Indexes with the same structure. I'm using MultiSearcher to search into those indexes and when I try to sort the result by field the result is sort by field and by index (we have all results from index1 and then index2,...) but I would like to have the result sorted on the all result

Re: - possible bug in lock timeout

2007-11-08 Thread Michael McCandless
OK, I've opened https://issues.apache.org/jira/browse/LUCENE-1048 to this issue. Thanks Nikolay! Mike "Nikolay Diakov" <[EMAIL PROTECTED]> wrote: > In Lucene 2.x, in method Lock#obtain(long lockWaitTimeout) I see the > following line: > > int maxSleepCount = (int)(lockWaitTimeout / LOCK_POLL_

Re: why Term variable text can not be interned?

2007-11-08 Thread Yonik Seeley
On Nov 8, 2007 12:44 AM, Chris Lu <[EMAIL PROTECTED]> wrote: > In Term object, there are variables "field" and "text". > My question is, why variable "text" can not be intern() ? > > Wouldn't it save some memory, especially in the FieldCache? The FieldCache already stores only one string per term,

RE: how can i store lucene results from a webpage to a oracle database

2007-11-08 Thread Ard Schrijvers
I suppose you have for about 5 minutes to display a single search ? :-) Perhaps before pointing out your possible solutions, you might better start describing your functional requirements, because your suggested solution is headed for problems. So you need custom ordering, check out lucene scoring