Re: Best way to purposely corrupt an index?

2005-04-21 Thread Andy Roberts
On Wednesday 20 Apr 2005 12:52, Kevin L. Cobb wrote: > My policy on this type of exception handling is to only byte off what > you can chew. If you catch an IOException, then you simply report to the > user that an unexpected error has occurred and the search engine is > unobtainable at the moment.

Re: fields that are indexed as UnStored

2005-04-21 Thread Andrzej Bialecki
Chuck Williams wrote: Omar Didi writes (4/20/2005 5:05 PM): Hi guys, If a field is indexed as UnStored how can I get it value? I tried document.get("UnStored_field") it returns null. You didn't store it, so it's not there. If the field happens to be a single Term, you might be able to find it

Re: Lucene bulk indexing

2005-04-21 Thread Peter A. Daly
On some systems I have seen big speed increases by indexing to a RAMDirectory and periodically "merging" into an on disk directory every X number of docs. May or may not help in this case. In the first case a used this, it took indexing down from a few hours to 30 minutes for a few million docume

extract data from mpg/avi etc

2005-04-21 Thread Peter Veentjer - Anchor Men
Does anyone know of a library that can extra metadata from movie formats? Met vriendelijke groet, Peter Veentjer Anchor Men Interactive Solutions - duidelijk in zakelijke internetoplossingen Praediniussingel 41 9711 AE Groningen T: 050-3115222 F: 050-5891696 E: [EMAIL PROTECTED] I : www.anch

Can not create searcher: java.io.IOException: Invalid argument

2005-04-21 Thread Mariella Digiacomo
Hi ALL, We have built Lucene indexes on a Solaris box. We have tested them and they can be accessed OK when residing on a native Linux filesystem. What we like to do is export through NFS the Lucene indexes from the Solaris box to the Linux box (mainly for development and testing purposes). When

Lucene and J2EE transactions

2005-04-21 Thread Peter Gelderbloem
Hi, I am looking to get Lucene to participate in a JTA transaction. What would be the best way to do this? I am thinking maybe use a message queue that feeds an indexing thread/message driven bean with add update and delete information. Or maybe using a subclass of Directory that uses a relational

Re: Lucene and J2EE transactions

2005-04-21 Thread Erik Hatcher
On Apr 21, 2005, at 9:43 AM, Peter Gelderbloem wrote: Hi, I am looking to get Lucene to participate in a JTA transaction. What would be the best way to do this? Have a look at LuceneRAR: https://lucenerar.dev.java.net/ I have no experience with it, but it fits what you're looking for. I am thinkin

Re: Lucene and J2EE transactions

2005-04-21 Thread Joseph B. Ottinger
Well, LuceneRAR isn't transactional - yet. As soon as I figure out how to queue deletes, though... :) On Thu, 21 Apr 2005, Erik Hatcher wrote: On Apr 21, 2005, at 9:43 AM, Peter Gelderbloem wrote: Hi, I am looking to get Lucene to participate in a JTA transaction. What would be the best way to do

Re: extract data from mpg/avi etc

2005-04-21 Thread Hasan Diwan
On 21/04/05, Peter Veentjer - Anchor Men <[EMAIL PROTECTED]> wrote: > Does anyone know of a library that can extra metadata from movie > formats? http://computing.ee.ethz.ch/sepp/jmf-1.0-to.html That's advertised to be able to. -- Cheers, Hasan Diwan <[EMAIL PROTECTED]> -

Re: WildCard search replacement

2005-04-21 Thread Aalap Parikh
Hi, Thanks for your reply. One more question. You mentioned that your technique can be used for wildcard search like ex. *123* . But say I only need something like 123* i.e. wildcard only at the end and NOT on both sides, then how can one use your technique to avoid TooManyClauseException? Thanks

Re: Lucene bulk indexing

2005-04-21 Thread Aalap Parikh
My machine is pretty good and fairly new. The disk for sure is not slow and also I am not indexing large Documents; 27 fields with each field value being a string with no more than 15-20 characters long. I tried setting the maxFieldLength value of the Indexwriter to a low value but that didn't hel

Re: Lucene bulk indexing

2005-04-21 Thread Aalap Parikh
Hi, Thanks for your suggestion. I haven't yet tried your technique but I did try something similar by tweaking some Indexwriter properties like mergeFactor and minMergeDocs and it did certainly speed up the process a lot. I am sure the same can be achieved with what you suggest because it is essen

help with date sort, please

2005-04-21 Thread James
Apologies if the post is a duplicate, but my original post didn't come back over the mailing list... I have an index of around 3 million records, and typical queries can result in result sets of between 1 and 400,000 results. We have indexed "dateTime" fields in the form 20050415142, that is,

sorting on "dates" a little fuzzy...

2005-04-21 Thread James Levine
I have an index of around 3 million records, and typical queries can result in result sets of between 1 and 400,000 results. We have indexed "dateTime" fields in the form 20050415142, that is, to 10-minute precision. When I try to sort queries I get something back that is roughly sorted on index

Re: sorting on "dates" a little fuzzy...

2005-04-21 Thread Erik Hatcher
On Apr 21, 2005, at 5:22 PM, James Levine wrote: I have an index of around 3 million records, and typical queries can result in result sets of between 1 and 400,000 results. We have indexed "dateTime" fields in the form 20050415142, that is, to 10-minute precision. When I try to sort queries I get

Re: Lucene bulk indexing

2005-04-21 Thread Chris Hostetter
: the app using JProfiler and found out that 90% of time : is spent in the IndexWriter.addDocument call. As what analyzer are you using? : My machine: Pentium 4 CPU 2.40 GHz : RAM 1 GB what JVM args are you using? (in particular: how much ram are you telling the JVM to use) ... what

Fwd: [jira] Closed: (INFRA-272) 3 new Lucene mailing lists

2005-04-21 Thread Erik Hatcher
Sorry for the delay in sending this out. There are now new lists for Lucene commit messages, one for the Ruby port work that is beginning, and also a general one set up to span all of the Lucene community for use for general discussion across all subprojects. Here are quick links for subscrib

Re: token type question

2005-04-21 Thread ethandev
Thanks Pierrick. Are you say that I should construct Token in analyzer like new Token ("chem_H2O", 100, 103, "chem"); note that chem_ is added prefix to H2O, and 100 to 103 is length of H2O rather than chem_H2O? I also have some further problem and not sure if can be solved by this approch. I

Increase IndexWriter.mergeFactor if you have enought memory Re: Lucene bulk indexing

2005-04-21 Thread Che Dong
Hi all: did you tried to increase IndexWriter.mergeFactor. I tried to increase it to 1000 and index speed is about 10 time faster than defualt = 10 . Regards Che Dong http://www.chedong.com/ Aalap Parikh åé: My machine is pretty good and fairly new. The disk for sure is not slow and also I am not

Re: sorting on "dates" a little fuzzy...

2005-04-21 Thread Che Dong
Just like Google said: full text search service is not traditional database application. Lucene is not a database too: if you wanna sort on some fields, you'd better pre-sort it before it indexed: like date. then get results by doc id. For lucene you can only sort results in top hits. if you so

Re: sorting on "dates" a little fuzzy...

2005-04-21 Thread James
Hi Erik, Thanks for the reply. All dateTime fields are zero-padded and the same length, and each indexed document has a valid dateTime value. Regarding the sort type, INT generates a ParseException, I assume because the string has too many digits to fit in an int. I looked for a LONG type but

Re: sorting on "dates" a little fuzzy...

2005-04-21 Thread James
Hi Che- The presort method was our first approach but this doesn't work in practice because we update the index incrementally and insertion order doesn't match date ordering as we add updates. I don't think sorting top hits only will deliver what the user is expecting -- that is, results listed

Re: sorting on "dates" a little fuzzy...

2005-04-21 Thread Che Dong
James åé: Hi Che- The presort method was our first approach but this doesn't work in practice because we update the index incrementally and insertion order doesn't match date ordering as we add updates. I don't think sorting top hits only will deliver what the user is expecting -- that is, results