What is the best file system for Lucene?

2004-11-30 Thread Sanyi
Hi! I'm testing Lucene 1.4.2 on two very different configs, but with the same index. I'm very surprised by the results: Both systems are searching at about the same speed, but I'd expect (and I really need) to run Lucene a lot faster on my stronger config. Config #1 (a notebook): WinXP Pro, NTFS

Re: What is the best file system for Lucene?

2004-11-30 Thread John Moylan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Interesting, what are your merge settings, which JDK are you using?(there are big differences between versions). Have you tried with hyperthreading turned off on #2? - if so did it fare any differently? Regards, John Sanyi wrote: | Hi! | | I'm testing L

Re: What is the best file system for Lucene?

2004-11-30 Thread Sanyi
> Interesting, what are your merge settings Sorry, I didn't mention that I was talking about search performance. I'm using the same, fully optimized index on both systems. (I've generated both indexes with the same code from the same database on the actual OS) > which JDK are you using? I'm usi

Re: What is the best file system for Lucene?

2004-11-30 Thread sg
> What file systems are you people using Lucene on? And what are your > experiences? http://www.apple.com/xsan/ Actually it is a beta version and have some small issues but it is very fast and easy to manage in case you get it installed. The installation it self is tricky since it is very depen

SEARCH CRITERIA

2004-11-30 Thread Karthik N S
Hi Guys Apologies. On yahoo and Altavista ,if searched upon a word like 'kid' returns the search with similar as below. Also try: kid rock, kid games, star wars kid, karate kid More... How to obtain the similar search criteria using Lucene. Thx in advance Warm regards Kart

Re: SEARCH CRITERIA

2004-11-30 Thread Nader Henein
they probably create a list of similar results by doing some sort of data mining on the search criteria that people use in succession, so for example someone, or they have a list of searches that are too general (a search for the word kid is at best stupid) but you can't call your users stupid

GETVALUES +SEARCH

2004-11-30 Thread Karthik N S
Hi Guys Apologies. On Search API the command [ package org.apache.lucene.document.Document ] Will this'public final String[] getValues(String name)' return me all the docs with out looping thru ? Please Explaine with example. Thx in advance WITH WARM REGA

Re: What is the best file system for Lucene?

2004-11-30 Thread Pete Lewis
Hi Sanyi Could you try XP on your desktop - that would take some variables out. The problem is that you are comparing OS, as well as filesystems, as well as different hardware configs. Also, unless you take your hyperthreading off, with just one index you are searching with just one half of the

AW: What is the best file system for Lucene?

2004-11-30 Thread Wolf-Dietrich.Materna
Hello, Sanyi [mailto:[EMAIL PROTECTED] wrote: > I'm testing Lucene 1.4.2 on two very different configs, but > with the same index. > I'm very surprised by the results: Both systems are searching > at about the same speed, but I'd expect (and I really need) > to run Lucene a lot faster on my stro

Re: What is the best file system for Lucene?

2004-11-30 Thread John Haxby
Sanyi wrote: I'm testing Lucene 1.4.2 on two very different configs, but with the same index. I'm very surprised by the results: Both systems are searching at about the same speed, but I'd expect (and I really need) to run Lucene a lot faster on my stronger config. Config #1 (a notebook): WinXP P

Re: What is the best file system for Lucene?

2004-11-30 Thread Justin Swanhart
On Tue, 30 Nov 2004 12:07:46 -, Pete Lewis <[EMAIL PROTECTED]> wrote: > Also, unless you take your hyperthreading off, with just one index you are > searching with just one half of the CPU - so your desktop is actually using > a 1.5GHz CPU for the search. So, taking account of this its not too

Re: What is the best file system for Lucene?

2004-11-30 Thread Sanyi
> Could you try XP on your desktop Sure, but I'll only do that I run out of ideas. > so your desktop is actually using > a 1.5GHz CPU for the search. No, this is not true. It uses a 3.0GHz P4 then. (HT means that you have two 3.0GHz P4s) So, it is still surprising to me. Regards, Sanyi

Re: AW: What is the best file system for Lucene?

2004-11-30 Thread Sanyi
> The notebook is quite good, e.g. the Pentium-M might be faster than > your Pentium 4. At least it has a similar speed, because of it better > internal design. Never compare cpus of different types by their > frequency. Ok, this might be true, but: All of my other tests where the CPU is involve

Re: What is the best file system for Lucene?

2004-11-30 Thread Justin Swanhart
As a generalisation, SuSE itself is not a lot slower than Windows XP. I also very much doubt that filesystem is a factor. If you want to test w/out filesystem involvement, simply load your index into a RAMDirectory instead of using FSDirectory. That precludes filesystem overhead in searches. Th

Re: What is the best file system for Lucene?

2004-11-30 Thread Sanyi
> How large is the index? If it's less than a couple of GByte then it > will be entirely in memory It is 3GBytes big and it will grow a lot. I have to search from the HDD which is very fast compared to the notebook's HDD. Average seek time: Notebook: 8-9ms Desktop: 3.9ms Data read: Notebook:

Re: What is the best file system for Lucene?

2004-11-30 Thread Sanyi
> simply load your index into a > RAMDirectory instead of using FSDirectory. I have 3GByte RAM and my index is 3GByte big currently. (it'll be soon about 4GByte) So, I have to find out this another way. > First off, 1.8GHz Pentium-M machines are supposed to run at about the > speed of a 2.4GHz

similarity matrix - more clear

2004-11-30 Thread Roxana Angheluta
Dear all, Yesterday I've asked a question about geting the similarity matrix of a collection of documents from an index, but I got only one answer, so perhaps my question was not very clear. I will try to reformulate: I want to use Lucene to have efficient access to an index of a collection of

Re: GETVALUES +SEARCH

2004-11-30 Thread Erik Hatcher
On Nov 30, 2004, at 7:10 AM, Karthik N S wrote: On Search API the command [ package org.apache.lucene.document.Document ] Will this'public final String[] getValues(String name)' return me all the docs with out looping thru ? getValues(fieldName) returns a String[] of the values of the

Re: similarity matrix - more clear

2004-11-30 Thread Xiangyu Jin
I also have the same task as you do. According to my understanding, suppose their are N documents, your approach will take N^2 similarity calculations. Although there are N(N-1)/2 distinct document pairs, the similarity calculation (according to my understanding) in Lucene is asymmetric, so this

RE: What is the best file system for Lucene?

2004-11-30 Thread Armbrust, Daniel C.
You may want to give the IBM JVM a try - I've found it faster in some cases... http://www-106.ibm.com/developerworks/java/jdk/linux140/ Dan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PR

RE: What is the best file system for Lucene?

2004-11-30 Thread Armbrust, Daniel C.
As I understand hyperthreading, this is not true: >Also, unless you take your hyperthreading off, with just one index you are >searching with just one half of the CPU - so your desktop is actually using >a 1.5GHz CPU for the search. You still have the full speed of the processor available - the

Lucene's ranking function VS Standard VSM model

2004-11-30 Thread Xiangyu Jin
I have seen different versions of Lucene's ranking function from the similarity document and Lucene user list. Since I need to get document-doucment similaries, so what I do is to issue the document as query directly. I found it is different if we issue "computer computer" to Lucene vers we issu

Re: What is the best file system for Lucene?

2004-11-30 Thread Otis Gospodnetic
Hello, > Lucene indexing completes in 13-15 hours on the desktop system while > it completes in about 29-33 > hours on the notebook. > > Now, combine it with the DROP INDEX tests completing in the same > amount of time on both and find > out why is the search only slightly faster :) > > > Until

Re: similarity matrix - more clear

2004-11-30 Thread Otis Gospodnetic
Hello, I don't think Lucene can spit out the similarity matrix for you, but perhaps you can use Lucene's Term Vector support to help you build the matrix yourself: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/TermFreqVector.html The other relevant sections of the Lucene API t

Does Lucene perform ranking in the retrieved set?

2004-11-30 Thread Xiangyu Jin
THis might be a stupid question. When perform retrieval for a query, deos Lucene first get a subset of candidate matches and then perform the ranking on the set? That is, similarity calculation is performed only on a subset of the docuemnts to the query. If so, from which module could I get thos

Re: Does QueryParser uses Analyzer ?

2004-11-30 Thread Otis Gospodnetic
QueryParser does use Analyzer, see this: static public Query parse(String query, String field, Analyzer analyzer) throws ParseException { QueryParser parser = new QueryParser(field, analyzer); <<< return parser.parse(query); } Otis P.S. Use lucene-user list, please. --- R

Re: Does Lucene perform ranking in the retrieved set?

2004-11-30 Thread Paul Elschot
On Tuesday 30 November 2004 18:46, Xiangyu Jin wrote: > > THis might be a stupid question. > > When perform retrieval for a query, deos Lucene first get > a subset of candidate matches and then perform the ranking > on the set? That is, similarity calculation is performed only > on a subset of th

Re: What is the best file system for Lucene?

2004-11-30 Thread Sanyi
Thanx for the replies to you all. I was looking for someone with the same experiences as mine ones, but it seems that I'll have to test this myself. I'll try out my ideas and the most interesting ideas from you guys. Regards, Sanyi __ Do you Yah

literal search in quotes on non-tokenized field

2004-11-30 Thread Allen Atamer
Here is a problem I am experiencing with Lucene searches on non-tokenized fields: A search in quotes on a field named Build with the query "\"orig\"" does not work but the query "origi" yields 62 hits I have run indexing on the field with the following method doc.add(Field

Re: literal search in quotes on non-tokenized field

2004-11-30 Thread Erik Hatcher
On Nov 30, 2004, at 4:42 PM, Allen Atamer wrote: A search in quotes on a field named Build with the query "\"orig\"" does not work but the query "origi" yields 62 hits I have run indexing on the field with the following method doc.add(Field.Keyword(data.getColumnName(j),

Re: similarity matrix - more clear

2004-11-30 Thread Chris Hostetter
: A possible solution would be to initialize in turn each document as a : query, do a search using an IndexSearcher and to take from the search : result the similarity between the query (which is in fact a document) : and all the other documents. This is highly redundant, because the : similarity b

RE: literal search in quotes on non-tokenized field

2004-11-30 Thread Allen Atamer
Erik, > -Original Message- > > Here's a log of the parsed query before going to the searcher: > > > > Parsed query: (Build:"origi") for the first search > > Parsed query: (Build:origi) for the second search > > What do you mean by "parsed", since below you say you're not using > QueryPar

Re: literal search in quotes on non-tokenized field

2004-11-30 Thread Erik Hatcher
On Nov 30, 2004, at 6:01 PM, Allen Atamer wrote: It doesn't work that way. A TermQuery must match *exactly* what was indexed (either directly as a Keyword, or as tokens emitted from the analyzer). Since you're building the query up yourself from, I'm assuming, user input, you may need to pre-proc

RE: GETVALUES +SEARCH

2004-11-30 Thread Karthik N S
Hi Guys Apologies... Is there any API in Lucene Which can retrieve all the searched Values in single fetch into some sort of an 'Array' WITHOUT using this [ below ] Looping process [ This would make the Search and display more Faster ]. for (int i = 0; i < hits.lengt