Re: Changing the Score of a Document.

2008-01-16 Thread Chris Hostetter
: In-Reply-To: <[EMAIL PROTECTED]> : Subject: Changing the Score of a Document. http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if y

Re: Nutch - Microsoft Search Server integration

2008-01-16 Thread Chris Hostetter
: Is it possible to integrate Nutch into MS Search Server via OpenSearch API? you'll probably find someone who can answer this question on the [EMAIL PROTECTED] mailing list. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTE

SV: Integrating dynamic data into Lucene search/ranking

2008-01-16 Thread Marcus Falk
We did this in our system, indexing a constant flow of news articles, by doing as Otis described (reopened the indexsearcher).. Every 3:d minute we are creating a new indexsearcher in the background after this searcher has been created we are fireing some warm up queries against it and after t

Re: Open source Arabic stemmer

2008-01-16 Thread Grant Ingersoll
Try searching this list for Arabic Stemmer. I seem to recall one under a GPL license. Also try Googling "arabic Lucene analyzer" -Grant On Jan 16, 2008, at 1:21 PM, Liaqat Ali wrote: Hi Kindly tell me about some open source Arabic Stemmer which can be used with Lucene. Regards, Liaqat

Re: How?

2008-01-16 Thread coolgeng coolgeng
A non-clustered and clustered index has resovle the problem, but Lucene can not do the same thing like that? On Jan 16, 2008 11:44 PM, <[EMAIL PROTECTED]> wrote: > > I can use the cluster index on the table. But you can create only one > > cluster index in a table. In this table , lots of data ne

Re: Integrating dynamic data into Lucene search/ranking

2008-01-16 Thread Tobias Lohr
The index contains about a several ten thousand documents, with a field count of about fifty. The index is going to be rebuild approx. every day, but varies, since the searchable content doesn't change very often. Now I face the challenge to work in more dynamic data into the index, and even ma

Re: Inverted search / Search on profilenet

2008-01-16 Thread Mark Miller
Couple ideas I guess... Rather than use queries (being so much more difficult) just make an index that contains documents that are just a list of keywords (representing a profilenet 'query'). Use the MoreLikeThis class from contrib to search that index using your source document. The hits you get

Re: Why there is no IndexWriter.deleteDocument(int docNum) method?

2008-01-16 Thread Yonik Seeley
On Jan 16, 2008 2:13 PM, Alexei Dets <[EMAIL PROTECTED]> wrote: > Hi! > Yonik Seeley wrote: > > On Jan 15, 2008 7:15 PM, Alexei Dets <[EMAIL PROTECTED]> wrote: > >> I'm curious, is there any particular reason why Lucene offers > >> IndexReader.deleteDocument(int docNum) but not > >> IndexWriter.del

Re: Why there is no IndexWriter.deleteDocument(int docNum) method?

2008-01-16 Thread Alexei Dets
Hi! Yonik Seeley wrote: > On Jan 15, 2008 7:15 PM, Alexei Dets <[EMAIL PROTECTED]> wrote: >> I'm curious, is there any particular reason why Lucene offers >> IndexReader.deleteDocument(int docNum) but not >> IndexWriter.deleteDocument(int docNum)? > > Document ids are transient and can change. I

SV: Inverted search / Search on profilenet

2008-01-16 Thread Marcus Falk
The norms are modded so each norm value is stored as 4 byte instead of 1 byte, this modification is using more memory. But anyway the hw we are running on are 2x 8 cpu hp servers with 16 gig ram in each of them. We are scaling the index on daterange (and the ranking is modified to sort by date)

Open source Arabic stemmer

2008-01-16 Thread Liaqat Ali
Hi Kindly tell me about some open source Arabic Stemmer which can be used with Lucene. Regards, Liaqat Ali

Re: Inverted search / Search on profilenet

2008-01-16 Thread Mark Miller
Don't have any info to add, but out of curiosity, what kind of setup are you using to host the 300 mil archive? Is the index distributed? Single machine? Solr? Thanks, Mark On Jan 16, 2008 12:27 PM, Marcus Falk <[EMAIL PROTECTED]> wrote: > Hi again, > > > > Today we are hosting a 300 million la

Re: NumberTools

2008-01-16 Thread mark harwood
Interesting question. Does zero-padding make primary key lookups faster or slower in lucene? From my tests it would seem that non-padded keys are quicker to lookup than zero-padded ones (tested doing random access on indexes of varying sizes up to 5m unique keys). However I imagine there could

Inverted search / Search on profilenet

2008-01-16 Thread Marcus Falk
Hi again, Today we are hosting a 300 million large search index without any problems in a lucene environment, with just some customization in the lucene api for ranking etc... So we are really satisfied with lucene. We also have the demands to search with documents on profiles we are

Re: Lucene + Hadoop

2008-01-16 Thread Andrzej Bialecki
David Vazquez Landa wrote: Uhmm... A simple question: I have a lucene index (the directory with the segment* files) in HDFS. This index is created by Nutch (who acesses files in HDFS seamlessly). My question is if there is a way of reading this Lucene Index without having to copy it to the local

Lucene + Hadoop

2008-01-16 Thread David Vazquez Landa
Uhmm... A simple question: I have a lucene index (the directory with the segment* files) in HDFS. This index is created by Nutch (who acesses files in HDFS seamlessly). My question is if there is a way of reading this Lucene Index without having to copy it to the local filesystem first... Thanks

RE: How?

2008-01-16 Thread spring
> I can use the cluster index on the table. But you can create only one > cluster index in a table. In this table , lots of data need > to search, so I > choose the Lucene to do that. Why do you need a clustered index in the database? A non-clustered would do the job as well. --

NumberTools

2008-01-16 Thread Cam Bazz
Hello, When storing fields to serve as id's - is it better to use NumberTools.longToString(id) or just store the id as a field? I have noticed when using NumberTools to store number as a string, this makes range queries easier, however - you end up storing a long string. Considering millions of id

Re: How?

2008-01-16 Thread coolgeng coolgeng
I can use the cluster index on the table. But you can create only one cluster index in a table. In this table , lots of data need to search, so I choose the Lucene to do that. On Jan 16, 2008 6:57 PM, <[EMAIL PROTECTED]> wrote: > > firstly, I submit the query like "select * from [tablename]". >

Re: How?

2008-01-16 Thread Erick Erickson
As I read your latest post, it's not *searching* that's taking too long, but *indexing*. Well, 100,000,000 rows is a lot. It'll never be just a few minutes. But I also have to ask whether the most time is being spent actually indexing or fetching from the database? You could time this easily by ju

Re: constructing query from string

2008-01-16 Thread Erick Erickson
As I remember from various threads, toString is more of a debugging aid and you cannot completely rely on the transformation from a parsed query -> tostring -> parsed query to be reliable. But this is "something I remember", so take it with a grain of salt (you might want to search the mail archive

IndexWriter#addIndexes

2008-01-16 Thread spring
Hi, looking into the code of IndexMergeTool I saw this: IndexWriter writer = new IndexWriter(mergedIndex, new SimpleAnalyzer(), true); Then the indexes are added to this new index. My question is: How does the Analyzer of this IndexWriter instance effect the merge process? It seems that is do

RE: How?

2008-01-16 Thread spring
> firstly, I submit the query like "select * from [tablename]". > And in this > table, there are around 30th columns and 40,000 rows data. > And I use the > standrandAnalyzer to generate the index. Why don't you use a database index? -

Changing the Score of a Document.

2008-01-16 Thread Benjamin Sznajder
Hi I am sure that this topic has been once discusses on this forum, so, sorry to ask again! Let's suppose a Document d1 containing the five terms: a b C D and a query (a AND b). The document d1 is relevant and will be retrieved and typically, its score will be a function tf*idf relative to the

constructing query from string

2008-01-16 Thread prabin meitei
Hi , I want to construct a query from string. how can I do it?? Actually i saved a query(a boolean query) as string (using query.toString()). Is there a way to reconstruct the query from the string i saved? How can i add more clauses to the reconstructed query? Thanks in advance. Prabin

Re: How?

2008-01-16 Thread coolgeng coolgeng
firstly, I submit the query like "select * from [tablename]". And in this table, there are around 30th columns and 40,000 rows data. And I use the standrandAnalyzer to generate the index. And as my experience, it cost 200M disk to store the index. for example, I will search the "Name" field in t

Re: lucene as a graph store

2008-01-16 Thread Cam Bazz
Hello, Not exactly, a document represents an edge, having src and dst node its. Nodes can be kept in another index or the same one. I can find number of edges by running a boolean term query. Currently I am looking for a way to distribute indexes, but in such a way that when querying you know whic

Re: IndexWriter.deleteDocument()

2008-01-16 Thread Karl Wettin
15 jan 2008 kl. 20.31 skrev Michael Prichard: When I run through and delete a few documents from my index, is it wise to call .flush() afterwards? Or is it better to close the index? Close means flush, but also releasing the write lock. What to usereally depends on how your service is im