RE: How to query for similar documents before indexing

2010-05-11 Thread Matthieu Labour
than an error or the usual status code and QTime.   Perhaps it would be a nice feature. On the other hand, you can also have a manual process that finds duplicates based on that signature and gather that information yourself as long as such a feature isn't there.     Cheers,   ---

RE: How to query for similar documents before indexing

2010-05-10 Thread Matthieu Labour
those documents on which i also need to check with the community tomorrow back at work ;-)     [1]: http://wiki.apache.org/solr/Deduplication   Cheers,   -Original message- From: Matthieu Labour Sent: Mon 10-05-2010 22:41 To: solr-user@lucene.apache.org; Subject: How to query for si

How to query for similar documents before indexing

2010-05-10 Thread Matthieu Labour
Hi I want to implement the following logic: Before I index a new document into the index, I want to check if there are already documents in the index with similar content to the content of the document about to be inserted. If the request returns 1 or more documents, then I don't want to inser

Re: replication issue

2010-03-04 Thread Matthieu Labour
the slave without any deletion happening on the master. Therefore I didn't see the SolrException in the slave log files and the replication worked Thank you --- On Tue, 3/2/10, Matthieu Labour wrote: From: Matthieu Labour Subject: Re: replication issue To: solr-user@lucene.apache.org

Re: replication issue

2010-03-02 Thread Matthieu Labour
Is there anything unusual about _7h0y.fdx? Does _7h0y.fdx still exist on the master when the replication fails? ... Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message ---- > From: Matthieu La

Re: replication issue

2010-03-02 Thread Matthieu Labour
One More information I deleted the index on the master and I restarted the master and restarted the slave and now the replication works Would it be possible that the replication doesn work well when started against an already existing big index ? Thank you --- On Tue, 3/2/10, Matthieu Labour

Re: replication issue

2010-03-02 Thread Matthieu Labour
The replication does not work for me I have a big master solr and I want to start replicating it. I can see that the slave is downloading data from the master... I see a directory index.20100302093000 gets created in data/ next to index... I can see its size growing but then the directory gets

Re: replication issue

2010-03-02 Thread Matthieu Labour
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)     at java.lang.Thread.run(Thread.java:595) --- On Tue, 3/2/10, Matthieu Labour wrote: From: Matthieu Labour Subject: Re: replication issue To: solr

Re: replication issue

2010-03-02 Thread Matthieu Labour
ir is located. I'm wondering if the symlink is causing the problem. Why don't you set the data dir as /raid/data instead of /solr/data On Sat, Feb 27, 2010 at 12:13 AM, Matthieu Labour wrote: > Hi > > I am still having issues with the replication and wonder if things are > wor

Re: replication issue

2010-03-01 Thread Matthieu Labour
: replication issue To: solr-user@lucene.apache.org Date: Friday, February 26, 2010, 2:06 PM On Sat, Feb 27, 2010 at 12:13 AM, Matthieu Labour wrote: > Hi > > I am still having issues with the replication and wonder if things are > working properly > > So I have 1 master and

Re: replication issue

2010-02-26 Thread Matthieu Labour
matt --- On Fri, 2/26/10, Shalin Shekhar Mangar wrote: From: Shalin Shekhar Mangar Subject: Re: replication issue To: solr-user@lucene.apache.org Date: Friday, February 26, 2010, 2:06 PM On Sat, Feb 27, 2010 at 12:13 AM, Matthieu Labour wrote: > Hi > > I am still having issues

replication issue

2010-02-26 Thread Matthieu Labour
Hi I am still having issues with the replication and wonder if things are working properly So I have 1 master and 1 slave On the slave, I deleted the data/index directory and data/replication.properties file and restarted solr. When slave is pulling data from master, I can see that the size o

replication. when the slave goes down...

2010-02-26 Thread Matthieu Labour
Hi I have 2 solr machine. 1 master, 1 slave replicating the index from the master The machine on which the slave is running went down while the replication was running I suppose the index must be corrupted. Can I safely remove the index on the slave and restart the slave and the slave will start

expire/delete documents

2010-02-12 Thread Matthieu Labour
HiIs there a way for solr or lucene to expire documents based on a field in a document. Let's say that I have a createTime field whose type is date, can i set a policy in schema.xml for solr to delete the documents older than X days?Thank you

Re: query on not stored field

2010-02-01 Thread Matthieu Labour
#x27;s stored, not store. But, to answer your question, the stored nature of the field has nothing whatsoever to do with it's searchability.  Stored only affects whether you can get that value back in the documents returned from a search, or not.     Erik On Feb 1, 2010, at 7:12

query on not stored field

2010-02-01 Thread Matthieu Labour
Hi on the following field [...] [...] the following query works {!lucene q.op=AND} [...] AND (status.message&STRING_ANALYZED_NO_US:(some keywords) AND [...] I was wondering If the query syntax above works as well if the store property of the field is set to NO. [...] [...] I ha

Re: Multiple Cores Vs. Single Core for the following use case

2010-01-28 Thread Matthieu Labour
er filter caching. > > Just my 2 cents > Amit > > On Wed, Jan 27, 2010 at 8:58 AM, Matthieu Labour > wrote: > > > Thanks Didier for your response > > And in your opinion, this should be as fast as if I would getCore(userId) > > -- provided that the cor

Re: Multiple Cores Vs. Single Core for the following use case

2010-01-27 Thread Matthieu Labour
Vs. Single Core for the following use case To: solr-user@lucene.apache.org Date: Wednesday, January 27, 2010, 10:52 AM On Wed, Jan 27, 2010 at 9:48 AM, Matthieu Labour wrote: > What I am trying to understand is the search/filter algorithm. If I have 1 > core with all documents and I  search

Re: Multiple Cores Vs. Single Core for the following use case

2010-01-27 Thread Matthieu Labour
ts > before you start worrying about it not being efficient. > > That being said, I really don't have any idea what your data looks like. > How many users do you have?  How many documents per user?  Are any > documents > shared by multiple users? > > -Trey > > >

Multiple Cores Vs. Single Core for the following use case

2010-01-26 Thread Matthieu Labour
Hi Shall I set up Multiple Core or Single core for the following use case: I have X number of users. When I do a search, I always know for which user I am doing a search Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and add a userId field to each document? If I

replication setup

2010-01-26 Thread Matthieu Labour
Hi I have set up replication following the wiki I downloaded the latest apache-solr-1.4 release and exploded it in 2 different directories I modified both solrconfig.xml for the master & the slave as described on the wiki page In both sirectory, I started solr from the example directory" exa

solr1.5

2010-01-26 Thread Matthieu Labour
Hi quick question: Is there any release date scheduled for solr 1.5 with all the wonderful patches (StreamingUpdateSolrServer etc ...)? Thank you !

CoreContainer / getCore and create ?

2010-01-22 Thread Matthieu Labour
Hi Would it make sense to modify/ add a method to CoreContainer that creates a core if the core doesn't exist ? something like public SolrCore getCore(String name) { synchronized(cores) { SolrCore core = cores.get(name); if (core != null) core.open(); // increment the re

Re: performance issue

2010-01-22 Thread Matthieu Labour
platform I/O Performance: High API name: c1.xlarge Thank you for your help matt On Thu, Jan 21, 2010 at 11:57 PM, Lance Norskog wrote: > Which version of Solr? Java? What garbage collection parameters? > > On Thu, Jan 21, 2010 at 1:03 PM, Matthieu Labour > wrote: > > Hi > >

Fwd: performance issue

2010-01-21 Thread Matthieu Labour
Hi I have been requested to look at a solr instance that has been patched with our own home grown patch to be able to handle 1000 cores on a solr instance The solr instance doesn't perform well. Within 12 hours, I can see the garbage collection taking a lot of time and query & update requests are

performance issue

2010-01-21 Thread Matthieu Labour
Hi I have been requested to look at a solr instance that has been patched with our own home grown patch to be able to handle 1000 cores on a solr instance The solr instance doesn't perform well. Within 12 hours, I can see the garbage collection taking a lot of time and query & update requests are

solr perf

2009-12-20 Thread Matthieu Labour
Hi I have a slr instance in which i created 700 core. 1 Core per user of my application. The total size of the data indexed on disk is 35GB with solr cores going from 100KB and few documents to 1.2GB and 50 000 documents. Searching seems very slow and indexing as well This is running on a EC2 xtra

Re: solr core size on disk

2009-12-17 Thread Matthieu Labour
is typically > in $SOLR_HOME/data/index > > On Thu, Dec 17, 2009 at 2:56 AM, Matthieu Labour > wrote: > > Hi > > I am new to solr. Here is my question: > > How to find out the size of a solr core

solr core size on disk

2009-12-16 Thread Matthieu Labour
Hi I am new to solr. Here is my question: How to find out the size of a solr core on disk ? Thank you matt