updating solr server

2010-01-12 Thread Smith G
Hello All, I am trying to find a better approach ( perfomance wise ) to index documents. Document count is approximately a million+. First, I thought of writing multiple threads using CommonsHttpSolrServer to submit documents. But later I found out StreamingUpdateSolrServer, which

Re: updating solr server

2010-01-12 Thread Chantal Ackermann
2) Also, is CommonsHttpSolrServer thread safe? it is only if you initialize it with the MultiThreadedHttpConnectionManager: http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.html Cheers, Chantal

Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-12 Thread Markus Jelsma
Hello, I now believe that i really did misunderstand the problem and, unfortunately, i don't believe i can be of much assistance as i did not have to implement a similar problem. Cheers, - Markus Jelsma Buyways B.V. Technisch ArchitectFriesestraatweg 215c

Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-12 Thread Chantal Ackermann
Hi Kelly, ...the criteria for this hypothetical search involves multi-valued fields, where the index of one matching criteria needs to correspond to the same value in another multi-valued field in the same index. You can't do that... Just my two cents: By storing values in two different

Data Full Import Error

2010-01-12 Thread Lee Smith
Hi All I am trying to do a data import but I am getting the following error. INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 QTime=405 2010-01-12 03:08:08.576::WARN: Error for /solr/dataimport java.lang.OutOfMemoryError: Java heap space Jan 12, 2010 3:08:05 AM

complex query

2010-01-12 Thread Wangsheng Mei
Hi, ALL! I have two tables in database. t_article { title, content, author } t_friend { person_A, person_B } note that in t_friend is many-to-many relation。 When a logged-in user, search articles with a query word, 3 factors should be considered in. factor 1. relevency score

Re: complex query

2010-01-12 Thread Wangsheng Mei
I have considered building lucene index like: Document: { title, content, author, friends } Thus, author and friends are two seperate fields. so I can boost them seperately. The problem is, if a document's author is the logged-in user, it's uncessary to search the friends field, because it would

Re: complex query

2010-01-12 Thread Wangsheng Mei
2010/1/12 Wangsheng Mei hairr...@gmail.com I have considered building lucene index like: Document: { title, content, author, friends } Thus, author and friends are two seperate fields. so I can boost them seperately. The problem is, if a document's author is the logged-in user, it's

Re: Data Full Import Error

2010-01-12 Thread Noble Paul നോബിള്‍ नोब्ळ्
You need more memory to run dataimport. On Tue, Jan 12, 2010 at 4:46 PM, Lee Smith l...@weblee.co.uk wrote: Hi All I am trying to do a data import but I am getting the following error. INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 QTime=405 2010-01-12

Re: Data Full Import Error

2010-01-12 Thread Lee Smith
Thank you for your response. Will I just need to adjust the allowed memory in a config file or is this a server issue. ? Sorry I know nothing about Java. Hope you can advise ! On 12 Jan 2010, at 12:26, Noble Paul നോബിള്‍ नोब्ळ् wrote: You need more memory to run dataimport. On Tue, Jan

Re: Data Full Import Error

2010-01-12 Thread Noble Paul നോബിള്‍ नोब्ळ्
it is the way you start your solr server( -Xmx option) On Tue, Jan 12, 2010 at 6:00 PM, Lee Smith l...@weblee.co.uk wrote: Thank you for your response. Will I just need to adjust the allowed memory in a config file or is this a server issue. ? Sorry I know nothing about Java. Hope you

Filtering near-duplicates using TextProfileSignature

2010-01-12 Thread Andrew Clegg
Hi, I'm interested in near-dupe removal as mentioned (briefly) here: http://wiki.apache.org/solr/Deduplication However the link for TextProfileSignature hasn't been filled in yet. Does anyone have an example of using TextProfileSignature that demonstrates the tunable parameters mentioned in

Problem comitting on 40GB index

2010-01-12 Thread Frederico Azeiteiro
Hi all, I started working with solr about 1 month ago, and everything was running well both indexing as searching documents. I have a 40GB index with about 10 000 000 documents available. I index 3k docs for each 10m and commit after each insert. Since yesterday, I can't commit no articles to

Re: What is this error means?

2010-01-12 Thread Grant Ingersoll
Do you have a stack trace? On Jan 12, 2010, at 2:54 AM, Ellery Leung wrote: When I am building the index for around 2 ~ 25000 records, sometimes I came across with this error: Uncaught exception Exception with message '0' Status: Communication Error I search Google Yahoo

Re: Problem comitting on 40GB index

2010-01-12 Thread Erick Erickson
There are several possibilities: 1 you have some process holding open your indexes, probably other searchers. You *probably* are OK just committing new changes if there is exactly *one* searcher keeping your index open. If you have some process whereby you periodically open a

RE: update solr index

2010-01-12 Thread Marc Des Garets
I have 2 ways to update the index, either I use solrj using SolrEmbeddedServer or I do it with an http query. If I do it with an http query I indeed don't stop tomcat but I have to do some operations (mainly taking instance out of the cluster) and I can't automate this process when I can automate

Re: Filtering near-duplicates using TextProfileSignature

2010-01-12 Thread Erik Hatcher
On Jan 12, 2010, at 7:56 AM, Andrew Clegg wrote: I'm interested in near-dupe removal as mentioned (briefly) here: http://wiki.apache.org/solr/Deduplication However the link for TextProfileSignature hasn't been filled in yet. Does anyone have an example of using TextProfileSignature that

Re: Filtering near-duplicates using TextProfileSignature

2010-01-12 Thread Andrew Clegg
Thanks Erik, but I'm still a little confused as to exactly where in the Solr config I set these parameters. The example on the wiki page uses Lookup3Signature which (presumably) takes no parameters, so there's no indication in the XML examples of where you would set them. Unless I'm missing

RE: Problem comitting on 40GB index

2010-01-12 Thread Frederico Azeiteiro
Hi Erik, I'm a newbie to solr... By IR, you mean searcher? Is there a place where I can check the open searchers? And rebooting the machine shouldn't closed that searchers? Thanks, -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: terça-feira, 12 de

Re: Filtering near-duplicates using TextProfileSignature

2010-01-12 Thread Erik Hatcher
On Jan 12, 2010, at 9:15 AM, Andrew Clegg wrote: Thanks Erik, but I'm still a little confused as to exactly where in the Solr config I set these parameters. You'd configure them within the processor element, something like this: str name=minTokenLen5/str The example on the wiki

Deleting * and Re-index after schema change

2010-01-12 Thread Lee Smith
Am I doing this right. I have made changes to my schema so as per guide I done the following. Stopped the application Updated the Schema Re-Started Deleted the index folder Then ran a full import optimize command ie: /dataimport?command=full-importoptimize=true In the status it shows

Re: Deleting * and Re-index after schema change

2010-01-12 Thread Erik Hatcher
What does a search of *:* give you? As far as your steps, delete the index folder *before* restarting Solr, not after. That might be the issue. Erik On Jan 12, 2010, at 9:23 AM, Lee Smith wrote: Am I doing this right. I have made changes to my schema so as per guide I done the

complex multi valued fields

2010-01-12 Thread Adamsky, Robert
I have a document that has a multi-valued field where each value in the field itself is comprised of two values itself. Think of an invoice doc with multi value line items - each line item having quantity and product name. One option I see is to have a line item multi value field and when

Re: Deleting * and Re-index after schema change

2010-01-12 Thread Lee Smith
Hi Erik Done as suggested and still only showing 1 Document Doing a *:* give me 1 document Cant understand why ? On 12 Jan 2010, at 14:25, Erik Hatcher wrote: What does a search of *:* give you? As far as your steps, delete the index folder *before* restarting Solr, not after. That

Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-12 Thread Smiley, David W.
Kelly, This is a good question you have posed and illustrates a challenge with Solr's limited schema. I don't see how the dedup will help. I would continue with the SKU based approach and use this patch: https://issues.apache.org/jira/browse/SOLR-236 You'll collapse on the product id. My

Re: Filtering near-duplicates using TextProfileSignature

2010-01-12 Thread Andrew Clegg
Erik Hatcher-4 wrote: On Jan 12, 2010, at 9:15 AM, Andrew Clegg wrote: Thanks Erik, but I'm still a little confused as to exactly where in the Solr config I set these parameters. You'd configure them within the processor element, something like this: str

Re: Deleting * and Re-index after schema change

2010-01-12 Thread Lee Smith
Dont worry my bad. I made a mistake in my dataimport to all have the same ID ! All working now thank you On 12 Jan 2010, at 14:33, Lee Smith wrote: Hi Erik Done as suggested and still only showing 1 Document Doing a *:* give me 1 document Cant understand why ? On 12 Jan 2010, at

Re: Problem comitting on 40GB index

2010-01-12 Thread Erick Erickson
Rebooting the machine certainly closes the searchers, but depending upon how you shut it down there may be stale files After reboot (but before you start SOLR), how much space is on your disk? If it's 40G, you have no stale files Yes, IR is IndexReader, which is a searcher. I'll have to

Re: updating solr server

2010-01-12 Thread Yonik Seeley
On Tue, Jan 12, 2010 at 3:48 AM, Smith G gudumba.sm...@gmail.com wrote: Hello All,               I am trying to find a better approach ( perfomance wise ) to index documents. Document count is approximately a million+. First, I thought of writing multiple threads using CommonsHttpSolrServer

DataImportHandler - synchronous execution

2010-01-12 Thread Alexey Serba
Hi, I found that there's no explicit option to run DataImportHandler in a synchronous mode. I need that option to run DIH from SolrJ ( EmbeddedSolrServer ) in the same thread. Currently I pass dummy stream to DIH as a workaround for this, but I think it makes sense to add specific option for

RE: Problem comitting on 40GB index

2010-01-12 Thread Frederico Azeiteiro
I restarted the solr and stopped all searches. After that, the commit() was normal (2 secs) and it's been working for 3h without problems (indexing and a few searches too)... I haven't done any optimize yet, mainly because I had no deletes on the index and the performance is ok, so no need to

Re: updating solr server

2010-01-12 Thread Smith G
Hello , I am using add() method which receives Collection of SolrInputDocuments instead of add() which receives a single document. I am afraid, is sending a group of documents being called as batching in Solr terminology? . If yes, then I am doing it ( by including additional logic in

Re: updating solr server

2010-01-12 Thread Smiley, David W.
The beauty of StreamingUpdateSolrServer is that you don't have to worry about batch sizes; it streams them all. Just keep calling add() with one document and it'll get enqueued. You can pass a collection but there's no performance benefit. StreamingUpdateSolrServer can be configured to use

Re: updating solr server

2010-01-12 Thread Yonik Seeley
On Tue, Jan 12, 2010 at 1:09 PM, Smiley, David W. dsmi...@mitre.org wrote: The beauty of StreamingUpdateSolrServer is that you don't have to worry about batch sizes; it streams them all.  Just keep calling add() with one document and it'll get enqueued.  You can pass a collection but there's

What is the proper way to deploy Solr with a custom schema.xml that requires extra JARs?

2010-01-12 Thread Teruhiko Kurosaka
I have schema.xml that uses a Tokenizer that I wrote. I understand the standard way of deploying Solr is to place solr.war in webapps directory, have a separate directory that has conf files under its conf subdirectory, and specify that directory as Solr home dir via either JVM property or JNDI.

Re: help implementing a couple of business rules

2010-01-12 Thread Aleksander Stensby
For your first question, wouldn't it be possible to achieve that with some simple boolean logic? I mean, if you have a requirement to match any of the other fields AND description2, but not if it ONLY matches description 2: say matching x against field A, B, and description 2: ((A:x OR B:x) AND

Re: Yankee's Solr integration

2010-01-12 Thread Aleksander Stensby
They have probably added the logic for that server-side. Solr does not support these type of features, but they are easy to implement. Saving a search could be as easy as storing the selected query parameters. Then creating an alert (or RSS feed) for that would be a process on the server that

SF Bay Area Lucene Meetup Jan. 21st

2010-01-12 Thread Grant Ingersoll
There will be a San Francisco/Bay Area meetup on Jan. 21st at 7:15 PM at the Hacker Dojo (don't ask me...) location. RSVP and all the details are at http://www.meetup.com/SFBay-Lucene-Solr-Meetup/ Hope to see you there, Grant

Re: Problem comitting on 40GB index

2010-01-12 Thread Erick Erickson
You'll be able to get some valuable info by monitoring your free space on disk. If this occurs again, it'd help if you posted your your SOLR configuration and told us about any warmups you're doing... Of course, there are always gremlins... On Tue, Jan 12, 2010 at 12:36 PM, Frederico Azeiteiro

Re: Replication problem

2010-01-12 Thread Jason Rutherglen
There's a connect exception on the client, however I'd expect this to show up in the slave replication console (it's not). Is this correct behavior (i.e. not showing replication errors)? On Mon, Jan 11, 2010 at 9:50 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Yonik, I added startup

Re: Problem comitting on 40GB index

2010-01-12 Thread Chris Hostetter
: Subject: Problem comitting on 40GB index : In-Reply-To: 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98a...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message,

Re: Replication problem

2010-01-12 Thread Jason Rutherglen
Hmm...Even with the IP address in the master URL on the slave, the indexversion command to the master mysteriously doesn't show the latest commit... Totally freakin' bizarre! On Tue, Jan 12, 2010 at 10:53 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: There's a connect exception on the

NYC Search in the Cloud meetup: Jan 20

2010-01-12 Thread Otis Gospodnetic
Hello, If Search Engine Integration, Deployment and Scaling in the Cloud sounds interesting to you, and you are going to be in or near New York next Wednesday (Jan 20) evening: http://www.meetup.com/NYC-Search-and-Discovery/calendar/12238220/ Sorry for dupes to those of you subscribed to

Re: Problem comitting on 40GB index

2010-01-12 Thread Erick Erickson
Huh? On Tue, Jan 12, 2010 at 2:00 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Subject: Problem comitting on 40GB index : In-Reply-To: 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98a...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists

Re: Replication problem

2010-01-12 Thread Jason Rutherglen
It was having multiple replicateAfter values... Perhaps a bug, though I probably won't spend time investigating the why right now, nor reproducing in the test cases. On Tue, Jan 12, 2010 at 11:10 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Hmm...Even with the IP address in the master

NullPointerException in ReplicationHandler.postCommit + question about compression

2010-01-12 Thread Stephen Weiss
Hi Solr List, We're trying to set up java-based replication with Solr 1.4 (dist tarball). We are running this to start with on a pair of test servers just to see how things go. There's one major problem we can't seem to get past. When we replicate manually (via the admin page) things

Re: Replication problem

2010-01-12 Thread Yonik Seeley
On Tue, Jan 12, 2010 at 2:17 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: It was having multiple replicateAfter values... Perhaps a bug, though I probably won't spend time investigating the why right now, nor reproducing in the test cases. Do you mean that you changed the config and

Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-12 Thread Kelly Taylor
David, Thanks, and yes, I decided to travel that path last night (applying SOLR-236 patch) and plan to have some results by the end of the day; I'll post a summary. I read about field collapsing in your book last night. The book is an excellent resource by the way (shameless commendation

Re: Meaning of this error: Failure to meet condition(s) of required/prohibited clause(s)???

2010-01-12 Thread Chris Hostetter
: Subject: Meaning of this error: Failure to meet condition(s) of : required/prohibited clause(s)??? First of all: it's not an error -- it's a debuging statment generated when you asekd for an explanation of a document's score... : 0.0 = (NON-MATCH) Failure to meet condition(s) of

Re: Solr 1.4 Field collapsing - What are the steps for applying the SOLR-236 patch?

2010-01-12 Thread Martijn v Groningen
I wouldn't use the patches of the sub issues right now as they are under development right now (the are currently a POC). I also think that the latest patch in SOLR-236 is currently the best option. There are some memory related problems with the patch that have to do with caching. The

Re: What is the proper way to deploy Solr with a custom schema.xml that requires extra JARs?

2010-01-12 Thread Otis Gospodnetic
I can't put the extra JARs in the Solr home dir's lib subdir, can I? Why, this is indeed what you should do, Kuro. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Teruhiko Kurosaka k...@basistech.com To: solr-user@lucene.apache.org

Re: updating solr server

2010-01-12 Thread Smith G
Hello, Yeah, to be brief.. I wanted to read documents and update them simoultaneously with different threads. Main issue I considered is To call add / commit for how many documents, because I can not keep adding millions of documents one after another to StreamingUpdateSolrServer by just

Localsolr wt=json and fl compatible?

2010-01-12 Thread Brian Westphal
We've got Localsolr (2.9.1 lucene-spatial library) running on Solr 1.4 with Tomcat 1.6. Everything's looking good, except for a couple little issues. If we specify fl=id (or fl= anything) and wt=json it seems that the fl parameter is ignored (thus we get a lot more detail in our results than

Re: updating solr server

2010-01-12 Thread Yonik Seeley
On Tue, Jan 12, 2010 at 2:53 PM, Smith G gudumba.sm...@gmail.com wrote: 4) queuesize parameter of Streaming constructer: What could be the rough-value when it comes to real time application having a million+ documents to be indexed ? ..           So what does queuesize is exactly for ? , if we

LongField not stripping leading zeros

2010-01-12 Thread Kevin Osborn
This is in Solr 1.3. I have some text in our database in the form 0088698183939. The leading zeros are useless, but I want to able to search it with no leading zeros or several leading zeros. So, I decided to index this as a long, expecting it to just store it as a number. But, instead, I see

Re: LongField not stripping leading zeros

2010-01-12 Thread Chris Hostetter
: I have some text in our database in the form 0088698183939. The leading : zeros are useless, but I want to able to search it with no leading zeros : or several leading zeros. So, I decided to index this as a long, : expecting it to just store it as a number. But, instead, I see this in : the

Re: LongField not stripping leading zeros

2010-01-12 Thread Kevin Osborn
Thanks. Is there any performance penalty vs. LongField? I don't need to do any range queries on these value. I am basically treating them as numerical strings. I thought it would just be a shortcut to strip leading zeros, which I can easily do on my own.

Re: LongField not stripping leading zeros

2010-01-12 Thread Chris Hostetter
: Thanks. Is there any performance penalty vs. LongField? I don't need to The other ones do normalization by converting to a Long internally -- i have no idea if you would see some micro performance benefit in doing the 0 stripping yourself. Sorting a LongField should take less RAM then a

Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2010-01-12 Thread Lance Norskog
You can do this stripping in the DataImportHandler. You would have to write your own stripping code using regular expresssions. Also, the ExtractingRequestHandler strips out the html markup when you use it to index an html file: http://wiki.apache.org/solr/ExtractingRequestHandler On Mon, Jan

Re: Multi language support

2010-01-12 Thread Lance Norskog
There are a lot of projects that don't use stopwords any more. You might consider dropping them altogether. On Mon, Jan 11, 2010 at 2:25 PM, Don Werve d...@madwombat.com wrote: This is the way I've implemented multilingual search as well. 2010/1/11 Markus Jelsma mar...@buyways.nl Hello,

Re: EOF IOException Query

2010-01-12 Thread Lance Norskog
The index files are corrupted. You have to create index again from scratch. This should have reported CorruptIndexException. The code in handling index files does not catch all exceptions and wrap them as it should. On Mon, Jan 11, 2010 at 3:10 PM, Osborn Chan oc...@shutterfly.com wrote: Hi

Re: What is this error means?

2010-01-12 Thread Ellery Leung
Hi, here is the stack trace: br / Fatal error: Uncaught exception 'Exception' with message 'quot;0quot; Status: Communication Error' in C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Serv ice.php:385 Stack trace: #0 C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Service.php(652): Apache_Solr_Ser

Re: Multi language support

2010-01-12 Thread Robert Muir
I don't think this is something to consider across the board for all languages. The same grammatical units that are part of a word in one language (and removed by stemmers) are independent morphemes in others (and should be stopwords) so please take this advice on a case-by-case basis for each

Re: Multi language support

2010-01-12 Thread Robert Muir
sorry, i forgot to include this 2009 paper comparing what stopwords do across 3 languages: http://doc.rero.ch/lm.php?url=1000,43,4,20091218142456-GY/Dolamic_Ljiljana_-_When_Stopword_Lists_Make_the_Difference_20091218.pdf in my opinion, if stopwords annoy your users for very special cases like

Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-12 Thread Lance Norskog
Field Collapsing is what you want - this is a classic problem with retail store product indexing and everyone uses field collapsing. (That is, everyone who is willing to apply the patch on their own code.) Dedupe is completely the wrong word. Deduping is something else entirely - it is about

question about date boosting

2010-01-12 Thread Daniel Higginbotham
Hello, I'm trying to boost results based on date using the first example here:http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents However, I'm getting an error that reads, Can't use ms() function on non-numeric legacy date field The date field uses

Re: What is this error means?

2010-01-12 Thread Israel Ekpo
Ellery, A preliminary look at the source code indicates that the error is happening because the solr server is taking longer than expected to respond to the client http://code.google.com/p/solr-php-client/source/browse/trunk/Apache/Solr/Service.php The default time out handed down to

Need help Migrating to Solr

2010-01-12 Thread Abin Mathew
Hi I am new to the solr technology. We have been using lucene for handling searching in our web application www.toostep.com which is a knowledge sharing platform developed in java using Spring MVC architecture and iBatis as the persistance framework. Now that the application is getting very

Re: question about date boosting

2010-01-12 Thread Joe Calderon
I think you need to use the new trieDateField On 01/12/2010 07:06 PM, Daniel Higginbotham wrote: Hello, I'm trying to boost results based on date using the first example here:http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents However, I'm getting an

Re: Multi language support

2010-01-12 Thread Walter Underwood
There is a band named The The. And a producer named Don Was. For a list of all-stopword movie titles at Netflix, see this post: http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html My favorite is To Be and To Have (Être et Avoir), which is all stopwords in two languages.

Re: DataImportHandler - synchronous execution

2010-01-12 Thread Noble Paul നോബിള്‍ नोब्ळ्
it can be added On Tue, Jan 12, 2010 at 10:18 PM, Alexey Serba ase...@gmail.com wrote: Hi, I found that there's no explicit option to run DataImportHandler in a synchronous mode. I need that option to run DIH from SolrJ ( EmbeddedSolrServer ) in the same thread. Currently I pass dummy stream