Re: Strategy for handling large (and growing) index: horizontal partitioning?

2008-03-03 Thread Kevin Lewandowski
How many documents are in the index? If you haven't already done this I'd take a really close look at your schema and make sure you're only storing the things that should really be stored, same with the indexed fields. I drastically reduced my index size just by changing some indexed/stored option

Re: acts_as_solr

2006-08-30 Thread Kevin Lewandowski
You might want to look at acts_as_searchable for Ruby: http://rubyforge.org/projects/ar-searchable That's a similar plugin for the Hyperestraier search engine using its REST interface. On 8/28/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: I've spent a few hours tinkering with an Ruby ActiveRecord

Solr now used on Discogs.com

2006-09-06 Thread Kevin Lewandowski
I just wanted to say thanks to the Solr developers. I'm now using Solr for the main search engine on Discogs.com. I've been through five revisions of the search engine and this was definitely the least painful. Solr gives me the power of Lucene without having to deal with the guts. It made for a

Re: Solr now used on Discogs.com

2006-09-06 Thread Kevin Lewandowski
"Main search engine" would be the search feature, but not browsing/category listing? That's correct, just the search function, though I'm looking into using Solr for other types of browsing. Are you using Solr for all data storage and search? Or a RDBMS? If so, what is the split? All data

Re: Solr now used on Discogs.com

2006-09-06 Thread Kevin Lewandowski
if i may ask: did you customize the Solr code at all (ie: are you using any custom request handlers, field types or your own Similarity class) ? ... if not, which request handler are you using (Standard or DisMax) ? I'm using the Solr from the nightly build, with Standard request handler, and ha

Re: Solr now used on Discogs.com

2006-09-06 Thread Kevin Lewandowski
this was all just config file changes though right, you didn't need to write any new javacode to load into solr to make those work did you? That's right. It was all config changes and no new java code, which is a plus since I've never coded in java :) Kevin

Re: Simple Faceted Searching out of the box

2006-09-12 Thread Kevin Lewandowski
Is it possible that the facets can be based on the contents of an entire field instead of the terms? For example say I have a document with this field: Hip Hop A facet query on the genre field returns: 1 1 but I'd like it to return: 1 thanks, Kevin

Can't get q.op working

2006-09-26 Thread Kevin Lewandowski
I'm running the latest nightly build (2006-09-27) and cannot seem to get the q.op parameter working. I have the default operator set to AND and am testing with a two word query that returns no results. If I add "OR" to the query I get results. But if I remove the OR and add "q.op=OR" to the Solr q

Re: Can't get q.op working

2006-09-27 Thread Kevin Lewandowski
with AND and three results with OR. I recommend you try this same scenario out with the tutorial example data and ensure things work as I've stated here. Let us know more details if the problem persists. Erik On Sep 26, 2006, at 11:02 PM, Kevin Lewandowski wrote: > I'm

How much ram can Solr use?

2006-09-27 Thread Kevin Lewandowski
On the performace wiki page it mentions a test box with 16GB ram. Did anything special need to be done to use that much ram (with the OS or java)? Would Solr on a system with Linux x86_64 and Tomcat be able to use that much ram? (sorry, I don't know Java so I don't know if there are any limitation

Re: Solr use case

2006-10-11 Thread Kevin Lewandowski
No, after you add new documents you simply issue a command and the new docs are searchable. On Discogs.com we have just over 1 million docs in the index and do about 20,000 updates per day. Every 15 minutes we read a queue and add new documents, then commit. And we optimize once per day. I've ha

Re: Couple of problems

2006-10-11 Thread Kevin Lewandowski
I've had a problem similar to this and it was because of the schema.xml. It was valid XML but there were some incorrect field definitions and/or the default field listed was not a defined field. I'd suggest you start with the default schema and build on it piece by piece, each time testing for th

QTime field in response XML

2006-10-11 Thread Kevin Lewandowski
I've searched the docs but could not find an answer. Is this field microseconds or milliseconds? thanks, Kevin

Can't add new data or optimize

2006-10-30 Thread Kevin Lewandowski
I'm no longer able to add new data or optimize my index. There are currently 1600 files in the index directory and it's about 1.1gb. I've tried changing solrconfig.xml to use the compound file format and that didn't make a difference. My ulimit is "unlimited" but I've tried setting it at 100 a

Re: Spellchecker in Solr?

2006-10-30 Thread Kevin Lewandowski
I have not done one but have been planning to do it based on this article: http://today.java.net/pub/a/today/2005/08/09/didyoumean.html With Solr it would be much simpler than the java examples they give. On 10/30/06, Michael Imbeault <[EMAIL PROTECTED]> wrote: Hello everyone, Has anybody succ

Re: Can't add new data or optimize

2006-10-30 Thread Kevin Lewandowski
Thanks for the help! I the problem was I was not using "ulimit -n". It's back to normal now. thanks, Kevin On 10/30/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 10/30/06, Kevin Lewandowski <[EMAIL PROTECTED]> wrote: > I'm no longer able to add new d

Re: Spellchecker in Solr?

2006-10-30 Thread Kevin Lewandowski
I had the very same article in mind - how would it be simpler in Solr than in Lucene? A spellchecker is pretty much standard in every major I meant it would be a simpler implementation in Solr because you don't have to deal with java or any Lucene API's. You just create a document for each "corr

Re: Solr Benchmarks

2006-11-06 Thread Kevin Lewandowski
I've been using Solr for keyword search on Discogs.com for a few months with great results. As of today Solr is running under Tomcat on a single dedicated box. It's a 2.66Ghz P4, with 1 gig ram. The index has about 1.2 million documents and is 1.2 gigs in size. This machine handles 250,000 querie

Minimum time between distributions

2006-11-21 Thread Kevin Lewandowski
On Discogs I'm running Solr with two slaves and one master, using the distribution scripts. The slaves pull and install a new snapshot every five minutes and this is working very well so far. Are there any risks with reducing this window to every one or two minutes? With large caches could the au

Re: Cache stats

2006-11-29 Thread Kevin Lewandowski
In the admin interface, if you click statistics, there's a cache section. On 11/29/06, Tom <[EMAIL PROTECTED]> wrote: Hi - I'm starting to try to tune my installation a bit, and I'm looking for cache statistics. Is there a way to peek into a running installation, and see what my cache stats are

solr/tomcat stops responding

2006-11-30 Thread Kevin Lewandowski
My solr installation has been running fine for a few weeks but now after a server reboot it starts and runs for a few seconds, then stops responding. I don't see any errors in the logfiles, apart from snapinstaller not being able to issue a commit. Also, the process is using 100% cpu and stops res

Re: solr/tomcat stops responding

2006-12-01 Thread Kevin Lewandowski
> My solr installation has been running fine for a few weeks but now > after a server reboot it starts and runs for a few seconds, then stops > responding. I don't see any errors in the logfiles, apart from > snapinstaller not being able to issue a commit. Also, the process is > using 100% cpu and

Re: solr/tomcat stops responding

2006-12-02 Thread Kevin Lewandowski
accept connections for 3 or 4 hours ... did you try taking some thread dumps like yonik suggested to see what all the threads were doing? A kill -3 will not kill the process. It does nothing and there's no thread dump on the console. kill -9 does kill it though. btw, this has been a bigger prob

Re: solr/tomcat stops responding

2006-12-03 Thread Kevin Lewandowski
Hmmm, on most Linux/UNIX systems, sending the QUIT signal does nothing else but generate a stack trace to the console or a log file. If you don't start tomcat by hand, the stack trace may go somewhere else I suppose. This would be useful to learn how to do on your particular system (and we shoul

Re: solr/tomcat stops responding

2006-12-04 Thread Kevin Lewandowski
OK, this may fix it: https://issues.apache.org/jira/browse/SOLR-77 A war with this patch included is here: http://people.apache.org/~yonik/solr/current/solr.war You also need to configure some queries to be done on the firstSearcher event in solrconfig.xml. Uncomment and customize the example o

Re: Happy Solstice

2006-12-22 Thread Kevin Lewandowski
Yes! There's no shortage of puns when using solr. We're always talking about "creating a solr system" or "one of the solr systems is down" :) On 12/21/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: It's all about "sol(a)r", ya know? More day light, please!

Re: replication

2007-01-23 Thread Kevin Lewandowski
This should explain most everything: http://wiki.apache.org/solr/CollectionDistribution I've been running solr replication on discogs.com for a few months and it works great! Kevin On 1/23/07, S Edirisinghe <[EMAIL PROTECTED]> wrote: Hi, I just started looking into solr. I like the features t

Re: Incremental replication...

2007-02-14 Thread Kevin Lewandowski
snapshooter copies all files but most files in the snapshot directories are hard links pointing to segments in the main index directory. So only new segments end up getting copied. We've been running replication on discogs.com for several months and it works great. On 2/13/07, escher2k <[EMAIL P

Re: Thank you...

2007-03-22 Thread Kevin Lewandowski
Thanks for sharing the info, Cass. Is eBay still using Texis? (this used to be obvious from eBay's URLs a few years ago). I used Texis with their Vortex script before Lucene was born. I'd guess no. I read a PDF about ebay's architecture a few months ago and it said all of the search stuff wa

Re: Facet Browsing

2007-04-19 Thread Kevin Lewandowski
I recommend you build your query with facet options in raw format and make sure you're getting back the data you want. Then build it into your app. On 4/18/07, Jennifer Seaman <[EMAIL PROTECTED]> wrote: Does anyone have any sample code (php, perl, etc) how to setup facet browsing with paging? I

Re: Snapshooting or replicating recently indexed data

2007-04-21 Thread Kevin Lewandowski
snapshooter does create incremental builds of the index. It doesn't appear so if you look at the contents because the existing files are hard links. But it is incremental. On 4/20/07, Doss <[EMAIL PROTECTED]> wrote: Hi Yonik, Thanks for your quick response, my question is this, can we take incr

snappuller copying to wrong directory?

2007-07-12 Thread Kevin Lewandowski
I've been running solr replication for several months with no issues but recently had an instance where snappuller was running for about 1.5 hours. rsync was still active, so it was still copying data. I also noticed that there was a snapshot.200707 directory inside of the main index directory

Re: snappuller copying to wrong directory?

2007-07-13 Thread Kevin Lewandowski
data_dir set up correctly in conf/scripts.conf? That's where snappuller puts the snapshots. Bill On 7/12/07, Kevin Lewandowski <[EMAIL PROTECTED]> wrote: > > I've been running solr replication for several months with no issues > but recently had an instance where snappuller was

index size

2007-08-17 Thread Kevin Lewandowski
Are there any tips on reducing the index size or what factors most impact index size? My index has 2.7 million documents and is 200 gigabytes and growing. Most documents are around 2-3kb and there are about 30 indexed fields. thanks, Kevin

strange sorting problem

2007-10-05 Thread Kevin Lewandowski
I'm having a problem with sorting on a certain field. In my schema.xml it's defined as a string (not analyzed, indexed/stored verbatim). But when I look at my results (sorted on that field ascending) I get things like the following: Yr City's A Sucker Movement b/w Yr City's A Sucker X, Y & Sometim

Re: strange sorting problem

2007-10-05 Thread Kevin Lewandowski
on and > score if the XML response is really big) > > > : Date: Fri, 5 Oct 2007 11:21:48 -0700 > : From: Kevin Lewandowski <[EMAIL PROTECTED]> > : Reply-To: solr-user@lucene.apache.org > : To: solr-user@lucene.apache.org > : Subject: strange sorting problem > : &g

Re: index size

2007-10-09 Thread Kevin Lewandowski
ize now. Kevin On 8/20/07, Mike Klaas <[EMAIL PROTECTED]> wrote: > > On 17-Aug-07, at 2:03 PM, Kevin Lewandowski wrote: > > > Are there any tips on reducing the index size or what factors most > > impact index size? > > > > My index has 2.7 million documents an

Re: index size

2007-10-11 Thread Kevin Lewandowski
index created by nutch so small in comparison (about 27 mb > approx) but it still returns snippets! Are you storing the complete html? If so I think you should strip out the html then index the document. > > On 10/9/07, Kevin Lewandowski <[EMAIL PROTECTED]> wrote: > >

solr not finding all results

2007-10-12 Thread Kevin Lewandowski
I've found an odd situation where solr is not returning all of the documents that I think it should. A search for "Geckoplp4-M" returns 3 documents but I know that there are at least 100 documents with that string. Here is an example query for that phrase and the result set: http://localhost:9020/

Re: solr not finding all results

2007-10-12 Thread Kevin Lewandowski
Sorry, I've figured out my own problem. There is a problem with the way I create the xml document for indexing that was causing some of the "comments" fields to not be listed correctly in the default search field, "content". On 10/12/07, Kevin Lewandowski <[EMAIL PRO