Re: Improving Solr performance

2011-01-10 Thread mike anderson
Not sure if this was mentioned yet, but if you are doing slave/master replication you'll need 2x the RAM at replication time. Just something to keep in mind. -mike On Mon, Jan 10, 2011 at 5:01 PM, Toke Eskildsen wrote: > On Mon, 2011-01-10 at 21:43 +0100, Paul wrote: > > > I see from your other

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-21 Thread mike anderson
[x] ASF Mirrors (linked in our release announcements or via the Lucene website) [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [x] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project) On

Re: Multicore boosting to only 1 core

2011-02-15 Thread mike anderson
Could you make an additional date field, call it date_boost, that gets populated in all of the cores EXCEPT the one with the newest articles, and then boost on this field? Then when you move articles from the 'newest' core to the rest of the cores you copy over the date to the date_boost field. (I

spellcheck component in 1.4 distributed

2009-08-07 Thread mike anderson
I am e-mailing to inquire about the status of the spellchecking component in 1.4 (distributed). I saw SOLR-785, but it is unreleased and for 1.5. Any help would be much appreciated. Thanks in advance, Mike

Re: How to use key with facet.prefix?

2009-08-08 Thread mike anderson
Hi all, I am e-mailing to inquire about the status of the spellchecking component in 1.4 (distributed). I saw SOLR-785, but it is unreleased and appears to be for 1.5. Any help would be much appreciated. Thanks in advance, Mike (sorry if this sent twice)

Re: How to use key with facet.prefix?

2009-08-08 Thread mike anderson
whoops, sorry guys On Sat, Aug 8, 2009 at 12:37 PM, mike anderson wrote: > Hi all, > > I am e-mailing to inquire about the status of the spellchecking component > in 1.4 (distributed). I saw SOLR-785, but it is unreleased and appears to be > for 1.5. Any help would be much apprec

spellcheck component in 1.4 distributed

2009-08-08 Thread mike anderson
Hi all, I am e-mailing to inquire about the status of the spellchecking component in 1.4 (distributed). I saw SOLR-785, but it is unreleased and appears to be for 1.5. Any help would be much appreciated. Thanks in advance, Mike

ruby client and building spell check dictionary

2009-08-14 Thread Mike Anderson
I set up the spell check component with this code in the config file: > textSpell titleCheck solr.IndexBasedSpellChecker dictionary 0.7 which works great. I can build the dictionary from my browser "q="foo"&spellcheck.build=true&spellcheck.name=titleCheck" and I can also r

MoreLikeThis (MLT) in 1.4 distributed

2009-08-18 Thread mike anderson
I'm trying to get MLT working in 1.4 distributed mode. I was hoping the patch *SOLR-788 *would do the trick, but after applying the patch by hand to revision 737810 (it kept choking on component/MoreLikeThisComponent.java) I still get nothing. The URL I am using is this: http://localhost:8983/solr

Re: MoreLikeThis (MLT) in 1.4 distributed

2009-08-18 Thread mike anderson
jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Aug 18, 2009 12:12:08 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/so

Re: MoreLikeThis (MLT) in 1.4 distributed

2009-08-18 Thread mike anderson
: Any advice to this end? My query is still the same: http://localhost:8983/solr/select?q=graph%20theory&mlt=true&mlt.fl=abstract&mlt.mindf=1&mlt.mintf=1&shards=localhost:8983/solr thanks in advance, Mike On Tue, Aug 18, 2009 at 12:18 PM, mike anderson wrote: > There d

stopfilterFactory isn't removing field name

2009-09-13 Thread mike anderson
I'm kind of stumped by this one.. is it something obvious? I'm running the latest trunk. In some cases the stopFilterFactory isn't removing the field name. Thanks in advance, -mike >From debugQuery (both words are in the stopwords file): http://localhost:8983/solr/select?q=citations:for&debugQu

Re: stopfilterFactory isn't removing field name

2009-09-13 Thread mike anderson
ike to know what the problem is. -mike On Mon, Sep 14, 2009 at 1:10 AM, Yonik Seeley wrote: > That's pretty strange... perhaps something to do with your synonyms > file mapping "for" to a zero length token? > > -Yonik > http://www.lucidimagination.com > >

Re: stopfilterFactory isn't removing field name

2009-09-15 Thread mike anderson
Could this be related to SOLR-1423? On Mon, Sep 14, 2009 at 8:51 AM, Yonik Seeley wrote: > Thanks, I'll see if I can reproduce... > > -Yonik > http://www.lucidimagination.com > > On Mon, Sep 14, 2009 at 2:10 AM, mike anderson > wrote: > > Yeah.. that was weird

benchmarking tools

2009-10-27 Thread Mike Anderson
configurations. If anybody has some insight into this kind of project I'd love to get some feedback. Thanks in advance, Mike Anderson

Re: benchmarking tools

2009-10-28 Thread mike anderson
would also look at java.net's Faban benchmarking > framework. We use it extensively for our acceptance tests and tuning > excercises. > > Joshua > > On Oct 27, 2009, at 1:59 PM, Mike Anderson wrote: > > > I've been making modifications here and there to the

field queries seem slow

2009-11-02 Thread mike anderson
I took a look through my Solr logs this weekend and noticed that the longest queries were on particular fields, like "author:albert einstein". Is this a result consistent with other setups out there? If not, Is there a trick to make these go faster? I've read up on filter queries and use those when

Re: apply a patch on solr

2009-11-02 Thread mike anderson
You can see what revision the patch was written for at the top of the patch, it will look like this: Index: org/apache/solr/handler/MoreLikeThisHandler.java === --- org/apache/solr/handler/MoreLikeThisHandler.java (revision 772437) ++

Re: field queries seem slow

2009-11-04 Thread mike anderson
rick Erickson > wrote: > > H, are you sorting? And has your readers been reopened? Is the > > second query of that sort also slow? If the answer to this last question > is > > "no", > > have you tried some autowarming queries? > > > > Best > >

atypical MLT use-case

2009-12-09 Thread Mike Anderson
This is somewhat of an odd use-case for MLT. Basically I'm using it for near-duplicate detection (I'm not using the built in dup detection for a variety of reasons). While this might sound like an okay idea, the problem lies in the order of which things happen. Ideally, duplicate detection would pr

content stream/MLT

2009-12-09 Thread Mike Anderson
I'm trying to understand how content stream works with respect to MLT. I did a regular MLT query using a document ID and specifying two fields to do MLT on and got back a set of results. I then copied the xml for the document with the aforementioned ID and pasted it to a text file. Then I made the

Re: SolrCloud in production?

2010-08-01 Thread mike anderson
I'd second the request for more information on the current state of SolrCloud. I have a 16 shard Solr setup in production running 1.3, and a lot of the features of SolrCloud would make my life a lot easier. Cheers, Mike On Sat, Jul 24, 2010 at 12:52 PM, Dennis Gearon wrote: > Boy, if it does wha

Re: How to retrieve the full corpus

2010-09-06 Thread mike anderson
You might check out Luke, the Lucene Index Toolbox. http://www.getopt.org/luke/ I know you can browse the index and get frequency counts, though I'm not sure if you can export the entire index as a list like what you're looking for. Hope this helps, Mike On Mon, Sep 6, 2010 at 10:52 AM, Roland

upgrade index from 2.9 to 3.x

2010-09-24 Thread mike anderson
What is the right way to upgrade a solr index from Lucene 2.9.1 to 3.x. I'm getting the exception: SEVERE: java.lang.RuntimeException: org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported in file '_aw5w.fdx': 1 (needs to be between 2 and 2). This version of Lucene on

Re: upgrade index from 2.9 to 3.x

2010-09-24 Thread mike anderson
p 24, 2010 at 10:33 AM, Markus Jelsma wrote: > There is a recent thread on this one > http://www.mail-archive.com/solr-user@lucene.apache.org/msg40491.html > > > On Friday 24 September 2010 16:30:36 mike anderson wrote: > > What is the right way to upgrade a solr index

phrase query with autosuggest (SOLR-1316)

2010-10-06 Thread mike anderson
It seemed like SOLR-1316 was a little too long to continue the conversation. Is there support for quotes indicating a phrase query. For example, my autosuggest query for "mike sha" ought to return "mike shaffer", "mike sharp", etc. Instead I get suggestions for "mike" and for "sha", resulting in a

how well does multicore scale?

2010-10-21 Thread mike anderson
I'm exploring the possibility of using cores as a solution to "bookmark folders" in my solr application. This would mean I'll need tens of thousands of cores... does this seem reasonable? I have plenty of CPUs available for scaling, but I wonder about the memory overhead of adding cores (aside from

Re: how well does multicore scale?

2010-10-22 Thread mike anderson
en fetching data. > > Many times this is probably the case - pro's and con's to each depending > on what you are up to. > > - Mark > lucidimagination.com > > > > > On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind > wrote: > >> No, it does n

Re: how well does multicore scale?

2010-10-26 Thread mike anderson
e Norskog wrote: > http://wiki.apache.org/solr/CoreAdmin > > Since Solr 1.3 > > On Fri, Oct 22, 2010 at 1:40 PM, mike anderson > wrote: > > Thanks for the advice, everyone. I'll take a look at the API mentioned > and > > do some benchmarking over the weekend. > &g

Re: how well does multicore scale?

2010-10-27 Thread mike anderson
wer. -mike On Tue, Oct 26, 2010 at 10:15 AM, Jonathan Rochkind wrote: > mike anderson wrote: > >> I'm really curious if there is a clever solution to the obvious problem >> with: "So your better off using a single index and with a user id and use >> a query filter

Re: how well does multicore scale?

2010-10-27 Thread mike anderson
Wed, 2010-10-27 at 14:20 +0200, mike anderson wrote: > > [...] By my simple math, this would mean that if we want each shard's > > index to be able to fit in memory, [...] > > Might I ask why you're planning on using memory-based sharding? The > performance gap betwee

Re: Improving Solr performance

2011-01-07 Thread mike anderson
Making sure the index can fit in memory (you don't have to allocate that much to Solr, just make sure it's available to the OS so it can cache it -- otherwise you are paging the hard drive, which is why you are probably IO bound) has been the key to our performance. We recently opted to use less RA

MLT calculation

2009-12-16 Thread Mike Anderson
How exactly is MLT calculated? I'm trying to gain an intuition for it by tweaking the parameters MLT.qf, MLT.mintf, and MLT.mindf (mostly the former, changing boosts), but so far it's a bit counter intuitive. How does MLT.boost play in? If anybody could point me to a technical description (equatio

Re: Lock problems: Lock obtain timed out

2010-01-25 Thread mike anderson
I am getting this exception as well, but disk space is not my problem. What else can I do to debug this? The solr log doesn't appear to lend any other clues.. Jan 25, 2010 4:02:22 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update params={} status=500 QTime=1990 Jan 25, 20

Re: solr application for website crawling and indexing html, pdf, word, ... files

2010-01-25 Thread mike anderson
I think you might be looking for Apache Tika. On Mon, Jan 25, 2010 at 3:55 PM, Frank van Lingen wrote: > I recently started working with solr and find it easy to setup and tinker > with. > > I now want to scale up my setup and was wondering if there is an > application/component that can do the

Re: Best OCR API for solr

2010-02-04 Thread mike anderson
There might be an OCR plugin for Apache Tika (which does exactly this out of the box except for OCR capability, i believe). http://lucene.apache.org/tika/ -mike 2010/2/4 Krantiā„¢ K K Parisa > Hi, > > Can anyone list the best OCR APIs available to use in combination with > SOLR. > > The idea is

solr-ruby with clustering

2010-03-22 Thread mike anderson
Has anybody got solr-ruby to return a clustering result? (using the clustering component) I'm almost certain the query is correct (I check the solr logs for the query and run it in my browser, get back the cluster output as expected). But when I dump the response from my solr-ruby query the cluste

Re: solr-ruby with clustering

2010-03-22 Thread mike anderson
false alarm, on the client side I was specifically setting a shard, and this was causing my query/solr-ruby/solr to think it was a distributed request, which isn't supported by the clustering component. cheers, mike On Mon, Mar 22, 2010 at 8:53 PM, mike anderson wrote: > Has anybody

proximity question

2010-07-06 Thread mike anderson
does anybody know how to accomplish this? Thanks, Mike Anderson

Re: Index Solr Logs

2011-06-26 Thread mike anderson
Check out Logg.ly. http://www.loggly.com/. They use SOLR to index all kinds of logs, SOLR included. This is a paid service, so maybe not what you're looking for. I've used it though, works great. -Mike On Sun, Jun 26, 2011 at 5:49 AM, Mr Havercamp wrote: > I'm interested to know if there is a w