deadlock in solrj?

2010-09-29 Thread Michal Stefanczak
Hello! I' using solrj 1.4.0 with java 1.6, on two occasions when indexing ~18000 documents we got the following problem: (trace from jconsole) Name: pool-1-thread-1 State: WAITING on java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@11 e464a Total blocked: 25

Re: Best way to check Solr index for completeness

2010-09-29 Thread Dennis Gearon
How soon do you need to know? Couldn't you just regenerate the index using some kind of 'nice' factor to not use too much processor/disk/etc? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at

Re: Best way to check Solr index for completeness

2010-09-29 Thread Peter Karich
How long does it take to get 1000 docs? Why not ensure this while indexing? I think besides your suggestion or the suggestion of Luke there is no other way... Regards, Peter. Hello, What would be the best way to check Solr index against original system (Database) to make sure index is up to

Re: deadlock in solrj?

2010-09-29 Thread Avi Rosenschein
This sounds like https://issues.apache.org/jira/browse/SOLR-1711. It is a known issue in Solr 1.4.0, which is apparently fixed in Solr 1.4.1. We also encountered it when indexing large numbers of documents with SolrJ, and are therefore in the process of upgrading to 1.4.1. -- Avi On Wed, Sep 29,

Missing facet values for zero counts

2010-09-29 Thread Allistair Crossley
Hello list, I am implementing a directory using Solr. The user is able to search with a free-text query or 2 filters (provided as pick-lists) for country. A directory entry only has one country. I am using Solr facets for country and I use the facet counts generated initially by a *:* search

Re: Missing facet values for zero counts

2010-09-29 Thread Chantal Ackermann
Hi Allistair, On Wed, 2010-09-29 at 15:37 +0200, Allistair Crossley wrote: Hello list, I am implementing a directory using Solr. The user is able to search with a free-text query or 2 filters (provided as pick-lists) for country. A directory entry only has one country. I am using Solr

Re: Best way to check Solr index for completeness

2010-09-29 Thread dshvadskiy
Using TermComponent is an interesting suggestion. However my understanding it will work only for unique terms. For example compare database primary key with Solr id field. A variation of that is to calculate some kind of unique record hash and store it in the index.Then retrieve id and hash via

How to set up multiple indexes?

2010-09-29 Thread Andy
I installed Solr according to the tutorial. My schema.xml solrconfig.xml is in ~/apache-solr-1.4.1/example/solr/conf Everything so far is just like that in the tutorial. But I want to set up a 2nd index (separate from the main index) just for the purpose of auto-complete. I understand that I

Re: Best way to check Solr index for completeness

2010-09-29 Thread dshvadskiy
Regenerating index is a slow operation due to limitation of the source systems. We run several complex SQL statements to generate 1 Solr document. Full reindex takes about 24 hours. -- View this message in context:

Re: How to set up multiple indexes?

2010-09-29 Thread Christopher Gross
Hi Andy! I configured this a few days ago, and found a good resource -- http://wiki.apache.org/solr/MultipleIndexes That page has links that will give you the instructions for setting up Tomcat, Jetty and Resin. I used the Tomcat ones the other day, and it gave me everything that I needed to

Re: How to set up multiple indexes?

2010-09-29 Thread Luke Crouch
Check http://doc.ez.no/Extensions/eZ-Find/2.2/Advanced-Configuration/Using-multi-core-features It's for eZ-Find, but it's the basic setup for multiple cores in any environment. We have cores designed like so: solr/sfx/ solr/forum/ solr/mail/ solr/news/ solr/tracker/ each of those core

Re: Queries, Functions, and Params

2010-09-29 Thread Yonik Seeley
On Tue, Sep 28, 2010 at 6:08 PM, Robert Thayer robert.tha...@bankserv.com wrote: On the http://wiki.apache.org/solr/FunctionQuery page, the following query function is listed: q={!func}add($v1,$v2)v1=sqrt(popularity)v2=100.0 When run against the default solr instance, server returns the

Swap on large memory multi-core multi-cpu NUMA

2010-09-29 Thread Glen Newton
In a recent blog entry (The MySQL “swap insanity” problem and the effects of the NUMA architecture http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/), Jeremy Cole describes a particular but common problem with large memory installations of MySql on multi-core

Re: Best way to check Solr index for completeness

2010-09-29 Thread dshvadskiy
Actually retrieving 1000 docs via search isn't that bad. Turned out it takes under 1 sec. I still like the idea of using TermComponent and will use it in the future if number of docs in the index will grow. Thanks for all suggestions. Dmitriy -- View this message in context:

Re: Best way to check Solr index for completeness

2010-09-29 Thread Walter Underwood
Think about what fields you need to return. For this, you probably only need the id. That could be a lot faster than the default set of fields. wunder On Sep 29, 2010, at 9:04 AM, dshvadskiy wrote: Actually retrieving 1000 docs via search isn't that bad. Turned out it takes under 1 sec. I

RE: Is Solr right for my business situation ?

2010-09-29 Thread Sharma, Raghvendra
Some questions. 1. I have about 3-5 tables. Now designing schema.xml for a single table looks ok, but whats the direction for handling multiple table structures is something I am not sure about. Would it be like a big huge xml, wherein those three tables (assuming its three) would show up as

Re: Best way to check Solr index for completeness

2010-09-29 Thread Erick Erickson
Yep, I was thinking of this on a uniqueKey field. I was assuming that there was a PK in the database that you were mapping to the uniqueKey field, but if that's not so then it's more of a problem. But you'd have problems anyway if you *don't* have a uniqueKey when it comes time to update any

Re: Is Solr right for my business situation ?

2010-09-29 Thread Erick Erickson
If at all possible, denormalize the data. Anytime you find yourself trying to make Solr behave like a database, the probability is high that you're mis-using Solr or the DB. Best Erick On Wed, Sep 29, 2010 at 12:40 PM, Sharma, Raghvendra sraghven...@corelogic.com wrote: Some questions. 1. I

Re: Missing facet values for zero counts

2010-09-29 Thread kenf_nc
I don't understand why you would want to show Sweden if it isn't in the index, what will your UI do if the user selects Sweden? However, one way to handle this would be to make a second document type. Have a field called type or some such, and make the new document type be 'dummy' or 'system' or

RE: Queries, Functions, and Params

2010-09-29 Thread Robert Thayer
Yes, just after sending the email I reread the wiki and noticed the 4.0 requirement. I will try that, thanks. From: ysee...@gmail.com on behalf of Yonik Seeley Sent: Wed 9/29/2010 8:12 AM To: solr-user@lucene.apache.org Subject: Re: Queries, Functions, and

Re: Missing facet values for zero counts

2010-09-29 Thread Allistair Crossley
Hi, For us this is a usability concern. You either don't show Sweden in a pick-list called Country and some users go away thinking you don't *ever* support Sweden (not true). OR you allow a user to execute an empty result search - but at least they know you do support Sweden. It is we believe

Issues with SolrJ and IndexReader reopening (again)

2010-09-29 Thread Antoniya Statelova
I saw there had been a previous discussion on commit failing for EmbeddedSolrServer here: http://www.mail-archive.com/solr-user@lucene.apache.org/msg28236.html But it was never resolved. I have an embedded solr server and it does not seem to pick up changes in the index after a commit through

Solr rate limiting / DoS attacks

2010-09-29 Thread Ian Upright
Hi, I'm curious as to what approaches one would take to defend against users attacking a Solr service, especially if exposed to the internet as opposed to an intranet. I'm fairly new to Solr, is there anything built in? Is there anything in place to prevent the search engine from getting

Re: Solr rate limiting / DoS attacks

2010-09-29 Thread Allistair Crossley
This kind of thing is not limited to Solr and you normally wouldn't solve it in software - it's more a network concern. I'd be looking at a web server solution such as Apache mod_evasive combined with a good firewall for more conventional DOS attacks. Just hide your Solr install behind the

How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Savannah Beckett
Hi,   I am using xpath to index different parts of the html pages into different fields.  Now, I have some pure text documents that has no html.  So I can't use xpath.  How do I index these pure text into different fields of the index?  How do I make nutch/solr understand these different parts

Re: Data Import Handler Rich Format Documents

2010-09-29 Thread Chris Hostetter
: What's a GA release? http://en.wikipedia.org/wiki/Software_release_life_cycle#General_availability -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!

Re: Dismax Request handler and Solrconfig.xml

2010-09-29 Thread Chris Hostetter
: In Solrconfig.xml, default request handler is set to standard. I am : planning to change that to use dismax as the request handler but when I : set default=true for dismax - Solr does not return any results - I get : results only when I comment out str name=defTypedismax/str. you need to

terms / stemming?

2010-09-29 Thread Peter A. Kirk
Hi I issue a request like the following, in order to get a list of search-terms in a particular field: http://localhost:8983/solr/terms?terms.limit=-1terms.fl=bodytext But some of the terms which are returned are not quite the same as those which were indexed (or which are returned in a

Re: terms / stemming?

2010-09-29 Thread Luke Crouch
Make sure your index and query analyzers are identical, and pay special attention if you're using any of the http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemminganalyzers - many of them have a number of configurable attributes that could cause differences. -L On Wed, Sep 29, 2010

Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Erick Erickson
Can you provide a few more details? You mention xpath, which leads me to believe that you are using DIH, is that true? How are you getting your documents to index? Parts of a filesystem? Because it's possible to do many things. If you're using DIH against a filesystem, you could use two

Re: terms / stemming?

2010-09-29 Thread Erick Erickson
Yes, this is almost certainly stemming. Take a look at solr/admin, [schema browser], then click on Homefieldsyour field here. Then the index and query details link shows you exactly what's happening. You can also get some joy from the admin [analysis] page. That takes input and shows you exactly

Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Savannah Beckett
No, I am using xpath for html, this is not the question.  I am indexing pure text in addition to html that I was indexing.  Pure text like TXT file or Microsoft Word doc.  So, no xpath for TXT, how do I index TXT file into different fields in my index like the way I use xpath to index html into

Memory usage

2010-09-29 Thread Jeff Moss
My server has 128GB of ram, the index is 22GB large. It seems the memory consumption goes up on every query and the garbage collector will never free up as much memory as I expect it to. The memory consumption looks like a curve, it eventually levels off but the old gen is always 60 or 70GB. I

DataImportHandler dynamic fields clarification

2010-09-29 Thread harrysmith
Looking for some clarification on DIH to make sure I am interpreting this correctly. I have a wide DB table, 100 columns. I'd rather not have to add 100 values in schema.xml and data-config.xml. I was under the impression that if the column name matched a dynamic Field name, it would be added. I

Re: Solr with example Jetty and score problem

2010-09-29 Thread Floyd Wu
Does anybody can help on this ? Many thanks 2010/9/29 Floyd Wu floyd...@gmail.com Hi there I have a problem, the situation is when I issue a query to single instance, Solr response XML like following as you can see, the score is normal(float name=score value=...) ===

Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Lance Norskog
Simple text .txt files and MS office .doc files are very very different beasts. You can do simple .txt files with some more lines in your DataImportHandler script. With DOC files it is easiest to use the extracting request handler */extract. This is on the wiki. If you want to do this inside the

Re: Swap on large memory multi-core multi-cpu NUMA

2010-09-29 Thread Lance Norskog
This would be a Java VM option, not something Solr or other apps can know about. Using this or procset seems like a great way to handle it. On Wed, Sep 29, 2010 at 8:46 AM, Glen Newton glen.new...@gmail.com wrote: In a recent blog entry (The MySQL “swap insanity” problem and the effects of the

Re: Memory usage

2010-09-29 Thread Lance Norskog
How many documents are there? How many unique words are in a text field? Both of these numbers can have a non-linear effect on the amount of space used. But, usually a 22Gb index (on disk) might need 6-12G of ram total. There is something odd going on here. Lance On Wed, Sep 29, 2010 at 4:34

Re: Is Solr right for my business situation ?

2010-09-29 Thread Lance Norskog
Some of these are big questions- try them in different emails. On Wed, Sep 29, 2010 at 9:40 AM, Sharma, Raghvendra sraghven...@corelogic.com wrote: Some questions. 1. I have about 3-5 tables. Now designing schema.xml for a single table looks ok, but whats the direction for handling multiple

Re: Why the query performance is so different for queries?

2010-09-29 Thread Lance Norskog
How much ram does the JVM have? Wildcard queries are slow. Starting with '*' are even slower. If you want all values try field:[* TO *]. This is a range query and lets you pick a range of values- this picks everything. The *:* is not a wildcard. It is a magic syntax for all documents and does

Re: Why the query performance is so different for queries?

2010-09-29 Thread newsam
Thanks for your reply. Our box is win server 2003 (32bits) and 6G RAM totally. Large heap (2G) may not be helpful for JVM in 32bits box. Therefore we set JAVA_OPTIONS to -Xms521m -Xmx1400m. Is my understanding right? Thanks. From: Lance Norskog goks...@gmail.com Reply-To:

Re: Why the query performance is so different for queries?

2010-09-29 Thread Walter Underwood
Stop running 32-bit operating systems. You'll never get good performance with a toy like that. --wunder On Sep 29, 2010, at 8:18 PM, newsam wrote: Thanks for your reply. Our box is win server 2003 (32bits) and 6G RAM totally. Large heap (2G) may not be helpful for JVM in 32bits box.

Where is the lock file?

2010-09-29 Thread Steve Cohen
Hello, We were testing nutch configurations and apparently we got heavy handed with our approach to stopping things. Now when nutch starts indexing solr, we are seeing these messages: org.apache.solr.common.SolrException: Lock obtain timed out: SingleInstanceLock: write.lock