Hello!
I' using solrj 1.4.0 with java 1.6, on two occasions when indexing
~18000 documents we got the following problem:
(trace from jconsole)
Name: pool-1-thread-1
State: WAITING on
java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@11
e464a
Total blocked: 25
How soon do you need to know? Couldn't you just regenerate the index using some
kind of 'nice' factor to not use too much processor/disk/etc?
Dennis Gearon
Signature Warning
EARTH has a Right To Life,
otherwise we all die.
Read 'Hot, Flat, and Crowded'
Laugh at
How long does it take to get 1000 docs?
Why not ensure this while indexing?
I think besides your suggestion or the suggestion of Luke there is no
other way...
Regards,
Peter.
Hello,
What would be the best way to check Solr index against original system
(Database) to make sure index is up to
This sounds like https://issues.apache.org/jira/browse/SOLR-1711. It is a
known issue in Solr 1.4.0, which is apparently fixed in Solr 1.4.1. We also
encountered it when indexing large numbers of documents with SolrJ, and are
therefore in the process of upgrading to 1.4.1.
-- Avi
On Wed, Sep 29,
Hello list,
I am implementing a directory using Solr. The user is able to search with a
free-text query or 2 filters (provided as pick-lists) for country. A directory
entry only has one country.
I am using Solr facets for country and I use the facet counts generated
initially by a *:* search
Hi Allistair,
On Wed, 2010-09-29 at 15:37 +0200, Allistair Crossley wrote:
Hello list,
I am implementing a directory using Solr. The user is able to search with a
free-text query or 2 filters (provided as pick-lists) for country. A
directory entry only has one country.
I am using Solr
Using TermComponent is an interesting suggestion. However my understanding it
will work only for unique terms. For example compare database primary key
with Solr id field. A variation of that is to calculate some kind of unique
record hash and store it in the index.Then retrieve id and hash via
I installed Solr according to the tutorial. My schema.xml solrconfig.xml is
in
~/apache-solr-1.4.1/example/solr/conf
Everything so far is just like that in the tutorial. But I want to set up a 2nd
index (separate from the main index) just for the purpose of auto-complete.
I understand that I
Regenerating index is a slow operation due to limitation of the source
systems. We run several complex SQL statements to generate 1 Solr document.
Full reindex takes about 24 hours.
--
View this message in context:
Hi Andy!
I configured this a few days ago, and found a good resource --
http://wiki.apache.org/solr/MultipleIndexes
That page has links that will give you the instructions for setting up
Tomcat, Jetty and Resin. I used the Tomcat ones the other day, and it gave
me everything that I needed to
Check
http://doc.ez.no/Extensions/eZ-Find/2.2/Advanced-Configuration/Using-multi-core-features
It's for eZ-Find, but it's the basic setup for multiple cores in any
environment.
We have cores designed like so:
solr/sfx/
solr/forum/
solr/mail/
solr/news/
solr/tracker/
each of those core
On Tue, Sep 28, 2010 at 6:08 PM, Robert Thayer
robert.tha...@bankserv.com wrote:
On the http://wiki.apache.org/solr/FunctionQuery page, the following query
function is listed:
q={!func}add($v1,$v2)v1=sqrt(popularity)v2=100.0
When run against the default solr instance, server returns the
In a recent blog entry (The MySQL “swap insanity” problem and the
effects of the NUMA architecture
http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/),
Jeremy Cole describes a particular but common problem with large memory
installations of MySql on multi-core
Actually retrieving 1000 docs via search isn't that bad. Turned out it takes
under 1 sec. I still like the idea of using TermComponent and will use it
in the future if number of docs in the index will grow. Thanks for all
suggestions.
Dmitriy
--
View this message in context:
Think about what fields you need to return. For this, you probably only need
the id. That could be a lot faster than the default set of fields.
wunder
On Sep 29, 2010, at 9:04 AM, dshvadskiy wrote:
Actually retrieving 1000 docs via search isn't that bad. Turned out it takes
under 1 sec. I
Some questions.
1. I have about 3-5 tables. Now designing schema.xml for a single table looks
ok, but whats the direction for handling multiple table structures is something
I am not sure about. Would it be like a big huge xml, wherein those three
tables (assuming its three) would show up as
Yep, I was thinking of this on a uniqueKey field. I was assuming that
there was
a PK in the database that you were mapping to the uniqueKey field, but if
that's
not so then it's more of a problem.
But you'd have problems anyway if you *don't* have a uniqueKey when it comes
time
to update any
If at all possible, denormalize the data. Anytime you find yourself trying
to make Solr
behave like a database, the probability is high that you're mis-using Solr
or the DB.
Best
Erick
On Wed, Sep 29, 2010 at 12:40 PM, Sharma, Raghvendra
sraghven...@corelogic.com wrote:
Some questions.
1. I
I don't understand why you would want to show Sweden if it isn't in the
index, what will your UI do if the user selects Sweden?
However, one way to handle this would be to make a second document type.
Have a field called type or some such, and make the new document type be
'dummy' or 'system' or
Yes, just after sending the email I reread the wiki and noticed the 4.0
requirement. I will try that, thanks.
From: ysee...@gmail.com on behalf of Yonik Seeley
Sent: Wed 9/29/2010 8:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Queries, Functions, and
Hi,
For us this is a usability concern. You either don't show Sweden in a pick-list
called Country and some users go away thinking you don't *ever* support Sweden
(not true). OR you allow a user to execute an empty result search - but at
least they know you do support Sweden.
It is we believe
I saw there had been a previous discussion on commit failing for
EmbeddedSolrServer here:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg28236.html
But it was never resolved. I have an embedded solr server and it does not
seem to pick up changes in the index after a commit through
Hi, I'm curious as to what approaches one would take to defend against users
attacking a Solr service, especially if exposed to the internet as opposed
to an intranet. I'm fairly new to Solr, is there anything built in?
Is there anything in place to prevent the search engine from getting
This kind of thing is not limited to Solr and you normally wouldn't solve it in
software - it's more a network concern. I'd be looking at a web server solution
such as Apache mod_evasive combined with a good firewall for more conventional
DOS attacks. Just hide your Solr install behind the
Hi,
I am using xpath to index different parts of the html pages into different
fields. Now, I have some pure text documents that has no html. So I can't use
xpath. How do I index these pure text into different fields of the index? How
do I make nutch/solr understand these different parts
: What's a GA release?
http://en.wikipedia.org/wiki/Software_release_life_cycle#General_availability
-Hoss
--
http://lucenerevolution.org/ ... October 7-8, Boston
http://bit.ly/stump-hoss ... Stump The Chump!
: In Solrconfig.xml, default request handler is set to standard. I am
: planning to change that to use dismax as the request handler but when I
: set default=true for dismax - Solr does not return any results - I get
: results only when I comment out str name=defTypedismax/str.
you need to
Hi
I issue a request like the following, in order to get a list of search-terms in
a particular field:
http://localhost:8983/solr/terms?terms.limit=-1terms.fl=bodytext
But some of the terms which are returned are not quite the same as those which
were indexed (or which are returned in a
Make sure your index and query analyzers are identical, and pay special
attention if you're using any of the
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemminganalyzers
- many of them have a number of configurable attributes that could
cause differences.
-L
On Wed, Sep 29, 2010
Can you provide a few more details? You mention xpath, which leads me
to believe that you are using DIH, is that true? How are you getting
your documents to index? Parts of a filesystem?
Because it's possible to do many things. If you're using DIH against a
filesystem,
you could use two
Yes, this is almost certainly stemming. Take a look at solr/admin, [schema
browser],
then click on Homefieldsyour field here. Then the index and query
details link
shows you exactly what's happening.
You can also get some joy from the admin [analysis] page. That takes input
and
shows you exactly
No, I am using xpath for html, this is not the question. I am indexing pure
text in addition to html that I was indexing. Pure text like TXT file or
Microsoft Word doc. So, no xpath for TXT, how do I index TXT file into
different fields in my index like the way I use xpath to index html into
My server has 128GB of ram, the index is 22GB large. It seems the memory
consumption goes up on every query and the garbage collector will never free
up as much memory as I expect it to. The memory consumption looks like a
curve, it eventually levels off but the old gen is always 60 or 70GB. I
Looking for some clarification on DIH to make sure I am interpreting this
correctly.
I have a wide DB table, 100 columns. I'd rather not have to add 100 values
in schema.xml and data-config.xml. I was under the impression that if the
column name matched a dynamic Field name, it would be added. I
Does anybody can help on this ?
Many thanks
2010/9/29 Floyd Wu floyd...@gmail.com
Hi there
I have a problem, the situation is when I issue a query to single instance,
Solr response XML like following
as you can see, the score is normal(float name=score value=...)
===
Simple text .txt files and MS office .doc files are very very different beasts.
You can do simple .txt files with some more lines in your
DataImportHandler script.
With DOC files it is easiest to use the extracting request handler
*/extract. This is on the wiki.
If you want to do this inside the
This would be a Java VM option, not something Solr or other apps can know about.
Using this or procset seems like a great way to handle it.
On Wed, Sep 29, 2010 at 8:46 AM, Glen Newton glen.new...@gmail.com wrote:
In a recent blog entry (The MySQL “swap insanity” problem and the
effects of the
How many documents are there? How many unique words are in a text
field? Both of these numbers can have a non-linear effect on the
amount of space used.
But, usually a 22Gb index (on disk) might need 6-12G of ram total.
There is something odd going on here.
Lance
On Wed, Sep 29, 2010 at 4:34
Some of these are big questions- try them in different emails.
On Wed, Sep 29, 2010 at 9:40 AM, Sharma, Raghvendra
sraghven...@corelogic.com wrote:
Some questions.
1. I have about 3-5 tables. Now designing schema.xml for a single table looks
ok, but whats the direction for handling multiple
How much ram does the JVM have?
Wildcard queries are slow. Starting with '*' are even slower. If you
want all values try field:[* TO *]. This is a range query and lets
you pick a range of values- this picks everything.
The *:* is not a wildcard. It is a magic syntax for all documents
and does
Thanks for your reply.
Our box is win server 2003 (32bits) and 6G RAM totally. Large heap (2G) may
not be helpful for JVM in 32bits box. Therefore we set JAVA_OPTIONS to
-Xms521m -Xmx1400m. Is my understanding right?
Thanks.
From: Lance Norskog goks...@gmail.com
Reply-To:
Stop running 32-bit operating systems. You'll never get good performance with a
toy like that. --wunder
On Sep 29, 2010, at 8:18 PM, newsam wrote:
Thanks for your reply.
Our box is win server 2003 (32bits) and 6G RAM totally. Large heap (2G) may
not be helpful for JVM in 32bits box.
Hello,
We were testing nutch configurations and apparently we got heavy handed with
our approach to stopping things.
Now when nutch starts indexing solr, we are seeing these messages:
org.apache.solr.common.SolrException: Lock obtain timed out:
SingleInstanceLock: write.lock
43 matches
Mail list logo