leading wildcard search is called grep ;-)
Ditto on the indexing reversed words suggestion.
Can you create a second field in solr that contains /only/ the words
from the fields you care to reverse? Once you do that you could
pre-process the query and look for leading wildcards and address those
From my past projects, our Lucene classification corpus looked like this:
0|document text...|categoryA
1|document text...|categoryB
2|document text...|categoryA
3|document text...|categoryA
...
800|document text...|categoryC
With the faceting capabilities of Solr it is now possible to design
I should say that we also have this problem when we commit with waitflush =
true and waitsearcher = true
because it again close the old searcher and open a new one. so it has
warming up process with the queryResultCache.
besides , I need to commit waitFlush = false and waitSearcher=false to
On Wed, Jan 28, 2009 at 4:29 PM, Parisa paris...@gmail.com wrote:
I should say that we also have this problem when we commit with waitflush =
true and waitsearcher = true
because it again close the old searcher and open a new one. so it has
warming up process with the queryResultCache.
I know that I can see the search result after the commit and it is ok,
I can disable the queryResultCache and the problem will be fixed . but I
need the queryResultCache because my index Size is big and I need good
performance .
so I am trying to find how to fix the bug or may be the solr
We are moving from single core to multicore. We have a few servers that we
want to migrate one at a time to ensure that each one functions. This
process is proving difficult as there is no default core to allow the
application to talk to the solr servers uniformly (ie without a core name
during
Hi,
Is there any way to join multiple indexes in Solr?
Thanks,
Jae
Hi,
Your problem seems to be lower level than the SOLR code. You are sending
an xml request that contains an illegal (to xml spec) character. You
should strip these characters out of the data that you send. Or turn the
xml validation (not recommended because of all kinds of risks).
See
I would think that using a servlet filter to rewrite the URL should be
pretty strait forward. You could write your own or use a tool like http://tuckey.org/urlrewrite/
and just configure that.
Using something like this, I think the upgrade procedure could be:
- install rewrite filter to
surfer10 wrote:
i'm a little bit noob in java compiler so could you please tell me what tools
are used to apply patch SOLR-236 (Field groupping), does it need to be
applied on current solr-1.3 (and nightly builds of 1.4) or it already in
box?
what batch file stands for solr compilation in its
Tried that. Basically, solr really didn't want to do the internal rewrite.
So essentially we would have to rewrite with a full redirect and then change
the solrj source to allow it to follow the redirect. We are going with an
external rewriter. However, the seemingly easiest way would be to
I'm coming in late on this thread, but I want to recommend the YourKit
Profiler product. It helped me track a performance problem similar to what
you describe. I had been futzing with GC logging etc. for days before
YourKit pinpointed the issue within minutes.
http://www.yourkit.com/
(My problem
Hi Ryuuichi,
Thanks for your quick reply.
I checked the setting of useCompoundFile in solrconfig.xml, and the value
is 'false'. Here is what in our solrconfig.xml.
===
indexDefaults
!-- Values here affect all index writers
Does you index stay at triple size after optimization? It is normal for
Lucene to use 2x or upto 3x disk space during optimization but it should
fall back to the normal numbers once optimization completes and unused
segments are cleaned up due the index deletion policy.
If you search for threads
IndexMergeTool - http://wiki.apache.org/solr/MergingSolrIndexes
Sameer.
--
http://www.productification.com
On Wed, Jan 28, 2009 at 7:30 AM, Jae Joo jae...@gmail.com wrote:
Hi,
Is there any way to join multiple indexes in Solr?
Thanks,
Jae
Well, both pages I listed are in the search results :). But I agree
that it isn't obvious to find, and that it should be improved. (The
Wiki is a community-created site which anyone can contribute to,
incidentally.)
cheers,
-Mike
On 28-Jan-09, at 1:11 AM, Jarek Zgoda wrote:
I swear I
On Thu, Jan 29, 2009 at 12:39 AM, Gert Brinkmann g...@netcologne.de wrote:
Hello again,
is there nobody who could help me with this? Or is it an FAQ and my
questions are dumb somehow? Maybe I should try to shorten the questions: ;)
Quite the opposite, you are actually working with some
Hi All,
Is anyone using Solr (and thus the lucene index) as there database store.
Up to now, we have been using a database to build Solr from. However, given
that lucene already keeps the stored data intact, and that rebuilding from
solr to solr can be very fast, the need for the separate
One thing to keep in mind is that things like joins are impossible in
solr, but easy in a database. So if you ever need to do stuff like run
reports, you're probably better off with a database to query on -
unless you cover your bases very well in the solr index.
Thanks for your time!
This is perfectly fine. Of course, you lose any relational model. If you
don't have or don't need one, why not.
It used to be the case that backups of live Lucene indices were hard, so people
preferred having a RDBMS be the primary data source, the one they know how to
back up and maintain
Yeah, I think the begin/end chars are very helpful here. But I like the
suggestion of figuring out which words really need to support leading
wildcards...although that's typically impossible to predict, since people are
typically free to enter whatever queries they feel like.
Otis
--
Mark,
I am not aware of anyone open-sourcing such tools. But note that changing the
files with a GUI is easy (editor + scp?). What makes things more complicated
is the need to make Solr reload those files and, in some cases, changes really
require a full index rebuilding.
Otis
--
Sematext
Alejandro,
What you really want to do is identify the language of the email, store that in
the index and apply the appropriate analyzer. At query time you really want to
know the language of the query (either by detecting it or asking the user or
...)
Otis
--
Sematext -- http://sematext.com/
Although the idea that you will need to rebuild from scratch is
unlikely, you might want to fully understand the cost of recovery if you
*do* have to.
If it's incredibly expensive(time or money), you need to keep that in
mind.
-Todd
-Original Message-
From: Ian Connor
I am planning with backups, the recovery will only be incremental.
Is there an internal field to know when the last document hit the index or
is this best to build your own created_at type field to know when you need
to rebuild from?
After the backup is restored, this field could be read and
There is no existing internal field like that.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Ian Connor ian.con...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wednesday, January 28, 2009 4:59:28 PM
Subject: Re: solr as the data
Hi,
I currently have two indexes with solr. One for english version and one
with german version. They use respectively english/german2 snowball
factory.
Right now depending on which language is website currently I query
corresponding index.
There is requirement though that stuff is found
Mark Miller markrmil...@gmail.com wrote on 01/26/2009 04:30:00 PM:
Just a point or I missed: with such a large index (not doc size large,
but content wise), I imagine a lot of your 16GB of RAM is being used by
the system disk cache - which is good. Another reason you don't want to
give too
Hi, bear with me as I am new to Solr.
I have a requirement in an application where I need to show a list of
results by groups.
For instance, each document in my index correspond to a person and they have
a family name. I have hundreds of thousands of records (persons). What I
would like to do is
org/apache/catalina/connector/Connector java/util/WeakHashMap
$Entry399,913,269 bytes
org/apache/catalina/connector/Connector java/lang/Object[ ]
197,256,078 bytes
org/apache/lucene/search/ExtendedFieldCachejava/util/WeakHashMap$Entry
[ ] 177,893,021 bytes
I am constructing documents from a JDBC datasource and a HTTP datasource
(see data-config file below.) My problem is that I cannot know if a
particular HTTP URL is available at index time, so I need DIH to
continue processing even if the HTTP location returns a 404.
onError=continue does not
But do note that there's also no requirement that all documents
have the same fields. So you could consider storing a special
meta document that had *no* fields in common with any other
document that records whatever information you want about the
current state of the index.
Best
Erick
On Wed,
I'm not entirely sure about the fine points, but consider the
filters that are available that fold all the diacritics into their
low-ascii equivalents. Perhaps using that filter at *both* index
and search time on the English index would do the trick.
In your example, both would be 'munchen'.
Duh. Four cases. For extra credit, what language is wunder in?
wunder
On 1/28/09 5:12 PM, Walter Underwood wunderw...@netflix.com wrote:
I've done this. There are five cases for the tokens in the search
index:
1. Tokens that are unique after stemming (this is good).
2. Tokens that are
onError=continue must help .
which version of DIH are you using? onError is a Solr 1.4 feature
--Noble
On Thu, Jan 29, 2009 at 5:04 AM, Nathan Adams na...@umich.edu wrote:
I am constructing documents from a JDBC datasource and a HTTP datasource
(see data-config file below.) My problem is that
Hello,
Is it possible to define more than one schema? I'm reading the example
schema.xml. It seems that we can only define one schema? What about if I want
to define one schema for document type A and another schema for document type B?
Thanks a lot,
Kevin
36 matches
Mail list logo