Hi,
which version do you use? 1.4.1 is highly recommended since previous versions
contained some
bugs related to memory usage that could lead to memory leaks. i had this gc
overhead limit
in my setup as well. only workaround that helped was a dayly restart of all
instances.
with 1.4.1 this
Hi,
IMHO you can do this with date range queries and (date) facets.
The DateMathParser will allow you to normalize dates on min/hours/days.
If you hit a limit there, then just add a field with an integer for
either min/hour/day. This way you'll loose the month information - which
is sometimes
I would use the string version as Drupal will probably populate it with a url
like thing something that may not validate as type url
On 27 Jul 2010, at 04:00, Savannah Beckett wrote:
I am trying to merge the schema.xml that is the solr/nutch setup with the one
from drupal apache solr
How to reduce the index files size, decreate the sync time between each nodes.
decrease the index create/update time.
Thanks.
Hello,
I'm using SnowballPorterFilterFactory with language=Russian.
The stemming works ok except people names, geographical places.
Here are some examples:
searching for Ковров should also find Коврова, Коврову, Ковровом, Коврове.
Are there other stemming plugins for the russian language that
All of your examples stem to ковров:
assertAnalyzesTo(a, Коврова Коврову Ковровом Коврове,
new String[] { ковров, ковров, ковров, ковров });
}
Are you sure you enabled this at *both* index and query time?
2010/7/27 Oleg Burlaca o...@burlaca.com
Hello,
I'm using
Hi,
I've recently been looking into Spellchecking in solr, and was struck by how
limited the usefulness of the tool was.
Like most corpora , ours contains lots of different spelling mistakes for
the same word, so the 'spellcheck.onlyMorePopular' is not really that useful
unless you click on it
another look, your problem is ковров itself... its mapped to ковр
a workaround might be to use the protected words functionality to
keep ковров and any other problematic people/geo names as-is.
separately, in trunk there is an alternative russian stemmer
(RussianLightStemFilterFactory), which
Yes, I'm sure I've enabled SnowballPorterFilterFactory both at Index and
Query time, because the search works ok,
except names and geo locations.
I've noticed that searching by
Коврова
also shows documents that contain Коврову, Коврове
Search by Ковров, 7 results:
A similar word is Немцов.
The strange thing is that searching for Немцова will not find documents
containing Немцов
Немцова: 14 articles
http://www.sova-center.ru/search/?lg=1q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2%D0%B0
Немцов: 74 articles
Actually the situation with Немцов из ок,
I've just checked how Yandex works with Немцов and Немцова:
http://nano.yandex.ru/project/inflect/
I think there are two solutions:
a) manually search for both Немцов and then Немцова
b) use wildcard query: Немцов*
Robert, thanks for the
2010/7/27 Oleg Burlaca o...@burlaca.com
Actually the situation with Немцов из ок,
I've just checked how Yandex works with Немцов and Немцова:
http://nano.yandex.ru/project/inflect/
I think there are two solutions:
a) manually search for both Немцов and then Немцова
b) use wildcard query:
Hi,
I'm attempting to get the carrot based clustering component (in trunk) to
work. I see that the clustering contrib has been disabled for the time
being. Does anyone know if this will be re-enabled soon, or even better,
know how I could get it working as it is?
Thanks,
Matt
Hi Matt,
I'm attempting to get the carrot based clustering component (in trunk) to
work. I see that the clustering contrib has been disabled for the time
being. Does anyone know if this will be re-enabled soon, or even better,
know how I could get it working as it is?
I've recently created a
We have three dedicated servers for solr, two for slaves and one for master,
all with linux/debian packages installed.
I understand that replication does always copies over the index in an exact
form as in master index directory (or it is supposed to do that at least),
and if the master index
Hi Mitch,
thanks for that suggestion. I wasn't aware of that. I've already added a
temporary field in my ScriptTransformer that does basically the same.
However, with this approach indexing time went up from 20min to more
than 5 hours.
The new approach is to query the solr index for that other
Good Morning, afternoon or evening...
If someone installed Solr using the LucidWorks.jar (1.4) installation how
can one make a small change and recompile.
Is there a LucidWorks (tomcat) build somewhere?
Regards
ericz
Hi Jon,
During the last days we front the same problem.
Using Solr 1.4.1 classic (tika 0.4 ),from some pdf files we can't extract
content and from others, Solr throws an exception during the Indexing
Process .
You must:
Update tika libraries (into /contrib/extraction/lib)with tika-core.0.8
Hi,
I have been using DIH to do index documents from database. I am hoping to
use DIH to delete documents from index. I search in wiki and found the
special commands in DIH to do so.
http://wiki.apache.org/solr/DataImportHandler#Special_Commands
But there is no example on how to use them. I
Ouch! Absolutely correct - quoting the URL fixed it. Thanks for saving me a
sleepless night!
cheers - rene
2010/7/26 Chris Hostetter hossman_luc...@fucit.org
: However, when I'm trying this very URL with curl within my (perl) script,
I
: receive a NullPointerException:
: CURL-COMMAND: curl
Hi Chantal,
However, with this approach indexing time went up from 20min to more
than 5 hours.
This is 15x slower than the initial solution... wow.
From MySQL I know that IN ()-clauses are the embodiment of endlessness -
they perform very, very badly.
New idea:
Create a method which
I did not realize the LucidWords.jar comes with an option to install the
sources :-)
On Tue, Jul 27, 2010 at 10:59 AM, Eric Grobler impalah...@googlemail.comwrote:
Good Morning, afternoon or evening...
If someone installed Solr using the LucidWorks.jar (1.4) installation how
can one make a
Hi Mitch,
New idea:
Create a method which returns the query-string:
returnString(theVIP)
{
if ( theVIP != null || theVIP != )
{
return a query-string to find the vip
}
else
{
return SELECT 1 // you need to modify this,
We have three dedicated servers for solr, two for slaves and one for master,
all with linux/debian packages installed.
I understand that replication does always copies over the index in an exact
form as in master index directory (or it is supposed to do that at least),
and if the master
Hi Chantal,
instead of:
entity name=prog ...
field name=vip ... /* multivalued, not required */
entity name=ssc_entry dataSource=ssc onError=continue
query=select SSC_VALUE from SSC_VALUE
where SSC_ATTRIBUTE_ID=1
Hi
I am using solrCloud.
Suppose I have a total 4 machines dedicated for solr.
I want to have 2 machines as replication (salves) and 2 masters
But I want to work with 8 logical cores rather 2.
i.e. each master (and each slave) will have 4 cores on it.
the reason is that I can optimize the cores
Hi Mitch,
thanks for the code. Currently, I've got a different solution running
but it's always good to have examples.
If realized
that I have to throw an exception and add the onError attribute to the
entity to make that work.
I am curious:
Can you show how to make a method
Thanks for the input, i'll check it out!
Marc
Subject: RE: Spellcheck help
Date: Fri, 23 Jul 2010 13:12:04 -0500
From: james.d...@ingrambook.com
To: solr-user@lucene.apache.org
In org.apache.solr.spelling.SpellingQueryConverter, find the line (#84):
final static String PATTERN =
Thanks Robert for all your help,
The idea of ы[A-Z].* stopwords is ideal for the english language,
although in russian nouns are inflected: Борис, Борису, Бориса, Борисом
I'll try the RussianLightStemFilterFactory (the article in the PDF mentioned
it's more accurate).
Once again thanks,
Oleg
right, but your problem is this is the current output:
Ковров - Ковр
Коврову - Ковров
Ковровом - Ковров
Коврове - Ковров
so, if Ковров was simply left alone, all your forms would match...
2010/7/27 Oleg Burlaca o...@burlaca.com
Thanks Robert for all your help,
The idea of ы[A-Z].* stopwords
The wiki entry for hl.highlightMultiTerm:
http://wiki.apache.org/solr/HighlightingParameters#hl.highlightMultiTerm
doesn't appear to be correct. It says:
If the SpanScorer is also being used, enables highlighting for
range/wildcard/fuzzy/prefix queries. Default is false.
But the code in
If you could, let me know how your testing goes with this change. I too am
interested in having the Collate work as good as it can. It looks like the
code would be better with this change but then again I don't know what the
original author was thinking when this was put in.
James Dyer
Hi Yonik,
I am using Solr 1.4 release dated Feb-9 2010. There is no custom code. I am
using regular out of box dismax requesthandler.
The query is a simple one with 4 filter queries (fq's) and one sort query.
During the index generation, I delete a set of rows based on date filter, then
add new
According to SO:
http://stackoverflow.com/questions/1557616/retrieving-per-keyword-field-match-position-in-lucene-solr-possible
It is not possible, but it is one year ago, is it still true for now?
Thanks.
Look into -XX:-GCUseOverheadLimit
On 7/26/10, Jonathan Rochkind rochk...@jhu.edu wrote:
I am now occasionally getting a Java GC overhead limit exceeded error
in my Solr. This may or may not be related to recently adding much
better (and more) warming querries.
I can get it when trying a
Hi Jason,
Are you looking for the total number of unique terms or total number of term
occurrences?
Checkindex reports both, but does a bunch of other work so is probably not the
fastest.
If you are looking for total number of term occurrences, you might look at
Hi,
I'm trying to sort by distance like this:
sort=dist(2,lat,lon,55.755786,37.617633) asc
In general results are sorted, but some documents are not in right order.
I'm using DistanceUtils.getDistanceMi(...) from lucene spatial to calculate
real distance after reading documents from Solr.
Solr
I'm adding lots of small docs with several threads to solr and the adds
start fast but then slow down. I didn't do any explicit commits and
autocommit is turned off but the logs show lots of commit activity on
this core and restarting this solr core logged the below. Where did all
these commits
Hi,
I found the suggestions returned from the standard solr spellcheck not to be
that relevant. By contrast, aspell, given the same dictionary and mispelled
words, gives much more accurate suggestions.
I therefore wrote an implementation of SolrSpellChecker that wraps jazzy,
the java aspell
Alessandro all,
I was having the same issue with Tika crashing on certain PDFs. I also noticed
the bug where no content was extracted after upgrading Tika.
When I went to the SOLR issue you link to below, I applied all the patches,
downloaded the Tika 0.8 jars, restarted tomcat, posted a
In trunk (flex) you can ask each segment for its unique term count.
But to compute the unique term count across all segments is
necessarily costly (requires merging them, to de-dup), as Hoss
described.
Mike
On Tue, Jul 27, 2010 at 12:27 PM, Burton-West, Tom tburt...@umich.edu wrote:
Hi Jason,
Mark,
I'd like to see your code if you open a JIRA for this. I recently
opened SOLR-2010 with a patch that does something similar to the second
part only of what you describe (find combinations that actually return a
match). But I'm not sure if my approach is the best one so I would like
to see
: Is there anyway to have time out support in distributed search. I
: searched https://issues.apache.org/jira/browse/SOLR-502 but looks it is
: not in main release of solr1.4
note that issue is marked Fix Version/s: 1.3 ... that means it
was fixed in Solr 1.3, well before 1.4 came out.
You
:
: I was wondering if anyone has found any resolution to this email thread?
As Grant asked in his reply when this thread was first started (December
2009)...
It sounds like you are either using embedded mode or you have some
custom code. Are you sure you are releasing your resources
: Thanks for your reply. I could not find in the log files any mention to
: that. By the way I only have _MM_DD.request.log files in my directory.
:
: Do I have to enable any specific log or level to catch those errors?
if you are using that java -jar start.jar command for the example
I'm a relative beginner at SOLR, indexing and searching Unicode Tibetan
texts. I am trying to use the highlighter but it just returns, empty
elements, such as:
lst name=highlighting
lst name=kt-d-0103-text-v4p262a/
/lst
What am I doing wrong?
The query that generated that is:
On Jul 27, 2010, at 12:21pm, Chris Hostetter wrote:
:
: I was wondering if anyone has found any resolution to this email
thread?
As Grant asked in his reply when this thread was first started
(December 2009)...
It sounds like you are either using embedded mode or you have some
custom
Than -
Looks like maybe your text_bo field type isn't analyzing how you'd
like? Though that's just a hunch. I pasted the value of that field
returned in the link you provided into your analysis.jsp page and it
chunked tokens by whitespace. Though I could be experiencing a copy/
I am getting a similar error with today's nightly build:
HTTP Status 500 - Index: 54, Size: 24
java.lang.IndexOutOfBoundsException: Index: 54, Size: 24 at
java.util.ArrayList.RangeCheck(ArrayList.java:547) at
java.util.ArrayList.get(ArrayList.java:322) at
I thought I asked a variation of this before, but I don't see it on the
list, apologies if this is a duplicate, but I have new questions.
So I need to find the min and max value of a result set. Which can be
several million documents. One way to do this is the StatsComponent.
One problem is
Hi,
(The first version of this was rejected for spam).
I'm setting up a test instance of Solr, and keep running into the problem of
having Solr not work the way I think it should work. Specifically, the data I
want to go into the index isn't there after indexing. I'm extracting the data
from
Yonik,
One more update on this. I used the filter query that was throwing
error and used it to delete a subset of results.
After that the queries started working correctly.
Which indicates that the particular docId was present in the index somewhere,
but lucene was not able to find it.
for STRING_VALUE, I assume there is a property in the 'select *' results
called string_value? if so I'm not sure why it wouldn't work. If not, then
that's why, it doesn't have anything to put there.
For ATTRIBUTE_NAME, is it possibly a case issue? you called it
'Attribute_Name' in your query,
(10/07/27 23:16), Stephen Green wrote:
The wiki entry for hl.highlightMultiTerm:
http://wiki.apache.org/solr/HighlightingParameters#hl.highlightMultiTerm
doesn't appear to be correct. It says:
If the SpanScorer is also being used, enables highlighting for
range/wildcard/fuzzy/prefix queries.
Is there a way to tell Solr to only return a specific set of facet values? I
feel like the facet query must be able to do this, but I'm not really
understanding the facet query. In my specific case, I'd like to only see facet
values for the same values I pass in as query filters, i.e. if I
Is there a way to tell Solr to only return a specific set of facet values? I
feel like the facet query must be able to do this, but I'm not really
understanding the facet query. In my specific case, I'd like to only see
facet
values for the same values I pass in as query filters, i.e. if I
I would start over from the Solr 1.4.1 binary distribution and follow
the instructions on the wiki:
http://wiki.apache.org/solr/ExtractingRequestHandler
(Java classpath stuff is notoriously difficult, especially when
dynamically configured and loaded. I often cannot tell if Java cannot
load the
Yonik's Law of Patches reads: A half-baked patch in Jira, with no
documentation, no tests and no backwards compatibilty is better than no
patch at all.
It'd be perfectly appropriate, IMO, for you to post an outline of what your
enhancements do over on the SOLR dev list and get a reaction from the
There are two different datasets that Solr (Lucene really) saves from
a document: raw storage and the indexed terms. I don't think the
ExtractingRequestHandler ever automatically stored the raw data; in
fact Lucene works in Strings internally, not raw byte arrays (this is
changing).
It should be
Ah! You have junk files piling up in the slave index directory. When
this happens, you may have to remove data/index entirely. I'm not sure
if Solr replication will handle that, or if you have to copy the whole
index to reset it.
You said the slaves time out- maybe the files are so large that the
Solr respects case for field names. Database fields are supplied in
lower-case, so it should be 'attribute_name' and 'string_value'. Also
'product_id', etc.
It is easier if you carefully emulate every detail in the examples,
for example lower-case names.
On Tue, Jul 27, 2010 at 2:59 PM, kenf_nc
Should this go into the trunk, or does it only solve problems unique
to your use case?
On Tue, Jul 27, 2010 at 5:49 AM, Chantal Ackermann
chantal.ackerm...@btelligent.de wrote:
Hi Mitch,
thanks for the code. Currently, I've got a different solution running
but it's always good to have
I have studied some Russian. I kind of got the picture from the texts that all
the exceptions had already been 'found', and were listed in the book.
I do know that languages are living, changing organisms, but Russian has got to
be more regular than English I would think, even WITH all six
63 matches
Mail list logo