Sorry, my bad... I replied to a current mailing list message only changing the
subject... Didn't know about this Hijacking problem. Will not happen again.
Just for close this issue, if I understand correctly, for an index of 40G, I
will need, for running an optimize:
- 40G if all activity on
On Wed, Jan 13, 2010 at 7:48 AM, Lance Norskog goks...@gmail.com wrote:
You can do this stripping in the DataImportHandler. You would have to
write your own stripping code using regular expresssions.
Note that DIH has a HTMLStripTransformer which wraps Solr's HTMLStripReader.
--
Regards,
Hi,
I am using the More Like This feature. I have configured it in
solrconfig.xml as a dedicated request handler and I am using SolrJ.
It's working properly when the similarity fields are all text data types.
But when I add a field whose datatype is 'sint', it's throwing an exception.
try /solr/select?q.alt=*:*qt=dismax
or /solr/select?q=some search termqt=dismax
dismax should be configured in solrconfig.xml by default, but you have
to adapt it to list the fields from your schema.xml
and for anything with known field:
/solr/select?q=field:valueqt=standard
Cheers,
Chantal
Hi all,
is it possible to restrict the returned facets to only those that apply
to the filter query but still use mincount=0? Keeping those that have a
count of 0 but apply to the filter, and at the same time leaving out
those that are not covered by the filter (and thus 0, as well).
Some
right, but we should not encourage users to significantly degrade
overall relevance for all movies due to a few movies and a band (very
special cases, as I said).
In english, by not using stopwords, it doesn't really degrade
relevance that much, so its a reasonable decision to make. This is not
That's my understanding.. But fortunately disk space is cheap G
On Wed, Jan 13, 2010 at 5:01 AM, Frederico Azeiteiro
frederico.azeite...@cision.com wrote:
Sorry, my bad... I replied to a current mailing list message only changing
the subject... Didn't know about this Hijacking
Hi,
I created Jira issue SOLR-1721 and attached simple patch ( no
documentation ) for this.
HIH,
Alex
2010/1/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com:
it can be added
On Tue, Jan 12, 2010 at 10:18 PM, Alexey Serba ase...@gmail.com wrote:
Hi,
I found that there's no explicit
Isn't the conclusion here that some stopword and stemming free
matching should be the best match if ever and to then gently degrade
to weaker forms of matching?
paul
Le 13-janv.-10 à 07:08, Walter Underwood a écrit :
There is a band named The The. And a producer named Don Was. For
a
Hi all,
I'm trying to add multiple files to solr 1.4 with solrj.
With this programm 1 Doc is added to solr:
SolrServer server = SolrHelper.getServer();
server.deleteByQuery( *:* );// delete everything!
server.commit();
QueryResponse rsp =
I have tried several variations now, but have been unable to come up with a way
to boost fields in a localsolr query. What I need to do is do a localsolr
search and sort the result set so that a specific value is at the top. My idea
was to use a nested dismax query with a boost field like
On Jan 13, 2010, at 10:44 AM, Kevin Thorley wrote:
I have tried several variations now, but have been unable to come up with a
way to boost fields in a localsolr query. What I need to do is do a
localsolr search and sort the result set so that a specific value is at the
top. My idea was
I don't have experience with migrating, but you should consider using the
example schema.xml in the distro as a starting basis for creating your schema.
-Original Message-
From: Abin Mathew [mailto:abin.mat...@toostep.com]
Sent: Tuesday, January 12, 2010 8:42 PM
To:
Hi all,
I tried creating a case-insensitive string using the values provided to a
string, via CopyField. This didn't work, since copyField does it's job before
the analyzer on the case-insensitive string field is invoked.
Is there another way I might accomplish this field replication on the
Just curious, have you checked if the hanging you are experiencing is not
garbage collection related?
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 13 January 2010 13:33
To: solr-user@lucene.apache.org
Subject: Re: Problem comitting on 40GB index
That's
On Wed, Jan 13, 2010 at 10:17 AM, Bill Bell bb...@kaango.com wrote:
I am using Solr 1.4, and have 3 cores defined in solr.xml. Question on
replication
1. How do I set up rsync replication from master to slaves? It was
easy to do with just one core and one script.conf, but with multiple
Thanks a lot. It works now. When i added the line
#set($hl = $response.highlighting)
i got the highlighting. But i wonder if there's any document that
describes the usage of that. I mean i didn't know the name of those
methods. Actually i just managed to guess it.
best regards,
Qiuyan
The hanging didn't happen again since yesterday. I never run out of space
again. This is still a dev environment, so the number of searches is very low.
Maybe I'm just lucky...
Where can I see the garbage collection info?
-Original Message-
From: Marc Des Garets
Hi,
I have a bit of an interesting OutOfMemoryError that I'm trying to
figure out.
My client Solr server are running in the same JVM (for deployment
simplicity). FWIW, I'm using Jetty to host Solr. I'm using the supplied
code for the http-based client interface. Solr 1.3.0.
My app is adding
Hi I have a field:
field name=srcANYSTRStrCI type=string_ci indexed=true stored=true
multiValued=true /
With type definition:
!-- A Case insensitive version of string type --
fieldType name=string_ci class=solr.StrField
from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
On wildcard and fuzzy searches, no text analysis is performed on
the search word.
i'd just lowercase the wildcard-ed search term in your client code,
before you send it to solr.
hth,
rob
On Wed, Jan 13, 2010 at 2:18 PM,
What do you get when you add debugQuery=on to your lower-case query?
And does Luke show you what you expect in the index?
On Wed, Jan 13, 2010 at 2:18 PM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]
timothy.j.har...@nasa.gov wrote:
Hi I have a field:
field name=srcANYSTRStrCI type=string_ci
I considered that, but I'm also having the issue that I can't get an exact
match as case insensitive either.
-Original Message-
From: Rob Casson [mailto:rob.cas...@gmail.com]
Sent: Wednesday, January 13, 2010 11:26 AM
To: solr-user@lucene.apache.org
Subject: Re: case-insensitive string
From the query
http://localhost:8080/solr/select?q=idxPartition%3ASOMEPART%20AND%20srcANYSTRStrCI:%22mixcase%20or%20lower%22debugQuery=on
Debug info attached
-Original Message-
From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]
[mailto:timothy.j.har...@nasa.gov]
Sent: Wednesday, January
The value in the srcANYSTRStrCI field is miXCAse or LowER according to Luke.
-Original Message-
From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]
[mailto:timothy.j.har...@nasa.gov]
Sent: Wednesday, January 13, 2010 11:31 AM
To: solr-user@lucene.apache.org
Subject: RE: case-insensitive
I created a document that has a string field and a case insensitive string
field using my string_ci type, both have the same value sent at document
creation time: miXCAse or LowER.
I attach two debug query results. One against the string type and one against
mine. The query is only different
That seems to work.
But why? Does string type not support LowerCaseFilterFactory? Or
KeywordTokenizerFactory?
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Wednesday, January 13, 2010 11:51 AM
To: solr-user@lucene.apache.org
Subject: RE: case-insensitive
Thanks, I know I read that sometime back but I guess I thought that was because
there were no analyzer tags defined on the string field in the schema. I
guess cause I'm still kind of a noob - I didn't take that to mean that it
couldn't be made to have analyzers. A subtle but important
Robert Muir: Thank you for the pointer to that paper!
On Wed, Jan 13, 2010 at 6:29 AM, Paul Libbrecht p...@activemath.org wrote:
Isn't the conclusion here that some stopword and stemming free matching
should be the best match if ever and to then gently degrade to weaker forms
of matching?
You can do this filtering in the DataImportHandler. The regular
expression tool is probably enough:
http://wiki.apache.org/solr/DataImportHandler#RegexTransformer
On Wed, Jan 13, 2010 at 8:57 AM, Harsch, Timothy J. (ARC-TI)[PEROT
SYSTEMS] timothy.j.har...@nasa.gov wrote:
Hi all,
I tried
The time in autocommit is in milliseconds. You are committing every
second while indexing. This then causes a build-up of sucessive index
readers that absorb each commit, which is probably the out-of-memory.
On Wed, Jan 13, 2010 at 10:36 AM, Minutello, Nick
nick.minute...@credit-suisse.com
Agreed, commit every second.
Assuming I understand what you're saying correctly:
There shouldn't be any index readers - as at this point, just writing to the
index.
Did I understand correctly what you meant?
-Nick
-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent:
Hi all,
The way the indexing works on our system is as follows:
We have a separate staging server with a copy of our web app. The
clients will index a number of documents in a batch on the staging
server (this happens about once a week), then they play with the results
on the staging server
Hi!
Garbage collection is an issue of the underlying JVM. You may use
–XX:+PrintGCDetails as an argument to your JVM in order to collect
details of the garbage collection. If you also use the parameter
–XX:+PrintGCTimeStamps you get the time stamps of the garbage
collection.
For further
Bill,
If you are using Solr 1.4, don't bother with rsync, use the Java-based
replication - info on zee Wiki.
Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
From: Bill Bell bb...@kaango.com
To: solr-user@lucene.apache.org
Sent: Wed, January
Hi,
Pointers:
* What happens when you don't use a field name?
* What are your logs showing?
* What is debugQuery=on showing?
* What is the Analysis page for some of the problematic queries showing?
Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
On Jan 13, 2010, at 5:34 PM, Minutello, Nick wrote:
Agreed, commit every second.
Do you need the index to be updated this often? Are you reading from
it every second? and need results that are that fresh
If not, i imagine increasing the auto-commit time to 1min or even 10
secs would
if you are using auto-commit, you should not call commit from the
client
Cheers, thanks.
Do you need the index to be updated this often?
Wouldn't increasing the autocommit time make it worse? (ie more
documents buffered)
I can extend it and see what effect it has
-Nick
-Original
Hi Qiuyan,
Thanks a lot. It works now. When i added the line
#set($hl = $response.highlighting)
i got the highlighting. But i wonder if there's any document that
describes the usage of that. I mean i didn't know the name of those
methods. Actually i just managed to guess it.
Solritas (aka
Hm, Ryan, you may have inadvertently solved the problem. :)
Going flat out in a loop, indexing 1 doc at a time, I can only index
about 17,000 per minute - roughly what I was seeing with my app
running... which makes me suspicious. The number is too close to be
coincidental.
It could very well
: Dedupe is completely the wrong word. Deduping is something else
: entirely - it is about trying not to index the same document twice.
Dedup can also certainly be used with field collapsing -- that was one of
the initial use cases identified for the SignatureUpdateProcessorFactory
... you can
Hoss,
Would you suggest using dedup for my use case; and if so, do you know of a
working example I can reference?
I don't have an issue using the patched version of Solr, but I'd much rather
use the GA version.
-Kelly
hossman wrote:
: Dedupe is completely the wrong word. Deduping is
Hi Israel
Thank you for your response.
However, I use both ini_set and set the _defaultTimeout to 6000 but the
error still occur with same error message.
Now, when I start build the index, the error pops up much faster than
changing it before.
So do you have any idea?
Thank you in advance
: i.e. just extend facet.sort to allow a 'count desc'. By convention, ok
: to use a a space in the name? - or would count.desc (and count.asc as
: alias for count) be more compliant?
i would use space to remain consistent with the existing sort
param.
it might even make sense to refactor
Here are a workaround of this issue:
On line 382 of SolrPhpClient/Apache/Solr/Service.php, I change to:
while(true){
$str = file_get_contents($url, false, $this-_postContext);
if(empty($str) == false){
break;
}
}
$response = new
Hi,
Thanks for the responses.
q.alt did the job. Turns out that the dismax query parser was at fault, and
wasn't able to handle queries of the type *:*. Putting the query in q.alt,
or adding a defType=lucene (as pointed out to me on the irc channel) worked.
Thanks,
--
- Siddhant
46 matches
Mail list logo