The easiest way is to do that in the app. That is, return the top
10 to the app (by score) then re-order them there. There's nothing
in Solr that I know of that does what you want out of the box.
Best
Erick
On Mon, Apr 30, 2012 at 11:10 AM, Jacek pjac...@gmail.com wrote:
Hello all,
I'm facing
Works fine for me with address_xml as string type, indexed, stored
on 3.6. What version of Solr are you using?
Best
Erick
On Mon, Apr 30, 2012 at 4:18 PM, William Bell billnb...@gmail.com wrote:
I am getting a post.jar failure when trying to post the following
CDATA field... It used to work on
Well, that'll be kinda self-defeating. The whole point of auto-warming
is to fill up the caches, consuming memory. Without that, searches
will be slow. So the idea of using minimal resources is really
antithetical to having these in-memory structures filled up.
You can try configuring minimal
I've no experience in the language nuances. I've found that I had to
mix unigram phrase searches with free-text searces in bigram fields.
This is for Chinese language, not Japanese. The bigram idea comes
about apparently because Chinese characters tend to be clumped into
2-3 letter words, in a way
Hi David,
I think you should add this option : flatten=true
and the could you try to use this XPath :
/MedlineCitationSet/MedlineCitation/AuthorList/Author
see here for the description :
http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1
I don't think the that
Not sure if there is an automatic way but we do it via a delete query and
where possible we update doc under same id to avoid deletes.
On 01/05/2012 13:43, Bai Shen baishen.li...@gmail.com wrote:
What is the best method to remove old documents? Things that no generate
404 errors, etc.
Is
hello shawn,
thanks for the reply.
ok - i did some testing and yes you are correct.
autocommit is doing the commit work in chunks. yes - the slaves are also
going to having everything to nothing, then slowly building back up again,
lagging behind the master.
... and yes - this is probably
Please clarify the problem, because the error message you provide refers to
address data that is not in the input data that you provide. It doesn't
match!
The error refers to an edu element, but the input data uses a poff
element. Maybe you have multiple SP2514N documents; maybe somebody made
Hello all,
I tried to use grouping with 2 slices with a index of 35K documents. When I
ask top 10 rows, grouped by filed A, it gave me about 16K groups. But, if I
ask for top 20K rows, the ngroups property is now at 30K.
Do you know why and of course how to fix it ?
Thanks.
Thank you Jack.
So, it's not doable/possible to search and highlight keywords within a
field that contains the raw formatted HTML? and strip out the HTML tags
during analysis...so that a user would get back nothing if they did a
search for (ex. p)?
On Mon, Apr 30, 2012 at 5:17 PM, Jack
Hi Prabhu,
I don't think such a merge policy exists, but it would be nice to have this
option and I imagine it wouldn't be hard to write if you really just base the
merge or no merge decision on the time of day (and maybe day of the week).
Note that this should go into Lucene, not Solr, so if
Sorry for the confusion. It is doable. If you feed the raw HTML into a field
that has the HTMLStripCharFilter, the stored value will retain the HTML
tags, while the indexed text will be stripped of the of the tags during
analysis and be searchable just like a normal text field. Then, search
I'm running Nutch, so it's updating the documents, but I'm wanting to
remove ones that are no longer available. So in that case, there's no
update possible.
On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk
mav.p...@holidaylettings.co.uk wrote:
Not sure if there is an automatic
Awesome, I'll give it try. Thanks Jack!
On Tue, May 1, 2012 at 10:23 AM, Jack Krupansky j...@basetechnology.comwrote:
Sorry for the confusion. It is doable. If you feed the raw HTML into a
field that has the HTMLStripCharFilter, the stored value will retain the
HTML tags, while the indexed
Nutch 1.4 has a separate tool to remove 404 and redirects documents from your
index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents
in one run based on segment data.
On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
I'm running Nutch, so it's updating the documents, but
I'm getting this error (below) when doing an import. I'd like to add a
Log line so I can see if the file path is messed up.
So my data-config.xml looks like below but I'm not getting any extra info
in the solr.log file under jetty. Is there a way to log to this log file
from data-import.xml?
Hi
What I do is I put the date created for when the doc was inserted or
updated and then I do a search/delete query based on that
Mav
On 01/05/2012 15:31, Bai Shen baishen.li...@gmail.com wrote:
I'm running Nutch, so it's updating the documents, but I'm wanting to
remove ones that are no
fixed the error, stupid typo, but log msg didn't appear until typo was
fixed. I would have thought they would be unrelated.
On 5/1/12 10:42 AM, Twomey, David david.two...@novartis.com wrote:
I'm getting this error (below) when doing an import. I'd like to add a
Log line so I can see if
Hello,
A related question on this topic. How do I programmatically find the total
number of documents across many shards ? For EmbeddedSolrServer, I use the
following command to get the total count :
solrSearcher.getStatistics().get(numDocs)
With distributed search, how do i get the count of all
Hi,
The first thing that comes to mind is to not query with *:*, which I'm guessing
you are doing, but by running a query with a time range constraint that you
know will return you enough docs, but not so many that performance suffers.
And, of course, thinking beyond Solr, if you really know
OK. I am using SOLR 3.6.
I restarted SOLR and it started working. No idea why. You were right I
showed the error log from a different document.
We might want to add a test case for CDATA.
add
doc
field name=idSP2514N/field
field name=nameSamsung SpinPoint P120 SP2514N - hard drive - 250
GB
CoreContainer.java, in the method 'load', finds itself calling
loader.NewInstance with an 'fname' of Log4j of the slf4j backend is
'Log4j'.
e.g.:
2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer - Unable
to load LogWatcher
org.apache.solr.common.SolrException: Error loading class
PROBLEM RESOLVED.
Solr 3.6.0 changed where it looks for stopwords_en.txt (now in sub-directory
/lang) . Schema.xml generated by Haystack 2.0.0 beta need to be edited.
Everthing working now.
-
BillB1951
--
View this message in context:
I have a field that is defined using what I believe is fairly standard text
fieldType. I have documents with the words 'evaluate', 'evaluating',
'evaluation' in them. When I search on the whole word, obviously it works,
if I search on 'eval' it finds nothing. However for some reason if I search
on
Sounds as if maybe it was some other kind of error having nothing to do with
the data itself. Were there any additional errors or exceptions shortly
before the failure? Maybe memory was low and some component wouldn't load,
or somebody caught an exception without reporting the actual cause.
This is a stemming artifact, that all of the forms of evaluat* are being
stemmed to evalu. That may seem odd, but stemming/stemmers are odd to
begin with.
1. You could choose a different stemmer.
2. You could add synonyms to map various forms of the word to the desired
form, such as eval.
3.
There is a recent JIRA issue about keeping the last n logs to display in the
admin UI.
That introduced a problem - and then the fix introduced a problem - and then
the fix mitigated the problem but left that ugly logging as a by product.
Don't remember the issue # offhand. I think there was a
Hello,
just a short question:
Is it possible to use solr/Lucene as a e-mail classifier? I mean, analyzing
an e-mail to add it automatically to a category (four are available)?
Thanks,
Ramo
On Tue, May 1, 2012 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote:
There is a recent JIRA issue about keeping the last n logs to display in the
admin UI.
That introduced a problem - and then the fix introduced a problem - and then
the fix mitigated the problem but left that ugly
yup.
fieldType name=cq_tag class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.PathHierarchyTokenizerFactory
delimiter=$/
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
Hello,
When you say 2 slices, do you mean 2 shards? As in, you're doing a distributed
query?
If you're doing a distributed query, then for group.ngroups to work you need to
ensure that all documents for a group exist on a single shard.
However, what you're describing sounds an awful lot like
Hello Simon,
Let me reply to solr-user. We consider BJQ as a promising solution for
parent/child usecase, we have a facet component prototype for it; but it's
too raw and my team had to switch to another challenges temporarily.
I participated in SOLR-3076, but achievement is really modest. I've
I am not sure. It just started working.
On Tue, May 1, 2012 at 9:39 AM, Jack Krupansky j...@basetechnology.com wrote:
Sounds as if maybe it was some other kind of error having nothing to do with
the data itself. Were there any additional errors or exceptions shortly
before the failure? Maybe
Hi,
Can anyone help me on how to integrate sen and lucene-ja.jar in SOLR 3.4
or 3.5 or 3.6 version?
Thanks,
Shanmugavel
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-integrate-sen-and-lucene-ja-in-SOLR-3-x-tp3953266.html
Sent from the Solr - User mailing list
There are a number of different routes you can go, one of which is to use
SolrCell (Tika) to parse mbox files and then add your own update processor
that does whatever mail classification analysis you desire and then
generates addition field values for the classification.
A simpler approach
I know about one regression at least. Fix is already committed. see
https://issues.apache.org/jira/browse/SOLR-3360
On Tue, May 1, 2012 at 12:53 AM, Brent Mills bmi...@uship.com wrote:
I've read some things in jira on the new functionality that was put into
caching in the DIH but I wouldn't
no problem - you are welcome.
Nothing out-of-the-box yet. Only approach is ready
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
https://issues.apache.org/jira/browse/SOLR-3076
Regards
On Mon, Apr 30, 2012 at 12:06 PM, G.Long jde...@gmail.com wrote:
Hi :)
Thank
Hi Jack,
thanks for the feedback. I'm really new to that stuff and not sure if I have
fully understood it.
Currently I've split emails in their properties and saved them into
relational tables, for example the body part. Most of my e-mails are html
emails. Now I have for example three
I have similar issue using log4j for logging with trunk build, the
CoreConatainer class print big stack trace on our jboss 4.2.2 startup, I am
using sjfj 1.5.2
10:07:45,918 WARN [CoreContainer] Unable to read SLF4J version
java.lang.NoSuchMethodError:
If you have the code that does all of that analysis, then you could
integrate it with Solr using one of the approaches I listed, but Solr itself
would not provide any of that analysis.
-- Jack Krupansky
-Original Message-
From: Ramo Karahasan
Sent: Tuesday, May 01, 2012 1:14 PM
To:
Hello all,
is there a notification / trigger / callback mechanism people use that
allows them to know when a dataimport process has finished?
we will be doing daily delta-imports and i need some way for an operations
group to know when the DIH has finished.
thank you,
--
View this message in
I am indexing content from a RDBMS. I have a column in a table with pipe
separated values, and upon indexing I would like to transform these values
into multi-valued fields in SOLR's index. For example,
ColumnA (From RDBMS)
-
apple|orange|banana
I want to expand
Thanks for your response Cody,
First, I used distributed grouping on 2 shards and I'm sure then all
documents of each group are in the same shard.
I take a look on JIRA issue and it seem really similar. There is the same
problem with group.ngroups. The count is calculated in second pass
here you go
specify regex transformer in entity tag of DIH config xml like below
entity
transformer=RegexTransformer ... /
and then
field column=ColumnA name=FruitField splitBy=\| /
That's it!
- Jeevanandam
On 02-05-2012 12:35 am, invisbl wrote:
I am indexing content from a RDBMS. I
Hello,
I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/
without and with -noCommit and restarted solr server
Log shows that 5 documents were removed but they are still in the search
results.
Is this a bug or something is missing?
I use nutch-1.4 and solr 3.5
Thanks.
Alex.
Ludovic,
Thanks for your help. I tried your suggestion but it didn't work for
Authors. Below are 3 snippets from data-config.xml, the XML file and the
XML response from the DB
Data-config:
entity name=medlineFiles processor=XPathEntityProcessor
I have a field that is defined using what I believe is fairly standard text
fieldType. I have documents with the words 'evaluate', 'evaluating',
'evaluation' in them. When I search on the whole word, obviously it works,
if I search on 'eval' it finds nothing. However for some reason if I search
on
On 1 May 2012 23:12, geeky2 gee...@hotmail.com wrote:
Hello all,
is there a notification / trigger / callback mechanism people use that
allows them to know when a dataimport process has finished?
we will be doing daily delta-imports and i need some way for an operations
group to know when
Is there a way to boost documents based on the search term/phrase?
My random searches can be a bit slow on startup, so i still would like to
get that lazy load but have more cores available.
I'm actually trying now the LotsOfCores way of handling things.
Had to work a bit to get the patch suitable for 3.5 but it seems to be
doing what i need.
On Tue, May 1,
Do you mean besides query elevation?
http://wiki.apache.org/solr/QueryElevationComponent
And besides explicit boosting by the user (the ^ suffix operator after a
term/phrase)?
-- Jack Krupansky
-Original Message-
From: Donald Organ
Sent: Tuesday, May 01, 2012 3:59 PM
To: solr-user
Hi,
Is that an indexing setting or query setting that will tokenize 'evalu'
but not 'eval'?
Without seeing the tokenizers you're using for the field type it's hard to
say. You can use Solr's analysis page to see the tokens that are generated
by the tokenizers in your analysis chain at both query
Use synonyms at index time. Make eval and evaluate equivalent words.
wunder
On May 1, 2012, at 1:31 PM, Dan Tuffery wrote:
Hi,
Is that an indexing setting or query setting that will tokenize 'evalu'
but not 'eval'?
Without seeing the tokenizers you're using for the field type it's hard
it may be related this this
http://stackoverflow.com/questions/10124055/solr-faceted-search-throws-nullpointerexception-with-http-500-status
we are doing deletes from our index as well so it is possible that
we're running into the same issue. I hope that sheds more light on
things.
On Tue, May
Darn... looks likely that it's another bug from when part of
UnInvertedField was refactored into Lucene.
We really need some random tests that can catch bugs like these though
- I'll see if I can reproduce.
Can you open a JIRA issue for this?
-Yonik
lucenerevolution.com - Lucene/Solr Open Source
query elevation was exactly what I was talking about.
Now is there a way to add this to the default query handler?
On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky j...@basetechnology.comwrote:
Do you mean besides query elevation?
Yes, you can add in last-components section on default query handler.
arr name=last-components
strelevator/str
/arr
- Jeevanandam
On 02-05-2012 3:53 am, Donald Organ wrote:
query elevation was exactly what I was talking about.
Now is there a way to add this to the default query
Here's some doc from Lucid:
http://lucidworks.lucidimagination.com/display/solr/The+Query+Elevation+Component
-- Jack Krupansky
-Original Message-
From: Donald Organ
Sent: Tuesday, May 01, 2012 5:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Boosting documents based on search
Maybe this is the HTTP caching feature? Solr comes with HTTP caching
turned on by default and so when you do queries and changes your
browser does not fetch your changed documents.
On Tue, May 1, 2012 at 11:53 AM, alx...@aim.com wrote:
Hello,
I did bin/nutch solrclean crawl/crawldb
I've been surprised to see Firefox cache even after empty-cache was ordered for
JSOn results...
this is quite annoying but I have get accustomed to it by doing the following
when I need to debug: add a random parameter extra. But only when debugging!
Using wget or curl showed me that the
Hi list,
Does anybody know if the Suggester component is designed to work with shards?
I'm asking because the documentation implies that it should (since ...Suggester
reuses much of the SpellCheckComponent infrastructure…, and the
SpellCheckComponent is documented as supporting a distributed
should i be concerned with the http response codes from update requests?
i can't find documentation on what values come back from them anywhere
(although maybe i'm not looking hard enough.) are they just http standard
with 200 for success and 400/500 for failures?
thanks,
richard
I should have also included one more bit of information.
If I configure the top-level (sharding) request handler to use just the suggest
component as such:
requestHandler name=/suggest
class=org.apache.solr.handler.component.SearchHandler
!-- default values for query parameters --
(12/05/02 1:47), Shanmugavel SRD wrote:
Hi,
Can anyone help me on how to integrate sen and lucene-ja.jar in SOLR 3.4
or 3.5 or 3.6 version?
I think lucene-ja.jar no longer exists in Internet and doesn't work with
Lucene/Solr 3.x because interface doesn't match (lucene-ja doesn't know
all caching is disabled and I restarted jetty. The same results.
Thanks.
Alex.
-Original Message-
From: Lance Norskog goks...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, May 1, 2012 2:57 pm
Subject: Re: Removing old documents
Maybe this is the HTTP caching
you should reverse your sort algorithm. maybe you can override the tf
method of Similarity and return -1.0f * tf(). (I don't know whether
default collector allow score smaller than zero)
Or you can hack this by add a large number or write your own
collector, in its collect(int doc) method, you can
Hi,
Can you please give an example of what you mean?
Otis
Performance Monitoring for Solr / ElasticSearch / HBase -
http://sematext.com/spm
From: Donald Organ dor...@donaldorgan.com
To: solr-user solr-user@lucene.apache.org
Sent: Tuesday, May 1, 2012
I don't have any more details than I provided here, but I created a
ticket with this information. Thanks again
https://issues.apache.org/jira/browse/SOLR-3427
On Tue, May 1, 2012 at 5:20 PM, Yonik Seeley yo...@lucidimagination.com wrote:
Darn... looks likely that it's another bug from when
Perfect, this is working well.
On Tue, May 1, 2012 at 5:33 PM, Jeevanandam je...@myjeeva.com wrote:
Yes, you can add in last-components section on default query handler.
arr name=last-components
strelevator/str
/arr
- Jeevanandam
On 02-05-2012 3:53 am, Donald Organ wrote:
query
check a release since r1332752
If things still look problematic, post a comment on:
https://issues.apache.org/jira/browse/SOLR-3426
this should now have a less verbose message with an older SLF4j and with Log4j
On Tue, May 1, 2012 at 10:14 AM, Gopal Patwa gopalpa...@gmail.com wrote:
I have
If your json value is amp; the proper xml value is amp;amp;
What is the value you are setting on the stored field? is is or amp;?
On Mon, Apr 30, 2012 at 12:57 PM, William Bell billnb...@gmail.com wrote:
One idea was to wrap the field with CDATA. Or base64 encode it.
On Fri, Apr 27, 2012
Yes, I'm the author of that JIRA.
On Tue, May 1, 2012 at 8:45 PM, Ryan McKinley ryan...@gmail.com wrote:
check a release since r1332752
If things still look problematic, post a comment on:
https://issues.apache.org/jira/browse/SOLR-3426
this should now have a less verbose message with an
Hello everyone,
I have a working DIH setup with a couple of long and complicated MySQL queries
in data-config.xml. To make it easier/safer for myself and other developers in
my company to edit the MySQL query, I’d like to remove it from data-config.xml
and store it in a separate file, and then
On Tue, May 1, 2012 at 6:48 PM, Ken Krugler kkrugler_li...@transpac.com wrote:
Hi list,
Does anybody know if the Suggester component is designed to work with shards?
I'm not really sure it is? They would probably have to override the
default merge implementation specified by SpellChecker.
74 matches
Mail list logo