Hi all,
I am trying to create a distributed search component in solr which is quite
difficult (at least for me, because I am new in solr and java). Anyway I
have looked into solr source (FacetComponent, TermsComponent...) and created
my own search component (it extends SearchComponent) but I
Are you implying to change the DB query of the nested entity which fetches
the comments (query is in my post) or something can be done during the index
like using Transformers etc. ?
Thanks,
Kaushik
On Mon, Apr 4, 2011 at 8:07 AM, Erick Erickson erickerick...@gmail.comwrote:
Why not count
Hi,
I would like to hear your opinion about the MLT feature and if it's a
good solution to what I need to implement.
My index has fields like: headline, body and medianame.
What I need to do is, before adding a new doc, verify if a similar doc
exists for this media.
My idea is to use
Do you want to not index if something similar? Or don't index if exact. I
would look into a hash code of the document if you don't want to index exact.
Similar though, I think has to be based off a document in the index.
On Apr 4, 2011, at 5:16, Frederico Azeiteiro
Hi everyone,
I'm trying to make a simple data import from MongoDB into Solr using REST
interface.
As an test example I've created schecma.xml like:
?xml version=1.0 ?
isbn
title
and data-import.xml as:
Hi,
The ideia is don't index if something similar (headline+bodytext) for
the same exact medianame.
Do you mean I would need to index the doc first (maybe in a temp index)
and then use the MLT feature to find similar docs before adding to final
index?
Thanks,
Frederico
-Original
Thanks Chris,
The field used for indexing and spellcheck is the same and is configured
like this:..
fieldType name=title stored=true indexed=true multiValued=false
class=solr.TextField
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter
Thanks Chris,
The field used for indexing and spellcheck is the same and is configured
like this:..
fieldType name=title stored=true indexed=true multiValued=false
class=solr.TextField
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter
Apologies for the duplicate post. I'm having Evolution problems
Thanks Chris,
The field used for indexing and spellcheck is the same and is
configured like this:..
fieldType name=title stored=true indexed=true multiValued=false
class=solr.TextField
analyzer
tokenizer
http://wiki.apache.org/solr/Deduplication
On Monday 04 April 2011 11:34:52 Frederico Azeiteiro wrote:
Hi,
The ideia is don't index if something similar (headline+bodytext) for
the same exact medianame.
Do you mean I would need to index the doc first (maybe in a temp index)
and then use
Greetings all,
I am currently using solr as the backend behind a log aggregation and
search system my team is developing. All was well and good until I
noticed a test server crashing quite unexpectedly. We'd like to dig more
into the incident but none of us has much experience with Jetty
This is not Solr crashing, per se, it is your JVM. I personally haven't
generally had much success debugging these kinds of failure - see
whether it happens again, and if it does, try updating your
JVM/switching to another/etc.
Anyone have better advice?
Upayavira
On Mon, 04 Apr 2011 11:59
Thank you Markus it looks great.
But the wiki is not very detailed on this.
Do you mean if I:
1. Create:
updateRequestProcessorChain name=dedupe
processor
class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
bool name=enabledtrue/bool
bool
I'm having trouble seeing your schema files, etc. I don't
know if gmail is stripping this on my end or whether
your e-mail is stripping it on upload, anyone else seeing this?
But to your question, what version are you using? From
Solr3.1 http://wiki.apache.org/solr/Solr3.1 is the first version
Is there a kind of function query that can count number of values in a
multi-valued field on a given document? I do not know.
From: Erick Erickson [erickerick...@gmail.com]
Sent: Sunday, April 03, 2011 10:37 PM
To: solr-user@lucene.apache.org
Subject:
Hi All,
I just to want to share some findings which clearly identified the reason
for our performance bottleneck. we had looked into several areas for
optimization mostly directed at Solr configurations, stored fields,
highlighting, JVM, OS cache etc. But it turned out that the main culprit
was
Hi again,
I guess I was wrong on my early post... There's no automated way to avoid the
indexation of the duplicate doc.
I guess I have 2 options:
1. Create a temp index with signatures and then have an app that for each new
doc verifies if sig exists on my primary index.
If not, add the
Hi again,
I guess I was wrong on my early post... There's no automated way to avoid
the indexation of the duplicate doc.
Yes there is, try set overwriteDupes to true and documents yielding the same
signature will be overwritten. If you have need both fuzzy and exact matching
then add a
Dear Rahul,
Stefan has the right solution. the autosuggest must be checked both from
Javascript and your backend. For javascript there are some really nice tools
to do that such as Jquery which implements a auto-suggest with a tunable
delay. It has also highlighting, you can add additional
As I was reviewing the boosting capabilities of the dismax edismax query
parsers, it's not clear to me that the boost query has much use. The value
of boost functions, particularly with a multiplied boost that edismax supports,
is very clear -- there are a variety of uses. But I can't think
Hey everybody,
I've been running into some issues indexing a very large set of documents.
There's about 4000 PDF files, ranging in size from 160MB to 10KB. Obviously
this is a big task for Solr. I have a PHP script that iterates over the
directory and uses PHP cURL to query Solr to index
This is related to Apache TIKA. Which version are you using?
Please see this thread for more details-
http://lucene.472066.n3.nabble.com/PDF-parser-exception-td644885.html
http://lucene.472066.n3.nabble.com/PDF-parser-exception-td644885.htmlHope
it helps.
Regards,
Anuj
On Mon, Apr 4, 2011 at
Looks like I'm using Tika 0.4:
apache-solr-1.4.1/contrib/extraction/lib/tika-core-0.4.jar
.../tika-parsers-0.4.jar
~Brandon Waterloo
From: Anuj Kumar [anujs...@gmail.com]
Sent: Monday, April 04, 2011 2:12 PM
To: solr-user@lucene.apache.org
Cc: Brandon
In the log messages are you able to locate the file at which it fails? Looks
like TIKA is unable to parse one of your PDF files for the details. We need
to hunt that one out.
Regards,
Anuj
On Mon, Apr 4, 2011 at 11:57 PM, Brandon Waterloo
brandon.water...@matrix.msu.edu wrote:
Looks like I'm
Thank you both for your replies. It looks like EdgeNGramFilter will do the
job nicely. Time to reindex...again.
On Fri, Apr 1, 2011 at 8:31 AM, Jan Høydahl jan@cominvent.com wrote:
Check out
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory
Don't know
I just noticed Juan's response and I find that I am encountering that very
issue in a few cases. Boosting is a good way to put the more relevant
results to the top but it is possible to only have the correct results
returned?
On Wed, Mar 30, 2011 at 11:51 AM, Brian Lamb
I have not find any solution to this. The only thing is to denormalize your
multivalue field into several docs with a single value field.
Try ComplexPhraseQueryParser (https://issues.apache.org/jira/browse/SOLR-1604)
if you are using solr 1.4 version.
El 04/04/2011, a las 21:21, Brian Lamb
I was able to resolve this issue by using a different jdbc driver:
http://www.xerial.org/trac/Xerial/wiki/SQLiteJDBC
-Original Message-
From: Zac Smith [mailto:z...@trinkit.com]
Sent: Friday, April 01, 2011 5:56 PM
To: solr-user@lucene.apache.org
Subject: Using the Data Import Handler
I tried it with the example json documents, and even if I add overwrite=false
to the URL, it still overwrites.
Do this twice:
curl 'http://localhost:8983/solr/update/json?commit=trueoverwrite=false'
--data-binary @books.json -H 'Content-type:application/json'
Then do this query:
curl
Thanks Hoss,
Externanlizing this part is exactly the path we are exploring now, not
only for this reason.
We already started testing Hadoop SequenceFile for write ahead log for
updates/deletes.
SequenceFile supports append now (simply great!). It was a a pain to
have to add hadoop into mix for
Sorry for mistake with Solr version ... I'm using Solr 3.1
--
View this message in context:
http://lucene.472066.n3.nabble.com/Mongo-REST-interface-and-full-data-import-tp2774479p2777319.html
Sent from the Solr - User mailing list archive at Nabble.com.
On 4/4/2011 3:21 PM, Brian Lamb wrote:
I just noticed Juan's response and I find that I am encountering that very
issue in a few cases. Boosting is a good way to put the more relevant
results to the top but it is possible to only have the correct results
returned?
Only what's already been said
Hello Experts,
I am a Solr newbie but read quite a lot of docs. I still do not understand
what would be the best way to setup very large scale deployments:
Goal (threoretical):
A.) Index-Size: 1 Petabyte (1 Document is about 5 KB in Size)
B) Queries: 10 Queries/ per Second
C)
33 matches
Mail list logo