Distributed search component.

2011-04-04 Thread Rok Rejc
Hi all, I am trying to create a distributed search component in solr which is quite difficult (at least for me, because I am new in solr and java). Anyway I have looked into solr source (FacetComponent, TermsComponent...) and created my own search component (it extends SearchComponent) but I

Re: Faceting on multivalued field

2011-04-04 Thread Kaushik Chakraborty
Are you implying to change the DB query of the nested entity which fetches the comments (query is in my post) or something can be done during the index like using Transformers etc. ? Thanks, Kaushik On Mon, Apr 4, 2011 at 8:07 AM, Erick Erickson erickerick...@gmail.comwrote: Why not count

Using MLT feature

2011-04-04 Thread Frederico Azeiteiro
Hi, I would like to hear your opinion about the MLT feature and if it's a good solution to what I need to implement. My index has fields like: headline, body and medianame. What I need to do is, before adding a new doc, verify if a similar doc exists for this media. My idea is to use

Re: Using MLT feature

2011-04-04 Thread Chris Fauerbach
Do you want to not index if something similar? Or don't index if exact. I would look into a hash code of the document if you don't want to index exact. Similar though, I think has to be based off a document in the index. On Apr 4, 2011, at 5:16, Frederico Azeiteiro

Mongo REST interface and full data import

2011-04-04 Thread andrew_s
Hi everyone, I'm trying to make a simple data import from MongoDB into Solr using REST interface. As an test example I've created schecma.xml like: ?xml version=1.0 ? isbn title and data-import.xml as:

RE: Using MLT feature

2011-04-04 Thread Frederico Azeiteiro
Hi, The ideia is don't index if something similar (headline+bodytext) for the same exact medianame. Do you mean I would need to index the doc first (maybe in a temp index) and then use the MLT feature to find similar docs before adding to final index? Thanks, Frederico -Original

Re: Spellchecking Escaped Queries

2011-04-04 Thread Colin Vipurs
Thanks Chris, The field used for indexing and spellcheck is the same and is configured like this:.. fieldType name=title stored=true indexed=true multiValued=false class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter

Re: Spellchecking Escaped Queries

2011-04-04 Thread Colin Vipurs
Thanks Chris, The field used for indexing and spellcheck is the same and is configured like this:.. fieldType name=title stored=true indexed=true multiValued=false class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter

Re: Spellchecking Escaped Queries

2011-04-04 Thread Colin Vipurs
Apologies for the duplicate post. I'm having Evolution problems Thanks Chris, The field used for indexing and spellcheck is the same and is configured like this:.. fieldType name=title stored=true indexed=true multiValued=false class=solr.TextField analyzer tokenizer

Re: Using MLT feature

2011-04-04 Thread Markus Jelsma
http://wiki.apache.org/solr/Deduplication On Monday 04 April 2011 11:34:52 Frederico Azeiteiro wrote: Hi, The ideia is don't index if something similar (headline+bodytext) for the same exact medianame. Do you mean I would need to index the doc first (maybe in a temp index) and then use

help with Jetty log message

2011-04-04 Thread Matthieu Huin
Greetings all, I am currently using solr as the backend behind a log aggregation and search system my team is developing. All was well and good until I noticed a test server crashing quite unexpectedly. We'd like to dig more into the incident but none of us has much experience with Jetty

Re: help with Jetty log message

2011-04-04 Thread Upayavira
This is not Solr crashing, per se, it is your JVM. I personally haven't generally had much success debugging these kinds of failure - see whether it happens again, and if it does, try updating your JVM/switching to another/etc. Anyone have better advice? Upayavira On Mon, 04 Apr 2011 11:59

RE: Using MLT feature

2011-04-04 Thread Frederico Azeiteiro
Thank you Markus it looks great. But the wiki is not very detailed on this. Do you mean if I: 1. Create: updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool bool

Re: Mongo REST interface and full data import

2011-04-04 Thread Erick Erickson
I'm having trouble seeing your schema files, etc. I don't know if gmail is stripping this on my end or whether your e-mail is stripping it on upload, anyone else seeing this? But to your question, what version are you using? From Solr3.1 http://wiki.apache.org/solr/Solr3.1 is the first version

RE: Faceting on multivalued field

2011-04-04 Thread Jonathan Rochkind
Is there a kind of function query that can count number of values in a multi-valued field on a given document? I do not know. From: Erick Erickson [erickerick...@gmail.com] Sent: Sunday, April 03, 2011 10:37 PM To: solr-user@lucene.apache.org Subject:

Re: Solrj performance bottleneck

2011-04-04 Thread rahul
Hi All, I just to want to share some findings which clearly identified the reason for our performance bottleneck. we had looked into several areas for optimization mostly directed at Solr configurations, stored fields, highlighting, JVM, OS cache etc. But it turned out that the main culprit was

RE: Using MLT feature

2011-04-04 Thread Frederico Azeiteiro
Hi again, I guess I was wrong on my early post... There's no automated way to avoid the indexation of the duplicate doc. I guess I have 2 options: 1. Create a temp index with signatures and then have an app that for each new doc verifies if sig exists on my primary index. If not, add the

Re: Using MLT feature

2011-04-04 Thread Markus Jelsma
Hi again, I guess I was wrong on my early post... There's no automated way to avoid the indexation of the duplicate doc. Yes there is, try set overwriteDupes to true and documents yielding the same signature will be overwritten. If you have need both fuzzy and exact matching then add a

Re: Solrj performance bottleneck

2011-04-04 Thread openvictor Open
Dear Rahul, Stefan has the right solution. the autosuggest must be checked both from Javascript and your backend. For javascript there are some really nice tools to do that such as Jquery which implements a auto-suggest with a tunable delay. It has also highlighting, you can add additional

dismax boost query not useful?

2011-04-04 Thread Smiley, David W.
As I was reviewing the boosting capabilities of the dismax edismax query parsers, it's not clear to me that the boost query has much use. The value of boost functions, particularly with a multiplied boost that edismax supports, is very clear -- there are a variety of uses. But I can't think

Problems indexing very large set of documents

2011-04-04 Thread Brandon Waterloo
Hey everybody, I've been running into some issues indexing a very large set of documents. There's about 4000 PDF files, ranging in size from 160MB to 10KB. Obviously this is a big task for Solr. I have a PHP script that iterates over the directory and uses PHP cURL to query Solr to index

Re: Problems indexing very large set of documents

2011-04-04 Thread Anuj Kumar
This is related to Apache TIKA. Which version are you using? Please see this thread for more details- http://lucene.472066.n3.nabble.com/PDF-parser-exception-td644885.html http://lucene.472066.n3.nabble.com/PDF-parser-exception-td644885.htmlHope it helps. Regards, Anuj On Mon, Apr 4, 2011 at

RE: Problems indexing very large set of documents

2011-04-04 Thread Brandon Waterloo
Looks like I'm using Tika 0.4: apache-solr-1.4.1/contrib/extraction/lib/tika-core-0.4.jar .../tika-parsers-0.4.jar ~Brandon Waterloo From: Anuj Kumar [anujs...@gmail.com] Sent: Monday, April 04, 2011 2:12 PM To: solr-user@lucene.apache.org Cc: Brandon

Re: Problems indexing very large set of documents

2011-04-04 Thread Anuj Kumar
In the log messages are you able to locate the file at which it fails? Looks like TIKA is unable to parse one of your PDF files for the details. We need to hunt that one out. Regards, Anuj On Mon, Apr 4, 2011 at 11:57 PM, Brandon Waterloo brandon.water...@matrix.msu.edu wrote: Looks like I'm

Re: Matching the beginning of a word within a term

2011-04-04 Thread Brian Lamb
Thank you both for your replies. It looks like EdgeNGramFilter will do the job nicely. Time to reindex...again. On Fri, Apr 1, 2011 at 8:31 AM, Jan Høydahl jan@cominvent.com wrote: Check out http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory Don't know

Re: Matching on a multi valued field

2011-04-04 Thread Brian Lamb
I just noticed Juan's response and I find that I am encountering that very issue in a few cases. Boosting is a good way to put the more relevant results to the top but it is possible to only have the correct results returned? On Wed, Mar 30, 2011 at 11:51 AM, Brian Lamb

Re: Matching on a multi valued field

2011-04-04 Thread Juan Pablo Mora
I have not find any solution to this. The only thing is to denormalize your multivalue field into several docs with a single value field. Try ComplexPhraseQueryParser (https://issues.apache.org/jira/browse/SOLR-1604) if you are using solr 1.4 version. El 04/04/2011, a las 21:21, Brian Lamb

RE: Using the Data Import Handler with SQLite

2011-04-04 Thread Zac Smith
I was able to resolve this issue by using a different jdbc driver: http://www.xerial.org/trac/Xerial/wiki/SQLiteJDBC -Original Message- From: Zac Smith [mailto:z...@trinkit.com] Sent: Friday, April 01, 2011 5:56 PM To: solr-user@lucene.apache.org Subject: Using the Data Import Handler

Re: does overwrite=false work with json

2011-04-04 Thread David Murphy
I tried it with the example json documents, and even if I add overwrite=false to the URL, it still overwrites. Do this twice: curl 'http://localhost:8983/solr/update/json?commit=trueoverwrite=false' --data-binary @books.json -H 'Content-type:application/json' Then do this query: curl

Re: Question about http://wiki.apache.org/solr/Deduplication

2011-04-04 Thread eks dev
Thanks Hoss, Externanlizing this part is exactly the path we are exploring now, not only for this reason. We already started testing Hadoop SequenceFile for write ahead log for updates/deletes. SequenceFile supports append now (simply great!). It was a a pain to have to add hadoop into mix for

Re: Mongo REST interface and full data import

2011-04-04 Thread andrew_s
Sorry for mistake with Solr version ... I'm using Solr 3.1 -- View this message in context: http://lucene.472066.n3.nabble.com/Mongo-REST-interface-and-full-data-import-tp2774479p2777319.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Matching on a multi valued field

2011-04-04 Thread Jonathan Rochkind
On 4/4/2011 3:21 PM, Brian Lamb wrote: I just noticed Juan's response and I find that I am encountering that very issue in a few cases. Boosting is a good way to put the more relevant results to the top but it is possible to only have the correct results returned? Only what's already been said

Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-04 Thread Jens Mueller
Hello Experts, I am a Solr newbie but read quite a lot of docs. I still do not understand what would be the best way to setup very large scale deployments: Goal (threoretical): A.) Index-Size: 1 Petabyte (1 Document is about 5 KB in Size) B) Queries: 10 Queries/ per Second C)