Re: Customizing Solr to handle Leading Wildcard queries

2009-01-28 Thread Neal Richter
leading wildcard search is called grep ;-) Ditto on the indexing reversed words suggestion. Can you create a second field in solr that contains /only/ the words from the fields you care to reverse? Once you do that you could pre-process the query and look for leading wildcards and address those

Re: Text classification with Solr

2009-01-28 Thread Hannes Carl Meyer
From my past projects, our Lucene classification corpus looked like this: 0|document text...|categoryA 1|document text...|categoryB 2|document text...|categoryA 3|document text...|categoryA ... 800|document text...|categoryC With the faceting capabilities of Solr it is now possible to design

Re: solrj delete by Id problem

2009-01-28 Thread Parisa
I should say that we also have this problem when we commit with waitflush = true and waitsearcher = true because it again close the old searcher and open a new one. so it has warming up process with the queryResultCache. besides , I need to commit waitFlush = false and waitSearcher=false to

Re: solrj delete by Id problem

2009-01-28 Thread Shalin Shekhar Mangar
On Wed, Jan 28, 2009 at 4:29 PM, Parisa paris...@gmail.com wrote: I should say that we also have this problem when we commit with waitflush = true and waitsearcher = true because it again close the old searcher and open a new one. so it has warming up process with the queryResultCache.

Re: solrj delete by Id problem

2009-01-28 Thread Parisa
I know that I can see the search result after the commit and it is ok, I can disable the queryResultCache and the problem will be fixed . but I need the queryResultCache because my index Size is big and I need good performance . so I am trying to find how to fix the bug or may be the solr

Changing to multicore

2009-01-28 Thread Jeff Newburn
We are moving from single core to multicore. We have a few servers that we want to migrate one at a time to ensure that each one functions. This process is proving difficult as there is no default core to allow the application to talk to the solr servers uniformly (ie without a core name during

Joining Solr Indexes

2009-01-28 Thread Jae Joo
Hi, Is there any way to join multiple indexes in Solr? Thanks, Jae

Re: multilanguage prototype

2009-01-28 Thread Jerven Bolleman
Hi, Your problem seems to be lower level than the SOLR code. You are sending an xml request that contains an illegal (to xml spec) character. You should strip these characters out of the data that you send. Or turn the xml validation (not recommended because of all kinds of risks). See

Re: Changing to multicore

2009-01-28 Thread Bryan Talbot
I would think that using a servlet filter to rewrite the URL should be pretty strait forward. You could write your own or use a tool like http://tuckey.org/urlrewrite/ and just configure that. Using something like this, I think the upgrade procedure could be: - install rewrite filter to

Re: [dummy question] applying patch

2009-01-28 Thread Mark Miller
surfer10 wrote: i'm a little bit noob in java compiler so could you please tell me what tools are used to apply patch SOLR-236 (Field groupping), does it need to be applied on current solr-1.3 (and nightly builds of 1.4) or it already in box? what batch file stands for solr compilation in its

Re: Changing to multicore

2009-01-28 Thread Jeff Newburn
Tried that. Basically, solr really didn't want to do the internal rewrite. So essentially we would have to rewrite with a full redirect and then change the solrj source to allow it to follow the redirect. We are going with an external rewriter. However, the seemingly easiest way would be to

RE: Performance dead-zone due to garbage collection

2009-01-28 Thread Renaud Waldura
I'm coming in late on this thread, but I want to recommend the YourKit Profiler product. It helped me track a performance problem similar to what you describe. I had been futzing with GC logging etc. for days before YourKit pinpointed the issue within minutes. http://www.yourkit.com/ (My problem

Re: index size tripled during optimization

2009-01-28 Thread Qingdi
Hi Ryuuichi, Thanks for your quick reply. I checked the setting of useCompoundFile in solrconfig.xml, and the value is 'false'. Here is what in our solrconfig.xml. === indexDefaults !-- Values here affect all index writers

Re: index size tripled during optimization

2009-01-28 Thread Shalin Shekhar Mangar
Does you index stay at triple size after optimization? It is normal for Lucene to use 2x or upto 3x disk space during optimization but it should fall back to the normal numbers once optimization completes and unused segments are cleaned up due the index deletion policy. If you search for threads

Re: Joining Solr Indexes

2009-01-28 Thread Sameer Maggon
IndexMergeTool - http://wiki.apache.org/solr/MergingSolrIndexes Sameer. -- http://www.productification.com On Wed, Jan 28, 2009 at 7:30 AM, Jae Joo jae...@gmail.com wrote: Hi, Is there any way to join multiple indexes in Solr? Thanks, Jae

Re: Highlighting does not work?

2009-01-28 Thread Mike Klaas
Well, both pages I listed are in the search results :). But I agree that it isn't obvious to find, and that it should be improved. (The Wiki is a community-created site which anyone can contribute to, incidentally.) cheers, -Mike On 28-Jan-09, at 1:11 AM, Jarek Zgoda wrote: I swear I

Re: query with stemming, prefix and fuzzy?

2009-01-28 Thread Shalin Shekhar Mangar
On Thu, Jan 29, 2009 at 12:39 AM, Gert Brinkmann g...@netcologne.de wrote: Hello again, is there nobody who could help me with this? Or is it an FAQ and my questions are dumb somehow? Maybe I should try to shorten the questions: ;) Quite the opposite, you are actually working with some

solr as the data store

2009-01-28 Thread Ian Connor
Hi All, Is anyone using Solr (and thus the lucene index) as there database store. Up to now, we have been using a database to build Solr from. However, given that lucene already keeps the stored data intact, and that rebuilding from solr to solr can be very fast, the need for the separate

Re: solr as the data store

2009-01-28 Thread Matthew Runo
One thing to keep in mind is that things like joins are impossible in solr, but easy in a database. So if you ever need to do stuff like run reports, you're probably better off with a database to query on - unless you cover your bases very well in the solr index. Thanks for your time!

Re: solr as the data store

2009-01-28 Thread Otis Gospodnetic
This is perfectly fine. Of course, you lose any relational model. If you don't have or don't need one, why not. It used to be the case that backups of live Lucene indices were hard, so people preferred having a RDBMS be the primary data source, the one they know how to back up and maintain

Re: Customizing Solr to handle Leading Wildcard queries

2009-01-28 Thread Otis Gospodnetic
Yeah, I think the begin/end chars are very helpful here. But I like the suggestion of figuring out which words really need to support leading wildcards...although that's typically impossible to predict, since people are typically free to enter whatever queries they feel like. Otis --

Re: Tools for Managing Synonyms, Elevate, etc.

2009-01-28 Thread Otis Gospodnetic
Mark, I am not aware of anyone open-sourcing such tools. But note that changing the files with a GUI is easy (editor + scp?). What makes things more complicated is the need to make Solr reload those files and, in some cases, changes really require a full index rebuilding. Otis -- Sematext

Re: Indexing documents in multiple languages

2009-01-28 Thread Otis Gospodnetic
Alejandro, What you really want to do is identify the language of the email, store that in the index and apply the appropriate analyzer. At query time you really want to know the language of the query (either by detecting it or asking the user or ...) Otis -- Sematext -- http://sematext.com/

RE: solr as the data store

2009-01-28 Thread Feak, Todd
Although the idea that you will need to rebuild from scratch is unlikely, you might want to fully understand the cost of recovery if you *do* have to. If it's incredibly expensive(time or money), you need to keep that in mind. -Todd -Original Message- From: Ian Connor

Re: solr as the data store

2009-01-28 Thread Ian Connor
I am planning with backups, the recovery will only be incremental. Is there an internal field to know when the last document hit the index or is this best to build your own created_at type field to know when you need to rebuild from? After the backup is restored, this field could be read and

Re: solr as the data store

2009-01-28 Thread Otis Gospodnetic
There is no existing internal field like that. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Ian Connor ian.con...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, January 28, 2009 4:59:28 PM Subject: Re: solr as the data

multilanguage + howto search in all languages?

2009-01-28 Thread Julian Davchev
Hi, I currently have two indexes with solr. One for english version and one with german version. They use respectively english/german2 snowball factory. Right now depending on which language is website currently I query corresponding index. There is requirement though that stuff is found

Re: Help with Solr 1.3 lockups?

2009-01-28 Thread Jerome L Quinn
Mark Miller markrmil...@gmail.com wrote on 01/26/2009 04:30:00 PM: Just a point or I missed: with such a large index (not doc size large, but content wise), I imagine a lot of your 16GB of RAM is being used by the system disk cache - which is good. Another reason you don't want to give too

Pagination by facet?

2009-01-28 Thread Bruno Aranda
Hi, bear with me as I am new to Solr. I have a requirement in an application where I need to show a list of results by groups. For instance, each document in my index correspond to a person and they have a family name. I have hundreds of thousands of records (persons). What I would like to do is

Re: Help with Solr 1.3 lockups?

2009-01-28 Thread Mark Miller
org/apache/catalina/connector/Connector java/util/WeakHashMap $Entry399,913,269 bytes org/apache/catalina/connector/Connector java/lang/Object[ ] 197,256,078 bytes org/apache/lucene/search/ExtendedFieldCachejava/util/WeakHashMap$Entry [ ] 177,893,021 bytes

DIH handling of missing files

2009-01-28 Thread Nathan Adams
I am constructing documents from a JDBC datasource and a HTTP datasource (see data-config file below.) My problem is that I cannot know if a particular HTTP URL is available at index time, so I need DIH to continue processing even if the HTTP location returns a 404. onError=continue does not

Re: solr as the data store

2009-01-28 Thread Erick Erickson
But do note that there's also no requirement that all documents have the same fields. So you could consider storing a special meta document that had *no* fields in common with any other document that records whatever information you want about the current state of the index. Best Erick On Wed,

Re: multilanguage + howto search in all languages?

2009-01-28 Thread Erick Erickson
I'm not entirely sure about the fine points, but consider the filters that are available that fold all the diacritics into their low-ascii equivalents. Perhaps using that filter at *both* index and search time on the English index would do the trick. In your example, both would be 'munchen'.

Re: multilanguage + howto search in all languages?

2009-01-28 Thread Walter Underwood
Duh. Four cases. For extra credit, what language is wunder in? wunder On 1/28/09 5:12 PM, Walter Underwood wunderw...@netflix.com wrote: I've done this. There are five cases for the tokens in the search index: 1. Tokens that are unique after stemming (this is good). 2. Tokens that are

Re: DIH handling of missing files

2009-01-28 Thread Noble Paul നോബിള്‍ नोब्ळ्
onError=continue must help . which version of DIH are you using? onError is a Solr 1.4 feature --Noble On Thu, Jan 29, 2009 at 5:04 AM, Nathan Adams na...@umich.edu wrote: I am constructing documents from a JDBC datasource and a HTTP datasource (see data-config file below.) My problem is that

newbie question --- multiple schemas

2009-01-28 Thread Cheng Zhang
Hello, Is it possible to define more than one schema? I'm reading the example schema.xml. It seems that we can only define one schema? What about if I want to define one schema for document type A and another schema for document type B? Thanks a lot, Kevin