Distributed field collapsing

2011-02-08 Thread V, Sriram
Hi, Is there any patch available for distributed field collapsing. I need it in my app. Any ideas please add... regards, V.Sriram

RE: How to search for special chars like ä from ae?

2011-02-08 Thread Anithya
Hi Steve, thanks for the reply. I did not understand which file do I need to rename? I'm working on Solr 1.4. The file in examples/solr/conf directory is mapping-ISOLatin1Accent.txt. The Schema.xml has the following commented entry. charFilter class=solr.MappingCharFilterFactory

does copyField recurse?

2011-02-08 Thread Paul Libbrecht
Hello list, if I have a field title which copied to text and a field text that is copied to text.stemmed. Am I going to get the copy from the field title to the field text.stemmed or should I include it? thanks in advance paul

Re: does copyField recurse?

2011-02-08 Thread Markus Jelsma
Field values are copied before being analyzed. There is no cascading of analyzers. Hello list, if I have a field title which copied to text and a field text that is copied to text.stemmed. Am I going to get the copy from the field title to the field text.stemmed or should I include it?

Re: does copyField recurse?

2011-02-08 Thread Paul Libbrecht
And no cascading of copying (as I experimented). I just enriched the wiki's http://wiki.apache.org/solr/SchemaXml#Copy_Fields thanks to proof. paul Le 8 févr. 2011 à 11:16, Markus Jelsma a écrit : Field values are copied before being analyzed. There is no cascading of analyzers.

Re: Solr n00b question: writing a custom QueryComponent

2011-02-08 Thread Upayavira
I'm still not quite clear what you are attempting to achieve, and more so why you need to extend Solr rather than just wrap it. You have data with title, description and content fields. You make no mention of an ID field. Surely, if you want to store some in mysql and some in Solr, you could

Re: q.alt=*:* for every request?

2011-02-08 Thread Markus Jelsma
I'm not sure what you mean but you may be looking for debugQuery=true ? On Tuesday 08 February 2011 08:28:12 Paul Libbrecht wrote: To be able to see this well, it would be lovely to have a switch that would activate a logging of the query expansion result. The Dismax QParserPlugin is

Re: Solr n00b question: writing a custom QueryComponent

2011-02-08 Thread Ishwar
Hi Upayavira, Apologies for the lack of clarity in the mail. The feeds have the following fields: id, url, title, content, refererurl, createdDate, author, etc. We need search functionality on title and content. As mentioned earlier, storing title and content in solr takes up a lot of space.

Re: Http Connection is hanging while deleteByQuery

2011-02-08 Thread shan2812
Hi, At last the migration to Solr-1.4.1 does solve this issue :-).. Cheers -- View this message in context: http://lucene.472066.n3.nabble.com/Http-Connection-is-hanging-while-deleteByQuery-tp2367405p2451214.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr n00b question: writing a custom QueryComponent

2011-02-08 Thread Upayavira
The conventional way to do it would be to index your title and content fields in Solr, along with the ID to identify the document. You could do a search against solr, and just return an ID field, then your 'client code' would match that up with the title/content data from your database. And yes,

Re: Search for FirstName with first Char uppercase followed by * not giving result; getting result with all lowercase and *

2011-02-08 Thread Erick Erickson
What you are missing is that the analysis page shows what happens when the text is run through analysis. Wildcards ARE NOT ANALYZED, so you cannot assume that the analysis page shows you what the search terms in that case. Regardless of whether george* is shown in the analysis page, the term

Re: Solr n00b question: writing a custom QueryComponent

2011-02-08 Thread Ishwar
Thanks for the detailed reply Upayavira. To answer your question, our index is growing much faster than expected and our performance is grinding to a halt. Currently, it has over 150 million records. We're planning to split the index into multiple shards very soon and move the index creation

Re: Solr n00b question: writing a custom QueryComponent

2011-02-08 Thread Edoardo Tosca
Hi, i agree with Upayavira, probably it's better to create an external app that retrieves content from a db. Anyway, if i am not wrong, finishStage is a method called by the coordinator if you have a distributed search. if your solr is on a single machine every component should implement only

RequestHandler code within 1.4.0 dist

2011-02-08 Thread McGibbney, Lewis John
Hello list, I have been searching through 1.4.0 source for a standard requestHandler plug-in example. I understand that for my purposes, extending RequestHandlerBase is a starting point, however I was wondering if there is any examples of plug-ins which I can view such as those contained

difference between filter_queries and parsed_filter_queries

2011-02-08 Thread Bagesh Sharma
Hi everybody, please suggest me what's the difference between these two things. After what processing on filter_queries the parsed_filter_queries are generated. Basically ... when i am searching city as fq=city:'noida' then filter_queries and parsed_filter_queries both are same as 'noida'.

Re: difference between filter_queries and parsed_filter_queries

2011-02-08 Thread Markus Jelsma
Hi, The parsed_filter_queries contains the value after it passed through the analyzer. In this case it remains the same because it was already lowercased and no synonyms were used. You're also using single quotes, these have no special meaning so you're searching for 'noida' in the first and

General question about Solr Caches

2011-02-08 Thread Savvas-Andreas Moysidis
Hello, I am going through the wiki page related to cache configuration http://wiki.apache.org/solr/SolrCaching and I have a question regarding the general cache architecture and implementation: In my understanding, the Current Index Searcher uses a cache instance and when a New Index

Re: EdgeNgram Auto suggest - doubles ignore

2011-02-08 Thread johnnyisrael
Hi Erick, If you have time, Can you please take a look and provide your comments (or) suggestions for this problem? Please let me know if you need any more information. Thanks, Johnny -- View this message in context:

Re: TermVector query using Solr Tutorial

2011-02-08 Thread Grant Ingersoll
Inline... On Feb 5, 2011, at 4:28 AM, Ryan Chan wrote: Hello all, I am following this tutorial: http://lucene.apache.org/solr/tutorial.html, I am playing with the TermVector, here is my step: 1. Launch the example server, java -jar start.jar 2. Index the monitor.xml, java -jar

Cache size

2011-02-08 Thread Mehdi Ben Haj Abbes
Hi folks, Is there any way to know the size *in bytes* occupied by a cache (filter cache, doc cache ...)? I don't find such information within the stats page. Regards -- Mehdi BEN HAJ ABBES

Re: Cache size

2011-02-08 Thread Markus Jelsma
You can dump the heap and analyze it with a tool like jhat. IBM's heap analyzer is also a very good tool and if i'm not mistaken people also use one that comes with Eclipse. On Tuesday 08 February 2011 16:35:35 Mehdi Ben Haj Abbes wrote: Hi folks, Is there any way to know the size *in

Re: EdgeNgram Auto suggest - doubles ignore

2011-02-08 Thread Erick Erickson
I'm afraid I'll have to pass, I'm absolutely swamped at the moment. Perhaps someone else can pick it up. I will say that you should be getting terms back when you pre-lower-case them, so look in your index via the admin page or Luke to see if what's really in your index is what you think in the

RE: How to search for special chars like ä from ae?

2011-02-08 Thread Steven A Rowe
Hi Anithya, Yes, that sounds right. You will want to edit mapping-FoldToASCII.txt, and my suggestion is that you rename mapping-FoldToASCII.txt to reflect your changes (for example, if your target language is German, you could rename it to mapping-German-FoldToASCII.txt); otherwise it would

Re: Separating Index Reader and Writer

2011-02-08 Thread Em
Just wanted to push that topic. Regards Em wrote: Hi Peter, I must jump in this discussion: From a logical point of view what you are saying makes only sense if both instances do not run on the same machine or at least not on the same drive. When both run on the same machine and the

Jira problem

2011-02-08 Thread Em
Hi list, I wanted to create a Jira-issue because of the CSVUpdateHandler-topic I started a few days ago. However I can not create a Jira-account - I do not recieve any mail or something like that. Are there any troubles with the Jira? Regards -- View this message in context:

Scoring: Precedent for a Rules-/Priority-based Approach?

2011-02-08 Thread Tavi Nathanson
Hey everyone, I have a question about Lucene/Solr scoring in general. There are many factors at play in the final score for each document, and very often one factor will completely dominate everything else when that may not be the intention. ** The question: might there be a way to enforce

Tokenization: How to Allow Multiple Strategies?

2011-02-08 Thread Tavi Nathanson
Hey everyone, Tokenization seems inherently fuzzy and imprecise, yet Solr/Lucene does not appear to provide an easy mechanism to account for this fuzziness. Let's take an example, where the document I'm indexing is v1.1.0 mr. jones www.gmail.com I may want to tokenize this as follows: [v1.1.0,

Re: Scoring: Precedent for a Rules-/Priority-based Approach?

2011-02-08 Thread Savvas-Andreas Moysidis
Hi Tavi, In my understanding the scoring formula Lucene (and therefore Solr) uses is based on a mathematical model which is proven to work for general purpose full text searching. The real challenge, as you mention, comes when you need to achieve high quality scoring based on the domain you are

Re: Scoring: Precedent for a Rules-/Priority-based Approach?

2011-02-08 Thread Em
Hi Tavi, could you please provide an example query for your problem and the debugQuery's output? It confuses me that you write score(query apple) = max(score(field1:apple), score(field2:apple)) I think your problem could come from the norms of your request, but I am not sure. If you can, show

Re: HTTP ERROR 400 undefined field: *

2011-02-08 Thread Jed Glazner
So I re-indexed some of the content, but no dice. Per Hoss, I tried disabling the TVC and it worked great. We're not really using tvc right now since we made a decision to turn off highlighting for the moment, so this isn't a huge deal. I'll create a new jira issue. FYI here is my query

Re: Tokenization: How to Allow Multiple Strategies?

2011-02-08 Thread Em
Hi Tavi, if you want to use multiple tokenization strategies (different tokenizers so to speak) you have to use different fieldTypes. Maybe you have to create your own tokenizer for doing what you want or a PatternTokenizer might help you. However, your examples for the different positions of

Re: HTTP ERROR 400 undefined field: *

2011-02-08 Thread Jed Glazner
here is the ticket: https://issues.apache.org/jira/browse/SOLR-2352 On 02/08/2011 11:27 AM, Jed Glazner wrote: So I re-indexed some of the content, but no dice. Per Hoss, I tried disabling the TVC and it worked great. We're not really using tvc right now since we made a decision to turn off

relational db mapping for advanced search

2011-02-08 Thread Scott Yeadon
Hi, I was just after some advice on how to map some relational metadata to a solr index. The web application I'm working on is based around people and the searching based around properties of these people. Several properties are more complex - for example, a person's occupations have place,

RE: relational db mapping for advanced search

2011-02-08 Thread Jonathan Rochkind
I have no great answer for you, this is to me a generally unanswered question, hard to do Solr with this sort of thing, I think you seem to understand it properly. There ARE some interesting new features in trunk (not 1.4) that may be relevant, although to my perspective none of them provide

Re: relational db mapping for advanced search

2011-02-08 Thread Scott Yeadon
Yes, I saw something in the dev stream about compound types as well which would also be useful (so in my example an occupation field could comprise of multiple fields of different types) but these are up and coming features. I suspect using multiple document types is probably the best way for

RE: How to search for special chars like ä from ae?

2011-02-08 Thread Anithya
Thanks for the help Steve, it worked!!! -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-search-for-special-chars-like-a-from-ae-tp2444921p2454816.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: How to search for special chars like ä from ae?

2011-02-08 Thread Steven A Rowe
Hi Anithya, That's good to hear. Again, please consider donating your work: http://wiki.apache.org/solr/HowToContribute#Making_Changes. Steve -Original Message- From: Anithya [mailto:surysha...@gmail.com] Sent: Tuesday, February 08, 2011 5:16 PM To: solr-user@lucene.apache.org

Re: How to search for special chars like ä from ae?

2011-02-08 Thread charan kumar
Hello, Quick question on solr replication? What effect does index reload after a replication has on search requests? Can server still respond to user queries with old index? Especially, during the following phases of replication on slaves.

RE: How to search for special chars like ä from ae?

2011-02-08 Thread Robert Sandiford
So - how did you end up setting it up? In my reading of the thread, it seems you could have a search for 'mäcman' hit 'macman' or 'maecman', but not both, since you it seems you could only map the ä to a single replacement. Or can it be mapped multiple times, generating multiple tokens?

Re: Solr n00b question: writing a custom QueryComponent

2011-02-08 Thread Upayavira
Your observation regarding optimisation is an interesting one, it does at least make sense that reducing the size of a segment will speed up optimisation and reduce the disk space needed. In a situation that had multiple shards, we had two 'rows', for redundancy purposes. In that situation, we

Re: How to search for special chars like ä from ae?

2011-02-08 Thread Erick Erickson
When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less

Re: Tokenization: How to Allow Multiple Strategies?

2011-02-08 Thread Tavi Nathanson
Thanks for the suggestions! Using a new field makes sense, except it would double the size of the index. I'd like to add additional terms, at my discretion, only when there's ambiguity. More specifically, do you know of any way to put multiple *tokens sets* at the same position of the same field?

Re: jndi datasource in dataimport

2011-02-08 Thread lee carroll
Hi Still no luck with this is the problem with the name attribute of the datasource element in the data config ? On 5 February 2011 10:48, lee carroll lee.a.carr...@googlemail.com wrote: ah should this work or am i doing something obvious wrong in config dataSource

Re: Tokenization: How to Allow Multiple Strategies?

2011-02-08 Thread Erick Erickson
A couple of things... First, you haven't provided any evidence that increasing the index size is a concern. If your index isn't all that large, it really doesn't matter, and conserving index size may not be a concern. WordDelimterFilterFactory (WDFF) will do the use cases you outlined below, but

Does Distributed Search support {!boost }?

2011-02-08 Thread Andy
Is it possible to do a query like {!boost b=log(popularity)}foo over sharded indexes? I looked at the wiki on distributed search (http://wiki.apache.org/solr/DistributedSearch) and it has a list of components that are supported in distributed search. Just wondering what component does {!boost

Re: General question about Solr Caches

2011-02-08 Thread Chris Hostetter
: In my understanding, the Current Index Searcher uses a cache instance and : when a New Index Searcher is registered a new cache instance is used which : is also auto-warmed. However, what happens when the New Index Searcher is a : view of an index which has been modified? If the entries

Re: jndi datasource in dataimport

2011-02-08 Thread Chris Hostetter
: It looks like you can use a jndi datsource in the data import handler. : however i can't find any syntax on this. : : Where is the best place to look for this ? (and confirm if jndi does work in : dataimporthandler) It's been a long time since i used JNDI on anything, and i've never tried it

[WKT] Spatial Searching

2011-02-08 Thread Adam Estrada
I just came across a ~nudge post over in the SIS list on what the status is for that project. This got me looking more in to spatial mods with Solr4.0. I found this enhancement in Jira. https://issues.apache.org/jira/browse/SOLR-2155. In this issue, David mentions that he's already integrated

Re: How to search for special chars like ä from ae?

2011-02-08 Thread charan kumar
sorry for cross posting, but that is the only I could get my question posted. SOLR Mailing server treats my question as SPAM Technical details of permanent failure: Google tried to deliver your message, but it was rejected by the recipient domain. We recommend contacting the other email provider

Re: [WKT] Spatial Searching

2011-02-08 Thread Mattmann, Chris A (388J)
+1 to David's patch from SOLR-2155. It would be great to implement. Great job using GDAL on converting the WKT Adam! Cheers, Chris On Feb 8, 2011, at 8:18 PM, Adam Estrada wrote: I just came across a ~nudge post over in the SIS list on what the status is for that project. This got me

Re: Solr n00b question: writing a custom QueryComponent

2011-02-08 Thread Ishwar
In the situation that you'd explained, I'm assuming one of the rows is the master and the other is the slave. How did you continue feeding documents while the master was down for optimisation? And thanks for the link to MultiPassIndexSplitter. I shall check it out. -- Thanks, Ishwar Just

Help migrating from Lucene

2011-02-08 Thread Todd Nine
Hey guys, We're migrating from Lucene to Solr. So far the migration has been smooth, however there is one feature I'm having issues adapting. Our calls to our indexing service are defined in a central interface. Here is an example of a query executed from a programmatically constructed

Re: Solr n00b question: writing a custom QueryComponent

2011-02-08 Thread Upayavira
Actually, in that situation, we indexed twice, to both, so there was no master and no slave. Our testing showed that search was not slowed down unduly by indexing. Upayavira On Tue, 08 Feb 2011 22:34 -0800, Ishwar ishwarsridha...@yahoo.com wrote: In the situation that you'd explained, I'm