Re: Stored hierachical data in Solr

2013-01-16 Thread Toke Eskildsen
On Tue, 2013-01-15 at 18:02 +0100, Nicholas Ding wrote: I'm thinking store hierachical data structure on Solr. I know I have to flatten the structure in a form like A_B_C, but it is possible to extend Solr to support hierachical data? You need to be more specific here. What is it you're trying

Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-16 Thread Mikhail Khludnev
Mark, Here is the https://issues.apache.org/jira/browse/SOLR-3284 ConcurrentUpdateSolrServer queues updates on the SolrJ side, not the server ones. Solr server processes number of updates simultaneously, e.g. if your servlet containers threads are unlimited it can potentially lead to OOM. On

Re: Is *:* the only possible search with * on the left-hand-side?

2013-01-16 Thread Upayavira
And, it would make for slow queries, as the more fields you query, the worse performance gets. Having said that, you can query multiple fields using the edismax query parser, with it qf param. Upayavira On Wed, Jan 16, 2013, at 12:23 AM, Jack Krupansky wrote: Semi-hard-coded. In

Search strategy - improving search quality for short search terms such as doll

2013-01-16 Thread David Parks
I'm a beginner-intermediate solr admin, I've set up the basics for our application and it runs well. Now it's time for me to dig in and start tuning and improving queries. My next target is searches on simple terms such as doll which, in google, would return documents about, well, toy

Re: Disable term frequency for some fields in solr

2013-01-16 Thread Amit Jha
Hi, How can I do this in solr4. Amit On Thu, Dec 6, 2012 at 1:40 PM, Markus Jelsma markus.jel...@openindex.iowrote: custom similarity for that field that returns 1 for

Re: Disable term frequency for some fields in solr

2013-01-16 Thread Upayavira
This involves taking a subclass of the DefaultSimilarity class, in Java, and adding that to your Solr setup. For someone versed in Java, this is relatively straight-forward. For others it is non-trivial. Upayavira On Wed, Jan 16, 2013, at 10:57 AM, Amit Jha wrote: Hi, How can I do this in

Re: Disable term frequency for some fields in solr

2013-01-16 Thread Amit Jha
Done same thing in solr3.6 and working but in sorl3.6 filed level of similarity is not available. And Solr4 has Similarity Factories. So I was not getting how do I do it on solr4. Which class do i need to extend and move ahead. On Wed, Jan 16, 2013 at 4:44 PM, Upayavira u...@odoko.co.uk wrote:

Priorities on fields

2013-01-16 Thread Dariusz Borowski
Hi, Is it possible to define priorities on fields? Lets say I have a product table which has the following fields: - id - title - description - code_name An entry could be like this: id: 42 title: shinny new shoes description: Shinny new shoes made in Italy code_name: shinny-new-shoes-42-2013

Re: Priorities on fields

2013-01-16 Thread Rafał Kuć
Hello! What do you mean by priority ? You can define index or query time boost. However that will allow to specify the importance of such field. A good page to look at is: http://wiki.apache.org/solr/SolrRelevancyCookbook -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr -

Re: Solr exception when parsing XML

2013-01-16 Thread Andre Bois-Crettez
Worth to note that some characters are completely forbidden in XML, such as chr(0). When dealing with external text input, some cleanup might be necessary to avoid breaking indexation. For example you could replace each forbidden XML character with . André On 01/15/2013 09:55 PM, Alexandre

Re: Solr exception when parsing XML

2013-01-16 Thread Andre Bois-Crettez
Forgot the link : http://en.wikipedia.org/wiki/Valid_characters_in_XML André On 01/16/2013 02:24 PM, Andre Bois-Crettez wrote: Worth to note that some characters are completely forbidden in XML, such as chr(0). When dealing with external text input, some cleanup might be necessary to avoid

Re: Solr exception when parsing XML

2013-01-16 Thread Yonik Seeley
On Tue, Jan 15, 2013 at 3:55 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Basically, the recommendation is to avoid CDATA and automatically encode characters such as yours, as well as less/more and ampersand. Unfortunately that doesn't even work. Just as a raw control character like a

Re: Solr exception when parsing XML

2013-01-16 Thread Alexandre Rafalovitch
Looking at this second time, maybe we have an X/Y problem (sp?). Why was that symbol in there in the first place? Was it a field separator instead of using multiple fields? Was it a character in an encoding other than UTF-8? My guess is that the character will not make sense to Solr during

Way to lock solr for incoming writes

2013-01-16 Thread mizayah
Is there a way to lock solr for writes? I don't wona use solr integrated backup because i'm using ceph claster. What I need is to have consistent data for few seconds to make backup. -- View this message in context:

Re: Way to lock solr for incoming writes

2013-01-16 Thread Per Steffensen
Well you can stop the solrs :-) If you are making backup by copying the actual files stored by solr, you probably want to stop them anyway to make sure everything is consistent and written to disk. If you dont stop the solrs, at least make sure that you do a commit (not soft) after all

Re: Search strategy - improving search quality for short search terms such as doll

2013-01-16 Thread Amit Jha
Its all about the data data set, here I mean index. If you have documents containing toy and doll it will return that in result set. What I understood that you are talking about the context of the query. For example if you search books on MK Gandhi and books by MK Gandhi both queries have

Re: Priorities on fields

2013-01-16 Thread Amit Jha
Boost query and Boost function will suffice your purpose. Rgds AJ On 16-Jan-2013, at 17:20, Dariusz Borowski darius...@gmail.com wrote: Hi, Is it possible to define priorities on fields? Lets say I have a product table which has the following fields: - id - title - description -

Re: Search strategy - improving search quality for short search terms such as doll

2013-01-16 Thread Alexandre Rafalovitch
Sounds like 'Doll' could be a category for you, while Doll face is a title. Maybe the categories should get a higher boost in eDismax definition over the titles? Related, you may find the following book interesting: http://rosenfeldmedia.com/books/searchanalytics/ Regards, Alex. Personal

group.ngroups behavior in response

2013-01-16 Thread Amit Nithian
Hi all, I recently discovered the group.main=true/false parameter which really has made life simple in terms of ensuring that the format coming out of Solr for my clients (RoR app) is backwards compatible with the non-grouped results which ensures no special handle grouped results logic. The

Re: retrieving latest document **only**

2013-01-16 Thread J Mohamed Zahoor
group field is timestamp… it is not multivalued. ./zahoor On 15-Jan-2013, at 7:14 PM, Upayavira u...@odoko.co.uk wrote: Is your group field multivalued? Could docs appear in more than one group? Upayavira On Tue, Jan 15, 2013, at 01:22 PM, J Mohamed Zahoor wrote: The sum of all the

Searching for field that contains multiple values

2013-01-16 Thread Nguyen, Vincent (CDC/OD/OADS) (CTR)
Hi, How do I find documents that have more than one value in a field? Example: doc arr name=color strblue/str strred/str /arr /doc Vincent Vu Nguyen

400 error with boost and exists()

2013-01-16 Thread Walter Underwood
We're running Solr 3.3 and I have a function query for boosting that works with bq but not with boost (edismax). This is the same behavior described here: http://stackoverflow.com/questions/12128561/why-doesnt-solr-function-query-work-with-boost-parameter Here is the first part of the stack

Re: Searching for field that contains multiple values

2013-01-16 Thread Mikhail Khludnev
It has been discussed few times - you need to implement own Similarity, which will write number of tokens as a norm during indexing, and then in query time you can check the norm value per document. You can also do it on a more straightforward way: preprocess docs to derive a number_or_colors

RE: Solr exception when parsing XML

2013-01-16 Thread Zhang, Lisheng
Hi Alex, Thanks very much for helps! I switched to (I am using PHP in client side) createTextNode(urlencode($value)) so CTRL character problem is avoided, but I noticed that somehow solr did not perform urldecode($value), so my initial value abc xyz becomes abc+xyz I have not fully read

Re: Disable term frequency for some fields in solr

2013-01-16 Thread Upayavira
There's gonna be two ways to do this - for yourself or for everyone. For yourself, you'll want to subclass org.apache.lucene.search.similarities.DefaultSimilarity and org.apache.solr.search.similarities.DefaultSimilarityFactory. Alternatively, patch those two files to allow setting the TF or the

Re: Query parsing VS marshalling/unmarshalling

2013-01-16 Thread balaji.gandhi
Hi, I am trying to do something similar:- Eg. Input: (name:John AND name:Doe) Output: ((firstName:John OR lastName:John) AND (firstName:John OR lastName:John)) How can I extract the fields, change them and repackage the query? Thanks, Balaji -- View this message in context:

Re: Rename fields in a query

2013-01-16 Thread balaji.gandhi
Hi, I am trying to do something similar:- Eg. Input: (name:John AND name:Doe) Output: ((firstName:John OR lastName:John) AND (firstName:John OR lastName:John)) How can I extract the fields, change them and repackage the query? Thanks, Balaji -- View this message in context:

RE: Disable term frequency for some fields in solr

2013-01-16 Thread Markus Jelsma
I would prefer to use SchemaSimilarityFactory as a global similarity and configure a per-field similarity of which some use a flat TF impl. Much simples and no need to patch anything, just build a custom sim. -Original message- From:Upayavira u...@odoko.co.uk Sent: Wed 16-Jan-2013

RE: Solr exception when parsing XML

2013-01-16 Thread Markus Jelsma
In Apache Nutch we strip non-character code points with a simple method. Check the patch, the relevant part is easily ported to any language: https://issues.apache.org/jira/browse/NUTCH-1016 -Original message- From:Zhang, Lisheng lisheng.zh...@broadvision.com Sent: Wed 16-Jan-2013

Re: 400 error with boost and exists()

2013-01-16 Thread Jack Krupansky
Maybe it's the semicolons in the if, which should be commas. Also, you're using some odd syntax in the exists value data source which expects a field name or a function. -- Jack Krupansky -Original Message- From: Walter Underwood Sent: Wednesday, January 16, 2013 1:28 PM To:

Re: 400 error with boost and exists()

2013-01-16 Thread Walter Underwood
First, that works as bf. I got the syntax from: http://lucidworks.lucidimagination.com/display/solr/Function+Queries Various documentation has different syntax for exists(). wunder On Jan 16, 2013, at 3:00 PM, Jack Krupansky wrote: Maybe it's the semicolons in the if, which should be

RE: Solr exception when parsing XML

2013-01-16 Thread Zhang, Lisheng
Hi, Thanks very much for helps! I checked solr source code, what happened is that for XML text inside one element, solr does not call URLDecoder (but to pass CTRL character, I have to call urlencode from PHP). So either I try to remove CTRL character from PHP side, or I change solr XMLReader

Re: 400 error with boost and exists()

2013-01-16 Thread Yonik Seeley
On Wed, Jan 16, 2013 at 6:11 PM, Walter Underwood wun...@wunderwood.org wrote: I got the syntax from: http://lucidworks.lucidimagination.com/display/solr/Function+Queries Oops, I've alerted our tech writers! It should be fixed now. exists(field|function) returns true if a value exists for a

Re: 400 error with boost and exists()

2013-01-16 Thread Walter Underwood
None of the variants worked. I started with that syntax for both exists() and if(). All gave the same stack trace. --wunder On Jan 16, 2013, at 3:32 PM, Yonik Seeley wrote: On Wed, Jan 16, 2013 at 6:11 PM, Walter Underwood wun...@wunderwood.org wrote: I got the syntax from:

Re: 400 error with boost and exists()

2013-01-16 Thread Yonik Seeley
On Wed, Jan 16, 2013 at 6:35 PM, Walter Underwood wun...@wunderwood.org wrote: None of the variants worked. I started with that syntax for both exists() and if(). All gave the same stack trace. --wunder These boolean functions are new for 4.0, but it looks like you're using 3.3? -Yonik

Re: 400 error with boost and exists()

2013-01-16 Thread Chris Hostetter
: None of the variants worked. I started with that syntax for both : exists() and if(). All gave the same stack trace. --wunder ... : We're running Solr 3.3 and I have a function query for boosting that : works with bq but not ...i'm very confused. All of the boolean functions (like

Re: 400 error with boost and exists()

2013-01-16 Thread Walter Underwood
Ah, that would be it. Does 4.0 also give a stack trace if you call a function that doesn't exist? I can achieve most of what I want with bq, though that has IDF, which I'd rather avoid here. wunder On Jan 16, 2013, at 3:38 PM, Yonik Seeley wrote: On Wed, Jan 16, 2013 at 6:35 PM, Walter

Re: 400 error with boost and exists()

2013-01-16 Thread Yonik Seeley
On Wed, Jan 16, 2013 at 6:42 PM, Walter Underwood wun...@wunderwood.org wrote: Ah, that would be it. Does 4.0 also give a stack trace if you call a function that doesn't exist? Stack trace still appears in the logs, but the error message returned seems OK:

Re: Disable term frequency for some fields in solr

2013-01-16 Thread Amit Jha
Please correct my understanding, Use one of the factory as global similarity. And extends org.apache.lucene.search.similarities.DefaultSimilarity to create custom sim. And add a similarity tag in field type definition for required fields. Or there is some other way to do that? Rgds AJ On

Re: Parsing a Lucene/Solr query and adding more clauses

2013-01-16 Thread Chris Hostetter
: I am trying to write a util which can parse a Lucene/Solr query and convert : into an object representation to add more clauses to the query. : : Eg. : Input: (name:John AND name:Doe) : Output: ((firstName:John OR lastName:John) AND (firstName:John OR : lastName:John)) edismax can support

Re: Disable term frequency for some fields in solr

2013-01-16 Thread Chris Hostetter
: Or there is some other way to do that? I'm late to this thread, but what was wrong with the simple suggestion of omitTermFreqAndPositions=true ? -Hoss

RE: SolrJ DirectXmlRequest

2013-01-16 Thread Chris Hostetter
: DirectXmlRequest is part of the SolrJ library, so I guess that means it : is not commonly used. My use case is that I'm applying an XSLT to the : raw XML on the client side, instead of leaving that up to the Solr : master (although even if I applied the XSLT on the Solr server, I'd I

RE: Search strategy - improving search quality for short search terms such as doll

2013-01-16 Thread David Parks
Thanks for the recommendation. I'll start this book today. In my example, doll is one example of a million I might only guess at, whereas the category music, and book tend to interferes in many places and seem to be a more limited set of categories to deal with. Dave -Original

RE: Search strategy - improving search quality for short search terms such as doll

2013-01-16 Thread David Parks
My issue is more that the search term doll shows up in both documents on CDs as well as documents about toys. But I have 10 CD documents for every toy document, so my searches for doll tend to show the CDs most prominently. But that's not the way a user thinks. If they want the CD documents

Re: SolrCloud-Master-Slave hybrid via additional replication handler on SolrCloud nodes?

2013-01-16 Thread Mark Miller
On Jan 15, 2013, at 10:59 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Question: Can one add the Solr master-like replication handler (but not call it /replication, yes) to SolrCloud nodes and point additional slave-like servers (i.e. servers that are not in the SolrCloud

Re: Disable term frequency for some fields in solr

2013-01-16 Thread Amit Jha
It will affect the phrase queries. That is why I am not using suggest configuration. On Thu, Jan 17, 2013 at 7:20 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Or there is some other way to do that? I'm late to this thread, but what was wrong with the simple suggestion of

Solr commit taking too long

2013-01-16 Thread Cool Techi
Hi, We have an index of approximately 400GB in size, indexing 5000 documents was taking 20 seconds. But lately, the indexing is taking very long, committing the same amount of document is taking 5-20 mins. On checking the logs I can see that their a frequent merges happening, which I am

Large data importing getting rollback with solr

2013-01-16 Thread ashimbose
I am trying to index large data (not rich document) about 5GB, but Its not getting index. In case of small data it's perfectly indexing.For Large data import XML response.. 00 data-config.xml full-import busy A command is still running... 0:9:12.738169