Re: How to group result when search on multiple fields

2011-01-27 Thread Stefan Matheis
On Thu, Jan 27, 2011 at 1:25 AM, cyang2010 ysxsu...@hotmail.com wrote: Is Field Collapsing a new feature for solr 4.0 (not yet released yet)? That's at least what the Wiki tells you, yes.

Question About Writing Custom Query Parser Plugin

2011-01-27 Thread Ahson Iqbal
Hi All I want to integrate lucene Surround Query Parser with solr 1.4.1, and for that I am writing Custom Query Parser Plugin, To accomplish this task I should write a sub class of org.apache.solr.search.QParserPlugin and implement its two methods public void init(NamedList nl) public

Re: Does solr supports indexing of files other than UTF-8

2011-01-27 Thread Paul Libbrecht
Why is converting documents to utf-8 not feasible? Nowadays any platform offers such services. Can you give a detailed failure description (maybe with the URL to a sample document you post)? paul Le 27 janv. 2011 à 07:31, prasad deshpande a écrit : I am able to successfully index/search

Re: A Maven archetype that helps packaging Solr as a standalone application embedded in Apache Tomcat

2011-01-27 Thread Paul Libbrecht
Simone, It's good that you did so! I had found this three days ago while googling. And I am starting to make sense of it. It works well. Two little comments: - you are saying that it packages a standalone multicore and a standalone app. But it actually also packs a webapp. At first, I had

Re: Does solr supports indexing of files other than UTF-8

2011-01-27 Thread prasad deshpande
The size of docs can be huge, like suppose there are 800MB pdf file to index it I need to translate it in UTF-8 and then send this file to index. Now suppose there can be any number of clients who can upload file. at that time it will affect performance. and already our product support

DismaxParser Query

2011-01-27 Thread Isan Fulia
Hi all, The query for standard request handler is as follows field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND field5:(keyword5) How the same above query can be written for dismax request handler -- Thanks

Re: Does solr supports indexing of files other than UTF-8

2011-01-27 Thread Paul Libbrecht
At least in java utf-8 transcoding is done on a stream basis. No issue there. paul Le 27 janv. 2011 à 09:51, prasad deshpande a écrit : The size of docs can be huge, like suppose there are 800MB pdf file to index it I need to translate it in UTF-8 and then send this file to index. Now

Tika config in ExtractingRequestHandler

2011-01-27 Thread Erlend Garåsen
The wiki page for the ExtractingRequestHandler says that I can add the following configuration: str name=tika.config/my/path/to/tika.config/str I have tried to google for an example of such a Tika config file, but haven't found anything. Erlend -- Erlend Garåsen Center for Information

Post PDF to solr with asp.net

2011-01-27 Thread Andrew McCombe
Hi We are trying to post some PDF documents to solr for indexing using ASP.net but cannot find any documentation or a library that will allow posting of binary data. Has anyone done this and if so, how? Regards Andrew McCombe iWeb Solutions Ltd.

query range in multivalued date field

2011-01-27 Thread ramzesua
hi all. My query range for multivalued date field work incorrect. My schema. There is field requestDate that have multivalued attr.: fields field name=id type=string indexed=true stored=true required=true / field name=keyword type=text indexed=true stored=true / field name=count

Re: DismaxParser Query

2011-01-27 Thread lee carroll
use dismax q for first three fields and a filter query for the 4th and 5th fields so q=keyword1 keyword 2 qf = field1,feild2,field3 pf = field1,feild2,field3 mm=something sensible for you defType=dismax fq= field4:(keyword3 OR keyword4) AND field5:(keyword5) take a look at the dismax docs for

Re: DismaxParser Query

2011-01-27 Thread Isan Fulia
but q=keyword1 keyword2 does AND operation not OR On 27 January 2011 16:22, lee carroll lee.a.carr...@googlemail.com wrote: use dismax q for first three fields and a filter query for the 4th and 5th fields so q=keyword1 keyword 2 qf = field1,feild2,field3 pf = field1,feild2,field3

DIH and duplicate content

2011-01-27 Thread Rosa (Anuncios)
Hi, Is there a way to avoid duplicate content in a index at the moment i'm uploading my xml feed via DIH? I would like to have only one entry for a given description. I mean if the desciption of one product already exist in index not import this new product. Is there a built in function?

Re: DismaxParser Query

2011-01-27 Thread lee carroll
the default operation can be set in your config to be or or on the query something like q.op=OR On 27 January 2011 11:26, Isan Fulia isan.fu...@germinait.com wrote: but q=keyword1 keyword2 does AND operation not OR On 27 January 2011 16:22, lee carroll lee.a.carr...@googlemail.com wrote:

Re: DismaxParser Query

2011-01-27 Thread Bijeet Singh
The DisMax query parser internally hard-codes its operator to OR. This is quite unlike the Lucene query parser, for which the default operator can be configured using the solrQueryParser in schema.xml Regards, Bijeet Singh On Thu, Jan 27, 2011 at 4:56 PM, Isan Fulia

Re: DismaxParser Query

2011-01-27 Thread lee carroll
sorry ignore that - we are on dismax here - look at mm param in the docs you can set this to achieve what you need On 27 January 2011 11:34, lee carroll lee.a.carr...@googlemail.com wrote: the default operation can be set in your config to be or or on the query something like q.op=OR On 27

Re: DIH and duplicate content

2011-01-27 Thread Markus Jelsma
http://wiki.apache.org/solr/Deduplication On Thursday 27 January 2011 12:32:29 Rosa (Anuncios) wrote: Is there a way to avoid duplicate content in a index at the moment i'm uploading my xml feed via DIH? I would like to have only one entry for a given description. I mean if the

Re: A Maven archetype that helps packaging Solr as a standalone application embedded in Apache Tomcat

2011-01-27 Thread Simone Tripodi
Hi Paul, thanks a lot for your feedbacks, much more than appreciated! :) Going through your comments: * Yes it also packs a Solr webepp, it is needed to embed it in Tomcat. Do you think it could be a useful feature having also webapp .war as output? if it helps, I'm open to add it as well. *

Re: configure httpclient to access solr with user credential on third party host

2011-01-27 Thread Upayavira
Looks like you are connecting to Tomcat's AJP port, not the HTTP one. Connect to the Tomcat HTTP port and I suspect you'll have greater success. Upayavira On Wed, 26 Jan 2011 22:45 -0800, Darniz rnizamud...@edmunds.com wrote: Hello, i uploaded solr.war file on my hosting provider and added

Re: A Maven archetype that helps packaging Solr as a standalone application embedded in Apache Tomcat

2011-01-27 Thread Paul Libbrecht
Le 27 janv. 2011 à 12:42, Simone Tripodi a écrit : thanks a lot for your feedbacks, much more than appreciated! :) Good time sync. I need it right now. * Yes it also packs a Solr webepp, it is needed to embed it in Tomcat. Do you think it could be a useful feature having also webapp .war as

Re: query range in multivalued date field

2011-01-27 Thread Erick Erickson
Range queries work on multivalued fields. I suspect the date math conversion is fooling you. For instance,NOW/HOUR first rounds down to the current hour, *then* subtracts one hour. If you attach debugQuery=on (or check the debug checkbox in the admin full search page), you'll see the exact

Re: DismaxParser Query

2011-01-27 Thread Isan Fulia
It worked by making mm=0 (it acted as OR operator) but how to handle this field1:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR field2:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR field3:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) On 27 January 2011 17:06, lee

Re: How to find Master Slave are in sync

2011-01-27 Thread Shanmugavel SRD
Markus, The problem here is if I call the below two URLs immediately after replication then I am getting both the index versions as same. In my python script I have added code to swap the online core on master with offline core on master and online core on slave with offline core on slave, if

Re: Post PDF to solr with asp.net

2011-01-27 Thread Gora Mohanty
On Thu, Jan 27, 2011 at 3:44 PM, Andrew McCombe eupe...@gmail.com wrote: Hi We are trying to post some PDF documents to solr for indexing using ASP.net but cannot find any documentation or a library that will allow posting of binary data. [...] Do not have much idea of ASP.net, but SolrNet (

Re: DismaxParser Query

2011-01-27 Thread lee carroll
with dismax you get to say things like match all terms if less then 3 terms entered else match term-x it produces highly flexible and relevant matches and works very well in lots of common search usescases. field boosting allows further tuning. if you have rigid rules like the last one you quote

Import Handler for tokenizing facet string into multi-valued solr.StrField..

2011-01-27 Thread Dennis Schafroth
Hi, Pretty novice into SOLR coding, but looking for hints about how (if not already done) to implement a PatternTokenizer, that would index this into multivalie fields of solr.StrField for facetting. Ex. Water -- Irrigation ; Water -- Sewage should be tokenized into Water Irrigation

Re: How to find Master Slave are in sync

2011-01-27 Thread Erick Erickson
Let's back up a moment and ask why you are doing this from scripts, because this feels like an XY problem, see: http://people.apache.org/~hossman/#xyproblem http://people.apache.org/~hossman/#xyproblem What are you trying to accomplish by swapping cores on the master and slave? Solr 1.4 has

Re: DismaxParser Query

2011-01-27 Thread Erick Erickson
What version of Solr are you using, and could you consider either 3x or applying a patch to 1.4.1? Because eDismax (extended dismax) handles the full Lucene query language and probably works here. See the Solr JIRA 1553 at https://issues.apache.org/jira/browse/SOLR-1553 Best Erick On Thu, Jan

AW: DismaxParser Query

2011-01-27 Thread Daniel Pötzinger
It may also be an option to mix the query parsers? Something like this (not tested): q={!lucene}field1:test OR field2:test2 _query_:{!dismax qf=fields}+my dismax -bad So you have the benefits of lucene and dismax parser -Ursprüngliche Nachricht- Von: Erick Erickson

Detect Out of Memory Errors

2011-01-27 Thread saureen
Hi, is ther a way by which i could detect the out of memory errors in solr so that i could implement some functionality such as restarting the tomcat or alert me via email whenever such error is detected.? -- View this message in context:

Re: Question About Writing Custom Query Parser Plugin

2011-01-27 Thread Ahsan |qbal
Any One On Thu, Jan 27, 2011 at 1:27 PM, Ahson Iqbal mianah...@yahoo.com wrote: Hi All I want to integrate lucene Surround Query Parser with solr 1.4.1, and for that I am writing Custom Query Parser Plugin, To accomplish this task I should write a sub class of

Re: Question About Writing Custom Query Parser Plugin

2011-01-27 Thread Erik Hatcher
Yes, you need to create both a QParserPlugin and a QParser implementation. Look at Solr's own source code for the LuceneQParserPlugin/LuceneQParser and built it like that. Baking the surround query parser into Solr out of the box would be a useful contribution, so if you care to give it a

Re: A Maven archetype that helps packaging Solr as a standalone application embedded in Apache Tomcat

2011-01-27 Thread Paul Libbrecht
Le 27 janv. 2011 à 12:42, Simone Tripodi a écrit : thanks a lot for your feedbacks, much more than appreciated! :) One more anomaly I find: the license is in the output of the pom.xml. I think this should not be the case. *my* license should be there, not the license of the archetype. Or? paul

Re: Tika config in ExtractingRequestHandler

2011-01-27 Thread Adam Estrada
I believe that as along as Tika is included in a folder that is referenced by solrconfig.xml you should be good. Solr will automatically throw mime types to Tika for parsing. Can anyone else add to this? Thanks, Adam On Thu, Jan 27, 2011 at 5:06 AM, Erlend Garåsen e.f.gara...@usit.uio.no wrote:

Re: A Maven archetype that helps packaging Solr as a standalone application embedded in Apache Tomcat

2011-01-27 Thread Simone Tripodi
Hi Paul, sorry I'm late but I've been in the middle of a conf call :( On which IRC server the #solr channel is? I'll reach you ASAP. Thanks a lot! Simo http://people.apache.org/~simonetripodi/ http://www.99soft.org/ On Thu, Jan 27, 2011 at 4:00 PM, Paul Libbrecht p...@hoplahup.net wrote: Le

Re: A Maven archetype that helps packaging Solr as a standalone application embedded in Apache Tomcat

2011-01-27 Thread Stefan Matheis
Simo, it's freenode.net On Thu, Jan 27, 2011 at 4:16 PM, Simone Tripodi simonetrip...@apache.orgwrote: Hi Paul, sorry I'm late but I've been in the middle of a conf call :( On which IRC server the #solr channel is? I'll reach you ASAP. Thanks a lot! Simo

RE: DismaxParser Query

2011-01-27 Thread Jonathan Rochkind
Yes, I think nested queries are the only way to do that, and yes, nested queries like Daniel's example work (I've done it myself). I haven't really tried to get into understanding/demonstrating _exactly_ how the relevance ends up working on the overall master query in such a situation, but it

Re: Tika config in ExtractingRequestHandler

2011-01-27 Thread Erlend Garåsen
If this configuration file is the same as the tika-mimetypes.xml file inside Nutch' conf file, I have an example. I was trying to implement language detection for Solr and thought I had to invoke some Tika functionality by this configuration file in order to do so, but found out that I

Re: Import Handler for tokenizing facet string into multi-valued solr.StrField..

2011-01-27 Thread Erick Erickson
Tokenization is fine with facets, that caution is about, say, faceting on the tokenized body of a document where you have potentially a huge number of unique tokens. But if there is a controlled number of distinct values, you shouldn't have to do anything except index to a tokenized field. I'd

Re: Import Handler for tokenizing facet string into multi-valued solr.StrField..

2011-01-27 Thread Erik Hatcher
Beyond what Erick said, I'll add that it is often better to do this from the outside and send in multiple actual end-user displayable facet values. When you send in a field like Water -- Irrigation ; Water -- Sewage, that is what will get stored (if you have it set to stored), but what you

Re: Import Handler for tokenizing facet string into multi-valued solr.StrField..

2011-01-27 Thread Dennis Schafroth
Thanks for the hints! Sorry about stealing the thread query range in multivalued date field Mistakenly responded to it. cheers, :-Dennis On 27/01/2011, at 16.48, Erik Hatcher wrote: Beyond what Erick said, I'll add that it is often better to do this from the outside and send in multiple

EmbeddedSolr issues

2011-01-27 Thread Karthik Manimaran
Hi, Am getting the following messages while using EmbeddedSolr to retrieve the Term Vectors. I also happened to go through https://issues.apache.org/jira/browse/SOLR-914 . Should I ignore these messages and proceed or should I make any changes?

Is relevance score related to position of the term?

2011-01-27 Thread cyang2010
Let me describe the question using an example: If search Lee on name field as exact term match, returning result can be: Lee Jamie Jamie Lee Will solr grant higher score to Lee Jamie vs Jamie Lee based on the position of the term in name field of each document? From what i know, the score

Re: Is relevance score related to position of the term?

2011-01-27 Thread Em
Hi Cyang, usually Solr isn't looking at the position of a term. However, there are solutions out there for considering the term's position when calculating a doc's score. Furthermore: If two docs got the same score, I think they are ordered the way they were found in the index. Does this

Re: SolrCloud Questions for MultiCore Setup

2011-01-27 Thread Em
Hi, excuse me for pushing this for a second time, but I can't figure it out by looking at the source code... Thanks! Hi Lance, thanks for your explanation. As far as I know in distributed search i have to tell Solr what other shards it has to query. So, if I want to query a

disappearing MBeans

2011-01-27 Thread matthew sporleder
I am using JMX to monitor my replication status and am finding that my MBeans are disappearing. I turned on debugging for JMX and found that solr seems to be deleting the mbeans. Is this a bug? Some trace info is below.. here's me reading the mbean successfully: Jan 27, 2011 5:00:02 PM

Re: configure httpclient to access solr with user credential on third party host

2011-01-27 Thread Darniz
thanks exaclty i asked my domain hosting provider and he provided me with some other port i am wondering can i specify credentials without the port i mean when i open the browser and i type www.mydomainmame/solr i get the tomcat auth login screen. in the same way can i configure the http

Re: DismaxParser Query

2011-01-27 Thread Erick Erickson
In general, patches are applied to the source tree and it's re-compiled. See: http://wiki.apache.org/solr/HowToContribute#Working_With_Patches This is pretty easy, and I do know that some people have applied the eDismax patch to the 1.4 code line, but I haven't done it myself. Best Erick On

Re: Is relevance score related to position of the term?

2011-01-27 Thread cyang2010
Hi Em, Thanks for reply. Basically you are saying there is no builtin solution that care about the position of the term to impact the relevancy score. In my scenario, i will get those two document with the same score. The order depends on the sequence of indexing. Thanks, Cyang -- View

Re: Is relevance score related to position of the term?

2011-01-27 Thread cyang2010
Just a little clarification, when i say position of the term, i mean the position of the term within the field. For example, Jamie Lee -- Lee is the second position of the name field. Lee Jamie -- Lee is the first position of the name field in this case. -- View this message in context:

Searching for negative numbers very slow

2011-01-27 Thread Simon Wistow
If I do qt=dismax fq=uid:1 (or any other positive number) then queries are as quick as normal - in the 20ms range. However, any of fq=uid:\-1 or fq=uid:[* TO -1] or fq=uid:[-1 to -1] or fq=-uid:[0 TO *] then queries are incredibly slow - in the 9

Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-01-27 Thread Simon Wistow
On Tue, Jan 25, 2011 at 01:28:16PM +0100, Markus Jelsma said: Are you sure you need CMS incremental mode? It's only adviced when running on a machine with one or two processors. If you have more you should consider disabling the incremental flags. I'll test agin but we added those to get

Re: Import Handler for tokenizing facet string into multi-valued solr.StrField..

2011-01-27 Thread Chris Hostetter
: Subject: Import Handler for tokenizing facet string into multi-valued : solr.StrField.. : In-Reply-To: 1296123345064-2361292.p...@n3.nabble.com : References: 1296123345064-2361292.p...@n3.nabble.com -Hoss

Re: DIH clean=false

2011-01-27 Thread Chris Hostetter
: Then for clean=false, my understanding is that it won't blow off existing : index. For data that exist in index and db table (by the same uniqueKey) : it will update the index data regardless if there is actual field update. : For existing index data but not existing in table (by comparing

Solr for noSQL

2011-01-27 Thread Jianbin Dai
Hi, Do we have data import handler to fast read in data from noSQL database, specifically, MongoDB I am thinking to use? Or a more general question, how does Solr work with noSQL database? Thanks. Jianbin

Re: Searching for negative numbers very slow

2011-01-27 Thread Simon Wistow
On Thu, Jan 27, 2011 at 11:32:26PM +, me said: If I do qt=dismax fq=uid:1 (or any other positive number) then queries are as quick as normal - in the 20ms range. For what it's worth uid is a TrieIntField with precisionStep=0, omitNorms=true, positionIncrementGap=0

Re: configure httpclient to access solr with user credential on third party host

2011-01-27 Thread Jayendra Patil
This should help HttpClient client = new HttpClient(); client.getParams().setAuthenticationPreemptive(true); AuthScope scope = new AuthScope(AuthScope.ANY_HOST,AuthScope.ANY_PORT); client.getState().setCredentials(scope, new UsernamePasswordCredentials(user, password)); Regards, Jayendra

Re: Tika config in ExtractingRequestHandler

2011-01-27 Thread Lance Norskog
The tika.config file is obsolete. I don't know what replaces it. On 1/27/11, Erlend Garåsen e.f.gara...@usit.uio.no wrote: If this configuration file is the same as the tika-mimetypes.xml file inside Nutch' conf file, I have an example. I was trying to implement language detection for Solr

Re: Solr for noSQL

2011-01-27 Thread Lance Norskog
There no special connectors available to read from the key-value stores like memcache/cassandra/mongodb. You would have to get a Java client library for the DB and code your own dataimporthandler datasource. I cannot recommend this; you should make your own program to read data and upload to Solr

Re: SolrCloud Questions for MultiCore Setup

2011-01-27 Thread Lance Norskog
Hello- I have not used SolrCloud. On 1/27/11, Em mailformailingli...@yahoo.de wrote: Hi, excuse me for pushing this for a second time, but I can't figure it out by looking at the source code... Thanks! Hi Lance, thanks for your explanation. As far as I know in distributed search i

Re: DismaxParser Query

2011-01-27 Thread Isan Fulia
Hi all, I am currently using solr1.4.1 .Do I need to apply patch for extended dismax parser. On 28 January 2011 03:42, Erick Erickson erickerick...@gmail.com wrote: In general, patches are applied to the source tree and it's re-compiled. See:

Re: Solr for noSQL

2011-01-27 Thread Dennis Gearon
Why not make one's own DIH handler, Lance? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from

Re: Solr for noSQL

2011-01-27 Thread Dai Jianbin 00901725
Do we have performance measurement? Would it be much slower compared to other DIH? There no special connectors available to read from the key-value stores like memcache/cassandra/mongodb. You would have to get a Java client library for the DB and code your own dataimporthandler datasource.

NOT operator not working

2011-01-27 Thread abhayd
i have a field in xml file DeviceTypeAccessory Data / Memory/DeviceType solr schema field declared as field name=deviceType type=text indexed=true stored=true / I am trying to eliminate results by using NOT. For example I want all devices for a term except where DeviceType is not Accessory*

Re: NOT operator not working

2011-01-27 Thread Ahmet Arslan
--- On Fri, 1/28/11, abhayd ajdabhol...@hotmail.com wrote: From: abhayd ajdabhol...@hotmail.com Subject: NOT operator not working To: solr-user@lucene.apache.org Date: Friday, January 28, 2011, 8:45 AM i have a field in xml file DeviceTypeAccessory Data / Memory/DeviceType solr schema

Re: Solr for noSQL

2011-01-27 Thread Gora Mohanty
On Fri, Jan 28, 2011 at 6:00 AM, Jianbin Dai j...@huawei.com wrote: [...] Do we have data import handler to fast read in data from noSQL database, specifically, MongoDB I am thinking to use? [...] Have you tried the links that a Google search turns up? Some of them look like pretty good

Re: Is relevance score related to position of the term?

2011-01-27 Thread Em
Hi, no, you missunderstood me, I only said that Solr does not care of the positions *usually*. Lucene got SpanNearQuery which considers the position of the Query's terms relative to eachother. Furthermore there exists a SpanFirstQuery which boosts occurences of a Term at the beginning of a