RE: Using curl comparing with using WebService::Solr

2009-07-10 Thread Francis Yakin
Yes, the xml files are in complete add format. This is my code: #!/usr/bin/perl if (($#ARGV + 1) = 0 ) { print Usage: perl prod.pl dir \n\n; exit(1); } ## -- CHANGE accordingly $timeout = 300; $topdir = /opt/Test/xml-file/; #$topdir = /opt/Test/; $dir =

RE: Using curl comparing with using WebService::Solr

2009-07-10 Thread Francis Yakin
I also commit too many I guess, since we have 1000 folders, so each loop will executed the load and commit. So 1000 loops with 1000 commits. I think it will be help if I only commit once after the 1000 loops completed. Any inputs? Thhanks Francis -Original Message- From: Francis

Re: Solr's MLT query call doesn't work

2009-07-10 Thread SergeyG
Done. Unfortunately with the same result. :confused: Thanks, Jun. Isn't it really strange? Again, I'm not the first person using Solr. I wonder if the matter might be just local, due to some not so obvious reason manifesting itself only on my machine (what is, of course, very unlikely but still

Re: Metada document for faceted search

2009-07-10 Thread Osman İZBAT
Thank you Chris. I've find out how to implement my faceted search. I don't index any metadata document but i create my in-memory faceting data structure from database at my request handlers init method. Compute facet count on any request and wrrite to response as NamedList of NemedLists.

Re: Lock timed out 2 worker running

2009-07-10 Thread Renz Daluz
Is it best way to implement my own Locking mechanism here? Thanks /Renz 2009/7/10 Renz Daluz renz052...@gmail.com Hi all, I have 2 workers running (app that's builds the index) and both are pointing to the same Solr (1.3.0) master instance when updating/committing documents. I'm using SolrJ

Re: DisMax query parser syntax for the fq parameter

2009-07-10 Thread gistolero
Yes, it works :-) Thanks Erik! I am using the dismax query parser syntax for the fq param: .../select?qt=dismaxrows=30q.alt=*:*qf=contentfq={!dismax qf=contentKeyword^1.0 mm=0%}Foofq=+date:[2009-03-11T00:00:00Z TO 2009-07-09T16:41:50Z]fl=id,date,content Now, I want to add one

Re: Using curl comparing with using WebService::Solr

2009-07-10 Thread Shalin Shekhar Mangar
On Fri, Jul 10, 2009 at 11:50 AM, Francis Yakin fya...@liquid.com wrote: I also commit too many I guess, since we have 1000 folders, so each loop will executed the load and commit. So 1000 loops with 1000 commits. I think it will be help if I only commit once after the 1000 loops completed.

RE: Using curl comparing with using WebService::Solr

2009-07-10 Thread Francis Yakin
How you batching all documents in one curl call? Do you have a sample, so I can modify my script and try it again. Right now I do curl on each documents( I have 1000 docs on each folder and I have 1000 folders) using : curl http://localhost:7001/solr/update --data-binary @abc.xml -H

Re: posting binary file and metadata in two separate documents

2009-07-10 Thread rossputin
Hi. Apologies for bumping this one, but another question occurred to me... is there a limit to the number of ext.literal components I can put in my curl command... if so, i will definitely need to find another way to get this data in, as I am building up relationships between documents, and

Re: Problem using ExtractingRequestHandler with tomcat

2009-07-10 Thread solenweg
I'm in the same situation, but is not getting what this ant example is about. Can't find anything in solr about it. Could I get anyone to write a little more specific what one have to do to get rid of the Error loading class 'org.apache.solr.handler.extraction.ExtractingRequestHandler' exception.

Re: Using curl comparing with using WebService::Solr

2009-07-10 Thread Shalin Shekhar Mangar
On Fri, Jul 10, 2009 at 1:17 PM, Francis Yakin fya...@liquid.com wrote: How you batching all documents in one curl call? Do you have a sample, so I can modify my script and try it again. Right now I do curl on each documents( I have 1000 docs on each folder and I have 1000 folders) using :

Re: Create incremental snapshot

2009-07-10 Thread Asif Rahman
Tushar: Is it necessary to do the optimize on each iteration? When you run an optimize, the entire index is rewritten. Thus each index file can have at most one hard link and each snapshot will consume the full amount of space on your disk. Asir On Thu, Jul 9, 2009 at 3:26 AM, tushar kapoor

Modifying a stored field after analyzing it?

2009-07-10 Thread Michael _
Hello, I've got a stored, indexed field that contains some actual text, and some metainfo, like this: one two three four [METAINFO] oneprime twoprime threeprime fourprime I have written a Tokenizer that skips past the [METAINFO] marker and uses the last four words as the tokens for the field,

Re: Retrieve docs with 1 multivalue field hits

2009-07-10 Thread Erik Hatcher
On Jul 9, 2009, at 5:37 PM, A. Steven Anderson wrote: A simple example would be if a schema included a phoneNum mulitValue field and I wanted to return all docs that contained more than 1 phoneNum field value. all docs that contain more than one phone number - regardless of matching a

Re: Retrieve docs with 1 multivalue field hits

2009-07-10 Thread Erik Hatcher
On Jul 9, 2009, at 5:37 PM, A. Steven Anderson wrote: A simple example would be if a schema included a phoneNum mulitValue field and I wanted to return all docs that contained more than 1 phoneNum field value. all docs that contain more than one phone number - regardless of matching a

Re: Retrieve docs with 1 multivalue field hits

2009-07-10 Thread A. Steven Anderson
all docs that contain more than one phone number - regardless of matching a particular query? Exactly. knowing that was a useful query, i'd change my indexer to also provide either a field with the count of phone number values, or a boolean field saying whether there are more than one or

Re: Modifying a stored field after analyzing it?

2009-07-10 Thread Shalin Shekhar Mangar
On Fri, Jul 10, 2009 at 5:56 PM, Michael _ solrco...@gmail.com wrote: Hello, I've got a stored, indexed field that contains some actual text, and some metainfo, like this: one two three four [METAINFO] oneprime twoprime threeprime fourprime I have written a Tokenizer that skips past the

Re: Modifying a stored field after analyzing it?

2009-07-10 Thread solrcoder
Shalin Shekhar Mangar wrote: Can't you have two fields like this? f1 (indexed, not stored) - one two three four [METAINFO] oneprime twoprime threeprime fourprime f2 (not indexed, stored) - one two three four Perhaps I don't understand highlighting, but won't that prevent snippets

Re: Modifying a stored field after analyzing it?

2009-07-10 Thread solrcoder
markrmiller wrote: Coming soon. First step was here: http://issues.apache.org/jira/browse/LUCENE-1699 Trunk doesn't have that version of Lucene yet though (I believe thats still the case). Replacing the RunUpdateProcessor give you full control of the Lucene document creation. Is

Re: Modifying a stored field after analyzing it?

2009-07-10 Thread Mark Miller
On Fri, Jul 10, 2009 at 2:02 PM, solrcoder solrco...@gmail.com wrote: markrmiller wrote: Coming soon. First step was here: http://issues.apache.org/jira/browse/LUCENE-1699 Trunk doesn't have that version of Lucene yet though (I believe thats still the case). Replacing the

printing scores

2009-07-10 Thread Marc Sturlese
I have noticed a weird behabiour doing score testing. I do a search using dismax request handler with no extra boosting in a index of a milion docs searching in five fields. Printing the score of the docs 3th,4th,5fh,6th I can see that is the same. If I build the index with my own lucene indexer

Re: printing scores

2009-07-10 Thread Erick Erickson
Why do you care? I'm not being too much of a jerk here, becausescores between separate queries are irrelevant. See: http://wiki.apache.org/lucene-java/ScoresAsPercentages http://wiki.apache.org/lucene-java/ScoresAsPercentagesSo, the scores aren't important, the important thing is whether the

Index-time boost propagated to copyField?

2009-07-10 Thread Mat Brown
Hi all, If I have two fields that are copied into a copyField, and I index data in these fields using different index-time boosts, are those boosts propagated into the copyField? Thanks! Mat

Re: Distributed Search in Solr

2009-07-10 Thread Grant Ingersoll
On Jul 9, 2009, at 11:58 PM, Sumit Aggarwal wrote: Hi, 1. Calls made to multiple shards are made in some concurrent fashion or serially? Concurrent 2. Any idea of algorithm followed for merging data? I mean how efficient it is? Not sure, but given that Yonik implemented it, I

Re: printing scores

2009-07-10 Thread Marc Sturlese
Well I was asking it because I have a custom FieldComparatorSource that uses lucene score among other params to calculate the sorting. The thing is that with my own lucene servlet I am getting different results than using solr now (because score values are different and Solr is giving me back the

Re: Modifying a stored field after analyzing it?

2009-07-10 Thread solrcoder
markrmiller wrote: When you specify a custom UpdateProcessor chain, you will normally make the RunUpdateProcessor the last processor in the chain, as it will add the doc to Solr. Rather than using the built in RunUpdateProcessor though, you could simply specify your own UpdateProcessor

Re: Boosting for most recent documents

2009-07-10 Thread vivek sar
Thanks Bill. Couple of questions, 1) Would the function query load all unique terms (for that field) in memory the way sort (field cache) does? If so, that wouldn't work for us as we can have over 5 billion records spread across multiple shards (up to 10 indexer instances), that would surely kill

Question About Solr Cores

2009-07-10 Thread danben
Hi, I'm building an application that dynamically instantiates a large number of solr cores on a single machine (large would ideally be as high as I can get it, in the millions, if it is possible to do so without significant performance degradation and/or system failure). I already tried this

Re: Aggregating/Grouping Document Search Results on a Field

2009-07-10 Thread Bradford Stephens
Does the facet aggregation take place on the Solr search server, or the Solr client? It's pretty slow for me -- on a machine with 8 cores/ 8 GB RAM, 50 million document index (about 36M unique values in the author field), a query that returns 131,000 hits takes about 20 seconds to calculate the

Re: Custom sort

2009-07-10 Thread dontthinktwice
Marc Sturlese wrote: I have been able to create my custom field. The problem is that I have laoded in the solr core a couple of HashMapsid_doc,value_influence_sort from a DB with values that will influence in the sort. My problem is that I don't know how to let my custom sort have access

Solr 1.2 and 1.3 - different Stamming

2009-07-10 Thread Jae Joo
I have found that the stamming in solr 1.2 and 1.3 is different for communication. We have index built in Solr 1.2 and the index is being queried by 1.3. Is there any way to adjust it? Jae joo

Update Preprocessing

2009-07-10 Thread jonarino
I am investigating the possibilities of preprocessing my data before it is indexed. Specifically, I would like to add fields or modify field values based on other fields in the XML I am injecting. I am a little confused on where this is supposed to happen; whether as part of the

Re: Update Preprocessing

2009-07-10 Thread Mark Miller
On Fri, Jul 10, 2009 at 6:40 PM, jonarino jonathan.h...@verizonwireless.com wrote: I am investigating the possibilities of preprocessing my data before it is indexed. Specifically, I would like to add fields or modify field values based on other fields in the XML I am injecting. I am a

Re: Solr 1.2 and 1.3 - different Stamming

2009-07-10 Thread Mark Miller
Sorry. From the CHANGES for 1.3: {quote} The Porter snowball based stemmers in Lucene were updated (LUCENE-1142), and are not guaranteed to be backward compatible at the index level (the stem of certain words may have changed). Re-indexing is recommended. {/quote} Would have been nice to leave a

Re: Aggregating/Grouping Document Search Results on a Field

2009-07-10 Thread Avlesh Singh
Does the facet aggregation take place on the Solr search server, or the Solr client? Solr server. Faceting is an expensive operation by nature, especially when the hits are large in number. Solr caches these values once computed. You might want to tweak cache related parameters in your solr

Re: Modifying a stored field after analyzing it?

2009-07-10 Thread Mark Miller
On Fri, Jul 10, 2009 at 3:42 PM, solrcoder solrco...@gmail.com wrote: markrmiller wrote: When you specify a custom UpdateProcessor chain, you will normally make the RunUpdateProcessor the last processor in the chain, as it will add the doc to Solr. Rather than using the built in

How to do a reverse distance search?

2009-07-10 Thread Development Team
Hi everybody, Let's say we have 10,000 traveling sales-people spread throughout the country. Each of them has has their own territory, and most of the territories overlap (eg. 100 sales-people in a particular city alone). Each of them also has a maximum distance they can travel. Some can

Re: Search results depending on search word length?

2009-07-10 Thread Jeff Newburn
I am guessing that the field is actually just a string or a really long word. Solr looks for occurrences of the term/token. It does not however search within a given token without the *. So in your example the system will not match thisisavery with thisisaverylongtesttitle even though they have

Using DirectConnection or EmbeddedSolrServer, within a component

2009-07-10 Thread Matt Mitchell
Hi, I'm experimenting with Solr components. I'd like to be able to use a nice-high-level querying interface like the DirectSolrConnection or EmbeddedSolrServer provides. Would it be considered absolutely insane to use one of those *within a component* (using the same core instance)? Matt

Re: Custom sort

2009-07-10 Thread Ben
It could be that you should be providing an implementation of SortComparatorSource I have missed the earlier part of this thread, I assume you're trying to implement some form of custom search? B dontthinktwice wrote: Marc Sturlese wrote: I have been able to create my custom field. The

RE: How to do a reverse distance search?

2009-07-10 Thread Stuart Yeates
The easiest modification is to use: calc_square_of_distance(CLIENT_LAT, CLIENT_LONG, lat, long) maxSquareOfTravelDist This has the same ordering as before, but is much cheaper to calculate. You can then calculate the actual distance in the GUI, where you're only showing a handful of values.

Re: Custom sort

2009-07-10 Thread dontthinktwice
okobloko wrote: It could be that you should be providing an implementation of SortComparatorSource I have missed the earlier part of this thread, I assume you're trying to implement some form of custom search? B Yes, exactly. What I'm trying to do is sort the results of an

Re: SEVERE: java.lang.ArrayIndexOutOfBoundsException

2009-07-10 Thread Chris Hostetter
: :( : : that is all we have in there!!! : : Is there any way I can raise the logging level for it? it's not an issue of logging level -- that just affects which types of messages get logged, this message is getting logged so the level is fine. The problem is the log formatting. if this is

Re: I am getting HTTP Version Not Supported (505)Error

2009-07-10 Thread Chris Hostetter
: data). I given the prepared url in URL calss i got the HTTP Version Not : Supported and the error code is 505. Solr never generates that error code. what servlet container are you using? : String urlStr = solrUrl + /update?stream.body= + strToAdd; :

Re: Index-time boost propagated to copyField?

2009-07-10 Thread Koji Sekiguchi
Mat Brown wrote: Hi all, If I have two fields that are copied into a copyField, and I index data in these fields using different index-time boosts, are those boosts propagated into the copyField? Thanks! Mat No, but the norms of source fields of copyField are propagated into the

solr jmx connection

2009-07-10 Thread J G
Hello, I have a SOLR JMX connection issue. I am running my JMX MBeanServer through Tomcat, meaning I am using Tomcat's MBeanServer rather than any other MBeanServer implemenation. I am having a hard time trying to figure out the correct JMX Service URL on my localhost for the accessing the