RE: Ensuring stable timestamp ordering

2010-11-01 Thread Toke Eskildsen
Dennis Gearon [gear...@sbcglobal.net] wrote: > how about a timrstamp with either a GUID appended on the end of it? Since long (8 bytes) is the largest atomic type supported by Java, this would have to be represented as a String (or rather BytesRef) and would take up 4 + 32 bytes + 2 * 4 bytes f

Default file locking on trunk

2010-11-01 Thread Lance Norskog
Scenario: Git update to current trunk (Nov 1, 2010). Build all Run solr in trunk/solr/example with 'java -jar start.jar' Hi ^C Jetty reports doing shutdown hook There is now a data/index with a write lock file in it. I have not attempted to read the index, let alone add something to it. I start s

RE: Ensuring stable timestamp ordering

2010-11-01 Thread Dennis Gearon
how about a timrstamp with either a GUID appended on the end of it? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blo

Re: Phrase Query Problem?

2010-11-01 Thread Ken Stanley
On Mon, Nov 1, 2010 at 10:26 PM, Tod wrote: > I have a number of fields I need to do an exact match on. I've defined > them as 'string' in my schema.xml. I've noticed that I get back query > results that don't have all of the words I'm using to search with. > > For example: > > > q=(((mykeyword

Phrase Query Problem?

2010-11-01 Thread Tod
I have a number of fields I need to do an exact match on. I've defined them as 'string' in my schema.xml. I've noticed that I get back query results that don't have all of the words I'm using to search with. For example: q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(

Re: Possible memory leaks with frequent replication

2010-11-01 Thread Lance Norskog
You should query against the indexer. I'm impressed that you got 5s replication to work reliably. On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow wrote: > We've been trying to get a setup in which a slave replicates from a > master every few seconds (ideally every second but currently we have it > s

Re: Re:Re: problem of solr replcation's speed

2010-11-01 Thread Lance Norskog
This is the time to replicate and open the new index, right? Opening a new index can take a lot of time. How many autowarmers and queries are there in the caches? Opening a new index re-runs all of the queries in all of the caches. 2010/11/1 kafka0102 : > I suspected my app has some sleeping op ev

Possible memory leaks with frequent replication

2010-11-01 Thread Simon Wistow
We've been trying to get a setup in which a slave replicates from a master every few seconds (ideally every second but currently we have it set at every 5s). Everything seems to work fine until, periodically, the slave just stops responding from what looks like it running out of memory: org.ap

Field boosting in DataImportHandler transformer

2010-11-01 Thread Brad Kellett
It's not looking very promising, but is there something I'm missing to be able to apply a field boost from within a transformer in the DataImportHandler? Not a boost defined within the schema, but a boost applied to the field from the transformer itself. I know you can do a document boost, but

Re: Which is faster -- delete or update?

2010-11-01 Thread Jonathan Rochkind
The actual time it takes to delete or update the document is unlikely to make a difference to you. What might make a difference to you is the time it takes to actually finalize the commit, and the time it takes to re-warm your indexes after a commit, and especially the time it takes to run any

Re: Which is faster -- delete or update?

2010-11-01 Thread Erick Erickson
Just deleting a document is faster because all that really happens is the document is marked as deleted. An update is really a delete followed by an add of the same document, so by definition an update will be slower... But... does it really make a difference? How often to you expect this to happe

Re: Which is faster -- delete or update?

2010-11-01 Thread Peter Karich
From the user perspective I wouldn't delete it, because it could be that down-voting by mistake or spam or something and up-voting can resurrect it. It could be also wise to keep the docs to see which content (from which users?) are down voted to get spam accounts? From the dev perspective yo

Which is faster -- delete or update?

2010-11-01 Thread Andy
My documents have a "down_vote" field. Every time a user votes down a document, I increment the "down_vote" field in my database and also re-index the document to Solr to reflect the new down_vote value. During searches, I want to restrict the results to only documents with, say fewer than 3 dow

Re: is my search fast ?! date search i need some feedback :D

2010-11-01 Thread Erick Erickson
Careful here. First searches are known to be slow, various caches are filled up the first time they are used etc. So even though you're measuring the second query, it's still perhaps filling caches. And what are you measuring? The raw search time or the entire response time? These can be quite dif

Re: Design and Usage Questions

2010-11-01 Thread Xin Li
If you just want a quick way to query Solr server, Perl module Webservice::Solr is pretty good. On Mon, Nov 1, 2010 at 4:56 PM, Lance Norskog wrote: > Yes, you can write your own app to read the file with SVNkit and post > it to the ExtractingRequestHandler. This would be easiest. > > On Mon, N

Re: Design and Usage Questions

2010-11-01 Thread Lance Norskog
Yes, you can write your own app to read the file with SVNkit and post it to the ExtractingRequestHandler. This would be easiest. On Mon, Nov 1, 2010 at 5:49 AM, getagrip wrote: > Ok, so if I did NOT use Solr_J I could PUSH a Stream to Solr somehow? > I do not depend on Solr_J, any connection-meth

Re: solr stuck in writing to inexisting sockets

2010-11-01 Thread Lance Norskog
> Besides, I don't know how you'd stop Solr processing a query mid-way > through, > I don't know of any way to make that happen. The timeAllowed parameter causes a timeout in the Solr server to kill the searching thread. They uses that now. But, yes, Erick is right- there is a fundamental problem

Re: How does DIH multithreading work?

2010-11-01 Thread Lance Norskog
It is useful for parsing PDFs on a multi-processor machine. Also, if a sub-entity does an outbound I/O call to a database, a file, or another SOLR (SOLR-1499). Anything where the pipeline time outweighs disk i/o time. Threading happens on a per-document level- there is no concurrent access inside

Re: Use SolrCloud (SOLR-1873) on trunk, or with 1.4.1?

2010-11-01 Thread Jeremy Hinegardner
I took a swag at applying SOLR-1873 to branch_3x. It applied mostly, most of the rest of the issues where Zookeeper integrations, and those appliedly cleanly by hand. There were also a few constants and such that need to be pulled in from trunk. At the moment, it passes all the tests. I have no

is my search fast ?! date search i need some feedback :D

2010-11-01 Thread stockiii
my index is 13M big and i have not index all of my documents. the index in production system should be about 30M Documents big. so with my test 13M Index i try a search over all documents, with first query: q:[2008-10-27 12:23:00:00 TO 2009-04-29 23:59:00:00] than i run the next query, for st

RE: Solr in virtual host as opposed to /lib

2010-11-01 Thread Chris Hostetter
: I don't think you read the entire thread. I'm assuming you made a mistake. No mistake. When you sent your first message with the subject "Solr in virtual host as opposed to /lib" you did so in response to a completely unrelated thread ("Searching with wrong keyboard layout or using translit

Re: Solr in virtual host as opposed to /lib

2010-11-01 Thread Markus Jelsma
No, he didn't make a mistake but you did. Next time, please start a new thread not by conveniently replying to an existing thread and just changing the subject. Now we have two threads in thread. :) > I don't think you read the entire thread. I'm assuming you made a mistake. > > -Original M

RE: Solr in virtual host as opposed to /lib

2010-11-01 Thread Eric Martin
I don't think you read the entire thread. I'm assuming you made a mistake. -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Monday, November 01, 2010 11:49 AM To: solr-user@lucene.apache.org Subject: Re: Solr in virtual host as opposed to /lib : Reference

Re: Reverse range search

2010-11-01 Thread Jan Høydahl / Cominvent
Hi, I think I have seen a comment on the list from someone with the same need a few months ago. He planned to make a new fieldType to support this, e.g. MinMaxRangeFieldType which would be a polyField type holding both a min and max value, and then you could query it q=myminmaxfield:123 I did

Re: Solr in virtual host as opposed to /lib

2010-11-01 Thread Chris Hostetter
: References: : : : : : In-Reply-To: : Subject: Solr in virtual host as opposed to /lib http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead s

Re: Facet count of zero

2010-11-01 Thread Tod
On 11/1/2010 1:03 PM, Yonik Seeley wrote: On Mon, Nov 1, 2010 at 12:55 PM, Tod wrote: I'm trying to exclude certain facet results from a facet query. �It seems to work but rather than being excluded from the facet list its returned with a count of zero. If you don't want to see 0 counts, use

Testing/packaging question

2010-11-01 Thread Bernhard Reiter
Hi, I'm pretty much of a Solr newbie currently packaging solrpy for Debian; see http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/ In order to run solrpy's supplied tests at build time, I'd need Solr to know about the schema.xml that comes with the tests. Can anyone tell

Re: Using ICUTokenizerFilter or StandardAnalyzer with UAX#29 support from Solr

2010-11-01 Thread Robert Muir
On Mon, Nov 1, 2010 at 1:34 PM, Burton-West, Tom wrote: > Thanks Robert, > > I'll use the workaround for now (using StandardTokenizerFactory and > specifying version 3.1), but I suspect that I don't want the added URL/IP > address recognition due to my use case.  I've also talked to a couple peo

RE: indexing '-

2010-11-01 Thread PeterKerk
Guys, the "string" type did the trick :) Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-tp1816969p1823199.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: How does DIH multithreading work?

2010-11-01 Thread Dyer, James
Mark, I have the same question so I did a little research on this. Not a complete answer but here is what I've found: - "threads" was aded with SOLR-1352 (https://issues.apache.org/jira/browse/SOLR-1352). - Also see http://www.lucidimagination.com/search/document/a9b26ade46466ee/queries_rega

RE: Using ICUTokenizerFilter or StandardAnalyzer with UAX#29 support from Solr

2010-11-01 Thread Burton-West, Tom
Thanks Robert, I'll use the workaround for now (using StandardTokenizerFactory and specifying version 3.1), but I suspect that I don't want the added URL/IP address recognition due to my use case. I've also talked to a couple people who recommended using the ICUTokenFilter with some rule modif

Re: Problem with phrase matches in Solr

2010-11-01 Thread darren
Take a look at term proximity and phrase query. http://wiki.apache.org/solr/SolrRelevancyCookbook > Hey guys, > > I have a solr index where i store information about experts from > various fields. The thing is when I search for "channel marketing" i > get people that have the word channel or mark

RE: Solr in virtual host as opposed to /lib

2010-11-01 Thread Eric Martin
I was speaking about apache virtual hosts. I was concerned that there was an increase processing time due to the solr and nutch instance being housed inside a virtual host as opposed to being dropped in root of my distro. Thank you for the astute clarification. -Original Message- From:

Re: Facet count of zero

2010-11-01 Thread Yonik Seeley
On Mon, Nov 1, 2010 at 12:55 PM, Tod wrote: > I'm trying to exclude certain facet results from a facet query.  It seems to > work but rather than being excluded from the facet list its returned with a > count of zero. If you don't want to see 0 counts, use facet.mincount=1 http://wiki.apache.org

Problem with phrase matches in Solr

2010-11-01 Thread Moazzam Khan
Hey guys, I have a solr index where i store information about experts from various fields. The thing is when I search for "channel marketing" i get people that have the word channel or marketing in their data. I only want people who have that entire phrase in their bio. I copy the contents of bio

Facet count of zero

2010-11-01 Thread Tod
I'm trying to exclude certain facet results from a facet query. It seems to work but rather than being excluded from the facet list its returned with a count of zero. Ex: q=(-foo:bar)&facet=true&facet.field=foo&facet.sort=idx&wt=json&indent=true This returns bar with a count of zero. All t

Re: Solr in virtual host as opposed to /lib

2010-11-01 Thread Jonathan Rochkind
I think you guys are talking about two different kinds of 'virtual hosts'. Lance is talking about CPU virtualization. Eric appears to be talking about apache virtual web hosts, although Eric hasn't told us how apache is involved in his setup in the first place, so it's unclear. Assuming you a

Re: Using ICUTokenizerFilter or StandardAnalyzer with UAX#29 support from Solr

2010-11-01 Thread Robert Muir
On Mon, Nov 1, 2010 at 12:24 PM, Burton-West, Tom wrote: > We are trying to solve some multilingual issues with our Solr analysis filter > chain and would like to use the new Lucene 3.x filters that are Unicode > compliant. > > Is it possible to use the Lucene ICUTokenizerFilter or StandardAnaly

Using ICUTokenizerFilter or StandardAnalyzer with UAX#29 support from Solr

2010-11-01 Thread Burton-West, Tom
We are trying to solve some multilingual issues with our Solr analysis filter chain and would like to use the new Lucene 3.x filters that are Unicode compliant. Is it possible to use the Lucene ICUTokenizerFilter or StandardAnalyzer with UAX#29 support from Solr? Is it just a matter of writing

Re: solr stuck in writing to inexisting sockets

2010-11-01 Thread Erick Erickson
I'm going to nudge you in the direction of understanding why the queries take so long in the first place rather than going toward the blunt approach of cutting them off after some time. The fact that you don't control the queries submitted doesn't prevent you from trying to understand what is takin

Re: Multiple Keyword Search

2010-11-01 Thread Erick Erickson
I'm not sure this exactly fits your use-case, but it may come "close enough". Have you looked at disMax and the mm parameter (minimum should match)? Best Erick On Mon, Nov 1, 2010 at 5:00 AM, Pawan Darira wrote: > Hi > > There is a situation where i search for more than 1 keyword & my main 2 >

Re: Boosting the score based on certain field

2010-11-01 Thread Erick Erickson
Would simple boosting work? As in category:electronics^2? If not, perhaps you can explain a bit more about what you're trying to accomplish... Best Erick On Sun, Oct 31, 2010 at 10:55 PM, sivaprasad wrote: > > Hi, > > In my document i have a filed called category.This contains > "electronics,ga

Re: Solr Relevency Calculation

2010-11-01 Thread Erick Erickson
Here's a good place to start: http://search.lucidimagination.com/search/out?u=http://lucene.apache.org/java/2_4_0/scoring.html But what do you mean "this is going to search on five fields"? This s

Re: big terms in UnInvertedField

2010-11-01 Thread Koji Sekiguchi
Yonik, Thank you for your reply. I just wanted to share my surprise. :) Koji -- http://www.rondhuit.com/en/ (10/11/01 23:17), Yonik Seeley wrote: 2010/11/1 Koji Sekiguchi: With solr example, using facet.field=text creates UnInvertedField for the text field in fieldValueCache. After that, I sa

Re: big terms in UnInvertedField

2010-11-01 Thread Yonik Seeley
2010/11/1 Koji Sekiguchi : > With solr example, using facet.field=text creates UnInvertedField > for the text field in fieldValueCache. After that, I saw stats page > and I was surprised at counters in *filterCache* were up: > Do they cause of big words in UnInvertedField? Yes. "big" terms (defi

big terms in UnInvertedField

2010-11-01 Thread Koji Sekiguchi
Hello, With solr example, using facet.field=text creates UnInvertedField for the text field in fieldValueCache. After that, I saw stats page and I was surprised at counters in *filterCache* were up: lookups : 213 hits : 106 hitratio : 0.49 inserts : 107 evictions : 0 size : 107 warmupTime : 0 cum

Re: solr stuck in writing to inexisting sockets

2010-11-01 Thread Roxana Angheluta
Hi, Yes, sometimes it takes >5 minutes for a query. I agree this is not desirable. However, if the application has no control over the input queries other that closing the socket after a while, solr should not continue writing the response, but terminate the thread. In general, is there a way

Re: Custom Sorting in Solr

2010-11-01 Thread Ezequiel Calderara
Ok i imagined that the double linked list would be far too complicated for solr. Now, how can i achieve that solr connects to a webservice and do the import? I'm sorry if i'm not clear, sometimes my english gets fuzzy :P On Fri, Oct 29, 2010 at 4:51 PM, Yonik Seeley wrote: > On Fri, Oct 29, 201

Re: Design and Usage Questions

2010-11-01 Thread getagrip
Ok, so if I did NOT use Solr_J I could PUSH a Stream to Solr somehow? I do not depend on Solr_J, any connection-method would suffice. On 11/01/2010 03:23 AM, Lance Norskog wrote: 2. The SolrJ library handling of content streams is "pull", not "push". That is, you give it a reader and it pulls co

Re: Design and Usage Questions

2010-11-01 Thread torin farmer
Hm, I do not have a webserver setup for security reasons.I use SVNKit to connect to SVN via the "file://" protocol, what I get then is the ByteArrayOutputStream.What would the buffer-solution or the DualThread Writer/Reader pair look like?-Ursprüngliche Nachricht- Von: "Lance Norskog"

Re:Re:Re: problem of solr replcation's speed

2010-11-01 Thread kafka0102
I suspected my app has some sleeping op every 1s, so I changed ReplicationHandler.PACKET_SZ to 1024 * 1024*10; // 10MB and log result is like thus : [2010-11-01 17:49:29][INFO][pool-6-thread-1][SnapPuller.java(1038)]readFully10485760 cost 3184 [2010-11-01 17:49:32][INFO][pool-6-thread-1][SnapPu

Re:Re: problem of solr replcation's speed

2010-11-01 Thread kafka0102
I hacked SnapPuller to log the cost, and the log is like thus: [2010-11-01 17:21:19][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost 979 [2010-11-01 17:21:19][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost 4 [2010-11-01 17:21:19][INFO][pool-6-thread-1][Snap

Multiple Keyword Search

2010-11-01 Thread Pawan Darira
Hi There is a situation where i search for more than 1 keyword & my main 2 fields are ad_title & ad_description. I want those results which match all of the keywords in both fields, should come on top. Then sequentially one by one keyword can be dropped in further results. E.g. In a search of 3 k

Re: Filtering results based on score

2010-11-01 Thread Ahmet Arslan
> As part of solr results i am able to get the max score.If i > want to filter > the results based on the max score, let say the max > score  is 10 And i need > only the results between max score  to 50 % of max > score.This max score is > going to change dynamically.How can we implement this?Do we

Boosting the score based on certain field

2010-11-01 Thread sivaprasad
Hi, In my document i have a filed called category.This contains "electronics,games ,..etc".For some of the category values i need to boost the document score.Let us say, for "electronics" category, i will decide the boosting parameter grater than the "games" category.Is there any body has the ide

Solr Relevency Calculation

2010-11-01 Thread sivaprasad
Hi, I have 25 indexed fields in my document.But by default, if i give "q=laptops" this is going to search on five fields and iam getting the score as part of search results.How solr will calculate the score?Is it going to calculate only on the five fields or on 25 fields which are indexed?What is

Filtering results based on score

2010-11-01 Thread sivaprasad
Hi, As part of solr results i am able to get the max score.If i want to filter the results based on the max score, let say the max score is 10 And i need only the results between max score to 50 % of max score.This max score is going to change dynamically.How can we implement this?Do we need to