Re: Slow facet sorting - lex vs count

2010-08-25 Thread Yonik Seeley
->string conversions. -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 > Regards > Eric > > On Wed, Aug 25, 2010 at 3:28 PM, Yonik Seeley > wrote: >> >> On Wed, Aug 25, 2010 at 7:22 AM, Eric Grobler >> wrote: >> > There is

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Yonik Seeley
On Wed, Aug 25, 2010 at 11:29 AM, Peter Spam wrote: > So, I went through all the effort to break my documents into max 1 MB chunks, > and searching for hello still takes over 40 seconds (searching across 7433 > documents): > >        8 results (41980 ms) > > What is going on???  (scroll down for

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Yonik Seeley
On Wed, Aug 25, 2010 at 10:55 AM, Eric Grobler wrote: > Thanks for the technical explanation. > I will in general try to use lex and sort by count in the client if there > are not too many rows. I just developed a patch that may help this scenario: https://issues.apache.org/jira/browse/SOLR-2089

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Yonik Seeley
On Wed, Aug 25, 2010 at 2:50 PM, Yonik Seeley wrote: > On Wed, Aug 25, 2010 at 10:55 AM, Eric Grobler > wrote: >> Thanks for the technical explanation. >> I will in general try to use lex and sort by count in the client if there >> are not too many rows. > > I j

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Yonik Seeley
On Wed, Aug 25, 2010 at 7:22 AM, Eric Grobler wrote: > Hi Solr experts, > > There is a huge difference doing facet sorting on lex vs count > The strange thing is that count sorting is fast when setting a small limit. > I realize I can do sorting in the client, but I am just curious why this is. >

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Yonik Seeley
On Wed, Aug 25, 2010 at 2:34 PM, Peter Spam wrote: > This is a very small number of documents (7000), so I am surprised Solr is > having such a hard time with it!! > > I do facet on 3 terms. > > Subsequent "hello" searches are faster, but still well over a second.  This > is a very fast Mac Pro,

Re: false matches with ReversedWildcardFilterFactory

2010-09-03 Thread Yonik Seeley
On Thu, Sep 2, 2010 at 1:10 PM, Landon Kuhn wrote: > Hello, I am using the ReversedWildcardFilterFactory, and I am > wondering if there is a way to prevent false matches when a query > token matches the reversed indexed token. For instance, the query > *zemog* matches documents that contain Gomez.

Re: anyone use hadoop+solr?

2010-09-06 Thread Yonik Seeley
On Mon, Sep 6, 2010 at 8:37 AM, MitchK wrote: > 10 % numShards(10) ->  1 -> doc 10 will be indexed at shard 1... and what > about the older version at shard 2? I am no expert when it comes to > cloudComputing and the other stuff. > If you can point me to one or another reference where I can read a

Re: anyone use hadoop+solr?

2010-09-06 Thread Yonik Seeley
On Mon, Sep 6, 2010 at 9:47 AM, MitchK wrote: > are there any discussions about SolrCloud-indexing? Not recently - personally I've been sidetracked by other stuff. Mapping docs to shards is the easy part... take a hash of the id, and then I imagine the shard id (the label for the index) can just

Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)

2010-09-06 Thread Yonik Seeley
On Mon, Sep 6, 2010 at 10:18 AM, MitchK wrote: [...consistent hashing...] > But it doesn't solve the problem at all, correct me if I am wrong, but: If > you add a new server, let's call him IP3-1, and IP3-1 is nearer to the > current ressource X, than doc x will be indexed at IP3-1 - even if IP2-1

Re: How to enable Unicode Support in Solr

2010-09-06 Thread Yonik Seeley
On Mon, Sep 6, 2010 at 10:30 AM, Walter Underwood wrote: > On Sep 6, 2010, at 1:49 AM, Lance Norskog wrote: > >> 1) The XML file must include the UTF-8 encoding metadata in the first line. > > If it requires that, it isn't a legal XML parser. The encoding declaration is > optional and it defaults

Re: How to retrieve the full corpus

2010-09-06 Thread Yonik Seeley
On Mon, Sep 6, 2010 at 10:52 AM, Roland Villemoes wrote: > How can I retrieve all words from a Solr core? > I need a list of all the words and how often they occur in the index. http://wiki.apache.org/solr/TermsComponent It doesn't currently stream though, so requesting *all* at once might take

Re: Null Pointer Exception with shards&facets where some shards have no values for some facets.

2010-09-07 Thread Yonik Seeley
Thanks for the report Ron, can you open a JIRA issue? What version of Solr is this? -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 On Tue, Sep 7, 2010 at 8:31 PM, Ron Mayer wrote: > Short summary: >  * Mixing Facets and Shards give me a NullPointerException >    when

Re: Null Pointer Exception with shards&facets where some shards have no values for some facets.

2010-09-08 Thread Yonik Seeley
On Tue, Sep 7, 2010 at 8:31 PM, Ron Mayer wrote: > Short summary: >  * Mixing Facets and Shards give me a NullPointerException >    when not all docs have all facets. https://issues.apache.org/jira/browse/SOLR-2110 I believe the underlying real issue stemmed from your use of a complex key "invol

Re: Re: Invariants on a specific fq value

2010-09-08 Thread Yonik Seeley
2010 at 1:32 PM, Markus Jelsma wrote: > Interesting! I haven't met the appends method before and i'll be sure to give > it a try tomorrow. Try, the wiki [1] is not very clear on what it really does. Here's a comment from the example solrconfig.xml: -Yonik http://lucenerevolution.org Luce

Re: Null Pointer Exception with shards&facets where some shards have no values for some facets.

2010-09-08 Thread Yonik Seeley
? Can you try trunk again now? -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 On Wed, Sep 8, 2010 at 6:28 PM, Ron Mayer wrote: > Yonik Seeley wrote: >> On Tue, Sep 7, 2010 at 8:31 PM, Ron Mayer wrote: >>> Short summary: >>>  * Mixing

[ANN] Webinar, Sep 15: Mastering the Power of Faceted Search

2010-09-08 Thread Yonik Seeley
Folks, here's an upcoming Solr webinar sponsored by my employer. It's Hoss on faceting, so it should be good! -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 --- Webinar Details Join us for a free webc

Re: Solr, c/s type ?

2010-09-09 Thread Yonik Seeley
On Thu, Sep 9, 2010 at 1:20 AM, Jonathan Rochkind wrote: > You _could_ use SolrJ with EmbeddedSolrServer.  But personally I wouldn't > unless there's a reason to.  There's no automatic reason not to use the > ordinary Solr HTTP api, even for an in-house application which is not a web > applicat

Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)

2010-09-09 Thread Yonik Seeley
On Thu, Sep 9, 2010 at 11:51 AM, Grant Ingersoll wrote: > On Sep 6, 2010, at 10:41 AM, Yonik Seeley wrote: >> For SolrCloud, I don't think we'll end up using consistent hashing - >> we don't need it (although some of the concepts may still be useful). > > Can y

Re: solr / lucene engineering positions in Boston, MA USA @ the Echo Nest

2010-09-10 Thread Yonik Seeley
On Fri, Sep 10, 2010 at 9:18 AM, Brian Whitman wrote: > Hi all, brief message to let you know that we're in heavy hire mode at the > Echo Nest. As many of you know we are very heavy solr/lucene users (~1bn > documents across many many servers) and a lot of our staff have been working > with and co

Re: SEVERE: java.io.IOException: The specified network name is no longer available

2010-09-10 Thread Yonik Seeley
On Fri, Sep 10, 2010 at 2:12 PM, brian519 wrote: > Once we see the error, it is persistent.  Restarting Tomcat makes the error > stop.  This is happening across a variety of deployments and networks, so I > don't think there is an actual network problem.  Many other apps operate > fine on the same

Re: Null Pointer Exception with shards&facets where some shards have no values for some facets.

2010-09-10 Thread Yonik Seeley
On Fri, Sep 10, 2010 at 7:21 PM, Ron Mayer wrote: > Ron Mayer wrote: > Yes, looks good now. > Thanks! Great, thanks for the report! -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8

Re: Facet Field Value truncation

2010-09-14 Thread Yonik Seeley
On Tue, Sep 14, 2010 at 3:35 PM, Niall O'Connor wrote: > Has anyone come across a situation where they have seen their facet field > values wrap into a new facet entry when the value exceeds 256 characters? Yes, for indexed string fields, there currently is a limit of 256 chars per token. It's b

Re: solr.DateField: org.apache.solr.common.SolrException: Error while creating field

2010-09-14 Thread Yonik Seeley
On Tue, Sep 14, 2010 at 4:54 PM, h00kpub...@gmail.com wrote: > SEVERE: org.apache.solr.common.SolrException: Error while creating field > 'metadata_last_modified{type=date,properties=indexed,stored,omitNorms}' from > value '2010-09-14T22:29:24+0200' Different timezones are currently not allowed -

Re: Solr returning irrelevant results

2010-09-15 Thread Yonik Seeley
On Wed, Sep 15, 2010 at 11:29 AM, Nguyen, Vincent (CDC/OSELS/PHITPO) (CTR) wrote: > I was running a query on the word "mining" and got results from > documents that have nothing to do with mining.  I got results with a > score of 0.2997284 and less.  It looks like Solr was querying the > dsm.fullt

Re: Null Pointer Exception while indexing

2010-09-15 Thread Yonik Seeley
On Wed, Sep 15, 2010 at 1:12 PM, andrewdps wrote: > > What could be possible error for > > 14-Sep-10 4:28:47 PM org.apache.solr.common.SolrException log > SEVERE: java.util.concurrent.ExecutionException: > java.lang.NullPointerException >   at java.util.concurrent.FutureTask$Sync.innerGet(libgcj.s

Re: Null Pointer Exception while indexing

2010-09-16 Thread Yonik Seeley
On Wed, Sep 15, 2010 at 2:01 PM, andrewdps wrote: > I still get the same error when I try to index the mrc file... If you get the exact same error, then you are still using GCJ. When you type "java" it's probably going to GCJ because of your path (i.e. change it or directly specify the path to th

Re: SOLR interface with PHP using javabin?

2010-09-16 Thread Yonik Seeley
On Thu, Sep 16, 2010 at 2:30 PM, onlinespend...@gmail.com wrote: >  I am planning on creating a website that has some SOLR search capabilities > for the users, and was also planning on using PHP for the server-side > scripting. > > My goal is to find the most efficient way to submit search queries

Re: Version stability [was: svn branch issues]

2010-09-17 Thread Yonik Seeley
I think we aim for a "stable" trunk (4.0-dev) too, as we always have (in the functional sense... i.e. operate correctly, don't crash, etc). The stability is more a reference to API stability - the Java APIs are much more likely to change on trunk. Solr's *external* APIs are much less likely to ch

Re: Version stability [was: svn branch issues]

2010-09-17 Thread Yonik Seeley
On Fri, Sep 17, 2010 at 10:46 AM, Mark Miller wrote: > I agree it's mainly API wise, but there are other issues - largely due > to Lucene right now - consider the bugs that have been dug up this year > on the 4.x line because flex has been such a large rewrite deep in > Lucene. We wouldn't do flex

Re: into

2010-09-17 Thread Yonik Seeley
On Fri, Sep 17, 2010 at 4:12 PM, facholi wrote: > > Hi, > > I would like a json result like that: > > { >   id:2342, >   name:"Abracadabra", >   metadatas: [ >      {type:"tag", name:"tutorial"}, >      {type:"value", name:"2323.434/434"}, >   ] > } Do you mean JSON with the tags not quoted (that

Re: multiple spatial values

2010-09-21 Thread Yonik Seeley
On Tue, Sep 21, 2010 at 12:12 PM, dan sutton wrote: > I was looking at the LatLonType and how it might represent multiple lon/lat > values ... it looks to me like the lat would go in {latlongfield}_0_LatLon > and the long in {latlongfield}_1_LatLon ... how then if we have multiple > lat/long point

Re: matches in result grouping

2010-09-23 Thread Yonik Seeley
2010/9/23 Koji Sekiguchi : >  (10/09/23 18:14), Koji Sekiguchi wrote: >>  I'm using recent committed field collapsing / result grouping >> feature in trunk. >> >> I'm confusing matches parameter in the result at the second >> sample output of Wiki: >> >> http://wiki.apache.org/solr/FieldCollapsing#

Re: Searches with a period (.) in the query

2010-09-23 Thread Yonik Seeley
On Wed, Sep 22, 2010 at 8:13 PM, Siddharth Powar wrote: > I am getting some weird output upon searching in solr. For certain searches > that have a period in the search term (e.g: q=ab.xyz) solr returns the > results perfectly, but for some other searches (e.g: q=ab.pqr) solr would > return 0 resu

Re: Range query not working

2010-09-23 Thread Yonik Seeley
On Thu, Sep 23, 2010 at 4:30 PM, PeterKerk wrote: > I have this in my query: > &q=*:*&facet.query=location_rating_total:[3 TO 100] > > And this document: > > - > > 1.0 > 1 > 2 > > > But still my total results equals 6 (total population) and not 0 as I would > expect > > Why? facet.query will

Re: Range query not working

2010-09-23 Thread Yonik Seeley
On Thu, Sep 23, 2010 at 5:44 PM, Jonathan Rochkind wrote: > The field type in a standard schema.xml that's defined as "integer" is NOT > sortable. Right - before 1.4. There is no "integer" field type in 1.4 and beyond in the example schema. > You can not sort on this and get what you want. (Wha

Re: AbstractMethodError with lucid KStem

2010-09-24 Thread Yonik Seeley
On Fri, Sep 24, 2010 at 9:39 AM, Bernd Fehling wrote: > I tried using lucid KStem with solr trunk version but get AbstractMethodError. That hasn't been ported to trunk yet. -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8

Re: matches in result grouping

2010-09-25 Thread Yonik Seeley
On Sat, Sep 25, 2010 at 2:07 AM, Koji Sekiguchi wrote: > Thanks Yonik for the explanation. > One more question. I think SearchGroupDocs.matches is unused > (I think TopDocsCollector.totalHits is used for displaying numFound > in each group). > Will it be used in the future for some reasons (if so

Re: bi-grams for common terms - any analyzers do that?

2010-09-25 Thread Yonik Seeley
On Sat, Sep 25, 2010 at 8:21 PM, Jonathan Rochkind wrote: > Huh, okay, I didn't know that #2 happened at all. Can you explain or point me > to documentation to explain when it happens?  I'm afraid I'm having trouble > understanding <<  if the analyzer returns more than one position back from a

Re: urgent SOLR query server request hangs

2010-09-27 Thread Yonik Seeley
On Mon, Sep 27, 2010 at 11:09 AM, Bharat Jain wrote: >   We are running into issues with SOLR queries. Our solr queries just hang. Are you perhaps using distributed search and accidentally set up an infinite loop? Do *not* configure a default "shards" param on your /select handler. Other than th

Re: Conditional Function Queries

2010-09-28 Thread Yonik Seeley
On Tue, Sep 28, 2010 at 11:33 AM, Jan Høydahl / Cominvent wrote: > Have anyone written any conditional functions yet for use in Function Queries? Nope - but it makes sense and has been on my list of things to do for a long time. -Y http://lucenerevolution.org Lucene/Solr Conference, Boston Oct

Re: Queries, Functions, and Params

2010-09-29 Thread Yonik Seeley
On Tue, Sep 28, 2010 at 6:08 PM, Robert Thayer wrote: > On the http://wiki.apache.org/solr/FunctionQuery page, the following query > function is listed: > > q={!func}add($v1,$v2)&v1=sqrt(popularity)&v2=100.0 > > When run against the default solr instance, server returns the error(400): > "undefi

Re: Local Solr, Spatial Search, and LatLonType clarification

2010-09-30 Thread Yonik Seeley
On Thu, Sep 30, 2010 at 1:09 PM, webdev1977 wrote: > 1.  I noticed that it said that the type of LatLongType can not be > mulitvalued. Does that mean that I can not have multiple lat/lon values for > one document. That means that if you want to have multiple points per document, each point must b

Re: Local Solr, Spatial Search, and LatLonType clarification

2010-09-30 Thread Yonik Seeley
On Thu, Sep 30, 2010 at 1:40 PM, webdev1977 wrote: > Or.. do you mean each field must have a unique name, but both be of type > latLon(solr.LatLonType). > x,y > x,y Yes. > If the statement directly above is true (I hope that it is not), how does > one dynamically create fields when adding geota

Re: Local Solr, Spatial Search, and LatLonType clarification

2010-09-30 Thread Yonik Seeley
On Thu, Sep 30, 2010 at 1:48 PM, Yonik Seeley wrote: > Dynamic field types.  You can configure it such that anything ending > with _latlon is of type LatLonType. > Perhaps we should do this in the example schema. Looks like we already have it: So you should be able to add stuff li

Re: Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-09-30 Thread Yonik Seeley
On Thu, Sep 30, 2010 at 10:41 AM, Renee Sun wrote: > > Hi - > I posted this problem but no response, I guess I need to post this in the > Solr-User forum. Hopefully you will help me on this. > > We were running Solr 1.3 for long time, with 130 cores. Just upgrade to Solr > 1.4, then when we start

Re: Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-10-01 Thread Yonik Seeley
On Thu, Sep 30, 2010 at 4:52 PM, Renee Sun wrote: >  - do you have any warming queries configured? > > no, all autowarmingcount are set to 0 for all caches Any static warming requests though (newSearcher / firstSearcher hooks in solrconfig.xml)? Is anything at all querying these cores while y

Re: Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-10-02 Thread Yonik Seeley
On Fri, Oct 1, 2010 at 5:42 PM, Renee Sun wrote: > Hi Yonik, > > I attached the solrconfig.xml to you in previous post, and we do have > firstSearch and newSearch hook ups. > > I commented them out, all 130 cores loaded up in 1 minute, same as in solr > 1.3.  total memory took about 1GB. Whereas i

Re: LuceneRevolution - NoSQL: A comparison

2010-10-11 Thread Yonik Seeley
On Mon, Oct 11, 2010 at 8:32 PM, Peter Keegan wrote: > I listened with great interest to Grant's presentation of the NoSQL > comparisons/alternatives to Solr/Lucene. It sounds like the jury is still > out on much of this. Here's a use case that might favor using a NoSQL > alternative for storing '

Re: Spatial search in Solr 1.5

2010-10-12 Thread Yonik Seeley
You may want to check the docs, which were recently updated to reflect the state of trunk: http://wiki.apache.org/solr/SpatialSearch -Yonik http://www.lucidimagination.com On Tue, Oct 12, 2010 at 7:49 PM, PeterKerk wrote: > > Hey Grant, > > Just came accross this post of yours. > > Run a query

Re: Spatial search in Solr 1.5

2010-10-12 Thread Yonik Seeley
On Tue, Oct 12, 2010 at 8:07 PM, PeterKerk wrote: > > Ok, so does this actually say: > for now you have to do calculations based on bounding box instead of great > circle? I tried to make the documentation a little simpler... there's - geofilt... filters within a radius of "d" km (i.e. "great c

Re: LuceneRevolution - NoSQL: A comparison

2010-10-13 Thread Yonik Seeley
On Tue, Oct 12, 2010 at 12:11 PM, Jan Høydahl / Cominvent wrote: > I'm pretty sure the 2nd phase to fetch doc-summaries goes directly to same > server as first phase. But what if you stick a LB in between? A related point - the load balancing implementation that's part of SolrCloud (and looks li

Re: Spatial search in Solr 1.5

2010-10-13 Thread Yonik Seeley
On Wed, Oct 13, 2010 at 7:28 AM, PeterKerk wrote: > Hi, > > Thanks for the quick reply :) > > I downloaded the latest version from the trunk. Got it up and running, and > got the error below: Hopefully the QuickStart on the wiki all worked for you, but you only got the error when customizing your

Re: Spatial search in Solr 1.5

2010-10-13 Thread Yonik Seeley
On Wed, Oct 13, 2010 at 9:42 AM, PeterKerk wrote: > Im now thinking I downloaded the wrong solr zip, I tried this one: > https://hudson.apache.org/hudson/job/Solr-trunk/lastSuccessfulBuild/artifact/trunk/solr/dist/apache-solr-4.0-2010-10-12_08-05-48.zip > > In that example scheme > (\apache-solr-4

Re: Spatial search in Solr 1.5

2010-10-13 Thread Yonik Seeley
On Wed, Oct 13, 2010 at 10:06 AM, PeterKerk wrote: > > haha ;) > > But so I DO have the right solr version? > > Anyways...I have added the lines you mentioned, what else can I do? The fact that the geolocation field does not show up in the results means that it's not getting added (i.e. something

Re: Which version of Solr to use?

2010-10-14 Thread Yonik Seeley
On Thu, Oct 14, 2010 at 1:58 PM, Lukas Kahwe Smith wrote: > the current confusing list of branches is a result of the merge of the lucene > and solr svn repositories. what baffpes me is that so far the countless > plea's for at least a rough roadmap or even just explanation for why so many > br

Re: Which version of Solr to use?

2010-10-14 Thread Yonik Seeley
On Thu, Oct 14, 2010 at 1:50 PM, Jonathan Rochkind wrote: > I'm kind of confused about Solr development plans in general, highlighted by > this thread. > > I think 1.4.1 is the latest officially stable release, yes? > > Why is there both a 1.5 and a 3.x, anyway?  Not to mention a 4.x?  Which of >

Re: Which version of Solr to use?

2010-10-14 Thread Yonik Seeley
On Thu, Oct 14, 2010 at 2:39 PM, Jonathan Rochkind wrote: > Thanks Yonik!  So I gather that the 1.5 branch has essentially been > abandoned, we can pretend it doesn't exist at all, it's been entirely > superceded by the 3.x branch, with the changes made just for the purposes of > syncronizing vers

Re: Which version of Solr to use?

2010-10-14 Thread Yonik Seeley
On Thu, Oct 14, 2010 at 2:55 PM, Mike Squire wrote: > As pointed out before it would be useful to have some kind of > documented road map for development, and some kind of indication of > how close certain versions are to release. Such things have proven to be very unreliable in the past, due to

Re: Faceting and first letter of fields

2010-10-14 Thread Yonik Seeley
On Thu, Oct 14, 2010 at 3:42 PM, Jonathan Rochkind wrote: > I believe that should work fine in Solr 1.4.1.  Creating a field with just > first letter of author is definitely the right (possibly only) way to allow > facetting on first letter of author's name. > > I have very voluminous facets (few

Re: filter query from external list of Solr unique IDs

2010-10-15 Thread Yonik Seeley
On Fri, Oct 15, 2010 at 11:49 AM, Burton-West, Tom wrote: > At the Lucene Revolution conference I asked about efficiently building a > filter query from an external list of Solr unique ids. Yeah, I've thought about a special query parser and query to deal with this (relatively) efficiently, both

Re: facet.field :java.lang.NullPointerException

2010-10-15 Thread Yonik Seeley
This is https://issues.apache.org/jira/browse/SOLR-2142 I'll look into it soon. -Yonik http://www.lucidimagination.com On Fri, Oct 15, 2010 at 3:12 PM, Pradeep Singh wrote: > Faceting blows up when the field has no data. And this seems to be random. > Sometimes it will work even with no data, o

Re: why solr search is slower than lucene so much?

2010-10-20 Thread Yonik Seeley
Careful comparing apples to oranges ;-) For one, your lucene code doesn't retrieve stored fields. Did you try the solr request more than once (with a different q, but the same filters?) Also, by default, Solr independently caches the filters. This can be higher up-front cost, but a win when filte

Re: why sorl is slower than lucene so much?

2010-10-21 Thread Yonik Seeley
2010/10/21 kafka0102 : > I found the problem's cause.It's the DocSetCollector. my fitler query > result's size is about 300,so the DocSetCollector.getDocSet() is > OpenBitSet. And 300 OpenBitSet.fastSet(doc) op is too slow. As I said in my other response to you, that's a perfect reason

Re: Date faceting +1MONTH problem

2010-10-22 Thread Yonik Seeley
On Fri, Sep 17, 2010 at 9:51 PM, Chris Hostetter wrote: > the default query parser > doesn't support range queries with mixed upper/lower bound inclusion. This has just been added to trunk. Things like [0 TO 100} now work. -Yonik http://www.lucidimagination.com

Re: Date faceting +1MONTH problem

2010-10-22 Thread Yonik Seeley
On Fri, Oct 22, 2010 at 6:02 PM, Shawn Heisey wrote: > On 10/22/2010 3:01 PM, Yonik Seeley wrote: >> >> On Fri, Sep 17, 2010 at 9:51 PM, Chris Hostetter >>  wrote: >>> >>>  the default query parser >>> doesn't support range queries with mixed

Re: How to index long words with StandardTokenizerFactory?

2010-10-23 Thread Yonik Seeley
On Fri, Oct 22, 2010 at 12:07 PM, Sergey Bartunov wrote: > I'm trying to force solr to index words which length is more than 255 If the field is not a text field, the Solr's default analyzer is used, which currently limits the token to 256 bytes. Out of curiosity, what's your usecase that you rea

Re: How to index long words with StandardTokenizerFactory?

2010-10-24 Thread Yonik Seeley
On Sun, Oct 24, 2010 at 10:47 AM, Sergey Bartunov wrote: > I did it just as you recommended. Solr indexes files around 15kb, but > no more. The same effect was with patched constants Lucene also has max token sizes it can index. IIRC, lengths used to be stored inline with the char data, and a sin

Re: How to index long words with StandardTokenizerFactory?

2010-10-24 Thread Yonik Seeley
On Sun, Oct 24, 2010 at 11:29 AM, Sergey Bartunov wrote: > It's a kind of research. There is no particular practical use case as > far as I know. > Do you know how to set all these max token lengths? It's a practical limit given how things are coded, not an arbitrary one. Given the lack of use c

Re: eDismax result differs from Dismax

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 9:30 AM, Ryan Walker wrote: > > We are launching a new version of our job board helping returning veterans > find a civilian job, and we chose Solr and Sunspot[1] to power our search. We > really didn't consider the power users in the HR world who are trained to use > bo

Re: documentCache clarification

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 2:31 PM, Jay Luker wrote: > This makes sense but still doesn't explain what I'm seeing in my cache > stats. When I issue a request with rows=10 the stats show an insert > into the queryResultCache. If I send the same query, this time with > rows=1000, I would not expect to

Re: Custom Sorting in Solr

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 3:39 PM, Ezequiel Calderara wrote: > Hi all guys! > I'm in a weird situation here. > We have index a set of documents which are ordered using a linked list (each > documents has the reference of the previous and the next). > > Is there a way when sorting in the solr search,

Re: documentCache clarification

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 3:49 PM, Chris Hostetter wrote: > > : This is a limitation in the SolrCache API. > : The key into the cache does not contain rows, so the cache returns the > : first 10 docs and increments it's hit count.  Then the cache user > : (SolrIndexSearcher) looks at the entry and d

Re: documentCache clarification

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 4:21 PM, Chris Hostetter wrote: > > : > Why don't we just include the start & rows (modulo the window size) in > : > the cache key? > : > : The implementation of equals() would be rather difficult... actually > : impossible w/o abusing the semantics. > : It would also be im

Re: SolrCore.getSearcher() and postCommit()

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 5:36 PM, Grant Ingersoll wrote: > Is it OK to call and increment a Searcher ref (i.e. SolrCore.getSearcher()) > in a SolrEventListener.postCommit() hook as long as I decrement it when I am > done?  I need to get a handle on an IndexReader so I can dump out a portion > of

Re: solr 4.0 - pagination

2010-10-30 Thread Yonik Seeley
On Sat, Oct 30, 2010 at 12:22 PM, Papp Richard wrote: >  I'm using Solr 4.0 with grouping (field collapsing), but unfortunately I > can't solve the pagination. It's not implemented yet, but I'm working on that right now. -Yonik http://www.lucidimagination.com

Re: big terms in UnInvertedField

2010-11-01 Thread Yonik Seeley
2010/11/1 Koji Sekiguchi : > With solr example, using facet.field=text creates UnInvertedField > for the text field in fieldValueCache. After that, I saw stats page > and I was surprised at counters in *filterCache* were up: > Do they cause of big words in UnInvertedField? Yes. "big" terms (defi

Re: Facet count of zero

2010-11-01 Thread Yonik Seeley
On Mon, Nov 1, 2010 at 12:55 PM, Tod wrote: > I'm trying to exclude certain facet results from a facet query.  It seems to > work but rather than being excluded from the facet list its returned with a > count of zero. If you don't want to see 0 counts, use facet.mincount=1 http://wiki.apache.org

Re: Possible memory leaks with frequent replication

2010-11-02 Thread Yonik Seeley
On Tue, Nov 2, 2010 at 12:32 PM, Simon Wistow wrote: > On Mon, Nov 01, 2010 at 05:42:51PM -0700, Lance Norskog said: >> You should query against the indexer. I'm impressed that you got 5s >> replication to work reliably. > > That's our current solution - I was just wondering if there was anything

Re: Negative or zero value for fieldNorm

2010-11-03 Thread Yonik Seeley
Regarding "Negative or zero value for fieldNorm", I don't see any negative fieldNorms here... just very small positive ones? Anyway the fieldNorm is the product of the lengthNorm and the index-time boost of the field (which is itself the product of the index time boost on the document and the inde

Re: blacklist docs by uniqueKey

2010-11-03 Thread Yonik Seeley
On Wed, Nov 3, 2010 at 3:05 PM, Erick Erickson wrote: > How dynamic is this list? Is it feasable to add a field to your docs like > blacklisteddocs, and at editorial's discretion add values to that field > like "app1", "app2"? > > At that point you can just filter them out via a filter query... R

Re: Negative or zero value for fieldNorm

2010-11-04 Thread Yonik Seeley
On Thu, Nov 4, 2010 at 8:04 AM, Markus Jelsma wrote: > The question remains, why does the title field return a fieldNorm=0 for many > queries? Because the index-time boost was set to 0 when the doc was indexed. I can't say how that happened... look to your indexing code. > And a subquestion, do

Re: Negative or zero value for fieldNorm

2010-11-04 Thread Yonik Seeley
On Thu, Nov 4, 2010 at 9:51 AM, Markus Jelsma wrote: > I've done some testing with the example docs and it behaves similar when there > is a zero doc boost. Luke, however, does not show me the index-time boosts. Remember that the norm is a product of the length norm and the index time boost... it

Re: solr 4.0 - pagination

2010-11-07 Thread Yonik Seeley
On Sun, Nov 7, 2010 at 10:55 AM, Papp Richard wrote: >  this is fantastic, but can you tell any time it will be ready ? It already is ;-) Grab the latest trunk or the latest nightly build. -Yonik http://www.lucidimagination.com

Re: solr 4.0 - pagination

2010-11-07 Thread Yonik Seeley
On Sun, Nov 7, 2010 at 2:45 PM, Papp Richard wrote: > Hi Yonik, > >  I've just tried the latest stable version from nightly build: > apache-solr-4.0-2010-11-05_08-06-28.war > >  I have some concerns however: I have 3 documents; 2 in the first group, 1 > in the 2nd group. > >  1. I got for matches

FAST ESP -> Solr migration webinar

2010-11-11 Thread Yonik Seeley
We're holding a free webinar on migration from FAST to Solr. Details below. -Yonik http://www.lucidimagination.com = Solr To The Rescue: Successful Migration From FAST ESP to Open Source Search Based on Apache Solr Thur

Re: facetting when using field collapsing

2010-11-13 Thread Yonik Seeley
On Wed, Nov 10, 2010 at 9:12 AM, Lukas Kahwe Smith wrote: > The above wiki page seems to be out of date. Reading the comments in > https://issues.apache.org/jira/browse/SOLR-236 it seems like "group" should > be replaced with "collapse". The Wiki page is not expansive, but I've tried to make it

Re: facetting when using field collapsing

2010-11-13 Thread Yonik Seeley
On Sat, Nov 13, 2010 at 10:46 AM, Lukas Kahwe Smith wrote: > > On 13.11.2010, at 10:30, Yonik Seeley wrote: > >> On Wed, Nov 10, 2010 at 9:12 AM, Lukas Kahwe Smith >> wrote: >>> The above wiki page seems to be out of date. Reading the comments in >>> htt

Re: IndexableBinaryStringTools (was FieldCache)

2010-11-13 Thread Yonik Seeley
On Sat, Nov 13, 2010 at 1:50 PM, Steven A Rowe wrote: > Looks to me like the returned value is in a Solr-internal form of XML > character escaping: \u is represented as "#0;" and \u0008 is represented > as "#8;".  (The escaping code is in > solr/src/java/org/apache/common/util/XML.java.) Y

Re: Solr Negative query

2010-11-14 Thread Yonik Seeley
On Sun, Nov 14, 2010 at 4:17 AM, Leonardo Menezes wrote: > try > Field1:Val1 AND (*:* NOT Field2:Val2), that shoud work ok That should be equivalent to Field1:Val1 -Field2:Val2 You only need the *:* trick if all of the clauses of a boolean query are negative. -Yonik http://www.lucidimagination.c

Re: Solr Negative query

2010-11-15 Thread Yonik Seeley
On Mon, Nov 15, 2010 at 12:42 AM, Viswa S wrote: > > Apologies for starting a new thread again, my mailing list subscription > didn't finalize till later than Yonik's response. > > Using "Field1:Val1 AND (*:* NOT Field2:Val2)" works, thanks. > > Does my original query "Field1:Value1 AND (NOT Fiel

result grouping / field collapsing changes

2010-11-16 Thread Yonik Seeley
We've recently added randomized testing for result grouping that resulted in finding + fixing a number of bugs. I've you've been using this feature, you should move to the latest trunk version. I've also added a section at the bottom of the wiki page to list current limitations. http://wiki.apache

Re: hash uniqueKey generation?

2010-11-16 Thread Yonik Seeley
On Tue, Nov 16, 2010 at 5:31 AM, Dennis Gearon wrote: > hashing is not 100% guaranteed to produce unique values. But if you go to enough bits with a good hash function, you can get the odds lower than the odds of something else changing the value like cosmic rays flipping a bit on you. -Yonik ht

Re: hash uniqueKey generation?

2010-11-16 Thread Yonik Seeley
On Tue, Nov 16, 2010 at 9:05 PM, Dennis Gearon wrote: > Read up on WikiPedia, but I believe that no Hash Function is much good above > 50% > of the address space it generates. 50% is way to high - collisions will happen before that. But given that something like MD5 has 128 bits, that's 3.4e38,

Re: Must require quote with single word token query?

2010-11-19 Thread Yonik Seeley
On Tue, Nov 16, 2010 at 10:28 PM, Chamnap Chhorn wrote: > I have one question related to single word token with dismax query. In order > to be found I need to add the quote around the search query all the time. > This is quite hard for me to do since it is part of full text search. > > Here is my

Re: Must require quote with single word token query?

2010-11-19 Thread Yonik Seeley
lucidimagination.com > On 11/19/10, Yonik Seeley wrote: >> On Tue, Nov 16, 2010 at 10:28 PM, Chamnap Chhorn >> wrote: >>> I have one question related to single word token with dismax query. In >>> order >>> to be found I need to add the quote around the s

Re: Problem with synonyms

2010-11-22 Thread Yonik Seeley
On Sat, Nov 20, 2010 at 5:59 AM, sivaprasad wrote: > Even after expanding the synonyms also i am unable to get same results. What you are trying to do should work with index-time synonym expansion. Just make sure to remove the synonym filter at query time (or use a synonym filter w/o multi-word s

Re: Problem with synonyms

2010-11-22 Thread Yonik Seeley
On Mon, Nov 22, 2010 at 10:29 AM, Yonik Seeley wrote: > On Sat, Nov 20, 2010 at 5:59 AM, sivaprasad > wrote: >> Even after expanding the synonyms also i am unable to get same results. > > What you are trying to do should work with index-time synonym expansion. > Just ma

Re: geospatial

2010-11-24 Thread Yonik Seeley
On Wed, Nov 24, 2010 at 2:41 PM, Dennis Gearon wrote: > What is the recommended Solr version and/or plugin combination to get > geospatial > search up and running the quickest and easiest? It depends on what capabilities you need. The current state of what is committed to trunk is reflected here

Re: Preventing index segment corruption when windows crashes

2010-11-29 Thread Yonik Seeley
On Mon, Nov 29, 2010 at 10:46 AM, Peter Sturge wrote: > If a Solr index is running at the time of a system halt, this can > often corrupt a segments file, requiring the index to be -fix'ed by > rewriting the offending file. Really? That shouldn't be possible (if you mean the index is truly corru

<    3   4   5   6   7   8   9   10   11   12   >