Re: Best strategy to commit often under load.

2009-09-15 Thread Jason Rutherglen
Hi Jerome, 5 seconds is too little using Solr 1.3 or 1.4 because of caching and segment warming. If you turn off caching and segment warming, then you may be able do 5s latency using either a RAMDirectory or an SSD. In the future these issues will be fixed and less than 1s will be possible. -J

Multiple parsedquery in the result set when debugQuery=true

2009-09-15 Thread Jason Rutherglen
Are there supposed to be multiple parsedquery entries for a distributed query when debugQuery=true?

Re: about SOLR-1395 integration with katta

2009-09-10 Thread Jason Rutherglen
Hi Zhong, For #2 the existing patch SOLR-1395 is a good start. It should be fairly simple to deploy indexes and distribute them to Solr Katta nodes/servers. -J On Wed, Sep 9, 2009 at 11:41 PM, Zhenyu Zhong zhongresea...@gmail.com wrote: Jason, Thanks for the reply. In general, I would

Re: about SOLR-1395 integration with katta

2009-09-09 Thread Jason Rutherglen
Hi Zhong, It's a very new patch. I'll update the issue as we start the wiki page. I've been working on indexing in Hadoop in conjunction with Katta, which is different (it sounds) than your use case where you have prebuilt indexes you simply want to distributed using Katta? -J On Wed, Sep 9,

Quickly view index files?

2009-09-02 Thread Jason Rutherglen
Is there a quick way to view index files?

Re: Quickly view index files?

2009-09-02 Thread Jason Rutherglen
, 2009 at 2:02 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Is there a quick way to view index files?

Re: questions about solr

2009-09-02 Thread Jason Rutherglen
For HDFS, failover, sharding you may want to use Solr with Katta. There's an issue open at: http://issues.apache.org/jira/browse/SOLR-1301 Near realtime search needs to be added incrementally to Solr. Today I wouldn't recommend it. On Wed, Sep 2, 2009 at 10:14 AM, Zhenyu

Re: Lucene Search Performance Analysis Workshop

2009-08-27 Thread Jason Rutherglen
Agreed, Solr uses random access bitsets everywhere so I'm thinking this could be an improvement or at least a great option to enable and try out. I'll update LUCENE-1536 so we can benchmark. On Thu, Aug 27, 2009 at 4:06 AM, Michael McCandlessluc...@mikemccandless.com wrote: On Thu, Aug 27, 2009

Re: Optimal Cache Settings, complicated by regular commits

2009-08-27 Thread Jason Rutherglen
Andrew, Which version of Solr are you using? There's an open issue to fix caching filters at the segment level, which will not clear the caches on each commit, you can vote to indicate your interest. http://issues.apache.org/jira/browse/SOLR-1308 -J On Thu, Aug 27, 2009 at 7:06 AM, Andrew

Re: Incremental Deletes to Index

2009-08-25 Thread Jason Rutherglen
This will be implemented as you're stating when IndexWriter.getReader is incorporated. This will carry over deletes in RAM until IW.commit is called (i.e. Solr commit). It's a fairly simple change though perhaps too late for 1.4 release? On Tue, Aug 25, 2009 at 3:10 PM,

Re: Incremental Deletes to Index

2009-08-25 Thread Jason Rutherglen
a new searcher from IW.getReader without calling IW.commit. On Tue, Aug 25, 2009 at 4:37 PM, KaktuChakarabatijimmoe...@gmail.com wrote: Jason, sounds like a very promising change to me - so much that I would gladly work toward creating a patch myself. Are there any specific points in the code u

Re: Return all docs

2009-08-21 Thread Jason Rutherglen
I guess you'd set the the rows parameter in the HTTP request to a high number? Check out /solr/admin/form.jsp which has the rows text field visible. On Fri, Aug 21, 2009 at 9:30 AM, Elaine Lielaine.bing...@gmail.com wrote: I want the query to return all the found docs, not just 10 of them by

Re: [ANNOUNCEMENT] Newly released book: Solr 1.4 Enterprise Search Server

2009-08-21 Thread Jason Rutherglen
It seems possible to cache the results of facet queries on a per segment basis, providing the caching you're describing. On Fri, Aug 21, 2009 at 8:42 AM, Fuad Efendif...@efendi.ca wrote: actually a hybrid that goes back to DocSet intersections when it's more efficient I noticed that too when I

Re: Implementing customized Scorer with solr API 1.4

2009-08-20 Thread Jason Rutherglen
We should probably move to using Lucene's Filters/DocIdSets instead of DocSets and merge the two. Then we will not need to maintain two separate but similar and confusing functionality classes. This will make seamlessly integrating searching with Solr's Filters/DocSets into Lucene's new per

Re: Faceting Performance Factors

2009-08-18 Thread Jason Rutherglen
Hi Cameron, You'll need to upgrade to Solr 1.4 as the 1.3 method of faceting is quite slow (i.e. intersecting bitsets). 1.4 uses UnInvertedField which caches the terms per doc and iterates/counts them. The 1.3 method is slow because for every term (i.e. unique field value) there needs to be a

Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-17 Thread Jason Rutherglen
Fuad, I'd recommend indexing in Hadoop, then copying the new indexes to Solr slaves. This removes the need for Solr master servers. Of course you'd need a Hadoop cluster larger than the number of master servers you have now. The merge indexes command (which can be taxing on the servers because

Re: Maximum number of values in a multi-valued field.

2009-08-17 Thread Jason Rutherglen
Your term dictionary will grow somewhat, which means the term index could consume more memory. Because the term dictionary has grown there could be less performance in looking up terms but that is unlikely to affect your application. How many unique terms will there be? On Mon, Aug 17, 2009 at

Re: Solr 1.4 Replication scheme

2009-08-14 Thread Jason Rutherglen
This would be good! Especially for NRT where this problem is somewhat harder. I think we may need to look at caching readers per corresponding http session. The pitfall is expiring them before running out of RAM. On Fri, Aug 14, 2009 at 6:34 AM, Yonik Seeleyyo...@lucidimagination.com wrote:

Re: Solr conditional adds/updates?

2009-08-14 Thread Jason Rutherglen
You could implement optimistic concurrency? Where a version is stored in the document? Or using the date system you described. Override DirectUpdateHandler2.addDoc with the custom logic. It seems like we should have a way of performing this without custom code and/or an easier way to plug logic

Re: Solr conditional adds/updates?

2009-08-14 Thread Jason Rutherglen
not a Java developer but I am in charge of implementing Solr. Any more detailed/straightforward instructions are very much appreciated. Thank you. Jason Rutherglen-2 wrote: You could implement optimistic concurrency? Where a version is stored in the document? Or using the date system you

Re: facet performance tips

2009-08-13 Thread Jason Rutherglen
be interesting to compare performance with SOLR... Distributed? -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: August-12-09 6:12 PM To: solr-user@lucene.apache.org Subject: Re: facet performance tips For your fields with many terms you may want

Re: facet performance tips

2009-08-13 Thread Jason Rutherglen
at 10:51 AM, Fuad Efendif...@efendi.ca wrote: SOLR-1.4-trunk uses terms counting instead of bitset intersects (seems to be); check this http://issues.apache.org/jira/browse/SOLR-475 (and probably http://issues.apache.org/jira/browse/SOLR-711) -Original Message- From: Jason Rutherglen

Query with no cache without editing solrconfig?

2009-08-12 Thread Jason Rutherglen
Is there a way to do this via a URL?

Re: Solr support for Lucene Near realtime search

2009-08-12 Thread Jason Rutherglen
Hi Alan, Solr 1.4 does not contain near realtime search capabilities and it could be variously detrimental to call commit too often as indexing and searches could precipitously degrade. That being said, most of the NRT functionality is not too difficult to add, except for per segment caching

Re: facet performance tips

2009-08-12 Thread Jason Rutherglen
For your fields with many terms you may want to try Bobo http://code.google.com/p/bobo-browse/ which could work well with your case. On Wed, Aug 12, 2009 at 12:02 PM, Fuad Efendif...@efendi.ca wrote: I am currently faceting on tokenized multi-valued field at http://www.tokenizer.org (25 mlns

Distributed query returns time consumed by each Solr shard?

2009-08-12 Thread Jason Rutherglen
Is there a way to do this currently? If a shard takes an inordinate amount of time compared to the other shards, it's useful to see the various qtimes per shard, with the aggregated results.

Re: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

2009-08-11 Thread Jason Rutherglen
Fuad, The lock indicates to external processes the index is in use, meaning it's not cause ConcurrentMergeScheduler to block. ConcurrentMergeScheduler does merge in it's own thread, however if the merges are large then they can spike IO, CPU, and cause the machine to be somewhat unresponsive.

Re: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

2009-08-11 Thread Jason Rutherglen
wrote: Hi Jason, I am using Master/Slave (two servers); I monitored few hours today - 1 minute of document updates (about 100,000 documents) and then SOLR stops for at least 5 minutes to do background jobs like RAM flush, segment merge... Documents are small; about 10Gb of total index size

Re: search suggest

2009-07-29 Thread Jason Rutherglen
Autosuggest is something that would be very useful to build into Solr as many search projects require it. I'd recommend indexing relevant terms/phrases into a Ternary Search Tree which is compact and performant. Using a wildcard query will likely not be as fast as a Ternary Tree, and I'm not sure

Re: search suggest

2009-07-29 Thread Jason Rutherglen
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/compound/hyphenation/TernaryTree.html On Wed, Jul 29, 2009 at 12:08 PM, Jason Rutherglenjason.rutherg...@gmail.com wrote: Autosuggest is something that would be very useful to build into Solr as many search projects require

Re: search suggest

2009-07-29 Thread Jason Rutherglen
using a different search engine.  It'd sure be kewl if this became a core feature of solr! I like the idea of the tree approach, sounds much faster.  The root is the least letters to start suggestions and the leaves are the full phrases? -Original Message- From: Jason Rutherglen

Solr index as multiple separate index directories

2009-07-21 Thread Jason Rutherglen
I'd like to be able to define within a single Solr core, a set of indexes in multiple directories. This is really useful for indexing in Hadoop or integrating with Katta where an EmbeddedSolrServer is distributed to the Hadoop cluster and indexes are generated in parallel and returned to Solr

Re: Merge Policy

2009-07-21 Thread Jason Rutherglen
I am referring to setting properties on the *existing* policy available in Lucene such as LogByteSizeMergePolicy.setMaxMergeMB On Tue, Jul 21, 2009 at 5:11 PM, Chris Hostetterhossman_luc...@fucit.org wrote: : SolrIndexConfig accepts a mergePolicy class name, however how does one : inject

Setting termInfosIndexDivisor and Interval?

2009-07-19 Thread Jason Rutherglen
Are we currently supporting this or in 1.4? (i.e. IndexReader.open and IndexWriter.setTermIndexInterval) It's useful for trie range, shingles, etc, where many terms are potentially created.

Re: Wikipedia or reuters like index for testing facets?

2009-07-17 Thread Jason Rutherglen
, 2009 at 2:21 PM, Jason Rutherglenjason.rutherg...@gmail.com wrote: Yeah that's what I was thinking of as an alternative, use enwiki and randomly generate facet data along with it. However for consistent benchmarking the random data would need to stay the same so that people could execute

Re: Wikipedia or reuters like index for testing facets?

2009-07-17 Thread Jason Rutherglen
the WikipediaTokenizer and then Tee the Categories, etc. off to separate fields automatically for faceting, etc. -Grant On Jul 17, 2009, at 10:48 AM, Jason Rutherglen wrote: The question that comes to mind is how it's different than http://people.apache.org/~gsingers/wikipedia/enwiki-20070527-pages

Re: Highlight arbitrary text

2009-07-16 Thread Jason Rutherglen
Interesting, many sites don't store text in Lucene/Solr and so need a way to highlight text stored in a database (or some equivalent), they have two options, re-analyze the doc for the term positions or access the term vectors from Solr and hand them to the client who then performs the

Chrome Web Browser doesn't render properly

2009-07-15 Thread Jason Rutherglen
From the Solr admin page, solr/admin/file/?file=schema.xml and /solr/select/?q=solrversion=2.2start=0rows=10indent=on renders improperly (meaning the XML isn't formatted). Maybe Chrome doesn't support XML?

Re: Wikipedia or reuters like index for testing facets?

2009-07-15 Thread Jason Rutherglen
On Jul 14, 2009, at 7:38 PM, Jason Rutherglen wrote:  You think enwiki has enough data for faceting? On Tue, Jul 14, 2009 at 2:56 PM, Grant Ingersollgsing...@apache.org wrote: At a min, it is trivial to use the EnWikiDocMaker and then send the doc over SolrJ... On Jul 14, 2009, at 4:07 PM

Re: Availability during merge

2009-07-14 Thread Jason Rutherglen
Kind of regrettable, I think we can look at changing this in Lucene. On Tue, Jul 14, 2009 at 12:08 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Jul 14, 2009 at 2:30 AM, Charlie Jackson charlie.jack...@cision.com wrote: The wiki page for merging solr cores

Wikipedia or reuters like index for testing facets?

2009-07-14 Thread Jason Rutherglen
Is there a standard index like what Lucene uses for contrib/benchmark for executing faceted queries over? Or maybe we can randomly generate one that works in conjunction with wikipedia? That way we can execute real world queries against faceted data. Or we could use the Lucene/Solr mailing lists

Re: Wikipedia or reuters like index for testing facets?

2009-07-14 Thread Jason Rutherglen
, Jason Rutherglen jason.rutherg...@gmail.com wrote: Is there a standard index like what Lucene uses for contrib/benchmark for executing faceted queries over? Or maybe we can randomly generate one that works in conjunction with wikipedia? That way we can execute real world queries against

Re: Aggregating/Grouping Document Search Results on a Field

2009-07-13 Thread Jason Rutherglen
SOLR 1.4 has a new feature https://issues.apache.org/jira/browse/SOLR-475that speeds up faceting on fields with many terms by adding an UnInvertedField. Bobo uses a custom field cache as well. It may be useful to benchmark the 3 different approaches (bitsets, SOLR-475, Bobo). This could be a good

Merge Policy

2009-07-13 Thread Jason Rutherglen
SolrIndexConfig accepts a mergePolicy class name, however how does one inject properties into it?

Re: Caching per segmentReader?

2009-07-13 Thread Jason Rutherglen
Shall we create an issue for this so we can list out desirable features? On Sun, Jul 12, 2009 at 7:01 AM, Yonik Seeley ysee...@gmail.com wrote: On Sat, Jul 11, 2009 at 7:38 PM, Jason Rutherglenjason.rutherg...@gmail.com wrote: Are we planning on implementing caching (docsets, documents

allowDocsOutOfOrder support?

2009-07-13 Thread Jason Rutherglen
Is there a way to set this in SOLR 1.3 using solrconfig? Otherwise one needs to instantiate a class that statically calls BooleanQuery.setAllowDocsOutOfOrder?

Caching per segmentReader?

2009-07-11 Thread Jason Rutherglen
Are we planning on implementing caching (docsets, documents, results) per segment reader or is this something that's going to be in 1.4?

Re: Ocean realtime search + Solr

2008-10-22 Thread Jason Rutherglen
Not quite yet, there is the IndexReader.clone patch that needs to be completed that Ocean depends on https://issues.apache.org/jira/browse/LUCENE-1314. I had it completed but then things changed in IndexReader so now it doesn't work and I have not had time to complete it again. Otherwise the

Re: spellcheck: issues

2008-10-10 Thread Jason Rennie
the trade-off between frequency and edit distance. I'll file a jira and look at submitting a patch. Cheers, Jason On Thu, Oct 9, 2008 at 9:22 AM, Grant Ingersoll [EMAIL PROTECTED] wrote: Sorting in the SpellChecker is handled by the SuggestWord.compareTo() method in Lucene. It looks like: public

Re: spellcheck: issues

2008-10-08 Thread Jason Rennie
Hi Grant, Here are solr config files (attached) and java code (included below) to recreate the test case. Jason ListPairString, Integer terms = new ArrayListPairString, Integer(); terms.add(new PairString, Integer(chanel, 834)); terms.add(new PairString, Integer(chant

Re: spellcheck: issues

2008-10-08 Thread Jason Rennie
do better than the two-step sort. Cheers, Jason

Re: spellcheck: issues

2008-10-08 Thread Jason Rennie
, as it is a bit more sophisticated than Levenstein when it comes to scoring. I just tried J-W and *yes* it seems to do a much better job! I'd certainly vote for that becoming the default :) Thanks for all the help! Much appreciated. Jason -- Jason Rennie Head of Machine Learning Technologies

Re: spellcheck: issues

2008-10-08 Thread Jason Rennie
On Wed, Oct 8, 2008 at 3:31 PM, Jason Rennie [EMAIL PROTECTED] wrote: I just tried J-W and *yes* it seems to do a much better job! I'd certainly vote for that becoming the default :) Ack! I did some more testing and J-W results started to get weird (including suggesting courses for coursets

Re: spellcheck: issues

2008-10-07 Thread Jason Rennie
? Jason

Re: spellcheck: issues

2008-10-07 Thread Jason Rennie
Sure. I just sent the relevant files/code directly to you. Let me know if you don't get them or have any trouble with them. Jason On Tue, Oct 7, 2008 at 3:27 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: Can you share your spellchecker setup and the code for the test case? I would like

Re: Transitioning from Solr 1.2 to Solr 1.3

2008-10-06 Thread Jason Rennie
you, I'd assume that some of the documents will be lost, so I'd reindex any recently added documents. Jason On Wed, Oct 1, 2008 at 12:58 PM, Michael Tedesco [EMAIL PROTECTED]wrote: Hey, I just want to copy over my indexes from my production server running 1.2 to another test server running 1.3

Re: required keyword in all a document

2008-10-06 Thread Jason Rennie
Sounds like the DisMax handler would work well for you. I'm no expert, but I'm fairly certain that if you created a solr.DisMaxRequestHandler handler with qf containing those three fields, you could issue the query +france +flag +french and get the desired results. Jason On Mon, Oct 6, 2008

spellcheck: issues

2008-10-06 Thread Jason Rennie
chanel always top chant and chang since they all have the same edit distance yet chanel is two orders of mangnitude more popular? Is there anything I could be doing wrong to create these problems? If not, are these known issues? If not, should I create jira's for them? Thanks, Jason

Re: spellcheck: issues

2008-10-06 Thread Jason Rennie
due to the higher frequency? - query is yello. 53 document hits. No suggestions. yellow yields 36560 document. Does the spellchecker only run when there are no document hits? Btw, is there a better place to be posting comments/questions like this? Jason On Mon, Oct 6, 2008 at 4:08 PM

Re: using spellcheckcomponent via solrj

2008-10-03 Thread Jason Rennie
be going on here? Thanks, Jason On Wed, Sep 24, 2008 at 4:22 PM, Jason Rennie [EMAIL PROTECTED] wrote: On Wed, Sep 24, 2008 at 4:07 PM, Grant Ingersoll [EMAIL PROTECTED]wrote: Just mimic the configuration for the spellCheckCompRH in the handler that you use for querying. Sounds even better

Re: How to tokenize/analyze docs for the spellchecker - at indexing and query time

2008-10-03 Thread Jason Rennie
weirdness earlier (no inserts/deletes), but that disappeared now that I'm using the StandardTokenizer. Cheers, Jason

Re: spellcheck: buildOnOptimize?

2008-09-30 Thread Jason Rennie
On Fri, Sep 26, 2008 at 9:33 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Jason, can you please open a jira issue to add this feature? Done. https://issues.apache.org/jira/browse/SOLR-795 Jason

Re: Anyproblem in running two solr instances on the same machine using the same directory ?

2008-09-27 Thread Jason Rutherglen
The question I have is what is the optimal approach for integrating realtime into SOLR? What classes should be extended or created? On Sat, Sep 27, 2008 at 9:40 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Solr today is not suited for real-time search (seeing newly added docs in search

spellcheck: buildOnOptimize?

2008-09-25 Thread Jason Rennie
like a more appropriate schedule for rebuilding the spelling index. Is there or could there be an option to rebuild the spelling index on optimize? Thanks, Jason

using spellcheckcomponent via solrj

2008-09-24 Thread Jason Rennie
/spellCheckCompRH. I found a post from a month ago indicating that it can be done. Can someone fill me in on what I'm missing? I see that solrj has support for the spellcheck response, so it looks like dealing with the response will be easy once I get the query right. Jason P.S. Thanks to everyone who had

Re: using spellcheckcomponent via solrj

2008-09-24 Thread Jason Rennie
/lst arr name=last-components strspellcheck/str /arr Thanks, Jason

Re: Some new SOLR features

2008-09-19 Thread Jason Rutherglen
PROTECTED] wrote: why to restart solr ? reloading a core may be sufficient. SOLR-561 already supports this - On Thu, Sep 18, 2008 at 5:17 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: Servlets is one thing. For SOLR the situation is different. There are always small changes people want

Re: Some new SOLR features

2008-09-18 Thread Jason Rutherglen
, and so either a dynamic schema, or no schema at all is best. Otherwise the documents would need to have a schemaVersion field, this gets messy I looked at this. Jason On Wed, Sep 17, 2008 at 5:10 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Wed, Sep 17, 2008 at 4:50 PM, Henrib [EMAIL PROTECTED] wrote

Re: Some new SOLR features

2008-09-18 Thread Jason Rutherglen
at 1:27 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: If the configuration code is going to be rewritten then I would like to see the ability to dynamically update the configuration and schema without needing to reboot the server. Exactly. Actually, multi-core allows you to instantiate

Re: Some new SOLR features

2008-09-18 Thread Jason Rutherglen
with the rsync based batch replication. On Wed, Sep 17, 2008 at 2:21 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: If the configuration code is going to be rewritten then I would like to see the ability to dynamically update

Re: Some new SOLR features

2008-09-18 Thread Jason Rutherglen
off in production in servelet containers imo as well. This can really be such a pain in the ass on a live site...someone touches web.xml and the app server reboots*shudder*. Seen it, don't dig it. Jason Rutherglen wrote: This should be done. Great idea. On Wed, Sep 17, 2008 at 3:41 PM

Re: Some new SOLR features

2008-09-18 Thread Jason Rutherglen
:56 AM, Mark Miller [EMAIL PROTECTED] wrote: Dynamic changes are not what I'm against...I'm against dynamic changes that are triggered by the app noticing that the config have changed. Jason Rutherglen wrote: Servlets is one thing. For SOLR the situation is different. There are always small

Re: Some new SOLR features

2008-09-17 Thread Jason Rutherglen
If the configuration code is going to be rewritten then I would like to see the ability to dynamically update the configuration and schema without needing to reboot the server. Also I would like the configuration classes to just contain data and not have so many methods that operate on the

Re: Some new SOLR features

2008-09-17 Thread Jason Rutherglen
16, 2008 at 10:12 AM, Jason Rutherglen [EMAIL PROTECTED] wrote: SQL database such as H2 Mainly to offer joins and be able to perform hierarchical queries. Can you define or give an example of what you mean by hierarchical queries? A downside of any type of cross-document queries (like joins

Re: Some new SOLR features

2008-09-16 Thread Jason Rutherglen
integrate it, it should be the easiest on the list. Jason On Mon, Sep 15, 2008 at 11:44 AM, Ryan McKinley [EMAIL PROTECTED] wrote: Here are my gut reactions to this list... in general, most of this comes down to sounds great, if someone did the work I'm all for it! Also, no need to post

Some new SOLR features

2008-09-15 Thread Jason Rutherglen
and a new SOLR war file to individual servers. Cheers, Jason

Re: What's the bottleneck?

2008-09-12 Thread Jason Rennie
parsing. Not sure what you mean here... See also https://issues.apache.org/jira/browse/LUCENE-494 for related stuff. Thanks for the pointer. Jason

Re: Question on how index works - runs out of disk space!

2008-09-11 Thread Jason Rennie
are easy to make via the solrj client we use. Though, for one of our indexes, we perform all of the updates offline and run an optimize before putting the index into production. Hope this helps. Cheers, Jason -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder http

What's the bottleneck?

2008-09-11 Thread Jason Rennie
, is there anything we could do to easily trim-down computation time (besides removing common words from the query)? Jason -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder http://www.stylefeeder.com/ Samantha's blog pictures: http://samanthalyrarennie.blogspot.com/

Re: What's the bottleneck?

2008-09-11 Thread Jason Rennie
On Thu, Sep 11, 2008 at 11:54 AM, Mark Miller [EMAIL PROTECTED] wrote: What kind of traffic are you getting when it takes seconds? 1 request? 12? I'd estimate concurrency around 3, though the speed doesn't change much when we run the same query on a server with zero traffic. Jason

Re: What's the bottleneck?

2008-09-11 Thread Jason Rennie
? Nope. Are you using multi-value filed for filter ... No, it does not have the multiValue attribute turned on. The qf field is just an integer. Any thoughts/comments are appreciated. Thanks, Jason

Re: Index partioning

2008-09-10 Thread Jason Rennie
minimizes maintenance. #1 or #2 seems like a better choice if it is likely you will eventually need to physically separate the indexes. Jason On Mon, Sep 8, 2008 at 6:15 PM, Chris Hostetter [EMAIL PROTECTED]wrote: : I found this thread in the archive... : : I'm responsible for a number of ruby

Re: Question on how index works - runs out of disk space!

2008-09-10 Thread Jason Rennie
Have you tried performing an optimize? Solr doesn't seem to fully integrate all updates into a single index until an optimize is performed. Jason On Wed, Sep 10, 2008 at 1:05 PM, sundar shankar [EMAIL PROTECTED]wrote: Hi All, We have a cluster of 4 servers for the application

Re: Realtime Search for Social Networks Collaboration

2008-09-03 Thread Jason Rutherglen
interested in realtime search to get involved as it may be something that is difficult for one company to have enough resources to implement to a production level. I think this is where open source collaboration is particularly useful. Cheers, Jason Rutherglen [EMAIL PROTECTED] On Wed, Sep 3, 2008 at 4

NumberUtils double encoding question

2008-08-28 Thread Jason Rutherglen
to convert a double into a long first and I am not an expert at bit level math. Cheers, Jason

Re: Less aggressive stemmer?

2008-08-22 Thread Jason Rennie
Kevin Guillaume, Many thanks for the pointers. It sounds like one of these two solutions will fit our needs. Cheers, Jason On Thu, Aug 21, 2008 at 5:33 PM, Guillaume Smet [EMAIL PROTECTED]wrote: On Thu, Aug 21, 2008 at 11:23 PM, Jason Rennie [EMAIL PROTECTED] wrote: Is there an option

Re: How to boost the score higher in case user query matches entire field value than just some words within a field

2008-08-21 Thread Jason Rennie
Count me as interested. Our documents are product descriptions, many fields of which are very short. Not sure if it would make large enough of an impact to warrant us rolling our own solr build, but I'm definitely interested to see the custom Similarity class. Thanks, Jason On Thu, Aug 21

Less aggressive stemmer?

2008-08-21 Thread Jason Rennie
basic, possibly as simple as removing plural endings. Our index is over product descriptions, so it's important that we stem normal variations in nouns, but adverbs, verbs and possibly adjective variations are not so important and sometimes cause problems for us. Jason

Re: Administrative questions

2008-08-14 Thread Jason Rennie
. For anyone unfamiliar w/ daemontools, here's DJB's explanation of why they rock compared to inittab, ttys, init.d, and rc.local: http://cr.yp.to/daemontools/faq/create.html#why Jason

Re: Administrative questions

2008-08-13 Thread Jason Rennie
for a production environment. A bit tricky to set, but solid once you have it in place. http://cr.yp.to/daemontools.html Jason -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder http://www.stylefeeder.com/ Samantha's blog pictures: http://samanthalyrarennie.blogspot.com/

Re: concurrent optimize and update

2008-08-12 Thread Jason Rennie
cause a large pile of threads to accumulate if we're not careful... Jason

dismax bq

2008-08-05 Thread Jason Rennie
=bags^-1.0 ? Thanks, Jason

Re: diversity in results

2008-08-04 Thread Jason Rennie
Thanks for the pointers. Looks interesting, at least as a starting point for something more sophisticated. Cheers, Jason On Mon, Aug 4, 2008 at 4:38 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: See https://issues.apache.org/jira/browse/SOLR-236 and http://wiki.apache.org/solr

Re: diversity in results

2008-08-04 Thread Jason Rennie
lucene not easily return term counts for a document with the standard indexing b/c it's term-based (i.e. inverted). Does TermVectors=true cause solr/lucene to store an additional doc-based index? Thanks, Jason On Mon, Aug 4, 2008 at 5:06 PM, Brian Whitman [EMAIL PROTECTED]wrote: not out of the box

pf nixes fl

2008-07-22 Thread Jason Rennie
Just tried adding a pf field to my request handler. When I did this, solr returned all document fields for each doc (no score) instead of returning the fields specified in fl. Bug? Feature? Anyone know what the reason for this behavior is? I'm using solr 1.2. Thanks, Jason

Re: pf nixes fl

2008-07-22 Thread Jason Rennie
returns all document fields, no score field. Jason On Tue, Jul 22, 2008 at 2:55 PM, Mike Klaas [EMAIL PROTECTED] wrote: On 22-Jul-08, at 11:53 AM, Jason Rennie wrote: Just tried adding a pf field to my request handler. When I did this, solr returned all document fields for each doc

Re: pf nixes fl

2008-07-22 Thread Jason Rennie
Doh! I mistakenly changed the request handler from dismax to standard. Ignore me... Jason On Tue, Jul 22, 2008 at 2:59 PM, Jason Rennie [EMAIL PROTECTED] wrote: I'm using solrj and all I did was add a pf entry to solrconfig.xml. I don't think it could be an ampersand issue... Here's

Re: Internal Server Error and waitSearcher=false for commit/optimize

2007-10-11 Thread Jason Rennie
thread, so this option would not affect operations. In case you're curious, we use solr as the search engine for www.stylefeeder.com. It has served us very well so far, handling over 3000 queries/day. Thanks, Jason -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder http

Internal Server Error and waitSearcher=false for commit/optimize

2007-10-10 Thread Jason Rennie
is separate from thread(s) making queries. Is waitSearcher=false designed to allow queries to be processed while a commit/optimize is being run? Are there any negative side effects to this setting (other than a query being slightly out-of-date :)? Thanks, Jason

RE: solr and Oracle 10g App Server

2007-07-30 Thread Jason P. Weiss
I had some trouble getting the current production build (1.2.0) working on 10gR3 (10.1.3.0.0). I had to remove 3 bad characters off of the front of the web.xml file and re-jar the WAR file. It worked perfectly after that minor modification. Jason -Original Message- From: Chris

<    2   3   4   5   6   7   8   >