Re: GIT does not support empty directories

2010-04-16 Thread Walter Underwood
>>> >>>> >>> >> https://git.wiki.kernel.org/index.php/GitFaq#Can_I_add_empty_directories.3F >>>> >>>> And so, when you check out from the Apache GIT repository, these empty >>>> directories do not appear and 'ant example' and 'ant run-example' >>>> fail. There is no 'how to use the solr git stuff' wiki page; that >>>> seems like the right place to document this. I'm not git-smart enough >>>> to write that page. >>>> -- >>>> Lance Norskog >>>> goks...@gmail.com >>>> >>> >>> >>> >>> -- >>> Robert Muir >>> rcm...@gmail.com >>> >> >> >> >> -- >> Robert Muir >> rcm...@gmail.com >> -- Walter Underwood Venture ASM, Troop 14, Palo Alto

[jira] Commented: (SOLR-534) Return all query results with parameter rows=-1

2010-02-10 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832351#action_12832351 ] Walter Underwood commented on SOLR-534: --- -1 This adds a denial of ser

Re: Namespaces in response (SOLR-1586)

2009-12-09 Thread Walter Underwood
On Dec 9, 2009, at 11:11 AM, Mattmann, Chris A (388J) wrote: >> >> Any parser that does that is so broken that you should stop using it >> immediately. --wunder > > Walter, totally agree here. To elaborate my position: 1. Validation is a user option. The XML spec makes that very clear. We've h

Re: Namespaces in response (SOLR-1586)

2009-12-09 Thread Walter Underwood
Any parser that does that is so broken that you should stop using it immediately. --wunder On Dec 9, 2009, at 8:33 AM, Yonik Seeley wrote: > My gut feeling is that we should not be introducing namespaces by default. > It introduces a new requirement of XML parsers in clients, and some > parsers

Re: Functions, floats and doubles

2009-11-13 Thread Walter Underwood
Float is often OK until you try and use it for further calculation. Maybe it is good enough for printing out distance, but maybe not for further use. wunder On Nov 13, 2009, at 10:32 AM, Yonik Seeley wrote: > On Fri, Nov 13, 2009 at 1:01 PM, Walter Underwood > wrote: >> Float is

Re: Functions, floats and doubles

2009-11-13 Thread Walter Underwood
Float is almost never good enough. The loss of precision is horrific. wunder On Nov 13, 2009, at 9:58 AM, Yonik Seeley wrote: > On Fri, Nov 13, 2009 at 12:52 PM, Grant Ingersoll wrote: >> Implementing my first function (distance stuff) and notices that functions >> seem to have a float bent to

Re: Another RC

2009-10-19 Thread Walter Underwood
Please wait for an official release of Lucene. It makes thing SO much easier when you need to dig into the Lucene code. It is well worth a week delay. wunder On Oct 19, 2009, at 10:27 AM, Yonik Seeley wrote: On Mon, Oct 19, 2009 at 10:59 AM, Grant Ingersoll wrote: Are we ready for a rele

Re: 8 for 1.4

2009-09-29 Thread Walter Underwood
It might not be proper to use the name "Solr", because it is really "Apache Solr". At a minimum, it is misleading to use an Apache project name on GPL'ed code. I agree that changing to GPL is a bad idea. I've worked at eight or nine companies since the GPL was created, and GPL'ed code was

Re: [PMX:FAKE_SENDER] Re: large OR-boolean query

2009-09-25 Thread Walter Underwood
This would work a lot better if you did the join at index time. For each paper, add a field with all the related drug names (or whatever you want to search for), then search on that field. With the current design, it will never be fast and never scale. Each lookup has a cost, so expanding a

Re: Solr Slow in Unix

2009-07-16 Thread Walter Underwood
In particular, are you using local disc or network storage? --wunder On 7/16/09 8:24 AM, "Yonik Seeley" wrote: > On Thu, Jul 16, 2009 at 4:18 AM, Anand Kumar > Prabhakar wrote: >> I'm running a Solr instance in Apache Tomcat 6 in a Solaris Box. The QTimes >> are high when compared to the same co

Re: lucene releases vs trunk

2009-06-25 Thread Walter Underwood
This is an excellent idea. When I find a problem and want to research the Lucene bugs that might describe it, that is really hard with a trunk build. It's easy with a release build. wunder On 6/25/09 4:18 AM, "Yonik Seeley" wrote: > For the next release cycle (presumably 1.5?) I think we shoul

[jira] Commented: (SOLR-1216) disambiguate the replication command names

2009-06-15 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719625#action_12719625 ] Walter Underwood commented on SOLR-1216: If we choose a name for the thing we

[jira] Commented: (SOLR-1216) disambiguate the replication command names

2009-06-15 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719609#action_12719609 ] Walter Underwood commented on SOLR-1216: "sync" is a weak name, beca

Re: Streaming Docs, Terms, TermVectors

2009-05-30 Thread Walter Underwood
Don't stream, request chunks of 10 or 100 at a time. It works fine and you don't have to write or test any new code. In addition, it works well with HTTP caches, so if two clients want to get the same data, the second can get it from the cache. We do that at Netflix. Each front-end box does a seri

[jira] Commented: (SOLR-1073) StrField should allow locale sensitive sorting

2009-04-28 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703893#action_12703893 ] Walter Underwood commented on SOLR-1073: Using the locale of the JVM is very,

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-03 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678601#action_12678601 ] Walter Underwood commented on SOLR-1044: During the Oscars, the HTTP cache in f

Re: Is there a built in keyword report (Tag Cloud) feature on Solr ?

2009-02-26 Thread Walter Underwood
If you want a tag cloud based on query freqency, start with your HTTP log analysis tools. Most of those generate a list of top queries and top words in queries. wunder On 2/26/09 2:54 PM, "Chris Hostetter" wrote: > > : I may have not made myself clear. When I say keyword report, I mean a kind

Re: Is there a built in keyword report (Tag Cloud) feature on Solr ?

2009-02-26 Thread Walter Underwood
Oops, missed that you wanted it by facet. Never mind. --wunder On 2/26/09 9:57 AM, "Walter Underwood" wrote: > That info is already available via Luke, right? --wunder > > On 2/26/09 9:55 AM, "Robert Douglass" wrote: > >> A solution that I'd consi

Re: Is there a built in keyword report (Tag Cloud) feature on Solr ?

2009-02-26 Thread Walter Underwood
That info is already available via Luke, right? --wunder On 2/26/09 9:55 AM, "Robert Douglass" wrote: > A solution that I'd considering implementing for Drupal's ApacheSolr > module is to do a *:* search and then make tag clouds from all of the > facets. Pretty easy to sort all the facet terms i

Re: [jira] Issue Comment Edited: (SOLR-844) A SolrServer impl to front-end multiple urls

2009-01-22 Thread Walter Underwood
This would be useful if there was search-specific balancing, like always send the same query back to the same server. That can make your cache far more effective. wunder On 1/22/09 1:13 PM, "Otis Gospodnetic (JIRA)" wrote: > > [ > https://issues.apache.org/jira/browse/SOLR-844?page=com.at

[jira] Commented: (SOLR-822) CharFilter - normalize characters before tokenizer

2008-10-23 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642188#action_12642188 ] Walter Underwood commented on SOLR-822: --- Yes, it should be in Lucene. LIke this:

[jira] Commented: (SOLR-815) Add new Japanese half-width/full-width normalizaton Filter and Factory

2008-10-20 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641071#action_12641071 ] Walter Underwood commented on SOLR-815: --- I looked it up, and even found a reason t

[jira] Commented: (SOLR-815) Add new Japanese half-width/full-width normalizaton Filter and Factory

2008-10-17 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640609#action_12640609 ] Walter Underwood commented on SOLR-815: --- If I remember correctly, Latin charac

[jira] Commented: (SOLR-814) Add new Japanese Hiragana Filter and Factory

2008-10-17 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640605#action_12640605 ] Walter Underwood commented on SOLR-814: --- This seems like a bad idea. Hirigana

Re: Offer to submit some custom enhancements

2008-10-16 Thread Walter Underwood
Python marshal format supports everything we need and is easy to implement in Java. It is roughly equivalent to JSON, but binary. http://docs.python.org/library/marshal.html wunder On 10/16/08 8:16 AM, "Shalin Shekhar Mangar" <[EMAIL PROTECTED]> wrote: > Hi Todd, > > AFAIK, protocol buffers ca

[jira] Commented: (SOLR-777) backword match search, for domain search etc.

2008-09-18 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632489#action_12632489 ] Walter Underwood commented on SOLR-777: --- You don't need backwards matching

Re: replace stax API with Geronimo-stax+Woodstox

2008-09-09 Thread Walter Underwood
We've been using woodstox in production for over a year. No problems. wunder On 9/9/08 8:07 AM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > FYI, I'm testing Solr with woodstox now and will probably do some ad > hoc stress testing too. > But woodstox is a quality parser. I expect less problems t

Re: Solr changes date format?

2008-08-12 Thread Walter Underwood
On 8/12/08 11:42 AM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > : by a point but, as you can see, the separator is converted to a comma when > : is accesed > : from Solr (i can see this too from Solr web admin) > > this boggles my mind ... i can't think of *anything* in Solr that would do > t

Re: [VOTE] Set Solr 1.3 freeze and release date

2008-08-06 Thread Walter Underwood
I would strongly prefer a released version of Lucene. We made some changes to Solr 1.1 that required tweaks inside of Lucene, and it was quite a treasure hunt to a suitable set of Lucene source. It just seems wrong for Solr to release a version of Lucene. wunder On 8/6/08 8:53 AM, "Chris Hostet

Re: Solr Logo thought

2008-08-01 Thread Walter Underwood
I kind of like the flaming version at http://www.solrmarc.org/ Not very fired up about the other choices. wunder On 8/1/08 9:45 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: > Hola, > > Yes, logo, trivial issue (hi Lance). But logos are important, so: > > I've cast my vote, but I don't re

[jira] Commented: (SOLR-600) XML parser stops working under heavy load

2008-06-17 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605751#action_12605751 ] Walter Underwood commented on SOLR-600: --- It could also be a concurrency bug in

Re: IDF in Distributed Search

2008-04-11 Thread Walter Underwood
Global IDF does not require another request/response. It is nearly free if you return the right info. Return the total number of docs and the df in the original response. Sum the doc counts and dfs, recompute the idf, and re-rank. See this post for an efficient way to do it: http://wunderwood

[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches

2008-02-08 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567068#action_12567068 ] Walter Underwood commented on SOLR-127: --- Two reasons to do HTTP caching for

Re: remote solrj using xml versus json

2007-11-09 Thread Walter Underwood
If you want speed, you should use Python marshal format. It handles data structures equivalent to JSON, but in binary. Very easy to convert to Java data types. --wunder On 11/9/07 12:56 PM, "Erik Hatcher" <[EMAIL PROTECTED]> wrote: > anybody compared/contrasted the two? seems like yonik's noggi

Re: default text type and stop words

2007-11-05 Thread Walter Underwood
I also said, "Stopword removal is a reasonable default because it works fairly well for a general text corpus." Ultraseek keeps stopwords but most engines don't. I think it is fine as a default. I also think you have to understand stopwords at some point. wunder On 11/5/07 9:59 PM, "Chris Hostett

Re: default text type and stop words

2007-11-05 Thread Walter Underwood
This isn't a problem in Lucene or Solr. It is a result of the analyzers you have chosen to use. If you choose to remove stopwords, you will not be able to match stopwords. Stopword removal has benefits (smaller index, faster searches) and drawbacks (missed matches, wrong matches). Solr and Lucene

Re: default text type and stop words

2007-11-02 Thread Walter Underwood
Stopwords are fairly common in movie titles. There are even titles made entirely of stopwords. The first one I noticed was "Being There". I posted more of them here: http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html wunder == Search Guy Netflix On 11/2/07 3:53 PM, "Sundlin

Re: HTTP or RMI, Jini, JavaSpaces for distributed search

2007-09-21 Thread Walter Underwood
Please don't switch to RMI. We've spent the past year converting our entire middle tier from RMI to HTTP. We are so glad that we no longer have any RMI servers. The big advantage of HTTP is that there are hundreds, maybe thousands, of engineers working on making it fast, on tools for it, on caches

[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches

2007-09-14 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527694 ] Walter Underwood commented on SOLR-127: --- Last-modified does require monotonic time, but ETags are version stamps

Re: what goes in CHANGES.txt

2007-07-09 Thread Walter Underwood
> : If we added a more obscure method that didn't exist before (like > : getFirstMatch()), that wouldn't need to be added (it's noise to most > : users, doesn't change existing functionality, not accessible w/o > : writing Java code, and an advanced user can pull up the javadoc). It sure is handy

[jira] Commented: (SOLR-277) Character Entity of XHTML is not supported with XmlUpdateRequestHandler .

2007-06-26 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508408 ] Walter Underwood commented on SOLR-277: --- This is not a bug. Solr accepts XML, not XHTML. It does not accept

[jira] Commented: (SOLR-216) Improvements to solr.py

2007-05-29 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499923 ] Walter Underwood commented on SOLR-216: --- GET is the right semantic for a query, since it doesn't chang

Re: svn commit: r541391 - in /lucene/solr/trunk: CHANGES.txt example/solr/conf/xslt/example_atom.xsl example/solr/conf/xslt/example_rss.xsl

2007-05-25 Thread Walter Underwood
On 5/25/07 10:45 AM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > > : I'd slap versions to those 2 XSL files to immediately answer "which > : version of Atom|RSS does this produce?" > > i'm comfortable calling the example_rss.xsl "RSS", since most RSS > readers will know what do do with it, but

[jira] Commented: (SOLR-208) RSS feed XSL example

2007-05-17 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496624 ] Walter Underwood commented on SOLR-208: --- I wasn't in the RSS wars, either, but I was on the Atom working

[jira] Commented: (SOLR-208) RSS feed XSL example

2007-05-17 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496608 ] Walter Underwood commented on SOLR-208: --- What kind of RSS? -1 unless it is Atom. The nine variants of RSS have

Re: dynamic copyFields

2007-05-02 Thread Walter Underwood
That syntax is from the "ed" editor. I learned it in 1975 on Unix v6/PWB, running on a PDP-11/70. --wunder On 5/2/07 5:04 PM, "Mike Klaas" <[EMAIL PROTECTED]> wrote: > On 5/2/07, Ryan McKinley <[EMAIL PROTECTED]> wrote: > >> How about Mike's other suggestion: >> >> >> this would keep the glo

Re: Progressive Query Relaxation

2007-04-10 Thread Walter Underwood
On 4/10/07 10:38 AM, "J. Delgado" <[EMAIL PROTECTED]> wrote: > I think you have something personal against Oracle... Hey I have no > interest in defending Oracle, but this no hack. It's true, I don't have much respect for Oracle's text search. When I was working on enterprise search, we never rea

Re: Progressive Query Relaxation

2007-04-10 Thread Walter Underwood
On 4/10/07 10:06 AM, "J. Delgado" <[EMAIL PROTECTED]> wrote: > Progressive relaxation, at least as Oracle has defined it, is a > flexible, developer defined series of queries that are efficiently > executed in progression and in one trip to the engine, until minimum > of hits required is satisfied

Re: Progressive Query Relaxation

2007-04-10 Thread Walter Underwood
>From the name, I thought this was an adaptive precision scheme, where the engine automatically tries broader matching if there are no matches or just a few. We talked about doing that with Ultraseek, but it is pretty tricky. Deciding when to adjust it is harder than making it variable. Instead, t

[jira] Commented: (SOLR-161) Dangling dash causes stack trace

2007-02-15 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473628 ] Walter Underwood commented on SOLR-161: --- It is really a Lucene query parser bug, but it wouldn't hurt to

[jira] Commented: (SOLR-161) Dangling dash causes stack trace

2007-02-15 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473625 ] Walter Underwood commented on SOLR-161: --- The parser can have a rule for this rather than exploding. A trailing

[jira] Created: (SOLR-161) Dangling dash causes stack trace

2007-02-15 Thread Walter Underwood (JIRA)
: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel Reporter: Walter Underwood I'm running tests from our search logs, and we have a query that ends in a dash. That caused a stack trace. org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the truth -

Re: AutoCommitTest failing

2007-02-05 Thread Walter Underwood
On 2/5/07 11:18 AM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > > Yes, I think that's it. > SolrCore.close() shuts down the Executor. > From the trace, you can see SolrCore closing, then an attempt to open > up another searcher after that. > > The close of the update handler should probably shut

Re: resin and UTF-8 in URLs

2007-02-02 Thread Walter Underwood
On 2/1/07 6:00 PM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > That may be, but Solr was only publicly available for 9 months before we > had someone running into confusion because they were tyring to post an XML > file that wasn't UTF-8 :) > > http://www.nabble.com/wana-use-CJKAnalyzer-tf

Re: resin and UTF-8 in URLs

2007-02-01 Thread Walter Underwood
On 2/1/07 3:18 PM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > > As for XML, or any other format a user might POST to solr (or ask solr > to fetch from a remote source) what possible reason would we have to only > supporting UTF-8? .. why do you suggest that the XML standard "specify > UTF-8, [s

Re: resin and UTF-8 in URLs

2007-02-01 Thread Walter Underwood
On 2/1/07 2:53 PM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > Solr, in my opinion, shouldn't have the string "UTF-8" hardcoded in it > anywhere -- not even in the example config: new users shouldn't need to > know about have any special solrconfig options that must be (un)set to get > Solr to

Re: resin and UTF-8 in URLs

2007-02-01 Thread Walter Underwood
Let's not make this complicated for situations that we've never seen in practice. Java is a Unicode language and always has been. Anyone running a Java system with a Shift-JIS default should already know the pitfalls, and know them much better than us (and I know a lot about Shift-JIS). The URI sp

Re: loading many documents by ID

2007-02-01 Thread Walter Underwood
On 2/1/07 10:55 AM, "Ryan McKinley" <[EMAIL PROTECTED]> wrote: > > Is there a better word then 'update'? It seems there is already enough > confusion between UpdateHandlers, "Update Plugins", > UpdateRequestHandler etc. Try "modify". Solr uses "update" to include "add". wunder

Re: loading many documents by ID

2007-01-31 Thread Walter Underwood
On 1/31/07 9:05 PM, "Ryan McKinley" <[EMAIL PROTECTED]> wrote: >>> >>> We'd have to make it very clear that this only works if all fields are >>> STORED. >> >> Isn't there some way to do this automatically instead of relying >> on documentation? We might need to add something, maybe a >> "require

Re: loading many documents by ID

2007-01-31 Thread Walter Underwood
On 1/31/07 3:39 PM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > > : Oh, and there have been numerous people interested in "updateable" > : documents, so it would be nice if that part was in the update handler. > > We'd have to make it very clear that this only works if all fields are > STORED.

[jira] Commented: (SOLR-129) Solrb - UTF 8 Support for add/delete

2007-01-31 Thread Walter Underwood (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469072 ] Walter Underwood commented on SOLR-129: --- This is not a bug, unless a bad error message is a bug. It looks like

Re: SOLR Improvement: Expiration

2007-01-30 Thread Walter Underwood
On 1/30/07 5:07 PM, "Fuad Efendi" <[EMAIL PROTECTED]> wrote: > If it is not implemented/reported yet... > I am having problems with deleting of old documents, would be nice to have > default expiration policy! Building in some specific policy would be hard and only useful for people with exactly

Re: Can this be achieved? (Was: document support for file system crawling)

2007-01-19 Thread Walter Underwood
nnected to Solr, but that depends on Autonomy. wunder -- Walter Underwood Search Guru, Netflix

Re: Java version for solr development (was Re: Update Plugins)

2007-01-16 Thread Walter Underwood
1.6 would be a serious problem for us. wunder -- Walter Underwood Search Guru, Netflix

Re: Handling disparate data sources in Solr

2007-01-08 Thread Walter Underwood
specific to each document type. Example: I have RFC-2822 mail messages with "Subject:" and HTML with "". If I store those in Solr as subject and title fields, then each query needs to search both fields. If I put them both in a "document_title" field, then the query can

Re: [jira] Commented: (SOLR-85) [PATCH] Add update form to the admin screen

2006-12-18 Thread Walter Underwood
way > smaller and more focused on the update part. > > https://issues.apache.org/jira/browse/SOLR-86 > > It is a replacement of the post.sh not much more (yet). I'll take a look at this. I also wrote my own, because I had no idea that the Java client code existed. wunder -- Walter Underwood Search Guru, Netflix

Heavily-populated bit sets

2006-12-12 Thread Walter Underwood
-- Walter Underwood Search Guru, Netflix

Re: Finalizing SOLR-58

2006-12-06 Thread Walter Underwood
On 12/6/06 10:54 AM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > P.S. > Wunder - http://www.cafeconleche.org/books/bible3/chapters/ch15.html was > invaluable, thanks. No kidding. It is a complete Yoda session on XSLT. --wunder

[jira] Commented: (SOLR-73) schema.xml and solrconfig.xml use CNET-internal class names

2006-12-05 Thread Walter Underwood (JIRA)
[ http://issues.apache.org/jira/browse/SOLR-73?page=comments#action_12455684 ] Walter Underwood commented on SOLR-73: -- Remember, this bug is only about removing aliased names from the sample files. Note that the users in favor of having a

[jira] Commented: (SOLR-73) schema.xml and solrconfig.xml use CNET-internal class names

2006-11-28 Thread Walter Underwood (JIRA)
[ http://issues.apache.org/jira/browse/SOLR-73?page=comments#action_12454190 ] Walter Underwood commented on SOLR-73: -- The context required to resolve the ambiguity is a wiki page that I didn't know existed. Since I didn't know a

[jira] Commented: (SOLR-73) schema.xml and solrconfig.xml use CNET-internal class names

2006-11-28 Thread Walter Underwood (JIRA)
[ http://issues.apache.org/jira/browse/SOLR-73?page=comments#action_12454159 ] Walter Underwood commented on SOLR-73: -- I think the aliases are harder to read. You need to go elsewhere to figure them out. I read documentation, but I didn&#

[jira] Commented: (SOLR-73) schema.xml and solrconfig.xml use CNET-internal class names

2006-11-28 Thread Walter Underwood (JIRA)
[ http://issues.apache.org/jira/browse/SOLR-73?page=comments#action_12454066 ] Walter Underwood commented on SOLR-73: -- The aliasing requires documentation and using the full class names doesn't. It seems much simpler to me to use the

[jira] Created: (SOLR-73) schema.xml and solrconfig.xml use CNET-internal class names

2006-11-28 Thread Walter Underwood (JIRA)
: search Reporter: Walter Underwood The configuration files in the example directory still use the old CNET-internal class names, like solr.LRUCache instead of org.apache.solr.search.LRUCache. This is confusing to new users and should be fixed before the first release. -- This message

Re: [jira] Commented: (SOLR-58) Change Admin components to return XML like the rest of the system

2006-11-27 Thread Walter Underwood
On 11/27/06 1:52 PM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > > : Hoss, regarding your point 7) about ping - makes sense. I think this is > : what Walter Underwood was talking about in a recent thread, too. So > : what should the ping response look like in

Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-22 Thread Walter Underwood
. Does Lucene access fetch information from disk while we iterate through the search results? If that happens a few times, then streaming might make a difference. If it is mostly CPU-bound, then streaming probably doesn't help. wunder -- Walter Underwood Search Guru, Netflix

Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-21 Thread Walter Underwood
nding the headers. This can be slower than writing the response out as it is computed, but the response codes can be accurate. Also, it allows optimal buffering, so it might scale better. If you really want to handle failure in an error response, write that to a string and if that fails, send a hard-coded string. wunder -- Walter Underwood Search Guru, Netflix

Phonetic Token Filter

2006-11-21 Thread Walter Underwood
ow do we add that to the distro? The code is very simple, but I need to learn the contribution process and build some tests, so this won't happen in one day. wunder -- Walter Underwood Search Guru, Netflix

Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-21 Thread Walter Underwood
On 11/20/06 5:51 PM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > Now that I think about it though, one nice change would be to get rid > of the long stack trace for 400 exceptions... it's not needed, right? That is correct. A client error (400) should not be reported with a server stack trace. --

Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-21 Thread Walter Underwood
iner as much as possible... We don't need to use HTTP response codes deep in Solr, but we do need to separate bad parameters, retryable errors, non-retryable errors, and so on. We can call them what ever we want internally, but we need to report them properly over HTTP. wunder -- Walter Underwood Search Guru, Netflix

Re: Cocoon-2.1.9 vs. SOLR-20 & SOLR-30

2006-11-17 Thread Walter Underwood
mportant to keep the HTTP response codes accurate. Never return an error with a 200. If we want more info, return an entity (body) with the 400 response. wunder -- Walter Underwood Search Guru, Netflix

Re: SOLR-58

2006-11-09 Thread Walter Underwood
remely verbose (and ugly) subset of Prolog. wunder -- Walter Underwood Search Guru, Netflix

Re: Adding Phonetic Search to Solr

2006-11-08 Thread Walter Underwood
grokking DisMax. I need to customize Similarity anyway. wunder -- Walter Underwood Search Guru, Netflix

Re: Adding Phonetic Search to Solr

2006-11-08 Thread Walter Underwood
ne will almost always have a lower idf because multiple words are mapped to one metaphone, so the encoded term occurs in more documents than the surface terms. One neat trick -- if regular terms are lowercased, they will never collide with the metaphones, which are all upper case. wunder -- Walter Underwood Search Guru, Netflix

Re: Adding Phonetic Search to Solr

2006-11-07 Thread Walter Underwood
27;t have any experience with those. wunder -- Walter Underwood Search Guru, Netflix

Re: Adding Phonetic Search to Solr

2006-11-07 Thread Walter Underwood
On 11/7/06 2:30 PM, "Mike Klaas" <[EMAIL PROTECTED]> wrote: > On 11/7/06, Walter Underwood <[EMAIL PROTECTED]> wrote: >> >> 1. Adding fuzzy to the DisMax specs. > > What do you envisage the implementation looking like? Probably continue with the tem

Adding Phonetic Search to Solr

2006-11-07 Thread Walter Underwood
asier to implement. Does that seem right? How do I specify the new token filter factory in the schema file? I don't quite get the mapping from solr.FooFilterFactory to org.apache.solr.analysis.FooFilterFactory. wunder -- Walter Underwood Search Guru, Netflix

Re: [jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Walter Underwood
URL also has HTML versions). http://www.ietf.org/internet-drafts/draft-ietf-atompub-protocol-11.txt http://tools.ietf.org/wg/atompub/draft-ietf-atompub-protocol/ wunder -- Walter Underwood Search Guru, Netflix

Re: [jira] Created: (SOLR-60) Remove overwritePending, overwriteCommitted flags?

2006-11-01 Thread Walter Underwood
+1 as well. --wunder On 11/1/06 11:17 AM, "Mike Klaas" <[EMAIL PROTECTED]> wrote: > +1 > > On 11/1/06, Yonik Seeley (JIRA) <[EMAIL PROTECTED]> wrote: >> Remove overwritePending, overwriteCommitted flags? >> -- >> >> Key: SOLR-60 >

Re: Copying the request parameters to Solr's response

2006-10-24 Thread Walter Underwood
Java client library for Ultraseek (XPA) does keep a local results cache and uses the query plus the query context as a key. wunder -- Walter Underwood Search Guru, Netflix Former Ultraseek Architect

Re: Copying the request parameters to Solr's response

2006-10-24 Thread Walter Underwood
Returning the query parameters is really useful. I'm not sure it needs to be optional, they are small and options multiply the test cases. It can even be useful to return the values of the defaults. All those go into the key for any client side caching, for example. wunder On 10/24/06 1:55 AM,

Re: Solr NightlyBuild

2006-09-20 Thread Walter Underwood
OK with nightly builds means that you need to run your own QA on the whole build every time you change. Kinda expensive. wunder -- Walter Underwood Search Guru, Netflix

Re: double curl calls in post.sh?

2006-09-18 Thread Walter Underwood
On 9/18/06 10:10 AM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > On 9/18/06, Walter Underwood <[EMAIL PROTECTED]> wrote: >> Instead, use a media type of application/xml, so that the server >> is allowed to sniff the content to discover the character encoding.

Re: double curl calls in post.sh?

2006-09-18 Thread Walter Underwood
application/xml, so that the server is allowed to sniff the content to discover the character encoding. For the gory details, see RFC 3023: http://www.ietf.org/rfc/rfc3023.txt wunder == Walter Underwood Search Guru, Netflix On 9/17/06 1:00 PM, "Chris Hostetter" <[EMAIL PROTECTED]> w