Re: loading many documents by ID

2007-02-01 Thread Chris Hostetter
: 1. Set the "updateable" fields explicitly in the schema. : : : * throw an exception at startup if an updateable field is not stored. : If somewhere down the road we figure out how to efficiently handled : unstored fields, we can remove this error. : * when 'updating', only copy the fields mark

Re: [jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-02-01 Thread Mike Klaas
Thanks! That's what I get for not running the test suite One Last Time before committing. -Mike On 2/1/07, Ryan McKinley (JIRA) <[EMAIL PROTECTED]> wrote: [ https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469663

[jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-02-01 Thread Ryan McKinley (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469663 ] Ryan McKinley commented on SOLR-126: It looks like the test solrconfi.xml got commited with some mangled xml comm

[jira] Resolved: (SOLR-134) Handle Time values in fields correctly (UTC + ISO 8601)

2007-02-01 Thread Erik Hatcher (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher resolved SOLR-134. --- Resolution: Fixed Assignee: Erik Hatcher thank goodness not everyone lives on the west coast, it

[jira] Updated: (SOLR-134) Handle Time values in fields correctly (UTC + ISO 8601)

2007-02-01 Thread Coda Hale (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Coda Hale updated SOLR-134: --- Attachment: solrb_xml_time_with_better_tests.diff What, you mean not everyone is on the West Coast? > Handle T

[jira] Resolved: (SOLR-132) i18n solrb test patch

2007-02-01 Thread Erik Hatcher (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher resolved SOLR-132. --- Resolution: Fixed Assignee: Erik Hatcher Thanks Antonio.. applied! Just a minor JIRA thing...

[jira] Commented: (SOLR-134) Handle Time values in fields correctly (UTC + ISO 8601)

2007-02-01 Thread Erik Hatcher (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469644 ] Erik Hatcher commented on SOLR-134: --- Do also note that Solr can tackle date math in its own configuration: http://w

[jira] Commented: (SOLR-134) Handle Time values in fields correctly (UTC + ISO 8601)

2007-02-01 Thread Erik Hatcher (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469643 ] Erik Hatcher commented on SOLR-134: --- 1) Failure: test_xml_date(FieldTest) [./test/unit/field_test.rb:27]: <"1999-12

Re: fieldtype -> fieldType

2007-02-01 Thread Erik Hatcher
On Feb 1, 2007, at 4:32 PM, Yonik Seeley wrote: On 2/1/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: A #code4libber made this comment a moment ago: "(nitpick) when you finalize the DTD for schema.xml, could you make fieldtype camel-case like the others. Or make the others lower- case." I n

Re: resin and UTF-8 in URLs

2007-02-01 Thread Chris Hostetter
: The XML spec says that XML parsers are only required to support : UTF-8, UTF-16, ISO 8859-1, and US-ASCII. If you use a different : encoding for XML, there is no guarantee that a conforming parser : will accept it. there may not be a garuntee -- but shouldn't we at least try to respect the clie

addConditionally / allowDups / overwriteBoth

2007-02-01 Thread Ryan McKinley
I'm having a bit of trouble deciphering the use case for these three 'add' options. here is my best guss, can you correct me if i'm wrong * addConditionally() Only add the document if we have not already tried to add it and have not committed yet * allowDups() add the document regardless of wha

Re: resin and UTF-8 in URLs

2007-02-01 Thread Walter Underwood
On 2/1/07 3:18 PM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > > As for XML, or any other format a user might POST to solr (or ask solr > to fetch from a remote source) what possible reason would we have to only > supporting UTF-8? .. why do you suggest that the XML standard "specify > UTF-8, [s

[jira] Updated: (SOLR-134) Handle Time values in fields correctly (UTC + ISO 8601)

2007-02-01 Thread Coda Hale (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Coda Hale updated SOLR-134: --- Attachment: solrb_xml_time.diff Includes unit test. > Handle Time values in fields correctly (UTC + ISO 8601)

[jira] Created: (SOLR-134) Handle Time values in fields correctly (UTC + ISO 8601)

2007-02-01 Thread Coda Hale (JIRA)
Handle Time values in fields correctly (UTC + ISO 8601) --- Key: SOLR-134 URL: https://issues.apache.org/jira/browse/SOLR-134 Project: Solr Issue Type: Improvement Components: cli

Re: resin and UTF-8 in URLs

2007-02-01 Thread Chris Hostetter
: > Solr, in my opinion, shouldn't have the string "UTF-8" hardcoded in it : > anywhere -- not even in the example config: new users shouldn't need to : > know about have any special solrconfig options that must be (un)set to get : > Solr to use their servlet container / system default charset. :

Re: resin and UTF-8 in URLs

2007-02-01 Thread Walter Underwood
On 2/1/07 2:53 PM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > Solr, in my opinion, shouldn't have the string "UTF-8" hardcoded in it > anywhere -- not even in the example config: new users shouldn't need to > know about have any special solrconfig options that must be (un)set to get > Solr to

Re: resin and UTF-8 in URLs

2007-02-01 Thread Chris Hostetter
: I am only suggesting it for GET requests where the params are pulled : off the query string. Apparently, UTF-8 is the *only* ok URL encoding : : http://www.w3.org/International/O-URL-code.html : : It is strange, that resin and tomcat don't observe this unless it is : specified as the default enc

Re: resin and UTF-8 in URLs

2007-02-01 Thread Chris Hostetter
: Let's not make this complicated for situations that we've never : seen in practice. Java is a Unicode language and always has been. : Anyone running a Java system with a Shift-JIS default should already : know the pitfalls, and know them much better than us (and I know a : lot about Shift-JIS).

[jira] Commented: (SOLR-133) change XmlUpdateRequestHandler to use StAX instead of XPP

2007-02-01 Thread J.J. Larrea (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469603 ] J.J. Larrea commented on SOLR-133: -- It would be useful if there first were some consensus as to what the goals are fo

[jira] Commented: (SOLR-61) move XML update parsing out of SolrCore

2007-02-01 Thread Thorsten Scherler (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469599 ] Thorsten Scherler commented on SOLR-61: --- agree. Thanks Hoss. > move XML update parsing out of SolrCore > ---

Re: resin and UTF-8 in URLs

2007-02-01 Thread Ryan McKinley
: If we can do something small that makes the most normal cases work : even if the container is not configured, that seems good. but how do we know the user wants what we consider a "normal cases" to work? ... if every servlet container lets you configure your default charset differently, we hav

Re: resin and UTF-8 in URLs

2007-02-01 Thread Walter Underwood
Let's not make this complicated for situations that we've never seen in practice. Java is a Unicode language and always has been. Anyone running a Java system with a Shift-JIS default should already know the pitfalls, and know them much better than us (and I know a lot about Shift-JIS). The URI sp

Re: resin and UTF-8 in URLs

2007-02-01 Thread Chris Hostetter
: If we can do something small that makes the most normal cases work : even if the container is not configured, that seems good. but how do we know the user wants what we consider a "normal cases" to work? ... if every servlet container lets you configure your default charset differently, we have

Re: resin and UTF-8 in URLs

2007-02-01 Thread Chris Hostetter
: > : > request.setCharacterEncoding ("utf-8") : > ...my reading of the servlet spec was that request.setCharacterEncoding : > only impacted request *body* data, not the URL. : > According to the javadocs for it, using it also means that if the client : > is well behaved and *does* set a charse

Re: fieldtype -> fieldType

2007-02-01 Thread Yonik Seeley
On 2/1/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: A #code4libber made this comment a moment ago: "(nitpick) when you finalize the DTD for schema.xml, could you make fieldtype camel-case like the others. Or make the others lower- case." I nixed the thought of a DTD, but it does look funny

Re: loading many documents by ID

2007-02-01 Thread Ryan McKinley
Not sure... depends on how update handlers will use it... by update handler, you mean UpdateRequestHandler(s)? or UpdateHandler? One thing we might not want to get rid of though is streaming (constructing and adding a document, then discarding it). People are starting to add a lot of documen

fieldtype -> fieldType

2007-02-01 Thread Erik Hatcher
A #code4libber made this comment a moment ago: "(nitpick) when you finalize the DTD for schema.xml, could you make fieldtype camel-case like the others. Or make the others lower- case." I nixed the thought of a DTD, but it does look funny now that I look at it. Perhaps we can modify i

Re: loading many documents by ID

2007-02-01 Thread Yonik Seeley
On 2/1/07, Ryan McKinley <[EMAIL PROTECTED]> wrote: I am (was?) using DISTINCT to say, only add the unique fields. As implemented, it keeps a Collection for each field name. If the 'mode' is 'DISTINCT' the collection is Set, otherwise List Ah, OK... that does seem useful. How would you feel

Re: loading many documents by ID

2007-02-01 Thread Ryan McKinley
> REPLACE_DOCUMENT > REPLACE_FIELDS > REPLACE_DISTINCT_FIELDS > ADD_FIELDS > ADD_DISTINCT_FIELDS What does "distinct" mean in this context? I am (was?) using DISTINCT to say, only add the unique fields. As implemented, it keeps a Collection for each field name. If the 'mode' is 'DISTINCT' t

Re: resin and UTF-8 in URLs

2007-02-01 Thread Ryan McKinley
it seems like every servlet container has some way of configuring the default, so we should just rely on that and not add our own default I agree, except that in the world of first time (and even seasoned) web-app/system developers/maintainers, we don't always set things up properly! or even kn

Re: resin and UTF-8 in URLs

2007-02-01 Thread Yonik Seeley
On 2/1/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : > should we add: : > request.setCharacterEncoding ("utf-8") : > to GET requests in StandardRequestParser? : : Perhaps. I wonder if there's any performance impact, and if it fixes : Tomcat's default of latin1 too. see my comments in the r

Re: resin and UTF-8 in URLs

2007-02-01 Thread Chris Hostetter
: > should we add: : > request.setCharacterEncoding ("utf-8") : > to GET requests in StandardRequestParser? : : Perhaps. I wonder if there's any performance impact, and if it fixes : Tomcat's default of latin1 too. see my comments in the related thread about POST... http://www.nabble.com/chars

[jira] Created: (SOLR-133) change XmlUpdateRequestHandler to use StAX instead of XPP

2007-02-01 Thread Hoss Man (JIRA)
change XmlUpdateRequestHandler to use StAX instead of XPP - Key: SOLR-133 URL: https://issues.apache.org/jira/browse/SOLR-133 Project: Solr Issue Type: Improvement Reporter:

[jira] Commented: (SOLR-61) move XML update parsing out of SolrCore

2007-02-01 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469566 ] Hoss Man commented on SOLR-61: -- i opened SOLR-133 to track any work on switching to StAX (that was really an orthogincal s

[jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-02-01 Thread Mike Klaas (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469559 ] Mike Klaas commented on SOLR-126: - Committed in r502328. Thanks! Ryan, the last comment of mine was about the units o

Re: loading many documents by ID

2007-02-01 Thread Yonik Seeley
On 2/1/07, Ryan McKinley <[EMAIL PROTECTED]> wrote: I have something working that adds a 'mode' to AddUpdateCommand. The modes I need are: Feel free to suggest replacements for the UpdateCommand classes if things become cumbersome. REPLACE_DOCUMENT REPLACE_FIELDS REPLACE_DISTINCT_FIELDS ADD_

Re: [jira] Created: (SOLR-131) tutorial update: faceting, highlighting, etc

2007-02-01 Thread Mike Klaas
On 1/31/07, Yonik Seeley (JIRA) <[EMAIL PROTECTED]> wrote: tutorial update: faceting, highlighting, etc Key: SOLR-131 URL: https://issues.apache.org/jira/browse/SOLR-131 Project: Solr Issue Type

Re: loading many documents by ID

2007-02-01 Thread Walter Underwood
On 2/1/07 10:55 AM, "Ryan McKinley" <[EMAIL PROTECTED]> wrote: > > Is there a better word then 'update'? It seems there is already enough > confusion between UpdateHandlers, "Update Plugins", > UpdateRequestHandler etc. Try "modify". Solr uses "update" to include "add". wunder

Re: resin and UTF-8 in URLs

2007-02-01 Thread Yonik Seeley
On 2/1/07, Ryan McKinley <[EMAIL PROTECTED]> wrote: should we add: request.setCharacterEncoding ("utf-8") to GET requests in StandardRequestParser? Perhaps. I wonder if there's any performance impact, and if it fixes Tomcat's default of latin1 too. -Yonik

Re: loading many documents by ID

2007-02-01 Thread Ryan McKinley
What I think I'm seeing is two validation options: 1. Set the "updateable" fields explicitly in the schema. * throw an exception at startup if an updateable field is not stored. If somewhere down the road we figure out how to efficiently handled unstored fields, we can remove this error. * whe

Re: Searching with accents

2007-02-01 Thread Yonik Seeley
On 2/1/07, Manuel Albela Miranda <[EMAIL PROTECTED]> wrote: I've never indexed with solr, so the only way to get what i want is to re-index using Solr with the next lines: positionIncrementGap="100"> > > > > The key is to put the accent filter in your fieldtyp

Re: resin and UTF-8 in URLs

2007-02-01 Thread Ryan McKinley
should we add: request.setCharacterEncoding ("utf-8") to GET requests in StandardRequestParser?

Re: resin and UTF-8 in URLs

2007-02-01 Thread Yonik Seeley
FYI, I talked to Caucho, and for params in the query string of a URI they use the charset of the request (which defaults to latin1). It will likely be fixed in the 3.1 line. They indicated that setting the charset before asking for the parameters would also work: request.setCharacterEncoding ("u

Re: Searching with accents

2007-02-01 Thread Manuel Albela Miranda
Yonik Seeley wrote: On 2/1/07, Manuel Albela Miranda <[EMAIL PROTECTED]> wrote: Yes, i was considering that, but there is a problem. If i remove the accents into the index, when i get the results of a search they will not have those accents so results will no be good enough. Stored fields aren

RE: Searching with accents

2007-02-01 Thread Binkley, Peter
Within Lucene the solution is to index the accented and unaccented versions of the word at the same position (i.e. without incrementing the position counter). Perhaps this could be added as an option to the ISOLatin1AccentFilter? Or perhaps it's already there? Peter -Original Message- Fr

Re: Searching with accents

2007-02-01 Thread Thorsten Scherler
On Thu, 2007-02-01 at 12:20 -0500, Yonik Seeley wrote: > On 2/1/07, Manuel Albela Miranda <[EMAIL PROTECTED]> wrote: > > Yes, i was considering that, but there is a problem. If i remove the > > accents into the index, when i get the results of a search they will not > > have those accents so result

Re: Searching with accents

2007-02-01 Thread Manuel Albela Miranda
Thorsten Scherler wrote: On Thu, 2007-02-01 at 16:35 +0100, Manuel Albela Miranda wrote: Thorsten Scherler wrote: On Thu, 2007-02-01 at 12:37 +0100, Manuel Albela Miranda wrote: Hello everybody, Do you know if there is a way to search with and without accents without dupl

Re: Searching with accents

2007-02-01 Thread Yonik Seeley
On 2/1/07, Manuel Albela Miranda <[EMAIL PROTECTED]> wrote: Yes, i was considering that, but there is a problem. If i remove the accents into the index, when i get the results of a search they will not have those accents so results will no be good enough. Stored fields aren't altered, so you wi

Re: Searching with accents

2007-02-01 Thread Thorsten Scherler
On Thu, 2007-02-01 at 16:35 +0100, Manuel Albela Miranda wrote: > Thorsten Scherler wrote: > > On Thu, 2007-02-01 at 12:37 +0100, Manuel Albela Miranda wrote: > > > >> Hello everybody, > >> > >> Do you know if there is a way to search with and without accents without > >> duplicate a field?.

Re: svn commit: r501512 - in /lucene/solr/trunk: ./ src/java/org/apache/solr/core/ src/java/org/apache/solr/handler/ src/java/org/apache/solr/request/ src/java/org/apache/solr/search/ src/java/org/apa

2007-02-01 Thread Yonik Seeley
On 2/1/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: On Feb 1, 2007, at 1:21 AM, Yonik Seeley wrote: > Bear in mind that json params like json.nl apply to it's subtypes, > ruby and python also. Oh! Now *that* I overlooked. Yes, it probably broke your current facet stuff :-) Interesting bit

[jira] Updated: (SOLR-132) i18n solrb test patch

2007-02-01 Thread Antonio Eggberg (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antonio Eggberg updated SOLR-132: - Attachment: i18n.patch2 Thanks for your comments and fixes :-). New patch attached. I didn't get '

Re: Searching with accents

2007-02-01 Thread Manuel Albela Miranda
Thorsten Scherler wrote: On Thu, 2007-02-01 at 12:37 +0100, Manuel Albela Miranda wrote: Hello everybody, Do you know if there is a way to search with and without accents without duplicate a field?. I have a large index (60Gb) and don't want to have two fields with the same content one

[jira] Commented: (SOLR-61) move XML update parsing out of SolrCore

2007-02-01 Thread Thorsten Scherler (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469450 ] Thorsten Scherler commented on SOLR-61: --- Hi Hoss, I personally would not close this issue, since we have complet

[jira] Commented: (SOLR-132) i18n solrb test patch

2007-02-01 Thread Erik Hatcher (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469428 ] Erik Hatcher commented on SOLR-132: --- Thanks for that patch! Since this is your first patch, I'll comment on it: >

Re: svn commit: r501512 - in /lucene/solr/trunk: ./ src/java/org/apache/solr/core/ src/java/org/apache/solr/handler/ src/java/org/apache/solr/request/ src/java/org/apache/solr/search/ src/java/org/apa

2007-02-01 Thread Erik Hatcher
On Feb 1, 2007, at 1:21 AM, Yonik Seeley wrote: Bear in mind that json params like json.nl apply to it's subtypes, ruby and python also. Oh! Now *that* I overlooked. Interesting bit of trivia. Maybe we should change the parameter to "format.nl" instead, or something besides "json"?

Re: Searching with accents

2007-02-01 Thread Thorsten Scherler
On Thu, 2007-02-01 at 12:37 +0100, Manuel Albela Miranda wrote: > Hello everybody, > > Do you know if there is a way to search with and without accents without > duplicate a field?. > > I have a large index (60Gb) and don't want to have two fields with the > same content one with accents and

Searching with accents

2007-02-01 Thread Manuel Albela Miranda
Hello everybody, Do you know if there is a way to search with and without accents without duplicate a field?. I have a large index (60Gb) and don't want to have two fields with the same content one with accents and the other one without them because this field is the biggest in the index.

[jira] Created: (SOLR-132) i18n solrb test patch

2007-02-01 Thread Antonio Eggberg (JIRA)
i18n solrb test patch - Key: SOLR-132 URL: https://issues.apache.org/jira/browse/SOLR-132 Project: Solr Issue Type: Test Components: clients - ruby - flare Reporter: Antonio Eggberg Prior

[jira] Updated: (SOLR-132) i18n solrb test patch

2007-02-01 Thread Antonio Eggberg (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antonio Eggberg updated SOLR-132: - Attachment: i18n_test.patch Attached i18n unit and functional test patch > i18n solrb test patch >

Re: charset in POST from browser

2007-02-01 Thread Chris Hostetter
: Other things might use POST for querying though. Perhaps they can all : set a charset while doing so. well, i can think of a couple of scenerios... 1) POST multipart/* to either /select or the new style URLs ... the browsers should put a content-type with a charset on each part; the ContentS