[jira] Updated: (LUCENE-1216) CharDelimiterTokenizer

2008-03-12 Thread Hiroaki Kawai (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated LUCENE-1216: -- Attachment: TestCharDelimiterTokenizer.java Add test file (TestCharDelimiterTokenizer.java) fo

[jira] Updated: (LUCENE-1216) CharDelimiterTokenizer

2008-03-12 Thread Hiroaki Kawai (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated LUCENE-1216: -- Attachment: CharDelimiterTokenizer.java Update CharDelimiterTokenizer.java 1. replaced TAB ->

Build failed in Hudson: Lucene-trunk #399

2008-03-12 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/399/changes Changes: [mikemccand] LUCENE-1223: fix lazy field loading to not allow string field to be loaded as binary, nor vice/versa [mikemccand] LUCENE-1214: preseve original exception in SegmentInfos write & commit [mikemccand] LU

[jira] Resolved: (LUCENE-1223) lazy fields don't enforce binary vs string value

2008-03-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1223. Resolution: Fixed > lazy fields don't enforce binary vs string value > ---

Re: [jira] Updated: (LUCENE-1226) IndexWriter.addIndexes(IndexReader[]) fails to create compound files

2008-03-12 Thread Michael McCandless
Woops! Thanks Michael. Mike On Mar 12, 2008, at 5:46 PM, Michael Busch (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1226? page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-1226: -- At

Re: [jira] Updated: (LUCENE-1217) use isBinary cached variable instead of instanceof in Field

2008-03-12 Thread eks dev
Thanks for diff Hoss! I was staring 10min at it but was not able to see any difference. Well, that is the price to pay when you work with us, non-native English speakers :) - Original Message From: Chris Hostetter <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Wednesday, 12 Ma

[jira] Created: (LUCENE-1226) IndexWriter.addIndexes(IndexReader[]) fails to create compound files

2008-03-12 Thread Michael Busch (JIRA)
IndexWriter.addIndexes(IndexReader[]) fails to create compound files Key: LUCENE-1226 URL: https://issues.apache.org/jira/browse/LUCENE-1226 Project: Lucene - Java Issue Ty

[jira] Updated: (LUCENE-1226) IndexWriter.addIndexes(IndexReader[]) fails to create compound files

2008-03-12 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-1226: -- Attachment: lucene-1226.patch > IndexWriter.addIndexes(IndexReader[]) fails to create compound

Re: Going to Java 5. Was: Re: A bit of planning

2008-03-12 Thread Otis Gospodnetic
I agree with Grant and would prefer to see 3.0 to seeing 4.0 (down with inflation!) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Grant Ingersoll <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Monday, March 10, 2008 4:05:54 PM Sub

[jira] Issue Comment Edited: (LUCENE-1216) CharDelimiterTokenizer

2008-03-12 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578023#action_12578023 ] otis edited comment on LUCENE-1216 at 3/12/08 2:37 PM: ---

[jira] Commented: (LUCENE-1216) CharDelimiterTokenizer

2008-03-12 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578023#action_12578023 ] Otis Gospodnetic commented on LUCENE-1216: -- This looks useful. Could you please w

Re: an API for synonym in Lucene-core

2008-03-12 Thread Otis Gospodnetic
Grant, I think Mathieu is hinting at his JIRA contribution (I looked at it briefly the other day, but haven't had the chance to really understand it). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Mathieu Lecarme <[EMAIL PROTECTED]> To: java-

RE: Looking to Index Various Document Types.

2008-03-12 Thread Steven A Rowe
'sup, DD: You should have posted your question, which is about *using* Lucene, to the java-user mailing list; the java-dev mailing list is instead intended for discussion of *development of* Lucene. Here's a Lius tutorial, in both French and English: http://www.doculibre.com/lius/ And here's

Re: [jira] Updated: (LUCENE-1217) use isBinary cached variable instead of instanceof in Field

2008-03-12 Thread Chris Hostetter
: >>fix typo that's been bugging me : : excuse my ignorance, but i do not understand this entry. Typo we need to fix, which one? If you view the change history for the issue, you'll see that comment was attached to change to the summary and description of the bug where "Filed" was fixed to b

Re: [jira] Updated: (LUCENE-1217) use isBinary cached variable instead of instanceof in Field

2008-03-12 Thread eks dev
>>fix typo that's been bugging me excuse my ignorance, but i do not understand this entry. Typo we need to fix, which one? __ Sent from Yahoo! Mail. The World's Favourite Email http://uk.docs.yahoo.com/nowyoucan.html --

Re: an API for synonym in Lucene-core

2008-03-12 Thread Grant Ingersoll
On Mar 12, 2008, at 5:47 AM, Mathieu Lecarme wrote: Why Lucen doesn't have a clean synonym API? Because no one has donated one WordNet contrib is not an answer, it provides an Interface for its own needs, and most of the world don't speak english. Compass provides a tool, just like

Looking to Index Various Document Types.

2008-03-12 Thread DURGA DEEP
HI Folks, I was looking at the Lucene FAQ and I found this very interesting. How can I index OpenOffice.org files? These files (.sxw, .sxc, etc) are ZIP archives that contain XML files. Uncompress the file using Java's ZIP support, then parse meta.xml to get title etc. and content.xml to get the

[jira] Resolved: (LUCENE-1214) Possible hidden exception on SegmentInfos commit

2008-03-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1214. Resolution: Fixed > Possible hidden exception on SegmentInfos commit > ---

[jira] Resolved: (LUCENE-1212) Basic refactoring of DocumentsWriter

2008-03-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1212. Resolution: Fixed > Basic refactoring of DocumentsWriter > ---

unique-id to doc-num

2008-03-12 Thread Jae Kwon
I'd like to have an up-to-date map from unique-ids to lucene internal doc-nums. This will allow me to create a custom filter based on the result of an external process (like mysql). There doesn't seem to be a straightforward efficient way AFAIK. I'll be looking for a way but any help or guidance wo

[jira] Updated: (LUCENE-1217) use isBinary cached variable instead of instanceof in Field

2008-03-12 Thread Doug Cutting (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Cutting updated LUCENE-1217: - Description: Field class can hold three types of values, See: AbstractField.java protected O

TokenFilter question

2008-03-12 Thread Hiroaki Kawai
I was trying to apply both org.apache.solr.analysis.WordDelimiterFilter and org.apache.lucene.analysis.ngram.NGramTokenFilter. Can I achive this with lucene's TokenStream? While thinking about TokenFilters, I came to an idea that the TokenStream should have a structured representation. It is m

[jira] Updated: (LUCENE-1225) NGramTokenizer creates bad TokenStream

2008-03-12 Thread Hiroaki Kawai (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated LUCENE-1225: -- Attachment: NGramTokenizer.patch This patch will fix the issue. > NGramTokenizer creates bad

[jira] Created: (LUCENE-1225) NGramTokenizer creates bad TokenStream

2008-03-12 Thread Hiroaki Kawai (JIRA)
NGramTokenizer creates bad TokenStream -- Key: LUCENE-1225 URL: https://issues.apache.org/jira/browse/LUCENE-1225 Project: Lucene - Java Issue Type: Bug Components: contrib/* Reporter

[jira] Resolved: (LUCENE-1221) DocumentsWriter truncates term text at \uFFFF

2008-03-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1221. Resolution: Invalid OK thanks Marcel. > DocumentsWriter truncates term text at \u

[jira] Resolved: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed

2008-03-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1217. Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [Patch Availa

[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-03-12 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1219: Attachment: LUCENE-1219.patch latest patch updated to the trunk (Lucene-1217 is there. Michael you did not

an API for synonym in Lucene-core

2008-03-12 Thread Mathieu Lecarme
Why Lucen doesn't have a clean synonym API? WordNet contrib is not an answer, it provides an Interface for its own needs, and most of the world don't speak english. Compass provides a tool, just like Solr. Lucene is the framework for applications like Solr, Nutch or Compass, why don't backport l

[jira] Commented: (LUCENE-1221) DocumentsWriter truncates term text at \uFFFF

2008-03-12 Thread Marcel Reutegger (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1256#action_1256 ] Marcel Reutegger commented on LUCENE-1221: -- Indeed there are some characters that

[jira] Created: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-03-12 Thread Hiroaki Kawai (JIRA)
NGramTokenFilter creates bad TokenStream Key: LUCENE-1224 URL: https://issues.apache.org/jira/browse/LUCENE-1224 Project: Lucene - Java Issue Type: Bug Components: contrib/* Repo

[jira] Updated: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-03-12 Thread Hiroaki Kawai (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated LUCENE-1224: -- Attachment: NGramTokenFilter.patch > NGramTokenFilter creates bad TokenStream > --