[jira] Updated: (LUCENE-849) contrib/benchmark: configurable HTML Parser, external classes to path, exhaustive doc maker

2007-03-25 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-849: --- Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Summary: contrib/benchm

[jira] Updated: (LUCENE-849) Configurable HTML Parser, external classes to path, exhaustive doc maker

2007-03-25 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-849: --- Attachment: 849-bench-parse-exhaust.patch > Configurable HTML Parser, external classes to path, exhau

[jira] Updated: (LUCENE-849) Configurable HTML Parser, external classes to path, exhaustive doc maker

2007-03-25 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-849: --- Description: "doc making" enhancements: 1. Allow configurable html parser, with a new html.parser

[jira] Created: (LUCENE-849) Configurable HTML Parser, external classes to path, exhasutive doc maker

2007-03-25 Thread Doron Cohen (JIRA)
Configurable HTML Parser, external classes to path, exhasutive doc maker Key: LUCENE-849 URL: https://issues.apache.org/jira/browse/LUCENE-849 Project: Lucene - Java Is

Re: whitespace

2007-03-25 Thread DM Smith
Just my 2 cents. I think it is good to use tooling to establish a whitespace standard for new files. For example, Eclipse has extensive formating and one of the declared styles is Sun's. Checkstyle is also good at creating a report of deviations from a defined standard. And it can be readil

Re: whitespace

2007-03-25 Thread Chris Hostetter
: The java coding standards do say : "Spaces after keywords but no spaces either before or after : parentheses in method calls" : : "if (a)" and "foo(a)" that's fairly readable in general, but i have found that when dealing with complex conditionals, a little extra white space can make some thing

[jira] Updated: (LUCENE-848) Add supported for Wikipediea English as a corpus in the benchmarker stuff

2007-03-25 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wettin updated LUCENE-848: --- Attachment: WikipediaHarvester.java There is some code in LUCENE-826. Here is a newer version. > Add

RE: [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-03-25 Thread Michael McCandless
"Steven Parkes" <[EMAIL PROTECTED]> wrote: > I've been wondering about taking minMergeDocs out of LMP > (LogarithmicMergePolicy): if IW is doing maxBufferedDocs, can we get by > with > ceil(log(docs)) > rather than > ceil(log(ceil(docs/minMergeDocs)) > (That's not exactly right, but it

RE: [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-03-25 Thread Steven Parkes
Yes, I'll separate out issues related to the basic refactor before submitting a candidate patch. I actually thought it might be helpful to keep it in the rough version to see context. But I can do that at any time ... With the factored merge policy, it's easy enough to create a merge policy on siz

RE: [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-03-25 Thread Michael McCandless
"Steven Parkes" <[EMAIL PROTECTED]> wrote: > Well, with all due respect, I don't find whitespace malignant ... Oh, sorry. I call it "cancerous" because it has a tendency to spread uncontrollably throughout the source code :) > That said, I don't get into this anymore. I make all the necessary >

RE: [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-03-25 Thread Michael McCandless
"Steven Parkes" <[EMAIL PROTECTED]> wrote: > Yes, I'll separate out issues related to the basic refactor before > submitting a candidate patch. I actually thought it might be helpful to > keep it in the rough version to see context. But I can do that at any > time ... OK, that makes sense to leav

RE: [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-03-25 Thread Steven Parkes
Well, with all due respect, I don't find whitespace malignant ... That said, I don't get into this anymore. I make all the necessary whitespace changes at the end. When making a candidate patch, I go through it line by line looking for whitespace/style changes that I've inadvertently added and tak

RE: [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-03-25 Thread Steven Parkes
* I think maxBufferedDocs should not be exposed in any *MergePolicy classes or interfaces? I'm planning on deprecating this param with LUCENE-843 when we switch by default to "buffering by RAM usage" and it really relates to "how/when should writer flush its RAM buffer". * I

[jira] Updated: (LUCENE-848) Add supported for Wikipediea English as a corpus in the benchmarker stuff

2007-03-25 Thread Steven Parkes (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-848: - Component/s: contrib/benchmark Priority: Minor (was: Major) Fix Version/s: 2.2 So

[jira] Created: (LUCENE-848) Add supported for Wikipediea English as a corpus in the benchmarker stuff

2007-03-25 Thread Steven Parkes (JIRA)
Add supported for Wikipediea English as a corpus in the benchmarker stuff - Key: LUCENE-848 URL: https://issues.apache.org/jira/browse/LUCENE-848 Project: Lucene - Java

whitespace

2007-03-25 Thread Yonik Seeley
On 3/25/07, Michael McCandless (JIRA) <[EMAIL PROTECTED]> wrote: My first comment, which I fear will be the most controversial feedback here :), is a whitespace style question: I'm not really a fan of "cancerous whitespace" where every ( [ etc has its own whitespace around it. I generally prefer

[jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-25 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-843: -- Attachment: LUCENE-843.take2.patch New rev of the patch: * Fixed at least one data c

[jira] Resolved: (LUCENE-846) IOExeception can cause loss of data due to premature segment deletion

2007-03-25 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-846. --- Resolution: Fixed Fix Version/s: 2.2 > IOExeception can cause loss of data due

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-03-25 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483931 ] Michael McCandless commented on LUCENE-847: --- OK some specific comments, only on the refactoring (ie I haven

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-03-25 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483930 ] Michael McCandless commented on LUCENE-847: --- My first comment, which I fear will be the most controversial

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-03-25 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483929 ] Michael McCandless commented on LUCENE-847: --- Steven, I looked through the patch quickly. It looks great!

[jira] Commented: (LUCENE-846) IOExeception can cause loss of data due to premature segment deletion

2007-03-25 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483919 ] Michael McCandless commented on LUCENE-846: --- Excellent I can repro the problem as well, thanks Steven! OK