Re: EnwikiDocMaker

2009-06-04 Thread Grant Ingersoll
On Jun 4, 2009, at 2:49 PM, Grant Ingersoll wrote: Looking more, I think my problem resides around the notion that I'm using EnWikiDocMaker independently of the benchmarking tool. The weird thing is, it used to work, but I don't know when it broke. I suspect I'm not init

Re: EnwikiDocMaker

2009-06-04 Thread Grant Ingersoll
Looking more, I think my problem resides around the notion that I'm using EnWikiDocMaker independently of the benchmarking tool. The weird thing is, it used to work, but I don't know when it broke. I suspect I'm not initializing things right. Anyone else doing that? -

Re: EnwikiDocMaker

2009-06-03 Thread Grant Ingersoll
. So whatever the decision is following your question, I can do it as > part of this issue, since that code will no longer be in EnwikiDocMaker. > > Regarding to your question, I don't know why it should depend on Xerces > (rather than the default Java XML parser I assume?)

Re: EnwikiDocMaker

2009-06-03 Thread Jason Rutherglen
gt; >> Mike >> >> On Wed, Jun 3, 2009 at 4:26 AM, Shai Erera wrote: >> > Grant, note that I'm changing the DocMakers in LUCENE-1595 including >> this >> > one. So whatever the decision is following your question, I can do it as >> > part of this i

Re: EnwikiDocMaker

2009-06-03 Thread Grant Ingersoll
es from benchmark as part of LUCENE-1595 Shai On Wed, Jun 3, 2009 at 7:09 PM, Grant Ingersoll wrote: +1 Note, Xerces Jar is not in benchmark, AFAICT. It relies on the fact that Java uses it under the hood. I'm having this really weird situation where I'm using EnwikiDocMa

Re: EnwikiDocMaker

2009-06-03 Thread Michael McCandless
act that >> Java uses it under the hood. >> I'm having this really weird situation where I'm using EnwikiDocMaker >> outside the context of the benchmarker and I'm grasping at straws as to why >> it is not working.  It seems to be a classpath issue, but is not Luce

Re: EnwikiDocMaker

2009-06-03 Thread Shai Erera
wrote: > +1 > Note, Xerces Jar is not in benchmark, AFAICT. It relies on the fact that > Java uses it under the hood. > > I'm having this really weird situation where I'm using EnwikiDocMaker > outside the context of the benchmarker and I'm grasping at straws as t

Re: EnwikiDocMaker

2009-06-03 Thread Grant Ingersoll
+1 Note, Xerces Jar is not in benchmark, AFAICT. It relies on the fact that Java uses it under the hood. I'm having this really weird situation where I'm using EnwikiDocMaker outside the context of the benchmarker and I'm grasping at straws as to why it is not working. I

Re: EnwikiDocMaker

2009-06-03 Thread Shai Erera
can do it as > > part of this issue, since that code will no longer be in EnwikiDocMaker. > > > > Regarding to your question, I don't know why it should depend on Xerces > > (rather than the default Java XML parser I assume?) > > > > Shai > > >

Re: EnwikiDocMaker

2009-06-03 Thread Michael McCandless
f this issue, since that code will no longer be in EnwikiDocMaker. > > Regarding to your question, I don't know why it should depend on Xerces > (rather than the default Java XML parser I assume?) > > Shai > > On Wed, Jun 3, 2009 at 2:48 AM, Grant Ingersoll wrote: >>

Re: EnwikiDocMaker

2009-06-03 Thread Shai Erera
Grant, note that I'm changing the DocMakers in LUCENE-1595 including this one. So whatever the decision is following your question, I can do it as part of this issue, since that code will no longer be in EnwikiDocMaker. Regarding to your question, I don't know why it should depend

EnwikiDocMaker

2009-06-02 Thread Grant Ingersoll
Is there a reason the EnwikiDocMaker assumes Xerces for the SAX parser? Line 96. Thanks, Grant - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Benchmark: EnwikiDocMaker does not use fileIn (BufferedReader)

2009-04-10 Thread Shai Erera
Thanks Uwe. Then I think we should at least wrap the IS with a Buffered IS in EnwikiDocMaker (that's what I wanted to achieve in the first place, reusing LDM's BufferedReader)? On Fri, Apr 10, 2009 at 10:22 AM, Uwe Schindler wrote: > Hi Shai, > > > > with XML parsers y

RE: Benchmark: EnwikiDocMaker does not use fileIn (BufferedReader)

2009-04-10 Thread Uwe Schindler
e eMail: u...@thetaphi.de _ From: Shai Erera [mailto:ser...@gmail.com] Sent: Friday, April 10, 2009 8:47 AM To: java-dev@lucene.apache.org Subject: Benchmark: EnwikiDocMaker does not use fileIn (BufferedReader) I started working on the patch for 1591, and noticed EnwikiDocMake

Benchmark: EnwikiDocMaker does not use fileIn (BufferedReader)

2009-04-09 Thread Shai Erera
I started working on the patch for 1591, and noticed EnwikiDocMaker uses the FileInputStream instance from LineDocMaker and not the BuferredReader. I don't see any reason to this, as InputSource accepts a Reader. I can change it as part of 1591, unless you think I'm missing something.

[jira] Commented: (LUCENE-1117) Intermittent thread safety issue with EnwikiDocMaker

2008-01-10 Thread Michael McCandless (JIRA)
x to NOT hang when the XML parsing thread hits an exception. > Intermittent thread safety issue with EnwikiDocMaker > > > Key: LUCENE-1117 > URL: https://issues.apache.org/jira/browse/LUCENE-1117 &g

Re: EnwikiDocMaker ?

2008-01-09 Thread Grant Ingersoll
ot look at the code of EnwikiDocMaker, but I hope this helps nonetheless. Regards, Paul Elschot On Wednesday 09 January 2008 14:55:05 Grant Ingersoll wrote: As one can probably guess, I have been looking at the EnwikiDocMaker a bit and using it outside of the benchmark suite, as related to

Re: EnwikiDocMaker ?

2008-01-09 Thread Paul Elschot
en be read by a configurable number of threads, probably 2-6. With multiple disks, one could feed this queue using multiple threads, one per independent disk. For even more speed, one could also try and put the index on a different disk. I did not look at the code of EnwikiDocMaker, but I hope this

Re: EnwikiDocMaker ?

2008-01-09 Thread Michael McCandless
As one can probably guess, I have been looking at the EnwikiDocMaker a bit and using it outside of the benchmark suite, as related to the new contrib/wikipedia stuff. Just wanted to make sure I have a good basic understanding of what it is doing, because I am looking for ways to speed it

EnwikiDocMaker ?

2008-01-09 Thread Grant Ingersoll
As one can probably guess, I have been looking at the EnwikiDocMaker a bit and using it outside of the benchmark suite, as related to the new contrib/wikipedia stuff. Just wanted to make sure I have a good basic understanding of what it is doing, because I am looking for ways to speed it

[jira] Updated: (LUCENE-1117) Intermittent thread safety issue with EnwikiDocMaker

2008-01-08 Thread Michael McCandless (JIRA)
issue (the exception in the o.p. of this issue also would just hang). OK I worked out a patch to fix this: attached excHang.patch. I'll in a day or two! > Intermittent thread safety issue with EnwikiDocMaker > > >

[jira] Commented: (LUCENE-1117) Intermittent thread safety issue with EnwikiDocMaker

2008-01-08 Thread Grant Ingersoll (JIRA)
that the process doesn't die if there is an exception thrown (as in the one above) b/c I think the thread doesn't stop. > Intermittent thread safety issue with EnwikiDocMaker > > > Key: LUCENE-11

[jira] Commented: (LUCENE-1117) Intermittent thread safety issue with EnwikiDocMaker

2008-01-07 Thread Grant Ingersoll (JIRA)
Mike > Intermittent thread safety issue with EnwikiDocMaker > > > Key: LUCENE-1117 > URL: https://issues.apache.org/jira/browse/LUCENE-1117 > Project: Lucene - Java > Issue Type:

[jira] Commented: (LUCENE-1117) Intermittent thread safety issue with EnwikiDocMaker

2008-01-07 Thread Michael McCandless (JIRA)
call docMaker.resetInputs()? The contrib/benchmark framework calls that, on creating a docMaker. That method opens the line file. > Intermittent thread safety issue with EnwikiDocMaker > > > Key: LUCENE-1117 >

[jira] Commented: (LUCENE-1117) Intermittent thread safety issue with EnwikiDocMaker

2008-01-07 Thread Grant Ingersoll (JIRA)
safety issue with EnwikiDocMaker > > > Key: LUCENE-1117 > URL: https://issues.apache.org/jira/browse/LUCENE-1117 > Project: Lucene - Java > Issue Type: Bug > Comp

[jira] Commented: (LUCENE-1117) Intermittent thread safety issue with EnwikiDocMaker

2008-01-07 Thread Grant Ingersoll (JIRA)
oing: EnwikiDocMaker docMaker = new EnwikiDocMaker(); Properties properties = new Properties(); //fileName = config.get("docs.file", null); String filePath = wikipediaXML.getAbsolutePath(); properties.setProperty("docs.file", filePath); properties.setProperty(&qu

[jira] Resolved: (LUCENE-1117) Intermittent thread safety issue with EnwikiDocMaker

2008-01-04 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1117. Resolution: Fixed > Intermittent thread safety issue with EnwikiDocMa

[jira] Updated: (LUCENE-1117) Intermittent thread safety issue with EnwikiDocMaker

2008-01-03 Thread Michael McCandless (JIRA)
a day or two. > Intermittent thread safety issue with EnwikiDocMaker > > > Key: LUCENE-1117 > URL: https://issues.apache.org/jira/browse/LUCENE-1117 > Project: Lucene - Java >

[jira] Created: (LUCENE-1117) Intermittent thread safety issue with EnwikiDocMaker

2008-01-03 Thread Michael McCandless (JIRA)
Intermittent thread safety issue with EnwikiDocMaker Key: LUCENE-1117 URL: https://issues.apache.org/jira/browse/LUCENE-1117 Project: Lucene - Java Issue Type: Bug Components

[jira] Resolved: (LUCENE-1102) EnwikiDocMaker id field

2007-12-31 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved LUCENE-1102. - Resolution: Fixed Lucene Fields: (was: [New]) Committed > EnwikiDocMaker

[jira] Updated: (LUCENE-1102) EnwikiDocMaker id field

2007-12-28 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1102: Attachment: LUCENE-1102.patch Adds docid Field to the index for EnwikiDocMaker

EnwikiDocMaker

2007-12-28 Thread Grant Ingersoll
I am using EnwikiDocMaker with the following algorithm outlined at the bottom (against trunk). After the first round is complete, I am getting java.lang.RuntimeException: java.io.IOException: Bad file descriptor at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker $Parser.run

[jira] Created: (LUCENE-1102) EnwikiDocMaker id field

2007-12-28 Thread Grant Ingersoll (JIRA)
EnwikiDocMaker id field --- Key: LUCENE-1102 URL: https://issues.apache.org/jira/browse/LUCENE-1102 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Grant