On Jun 4, 2009, at 2:49 PM, Grant Ingersoll wrote:
Looking more, I think my problem resides around the notion that I'm
using EnWikiDocMaker independently of the benchmarking tool. The
weird thing is, it used to work, but I don't know when it broke. I
suspect I'm not init
Looking more, I think my problem resides around the notion that I'm
using EnWikiDocMaker independently of the benchmarking tool. The
weird thing is, it used to work, but I don't know when it broke. I
suspect I'm not initializing things right.
Anyone else doing that?
-
. So whatever the decision is following your question, I can do
it as
> part of this issue, since that code will no longer be in
EnwikiDocMaker.
>
> Regarding to your question, I don't know why it should depend on
Xerces
> (rather than the default Java XML parser I assume?)
gt;
>> Mike
>>
>> On Wed, Jun 3, 2009 at 4:26 AM, Shai Erera wrote:
>> > Grant, note that I'm changing the DocMakers in LUCENE-1595 including
>> this
>> > one. So whatever the decision is following your question, I can do it as
>> > part of this i
es from benchmark as part of LUCENE-1595
Shai
On Wed, Jun 3, 2009 at 7:09 PM, Grant Ingersoll
wrote:
+1
Note, Xerces Jar is not in benchmark, AFAICT. It relies on the fact
that Java uses it under the hood.
I'm having this really weird situation where I'm using
EnwikiDocMa
act that
>> Java uses it under the hood.
>> I'm having this really weird situation where I'm using EnwikiDocMaker
>> outside the context of the benchmarker and I'm grasping at straws as to why
>> it is not working. It seems to be a classpath issue, but is not Luce
wrote:
> +1
> Note, Xerces Jar is not in benchmark, AFAICT. It relies on the fact that
> Java uses it under the hood.
>
> I'm having this really weird situation where I'm using EnwikiDocMaker
> outside the context of the benchmarker and I'm grasping at straws as t
+1
Note, Xerces Jar is not in benchmark, AFAICT. It relies on the fact
that Java uses it under the hood.
I'm having this really weird situation where I'm using EnwikiDocMaker
outside the context of the benchmarker and I'm grasping at straws as
to why it is not working. I
can do it as
> > part of this issue, since that code will no longer be in EnwikiDocMaker.
> >
> > Regarding to your question, I don't know why it should depend on Xerces
> > (rather than the default Java XML parser I assume?)
> >
> > Shai
> >
>
f this issue, since that code will no longer be in EnwikiDocMaker.
>
> Regarding to your question, I don't know why it should depend on Xerces
> (rather than the default Java XML parser I assume?)
>
> Shai
>
> On Wed, Jun 3, 2009 at 2:48 AM, Grant Ingersoll wrote:
>>
Grant, note that I'm changing the DocMakers in LUCENE-1595 including this
one. So whatever the decision is following your question, I can do it as
part of this issue, since that code will no longer be in EnwikiDocMaker.
Regarding to your question, I don't know why it should depend
Is there a reason the EnwikiDocMaker assumes Xerces for the SAX
parser? Line 96.
Thanks,
Grant
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org
Thanks Uwe. Then I think we should at least wrap the IS with a Buffered IS
in EnwikiDocMaker (that's what I wanted to achieve in the first place,
reusing LDM's BufferedReader)?
On Fri, Apr 10, 2009 at 10:22 AM, Uwe Schindler wrote:
> Hi Shai,
>
>
>
> with XML parsers y
e
eMail: u...@thetaphi.de
_
From: Shai Erera [mailto:ser...@gmail.com]
Sent: Friday, April 10, 2009 8:47 AM
To: java-dev@lucene.apache.org
Subject: Benchmark: EnwikiDocMaker does not use fileIn (BufferedReader)
I started working on the patch for 1591, and noticed EnwikiDocMake
I started working on the patch for 1591, and noticed EnwikiDocMaker uses the
FileInputStream instance from LineDocMaker and not the BuferredReader. I
don't see any reason to this, as InputSource accepts a Reader. I can change
it as part of 1591, unless you think I'm missing something.
x to NOT hang when the XML parsing thread hits an
exception.
> Intermittent thread safety issue with EnwikiDocMaker
>
>
> Key: LUCENE-1117
> URL: https://issues.apache.org/jira/browse/LUCENE-1117
&g
ot look at the code of EnwikiDocMaker, but I hope this helps
nonetheless.
Regards,
Paul Elschot
On Wednesday 09 January 2008 14:55:05 Grant Ingersoll wrote:
As one can probably guess, I have been looking at the
EnwikiDocMaker a
bit and using it outside of the benchmark suite, as related to
en be read by a configurable number of threads,
probably 2-6.
With multiple disks, one could feed this queue using multiple threads,
one per independent disk.
For even more speed, one could also try and put the index on a different
disk.
I did not look at the code of EnwikiDocMaker, but I hope this
As one can probably guess, I have been looking at the
EnwikiDocMaker a bit and using it outside of the benchmark suite,
as related to the new contrib/wikipedia stuff. Just wanted to
make sure I have a good basic understanding of what it is doing,
because I am looking for ways to speed it
As one can probably guess, I have been looking at the EnwikiDocMaker a
bit and using it outside of the benchmark suite, as related to the new
contrib/wikipedia stuff. Just wanted to make sure I have a good
basic understanding of what it is doing, because I am looking for ways
to speed it
issue (the exception in
the o.p. of this issue also would just hang).
OK I worked out a patch to fix this: attached excHang.patch. I'll
in a day or two!
> Intermittent thread safety issue with EnwikiDocMaker
>
>
>
that the process doesn't die if there is an
exception thrown (as in the one above) b/c I think the thread doesn't stop.
> Intermittent thread safety issue with EnwikiDocMaker
>
>
> Key: LUCENE-11
Mike
> Intermittent thread safety issue with EnwikiDocMaker
>
>
> Key: LUCENE-1117
> URL: https://issues.apache.org/jira/browse/LUCENE-1117
> Project: Lucene - Java
> Issue Type:
call docMaker.resetInputs()?
The contrib/benchmark framework calls that, on creating a docMaker. That
method opens the line file.
> Intermittent thread safety issue with EnwikiDocMaker
>
>
> Key: LUCENE-1117
>
safety issue with EnwikiDocMaker
>
>
> Key: LUCENE-1117
> URL: https://issues.apache.org/jira/browse/LUCENE-1117
> Project: Lucene - Java
> Issue Type: Bug
> Comp
oing:
EnwikiDocMaker docMaker = new EnwikiDocMaker();
Properties properties = new Properties();
//fileName = config.get("docs.file", null);
String filePath = wikipediaXML.getAbsolutePath();
properties.setProperty("docs.file", filePath);
properties.setProperty(&qu
[
https://issues.apache.org/jira/browse/LUCENE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-1117.
Resolution: Fixed
> Intermittent thread safety issue with EnwikiDocMa
a day or two.
> Intermittent thread safety issue with EnwikiDocMaker
>
>
> Key: LUCENE-1117
> URL: https://issues.apache.org/jira/browse/LUCENE-1117
> Project: Lucene - Java
>
Intermittent thread safety issue with EnwikiDocMaker
Key: LUCENE-1117
URL: https://issues.apache.org/jira/browse/LUCENE-1117
Project: Lucene - Java
Issue Type: Bug
Components
[
https://issues.apache.org/jira/browse/LUCENE-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll resolved LUCENE-1102.
-
Resolution: Fixed
Lucene Fields: (was: [New])
Committed
> EnwikiDocMaker
[
https://issues.apache.org/jira/browse/LUCENE-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll updated LUCENE-1102:
Attachment: LUCENE-1102.patch
Adds docid Field to the index for EnwikiDocMaker
I am using EnwikiDocMaker with the following algorithm outlined at the
bottom (against trunk). After the first round is complete, I am getting
java.lang.RuntimeException: java.io.IOException: Bad file descriptor
at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker
$Parser.run
EnwikiDocMaker id field
---
Key: LUCENE-1102
URL: https://issues.apache.org/jira/browse/LUCENE-1102
Project: Lucene - Java
Issue Type: Improvement
Components: contrib/benchmark
Reporter: Grant
33 matches
Mail list logo