Client/runLuceneClient.bat script (on Windows) and it should just
work. If it doesn't, let me know.
Cheers,
Eliot
--
W. Eliot Kimber, [EMAIL PROTECTED]
Consultant, ISOGEN International
1016 La Posada Dr., Suite 240
Austin, TX 78752 Phone: 512.656.4139
--
To unsubscribe, e-mail: <mailto:[E
y experience using
JNI to expose C libraries).
Thanks for the tip.
Cheers,
Eliot
--
W. Eliot Kimber, [EMAIL PROTECTED]
Consultant, ISOGEN International
1016 La Posada Dr., Suite 240
Austin, TX 78752 Phone: 512.656.4139
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For addition
from various non-PDF inputs). Our main writing usecase is the
rewriting of existing PDFs following some amount of manipulation through
our API.
A caution: I am still waiting to get approval from my employers to do
this work as open source--it may be a while before I can even start on
the coding.
Lucene integrators would want from
a PDF access library.
Thanks,
Eliot
--
W. Eliot Kimber, [EMAIL PROTECTED]
Consultant, ISOGEN International
1016 La Posada Dr., Suite 240
Austin, TX 78752 Phone: 512.656.4139
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional comma
have to implement Adobe's layout logic.
However, you need this functionality in order to correlate PDF
annotations (links, bookmarks, notes) to the page objects they relate
to--it's all done with bounding boxes.
Cheers,
Eliot
--
W. Eliot Kimber, [EMAIL PROTECTED]
Consultant, ISOGEN Intern
the hard work of integrating this
technique into one of our customer's systems will be presenting a paper
on his experience at the XML Europe conference in Barcelona, Spain in
May (http://www.idealliance.org/.
Cheers,
E.
--
W. Eliot Kimber, [EMAIL PROTECTED]
Consultant, ISOGEN International
1
be too hard
to write a PDF indexer for Lucene using this library. The main challenge
would be guessing word boundaries in strings where spaces have been
replaced with explicit shift values by the formatter.
Cheers,
Eliot
--
W. Eliot Kimber, [EMAIL PROTECTED]
Consultant, ISOGEN International
101
You can now find our package for doing XML indexing with Lucene on the
ISOGEN web site:
http://www.isogen.com/papers/lucene_xml_indexing.html
The package (lucene_xml_indexing.zip) includes all the 3rd-party
libraries it depends on (Lucene, Xerces 1.4.4, junit).
This package is provided as-is an
"Ogren, Philip V." wrote:
> We are indexing a large corpus of XML documents (~10M). One thing that
> Verity does with XML notes is that it indexes each XML tag as a zone.*
> What's cool about it is that the zones are nested so that it mirrors the
> schema of your XML document. You can limit your
Winton Davies wrote:
>
> Hi Eliot,
>
> Not really, all documents have an accountID, but I need to search
> all the documents
> first, and each document that is returned has an accountID, but I
> just want one document
> per accountID.
I see the problem. Can't think of any other way to solve i
Winton Davies wrote:
>
> Hi all,
> In my application, I have to be able to return a list of documents,
> that have been uniqified according to an accountID. The most relevant
> document for an accountID is returned, and then susequent hits that
> have the same accountID are dropped.
Do you me
I have put together a hopefully useful package that demonstrates our
current experiments with using Lucene for XML indexing. You can get the
files by anonymous ftp from che.isogen.com, /outgoing/lucene. There are
two zip files:
- lucene_xml_indexing.zip
This is the core indexing code and a l
er 1, 2001.
Cool--thanks!
E.
--
. . . . . . . . . . . . . . . . . . . . . . . .
W. Eliot Kimber | Lead Brain
1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.656.4139 | F 512.419.1860 | [EMAIL PROTECTED]
w w w . d a t a c h a n n e l . c o m
hat Lucene supports date
matching, but I don't see how to specify this in a query.
Also, is there a description of the algorithm "~" uses?
Thanks,
E.
--
. . . . . . . . . . . . . . . . . . . . . . . .
W. Eliot Kimber | Lead Brain
1016 La Posada Dr. | Suite 240 | Austin TX 787
ome back (for example,
organizing the hits by XML document or doing additional context-based
filtering that can't be done at the Lucene level).
Cheers,
Eliot
--
. . . . . . . . . . . . . . . . . . . . . . . .
W. Eliot Kimber | Lead Brain
1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.656.4139 | F 512.419.1860 | [EMAIL PROTECTED]
w w w . d a t a c h a n n e l . c o m
E.
--
. . . . . . . . . . . . . . . . . . . . . . . .
W. Eliot Kimber | Lead Brain
1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.656.4139 | F 512.419.1860 | [EMAIL PROTECTED]
w w w . d a t a c h a n n e l . c o m
hing I know how to do with Verity, Fulcrum, Excallibur, etc. and it
was freaky easy to do once we got the idea for the approach. I just hope
it performs adequately.
Cheers,
E.
--
. . . . . . . . . . . . . . . . . . . . . . . .
W. Eliot Kimber | Lead Brain
1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.656.4139 | F 512.419.1860 | [EMAIL PROTECTED]
w w w . d a t a c h a n n e l . c o m
the text (thus ignoring element-specific
searching) might incur a performance penalty.
In a related question, is there anything we can or need to do to
optimize Lucene to handle lots of little Lucene documents?
Thanks,
Eliot
--
. . . . . . . . . . . . . . . . . . . . . . . .
W. Eliot Kimber
18 matches
Mail list logo