Hi,
to write an extractor for MS-office documents you might use Jakarta POI.
I have no clue how much effort this would be.
What about following interface:
public interface Extractor
{
/**
* gets a string representation of the content data
*
* @return a String
*
* @throws IndexException
*
*/
String getContent () throws IndexException;
/**
* gets properties from the resource, for example "author"
* for a word doc, ...
*
* @return a Map key: String, value: String
*
* @throws IndexException
*
*/
Map getProperties () throws IndexException;
}
Regards,
Martin
> -----Original Message-----
> From: Daniel Florey [mailto:[EMAIL PROTECTED]
> Sent: Dienstag, 24. Februar 2004 11:44
> To: Slide Developers Mailing List
> Subject: Re: Full Text Search for MS Word and Excel files?
>
>
> Hi Ryan,
> I hope I can provide a proposal on the extractor in the next
> week or so. The
> idea was to extract metadata before storing content by using
> the event stuff.
> I have no idea how to get metadata (or whatever you are
> interested in) from
> word or excel documents, but there should be some kind of
> libraries available
> for doing this.
> I'll try to keep the extractor interface easy. So the main
> task will be to get
> the infos out of propriatary docs.
> Regards,
> Daniel
>
> Am Dienstag, 24. Februar 2004 02:19 schrieb ryan:
> > Hi,
> >
> > I would like to use the DASL features of Slide to search
> for text inside
> > of MS Word and Excel files. A while back, I read a
> discussion on this
> > list about providing an extractor interface for this kind of feature
> > that could extract metadata and text and store it for later
> searches.
> > Can anyone say what the status of these features is?
> >
> > If you do support the extractor concept or are planning to
> add it in the
> > near future, can anyone say how difficult it would be to write an
> > Extractor for MS Word and Excel and what the overall approach for
> > implementing an extractor will be like?
> >
> > Thanks,
> >
> > Ryan Rhodes
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]