Y, sorry about that surprise.  I tried to communicate it in the release notes, 
and you’re right, we could do a better job of documenting it…please let us know 
specifically how we can improve our documentation!

> that all parsers should extract everything they can unless told otherwise, 
> but it is what it is I guess.
It is, but we can make modifications based on user feedback.  The reason I 
chose to turn it off was my opinion (and no fellow devs objecting to the 
proposal) that enterprise search users probably wouldn’t want to get false 
positives on a macro in an excel sheet, and that folks who cared about those 
would figure it out and set Tika correctly.  That doesn’t mean my opinion is 
correct.

> So be it, but it seems like there is a case for a subset of options that may 
> apply to all such as "extract anything that qualifies as a 'macro'" that all 
> parsers would obey if they have not been told anything specifically.
If you feel strongly about this, please open an issue on our JIRA.  There may 
be an easy(ish) fix.  I can’t think of one at the moment, but we should look 
into it if there’s sufficient user need.

Cheers,

           Tim

From: Jim Idle [mailto:[email protected]]
Sent: Sunday, June 4, 2017 4:07 AM
To: [email protected]
Subject: RE: Extracting macros in 1.15


Direct Java calls and "I am using the AutoDetectParser at the moment."



I find an online example buried a test for another package, so I have worked 
out how to do it now, but it seems that if I have many difference document 
types to support I will have to configure each parser separately. So be it, but 
it seems like there is a case for a subset of options that may apply to all 
such as "extract anything that qualifies as a 'macro'" that all parsers would 
obey if they have not been told anything specifically.



It is my opinion (for what it's worth 😉, that all parsers should extract 
everything they can unless told otherwise, but it is what it is I guess and I 
am pleased to have TIKA as an aid in analyzing all the myriad document types.



Jim



        pc = new ParseContext();

       parser = new AutoDetectParser();

        OfficeParserConfig officeParserConfig = new OfficeParserConfig();

        officeParserConfig.setExtractMacros(true);

        pc.set(OfficeParserConfig.class, officeParserConfig);







> -----Original Message-----

> From: Nick Burch [mailto:[email protected]]

> Sent: Saturday, June 3, 2017 16:36

> To: [email protected]<mailto:[email protected]>

> Subject: Re: Extracting macros in 1.15

>

> On Sat, 3 Jun 2017, Jim Idle wrote:

> > After being baffled why macros no longer show up in 1.15 I found:

> > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org

> > _jira_browse_TIKA-

> 2D2302&d=DwIBAg&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJy

> >

> p031rXjhg&r=LQ_Q8ZxvkO2zK857fAbj5MDtaB4Bvrpw3bihfO3Bhbw&m=o8gr

> 8gP1-gre

> >

> pBVLNkl9r56fM6Jt6LIlRff8aub3bEA&s=8nhkO_W_dLX6R9XdCgmgqoEpbRlvVL

> iSwf4L

> > rAFE1tA&e=

> >

> > Can anyone point me to an example of doing this? I am finding bits and

> > pieces but no example of turning macros back on.I basically want all

> > macros in all documents, office, pdf, anything really.

>

> How do you call Apache Tika? Tika App? Tika Server? Tika java class facade?

> Direct Java calls to TikaConfig / AutoDetectParser etc?

>

> The solution will differ depending on which one you use

>

> Nick

Reply via email to