Jim,
Thank you, again, for reaching out to us. Now that we have a user who
actually cares about macros, I have some follow up questions, we aren’t
treating js in html as a macro…should we try to do that? Are there other
macro-like bits of code that we should be extracting?
Cheers,
Tim
From: Jim Idle [mailto:[email protected]]
Sent: Sunday, June 4, 2017 4:07 AM
To: [email protected]<mailto:[email protected]>
Subject: RE: Extracting macros in 1.15
Direct Java calls and "I am using the AutoDetectParser at the moment."
I find an online example buried a test for another package, so I have worked
out how to do it now, but it seems that if I have many difference document
types to support I will have to configure each parser separately. So be it, but
it seems like there is a case for a subset of options that may apply to all
such as "extract anything that qualifies as a 'macro'" that all parsers would
obey if they have not been told anything specifically.
It is my opinion (for what it's worth 😉, that all parsers should extract
everything they can unless told otherwise, but it is what it is I guess and I
am pleased to have TIKA as an aid in analyzing all the myriad document types.
Jim
pc = new ParseContext();
parser = new AutoDetectParser();
OfficeParserConfig officeParserConfig = new OfficeParserConfig();
officeParserConfig.setExtractMacros(true);
pc.set(OfficeParserConfig.class, officeParserConfig);
> -----Original Message-----
> From: Nick Burch [mailto:[email protected]]
> Sent: Saturday, June 3, 2017 16:36
> To: [email protected]<mailto:[email protected]>
> Subject: Re: Extracting macros in 1.15
>
> On Sat, 3 Jun 2017, Jim Idle wrote:
> > After being baffled why macros no longer show up in 1.15 I found:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org
> > _jira_browse_TIKA-
> 2D2302&d=DwIBAg&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJy
> >
> p031rXjhg&r=LQ_Q8ZxvkO2zK857fAbj5MDtaB4Bvrpw3bihfO3Bhbw&m=o8gr
> 8gP1-gre
> >
> pBVLNkl9r56fM6Jt6LIlRff8aub3bEA&s=8nhkO_W_dLX6R9XdCgmgqoEpbRlvVL
> iSwf4L
> > rAFE1tA&e=
> >
> > Can anyone point me to an example of doing this? I am finding bits and
> > pieces but no example of turning macros back on.I basically want all
> > macros in all documents, office, pdf, anything really.
>
> How do you call Apache Tika? Tika App? Tika Server? Tika java class facade?
> Direct Java calls to TikaConfig / AutoDetectParser etc?
>
> The solution will differ depending on which one you use
>
> Nick