> On Jun 5, 2017, at 10:43am, Allison, Timothy B. <talli...@mitre.org> wrote:
> 
> Jim,
>   Thank you, again, for reaching out to us.  Now that we have a user who 
> actually cares about macros, I have some follow up questions, we aren’t 
> treating js in html as a macro…should we try to do that?  Are there other 
> macro-like bits of code that we should be extracting?

Oddly enough, this just came up for me a few days ago.

I was going to use a custom mapper and content handler to extract the <script> 
data, but having built-in support that treats them as macros would be better.

So yes, please :)

How would you handle the src=xxx attribute? Ultimately I plan to treat these 
like an import statement in a regular source code file.

Regards,

— Ken


> From: Jim Idle [mailto:ji...@proofpoint.com <mailto:ji...@proofpoint.com>] 
> Sent: Sunday, June 4, 2017 4:07 AM
> To: user@tika.apache.org <mailto:user@tika.apache.org>
> Subject: RE: Extracting macros in 1.15
>  
> Direct Java calls and "I am using the AutoDetectParser at the moment."
>  
> I find an online example buried a test for another package, so I have worked 
> out how to do it now, but it seems that if I have many difference document 
> types to support I will have to configure each parser separately. So be it, 
> but it seems like there is a case for a subset of options that may apply to 
> all such as "extract anything that qualifies as a 'macro'" that all parsers 
> would obey if they have not been told anything specifically.
>  
> It is my opinion (for what it's worth 😉, that all parsers should extract 
> everything they can unless told otherwise, but it is what it is I guess and I 
> am pleased to have TIKA as an aid in analyzing all the myriad document types.
>  
> Jim
>  
>         pc = new ParseContext();
>        parser = new AutoDetectParser();
>         OfficeParserConfig officeParserConfig = new OfficeParserConfig();
>         officeParserConfig.setExtractMacros(true);
>         pc.set(OfficeParserConfig.class, officeParserConfig);
>  
>  
>  
> > -----Original Message-----
> > From: Nick Burch [mailto:apa...@gagravarr.org <mailto:apa...@gagravarr.org>]
> > Sent: Saturday, June 3, 2017 16:36
> > To: user@tika.apache.org <mailto:user@tika.apache.org>
> > Subject: Re: Extracting macros in 1.15
> > 
> > On Sat, 3 Jun 2017, Jim Idle wrote:
> > > After being baffled why macros no longer show up in 1.15 I found:
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org 
> > > <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org>
> > > _jira_browse_TIKA-
> > 2D2302&d=DwIBAg&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJy
> > >
> > p031rXjhg&r=LQ_Q8ZxvkO2zK857fAbj5MDtaB4Bvrpw3bihfO3Bhbw&m=o8gr
> > 8gP1-gre
> > >
> > pBVLNkl9r56fM6Jt6LIlRff8aub3bEA&s=8nhkO_W_dLX6R9XdCgmgqoEpbRlvVL
> > iSwf4L
> > > rAFE1tA&e=
> > >
> > > Can anyone point me to an example of doing this? I am finding bits and
> > > pieces but no example of turning macros back on.I basically want all
> > > macros in all documents, office, pdf, anything really.
> > 
> > How do you call Apache Tika? Tika App? Tika Server? Tika java class facade?
> > Direct Java calls to TikaConfig / AutoDetectParser etc?
> > 
> > The solution will differ depending on which one you use
> > 
> > Nick

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr



Reply via email to