[ https://issues.apache.org/jira/browse/TIKA-2550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16703529#comment-16703529 ]
Tim Allison edited comment on TIKA-2550 at 11/29/18 5:22 PM: ------------------------------------------------------------- Would anyone object if I modified the {{ToTextContentHandler}} to ignore content within {{<script/>}} and {{<style/>}} elements? was (Author: talli...@mitre.org): Would anyone object if I modified the {{ToTextContentHandler}} to ignore content within {{<script/>}} elements? > ToTextHandler includes <style/> element content > ----------------------------------------------- > > Key: TIKA-2550 > URL: https://issues.apache.org/jira/browse/TIKA-2550 > Project: Tika > Issue Type: Bug > Reporter: Tim Allison > Priority: Trivial > > When using the ToTextHandler to process .java files, the <style/> element > content is included, e.g.: > {noformat} > testFile > code { > color: rgb(0,0,0); font-family: monospace; font-size: 12px; white-space: > nowrap; > } > .java_plain { > color: rgb(0,0,0); > } > .java_keyword { > color: rgb(0,0,0); font-weight: bold; > } > .java_javadoc_tag { > color: rgb(147,147,147); background-color: rgb(247,247,247); font-style: > italic; font-weight: bold; > } > h1 { > font-family: sans-serif; font-size: 16pt; font-weight: bold; color: > rgb(0,0,0); background: rgb(210,210,210); border: solid 1px black; padding: > 5px; text-align: center; > } > .java_type { > color: rgb(0,44,221); > } > .java_literal { > color: rgb(188,0,0); > } > .java_javadoc_comment { > color: rgb(147,147,147); background-color: rgb(247,247,247); font-style: > italic; > } > .java_operator { > color: rgb(0,124,31); > } > .java_separator { > color: rgb(0,33,255); > } > .java_comment { > color: rgb(147,147,147); background-color: rgb(247,247,247); > } > testFile/************************************************************************* > * Compilation: javac HelloWorld.java > * Execution: java HelloWorld > * > * Prints "Hello, World". By tradition, this is everyone's first program. > * > *************************************************************************/ > public class HelloWorld { > public static void main(String[] args) { > System.out.println("Hello, World"); > } > } > {noformat} > Is this what we want as the default behavior? -- This message was sent by Atlassian JIRA (v7.6.3#76005)