Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-05-04 Thread via GitHub
Claudenw merged PR #240: URL: https://github.com/apache/creadur-rat/pull/240 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-05-04 Thread via GitHub
ottlinger commented on PR #240: URL: https://github.com/apache/creadur-rat/pull/240#issuecomment-2094202616 @Claudenw pls review my latest additions concerning RAT-301, after that go ahead with the merge. Thanks for your work and the cool addition of more functionality to RAT #kudos --

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-05-04 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1589986059 ## src/changes/changes.xml: ## @@ -72,6 +72,22 @@ https://maven.apache.org/plugins/maven-changes-plugin/xsd/changes-1.0.0.xsd --> + +

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-05-04 Thread via GitHub
Claudenw commented on PR #240: URL: https://github.com/apache/creadur-rat/pull/240#issuecomment-2094116895 I updated the checklist. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-05-03 Thread via GitHub
ottlinger commented on PR #240: URL: https://github.com/apache/creadur-rat/pull/240#issuecomment-2093681175 > @ottlinger If you approve I can merge this. If you want more eyes on it, lets's invite a few reviewers. In the PR's main description you've created a check list - is that

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-05-03 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1589675326 ## src/changes/changes.xml: ## @@ -72,6 +72,22 @@ https://maven.apache.org/plugins/maven-changes-plugin/xsd/changes-1.0.0.xsd --> + +

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-05-03 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1589675326 ## src/changes/changes.xml: ## @@ -72,6 +72,22 @@ https://maven.apache.org/plugins/maven-changes-plugin/xsd/changes-1.0.0.xsd --> + +

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-05-03 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1589673093 ## apache-rat-core/src/test/java/org/apache/rat/document/impl/guesser/BinaryGuesserTest.java: ## @@ -1,150 +1,150 @@ -/* - * Licensed to the Apache Software

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-05-03 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1589669854 ## apache-rat-core/pom.xml: ## @@ -126,5 +126,10 @@ assertj-core test + Review Comment: Thanks. -- This is an automated message

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-05-01 Thread via GitHub
Claudenw commented on PR #240: URL: https://github.com/apache/creadur-rat/pull/240#issuecomment-2089624723 @ottlinger If you approve I can merge this. If you want more eyes on it, lets's invite a few reviewers. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-05-01 Thread via GitHub
ottlinger commented on PR #240: URL: https://github.com/apache/creadur-rat/pull/240#issuecomment-2088992912 @Claudenw the extraction into the Tika-class looks very nice - thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-05-01 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1586731939 ## apache-rat-core/src/main/java/org/apache/rat/Defaults.java: ## @@ -152,27 +163,35 @@ public SortedSet getLicenseFamilies(LicenseFilter filter) { * @param

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-05-01 Thread via GitHub
Claudenw commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1585924926 ## apache-rat-core/src/main/java/org/apache/rat/Defaults.java: ## @@ -152,27 +163,35 @@ public SortedSet getLicenseFamilies(LicenseFilter filter) { * @param

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-30 Thread via GitHub
Claudenw commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1584567119 ## apache-rat-core/src/main/java/org/apache/rat/Defaults.java: ## @@ -152,27 +163,35 @@ public SortedSet getLicenseFamilies(LicenseFilter filter) { * @param

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-30 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1584556357 ## apache-rat-core/src/main/java/org/apache/rat/analysis/DefaultAnalyserFactory.java: ## @@ -63,8 +60,8 @@ private final static class DefaultAnalyser implements

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-30 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1584553794 ## apache-rat-core/src/main/java/org/apache/rat/Defaults.java: ## @@ -152,27 +163,35 @@ public SortedSet getLicenseFamilies(LicenseFilter filter) { * @param

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-29 Thread via GitHub
Claudenw commented on PR #240: URL: https://github.com/apache/creadur-rat/pull/240#issuecomment-2081968269 I extracted the Tika processing to its own class. I added the tika `MediaType` to our metadata. The process now assumes that all media types = "text/*" are `STANDARD`

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-28 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1582370881 ## src/changes/changes.xml: ## @@ -72,6 +72,22 @@ https://maven.apache.org/plugins/maven-changes-plugin/xsd/changes-1.0.0.xsd --> + +

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-28 Thread via GitHub
Claudenw commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1582029861 ## src/changes/changes.xml: ## @@ -72,6 +72,22 @@ https://maven.apache.org/plugins/maven-changes-plugin/xsd/changes-1.0.0.xsd --> + +

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-28 Thread via GitHub
Claudenw commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1582028644 ## apache-rat-core/src/main/java/org/apache/rat/walker/Walker.java: ## @@ -33,38 +34,32 @@ public abstract class Walker implements IReportable { protected

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-28 Thread via GitHub
Claudenw commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1582027702 ## apache-rat-core/src/main/java/org/apache/rat/report/claim/ClaimStatistic.java: ## @@ -57,45 +58,71 @@ public int getCounter(Counter counter) { return

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-28 Thread via GitHub
Claudenw commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1582027540 ## apache-rat-core/src/main/java/org/apache/rat/ReportConfiguration.java: ## @@ -179,31 +177,31 @@ public boolean isDryRun() { /** * @return The filename

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-28 Thread via GitHub
Claudenw commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1582027053 ## apache-rat-core/src/main/java/org/apache/rat/Report.java: ## @@ -452,11 +452,11 @@ private static IReportable getDirectory(String baseDirectory,

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-27 Thread via GitHub
Claudenw commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1582025702 ## apache-rat-core/src/main/java/org/apache/rat/Defaults.java: ## @@ -57,6 +62,10 @@ public class Defaults { public static final String

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-27 Thread via GitHub
Claudenw commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1582025702 ## apache-rat-core/src/main/java/org/apache/rat/Defaults.java: ## @@ -57,6 +62,10 @@ public class Defaults { public static final String

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-27 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1581906723 ## apache-rat-core/src/main/java/org/apache/rat/report/claim/ClaimStatistic.java: ## @@ -57,45 +58,71 @@ public int getCounter(Counter counter) { return

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-27 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1581906646 ## apache-rat-core/src/main/java/org/apache/rat/report/claim/ClaimStatistic.java: ## @@ -57,45 +58,71 @@ public int getCounter(Counter counter) { return

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-27 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1581906606 ## apache-rat-core/src/main/java/org/apache/rat/report/claim/ClaimStatistic.java: ## @@ -57,45 +58,71 @@ public int getCounter(Counter counter) { return

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-27 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1581906451 ## apache-rat-core/src/main/java/org/apache/rat/ReportConfiguration.java: ## @@ -179,31 +177,31 @@ public boolean isDryRun() { /** * @return The

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-27 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1581906153 ## apache-rat-core/src/main/java/org/apache/rat/Report.java: ## @@ -452,11 +452,11 @@ private static IReportable getDirectory(String baseDirectory,

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-27 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1581906050 ## apache-rat-core/src/main/java/org/apache/rat/Defaults.java: ## @@ -57,6 +62,10 @@ public class Defaults { public static final String

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-27 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1581905166 ## apache-rat-core/src/main/java/org/apache/rat/Defaults.java: ## @@ -57,6 +62,10 @@ public class Defaults { public static final String

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-27 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1581905166 ## apache-rat-core/src/main/java/org/apache/rat/Defaults.java: ## @@ -57,6 +62,10 @@ public class Defaults { public static final String

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-26 Thread via GitHub
Claudenw commented on PR #240: URL: https://github.com/apache/creadur-rat/pull/240#issuecomment-2079782431 Well this blew up to something bigger than I wanted but... I added default exclusion for "*.json" files in the Default class and used that to configure the ReportConfiguration

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-26 Thread via GitHub
ottlinger commented on PR #240: URL: https://github.com/apache/creadur-rat/pull/240#issuecomment-2079291560 Pls add a reference to all the old tickets in the changelog & thanks for taking care of the old tickets/bugs. -- This is an automated message from the Apache Git Service. To

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-26 Thread via GitHub
Claudenw commented on PR #240: URL: https://github.com/apache/creadur-rat/pull/240#issuecomment-2078759677 I am adding a file filter to remove json files. Initially this will be hard coded. I will open a subsequent ticket to generalize it so that we can define a list of extensions to

Re: [PR] RAT-54: Tika based document analyzer [creadur-rat]

2024-04-25 Thread via GitHub
ottlinger commented on code in PR #240: URL: https://github.com/apache/creadur-rat/pull/240#discussion_r1580070068 ## apache-rat-core/src/main/java/org/apache/rat/api/Document.java: ## @@ -33,47 +36,416 @@ public interface Document { */ enum Type { /** A