Trying to see if certain information could be extracted from files based on
patterns added in a file.
* Use Apache Tika with a hashing approach to manage content extraction
processing
* Store URL or document content in DB and run apache-tika as the content
extractor.

Is apache solr better for this purpose?
See only some pattern examples below
https://tika.apache.org/1.21/examples.html#Apache_Tika_API_Usage_Examples

But wanted to see how the other components are being used at this stage
with tika in the build image.
Is there any way these components could be directly used with apache-tika
build and if so, what are the steps for that?

On Thu, Jul 6, 2023 at 1:33 PM Tim Allison <talli...@apache.org> wrote:

> >The example on tika-app.jar just outputs the input file contents and the
> options that are listed are limited,
>
> What are you trying to do?  What are your goals?
>
> On Thu, Jul 6, 2023 at 1:38 PM vijaya Panchak <panchakvij...@gmail.com>
> wrote:
>
>> With -DskipTests it works and generates jar files.
>> I will check that.
>>
>> I noticed lot of jar files being created with dependencies on each other
>> kafka, solr, lucene, sql-lite, parser types and others (which cannot run
>> independently).
>>
>> The example on tika-app.jar just outputs the input file contents and the
>> options that are listed are limited,
>>
>> Also checked tika book but it gives broadest of examples based on
>> wikipedia data
>> and others.
>>
>> Is there anyway subset of components that can be used as tutorial and
>> examples
>> are not much?
>>
>>
>> On Jul 6, 2023, at 10:20 AM, Tilman Hausherr <thaush...@t-online.de>
>> wrote:
>>
>> 
>> Edit the parent pom.xml so that bouncycastle is at 1.75.
>>
>> Tilman
>>
>> On 06.07.2023 18:01, vijaya Panchak wrote:
>>
>> [*INFO*]
>> *------------------------------------------------------------------------*
>>
>> [*ERROR*] Failed to execute goal
>> org.sonatype.ossindex.maven:ossindex-maven-plugin:3.2.0:audit
>> *(audit-dependencies)* on project tika-parser-digest-commons: *Detected
>> 1 vulnerable components:*
>>
>> [*ERROR*] *  org.bouncycastle:bcprov-jdk18on:jar:1.73:compile; 
>> https://ossindex.sonatype.org/component/pkg:maven/org.bouncycastle/bcprov-jdk18on@1.73?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
>> <https://ossindex.sonatype.org/component/pkg:maven/org.bouncycastle/bcprov-jdk18on@1.73?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1>*
>>
>> [*ERROR*] *    * [CVE-2023-33201] CWE-200: Information Exposure
>> (6.5); 
>> https://ossindex.sonatype.org/vulnerability/CVE-2023-33201?component-type=maven&component-name=org.bouncycastle%2Fbcprov-jdk18on&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
>> <https://ossindex.sonatype.org/vulnerability/CVE-2023-33201?component-type=maven&component-name=org.bouncycastle%2Fbcprov-jdk18on&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1>*
>>
>> [*ERROR*]
>>
>> [*ERROR*] -> *[Help 1]*
>>
>> *org.apache.maven.lifecycle.LifecycleExecutionException*: *Failed to
>> execute goal *
>> *org.sonatype.ossindex.maven:ossindex-maven-plugin:3.2.0:audit*
>> *(audit-dependencies)* on project tika-parser-digest-commons: *Detected
>> 1 vulnerable components:*
>>
>> *  org.bouncycastle:bcprov-jdk18on:jar:1.73:compile; 
>> https://ossindex.sonatype.org/component/pkg:maven/org.bouncycastle/bcprov-jdk18on@1.73?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
>> <https://ossindex.sonatype.org/component/pkg:maven/org.bouncycastle/bcprov-jdk18on@1.73?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1>*
>>
>> *    * [CVE-2023-33201] CWE-200: Information Exposure
>> (6.5); 
>> https://ossindex.sonatype.org/vulnerability/CVE-2023-33201?component-type=maven&component-name=org.bouncycastle%2Fbcprov-jdk18on&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
>> <https://ossindex.sonatype.org/vulnerability/CVE-2023-33201?component-type=maven&component-name=org.bouncycastle%2Fbcprov-jdk18on&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1>*
>>
>>
>>

Reply via email to