THausherr merged PR #1714:
URL: https://github.com/apache/tika/pull/1714
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
THausherr merged PR #1716:
URL: https://github.com/apache/tika/pull/1716
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
THausherr merged PR #1715:
URL: https://github.com/apache/tika/pull/1715
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
dependabot[bot] opened a new pull request, #1715:
URL: https://github.com/apache/tika/pull/1715
Bumps `aws.version` from 1.12.696 to 1.12.697.
Updates `com.amazonaws:aws-java-sdk-s3` from 1.12.696 to 1.12.697
Changelog
Sourced from
dependabot[bot] opened a new pull request, #1714:
URL: https://github.com/apache/tika/pull/1714
Bumps org.apache.jackrabbit:oak-jackrabbit-api from 1.60.0 to 1.62.0.
[![Dependabot compatibility
dependabot[bot] opened a new pull request, #1716:
URL: https://github.com/apache/tika/pull/1716
Bumps commons-io:commons-io from 2.16.0 to 2.16.1.
[![Dependabot compatibility
On Mon, 8 Apr 2024, Tim Allison wrote:
Not sure we should jump on the bandwagon, but anything we can do to
support smart chunking would benefit us.
Could just be more integrations with parsers that turn out to be useful. I
haven’t had much joy with some. Here’s one that I haven’t evaluated
[
https://issues.apache.org/jira/browse/TIKA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835077#comment-17835077
]
Lewis John McGibbney commented on TIKA-4232:
It turns out that the original GitHub action I
lewismc commented on PR #15:
URL: https://github.com/apache/tika-helm/pull/15#issuecomment-2043768368
Thank you @ahilmathew really nice patch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
lewismc merged PR #15:
URL: https://github.com/apache/tika-helm/pull/15
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org
I am also very interested in this vector-based search. Indexes are a big
thing right now.
On Mon, Apr 8, 2024, 4:16 PM Michael Wechner
wrote:
> It would be great to have good "semantic chunking" in order to generate
> vector embeddings.
>
> Thanks for the link below, will try to test it.
>
>
It would be great to have good "semantic chunking" in order to generate
vector embeddings.
Thanks for the link below, will try to test it.
Thanks
Michael
Am 08.04.24 um 18:29 schrieb Tim Allison:
Not sure we should jump on the bandwagon, but anything we can do to support
smart chunking
Not sure we should jump on the bandwagon, but anything we can do to support
smart chunking would benefit us.
Could just be more integrations with parsers that turn out to be useful. I
haven’t had much joy with some. Here’s one that I haven’t evaluated yet:
https://github.com/Filimoa/open-parse
>From October 2023:
https://www.brilworks.com/blog/java-11-countdown-to-end-of-support/
Getting 3.x out has taken longer than I had anticipated. Should we
reopen the 17 vs 11 discussion given Eric's input? Or do we continue
with the plan to target 11 in 3x for the foreseeable future?
On Mon, Apr
bartek commented on code in PR #1712:
URL: https://github.com/apache/tika/pull/1712#discussion_r1555919713
##
tika-pipes/tika-fetchers/tika-fetcher-http/src/main/java/org/apache/tika/pipes/fetcher/http/jwt/JwtGenerator.java:
##
@@ -0,0 +1,64 @@
+package
Time to move on? Lucene 10 will be on 17+, Solr 10 will be on 17+, OpenNLP is
already there….Java 11 is EOL and has been for a while….
Any other file parsers that are being optimized to take advantage of the newer
features that are in recent Java versions that we know about?
> On
Sorry, more correctly:
OpenNLP is effectively EOL'd for our 3.x because OpenNLP >= 2.3.0
requires Java 17 and our 3.x is still on 11.
On Mon, Apr 8, 2024 at 6:30 AM Tim Allison wrote:
>
> All,
> As Brian pointed out, optimaize is no longer maintained, and it has
> some dependencies that have
All,
As Brian pointed out, optimaize is no longer maintained, and it has
some dependencies that have aged out. Should we replace our baseline
langdetect in tika-app and tika-server in 3.x?
I'd say that we should go with our OpenNLP based language detection,
but that, too, is effectively EOL'd
All,
I'm now thinking it would make sense to have one more 3.x beta
release before the final 3.0.0. Are there any breaking changes that we
want to get into 3.x?
I'd like to wait for COMPRESS-675 to be fixed and for COMPRESS-674
to be released before we release 3.0.0-BETA2. Any other items that
THausherr merged PR #1713:
URL: https://github.com/apache/tika/pull/1713
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
20 matches
Mail list logo