[jira] [Commented] (TIKA-4208) OOM error in SAS7BDATParser

2024-03-15 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17827516#comment-17827516 ] Gregory Lepore commented on TIKA-4208: -- I don't mind excluding the SAS parser since i

[jira] [Commented] (TIKA-4208) OOM error in SAS7BDATParser

2024-03-15 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17827512#comment-17827512 ] Gregory Lepore commented on TIKA-4208: -- Hmm, here's what I get: java -Xmx6g -jar ..

[jira] [Commented] (TIKA-4208) OOM error in SAS7BDATParser

2024-03-11 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825394#comment-17825394 ] Gregory Lepore commented on TIKA-4208: -- Actually, processing the file separately yiel

[jira] [Commented] (TIKA-4208) OOM error in SAS7BDATParser

2024-03-11 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825384#comment-17825384 ] Gregory Lepore commented on TIKA-4208: -- I extracted all files from the ARC file and w

[jira] [Updated] (TIKA-4208) OOM error in SAS7BDATParser

2024-03-11 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4208: - Attachment: table23.sas7bdat.zip > OOM error in SAS7BDATParser > --- > >

[jira] [Commented] (TIKA-4208) OOM error in SAS7BDATParser

2024-03-08 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824879#comment-17824879 ] Gregory Lepore commented on TIKA-4208: -- java -Xmx4G -Xms4G -jar ../tika.jar file.arc.

[jira] [Created] (TIKA-4208) OOM error in SAS7BDATParser

2024-03-08 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4208: Summary: OOM error in SAS7BDATParser Key: TIKA-4208 URL: https://issues.apache.org/jira/browse/TIKA-4208 Project: Tika Issue Type: Bug Affects Versions:

[jira] [Created] (TIKA-4206) Variation on Zip Bomb

2024-03-03 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4206: Summary: Variation on Zip Bomb Key: TIKA-4206 URL: https://issues.apache.org/jira/browse/TIKA-4206 Project: Tika Issue Type: Bug Affects Versions: 3.0.0-

[jira] [Commented] (TIKA-4198) Skip blob fields in geopkg files

2024-02-16 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818069#comment-17818069 ] Gregory Lepore commented on TIKA-4198: -- For this set of data from the Bureau of Land

[jira] [Commented] (TIKA-4198) Skip blob fields in geopkg files

2024-02-16 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818018#comment-17818018 ] Gregory Lepore commented on TIKA-4198: -- This would make a huge difference in my agenc

[jira] [Commented] (TIKA-4188) Add support for ARC files

2024-02-06 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815012#comment-17815012 ] Gregory Lepore commented on TIKA-4188: -- The ones I'm working with are concatenated gz

[jira] [Commented] (TIKA-4187) Add detection for geopackage

2024-02-03 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813969#comment-17813969 ] Gregory Lepore commented on TIKA-4187: -- Got it, thought I was being hasty. Thanks aga

[jira] [Commented] (TIKA-4187) Add detection for geopackage

2024-02-03 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813965#comment-17813965 ] Gregory Lepore commented on TIKA-4187: -- Tim - Thanks for adding this! Apologies for a

[jira] [Created] (TIKA-4188) Add support for ARC files

2024-02-02 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4188: Summary: Add support for ARC files Key: TIKA-4188 URL: https://issues.apache.org/jira/browse/TIKA-4188 Project: Tika Issue Type: Improvement Repo

[jira] [Commented] (TIKA-3992) Add common missing mimes based on Common Crawl data

2023-06-30 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739089#comment-17739089 ] Gregory Lepore commented on TIKA-3992: -- Got a chance to download the May/June CommonC

[jira] [Created] (TIKA-4090) Add magic for SolidWorks eDrawing Electronic Assembly Data File format

2023-06-16 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4090: Summary: Add magic for SolidWorks eDrawing Electronic Assembly Data File format Key: TIKA-4090 URL: https://issues.apache.org/jira/browse/TIKA-4090 Project: Tika

[jira] [Created] (TIKA-4089) Add magic for Teeworlds/DDRace Map Format

2023-06-16 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4089: Summary: Add magic for Teeworlds/DDRace Map Format Key: TIKA-4089 URL: https://issues.apache.org/jira/browse/TIKA-4089 Project: Tika Issue Type: Sub-task

[jira] [Updated] (TIKA-4089) Add magic for Teeworlds/DDRace Map Format

2023-06-16 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4089: - Attachment: TeeTacToe.map Time_Calculator.map Volleyball.map

[jira] [Updated] (TIKA-4088) Add magic for SEG Y format

2023-06-15 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4088: - Description: The SEG Y format occurs 2,390 times (roughly) in the latest Common Crawl dataset. No

[jira] [Created] (TIKA-4088) Add magic for SEG Y format

2023-06-15 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4088: Summary: Add magic for SEG Y format Key: TIKA-4088 URL: https://issues.apache.org/jira/browse/TIKA-4088 Project: Tika Issue Type: Sub-task Report

[jira] [Updated] (TIKA-4087) Add magic for Warcraft III Map format

2023-06-15 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4087: - Description: The Warcraft III Map format occurs 876 times in the latest Common Crawl dataset. No

[jira] [Updated] (TIKA-4086) Add magic for X-Moto Replay format

2023-06-15 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4086: - Description: The X-Moto Replay format occurs 1,610 times in the latest Common Crawl dataset. No k

[jira] [Updated] (TIKA-4085) Add magic for Unreal Engine Package format

2023-06-15 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4085: - Description: The Unreal Engine Package format occurs 917 times in the latest Common Crawl dataset

[jira] [Updated] (TIKA-4084) Add magic for SquashFS Format

2023-06-15 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4084: - Description: The SquashFS format appears 1,025 times in the latest Common Crawl dataset. No known

[jira] [Updated] (TIKA-4083) Add magic for ClamAV CDiff files

2023-06-15 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4083: - Description: The ClamAV CDIFF format appears 1,582 times in the latest Common Crawl dataset. No k

[jira] [Created] (TIKA-4087) Add magic for Warcraft III Map format

2023-06-15 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4087: Summary: Add magic for Warcraft III Map format Key: TIKA-4087 URL: https://issues.apache.org/jira/browse/TIKA-4087 Project: Tika Issue Type: Sub-task

[jira] [Created] (TIKA-4086) Add magic for X-Moto Replay format

2023-06-15 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4086: Summary: Add magic for X-Moto Replay format Key: TIKA-4086 URL: https://issues.apache.org/jira/browse/TIKA-4086 Project: Tika Issue Type: Sub-task

[jira] [Created] (TIKA-4085) Add magic for Unreal Engine Package format

2023-06-15 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4085: Summary: Add magic for Unreal Engine Package format Key: TIKA-4085 URL: https://issues.apache.org/jira/browse/TIKA-4085 Project: Tika Issue Type: Sub-task

[jira] [Created] (TIKA-4084) Add magic for SquashFS Format

2023-06-15 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4084: Summary: Add magic for SquashFS Format Key: TIKA-4084 URL: https://issues.apache.org/jira/browse/TIKA-4084 Project: Tika Issue Type: Sub-task Rep

[jira] [Created] (TIKA-4083) Add magic for ClamAV CDiff files

2023-06-15 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4083: Summary: Add magic for ClamAV CDiff files Key: TIKA-4083 URL: https://issues.apache.org/jira/browse/TIKA-4083 Project: Tika Issue Type: Sub-task

[jira] [Comment Edited] (TIKA-4081) Add magic for ZIM format

2023-06-14 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732764#comment-17732764 ] Gregory Lepore edited comment on TIKA-4081 at 6/14/23 9:10 PM: -

[jira] [Commented] (TIKA-4081) Add magic for ZIM format

2023-06-14 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732764#comment-17732764 ] Gregory Lepore commented on TIKA-4081: -- My bad, I wrote some code last week to sort t

[jira] [Updated] (TIKA-4081) Add magic for ZIM format

2023-06-14 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4081: - Attachment: gutenberg_ar_all_2021-05-1.zim wikipedia_ln_all_nopic_2021-03.zim > Ad

[jira] [Updated] (TIKA-4081) Add magic for ZIM format

2023-06-14 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4081: - Attachment: (was: 0a9541750cd3bf696a3b2d31f22636b52be02322ebf8d7e80d53ac0c28038f16) > Add mag

[jira] [Commented] (TIKA-4072) Add magic for Atari Floppy Disk Image Format

2023-06-14 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732737#comment-17732737 ] Gregory Lepore commented on TIKA-4072: -- Which I think could also be written as:   9

[jira] [Commented] (TIKA-4072) Add magic for Atari Floppy Disk Image Format

2023-06-14 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732735#comment-17732735 ] Gregory Lepore commented on TIKA-4072: -- That works. I also have a handful that have t

[jira] [Updated] (TIKA-4072) Add magic for Atari Floppy Disk Image Format

2023-06-14 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4072: - Attachment: ANIME_B-1.ATR > Add magic for Atari Floppy Disk Image Format > ---

[jira] [Updated] (TIKA-4072) Add magic for Atari Floppy Disk Image Format

2023-06-14 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4072: - Attachment: GAMES050-1.ATR lepix-cin.atr > Add magic for Atari Floppy Disk Image F

[jira] [Created] (TIKA-4081) Add magic for ZIM format

2023-06-13 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4081: Summary: Add magic for ZIM format Key: TIKA-4081 URL: https://issues.apache.org/jira/browse/TIKA-4081 Project: Tika Issue Type: Sub-task Reporter

[jira] [Created] (TIKA-4080) Add magic for Planetary Data System Version 2 format

2023-06-13 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4080: Summary: Add magic for Planetary Data System Version 2 format Key: TIKA-4080 URL: https://issues.apache.org/jira/browse/TIKA-4080 Project: Tika Issue Type: S

[jira] [Updated] (TIKA-4078) Add magic for IDL Binary Format Save File format

2023-06-13 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4078: - Description: The IDL Binary Format Save File format occurs 9,005 times in the latest Common Crawl

[jira] [Created] (TIKA-4079) Add magic for Planetary Data System Version 3 format

2023-06-13 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4079: Summary: Add magic for Planetary Data System Version 3 format Key: TIKA-4079 URL: https://issues.apache.org/jira/browse/TIKA-4079 Project: Tika Issue Type: S

[jira] [Created] (TIKA-4078) Add magic for IDL Binary Format Save File format

2023-06-13 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4078: Summary: Add magic for IDL Binary Format Save File format Key: TIKA-4078 URL: https://issues.apache.org/jira/browse/TIKA-4078 Project: Tika Issue Type: Sub-t

[jira] [Created] (TIKA-4077) Add magic for Modified Maximum Method Digisonde Portable Sounder File format

2023-06-13 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4077: Summary: Add magic for Modified Maximum Method Digisonde Portable Sounder File format Key: TIKA-4077 URL: https://issues.apache.org/jira/browse/TIKA-4077 Project: Tik

[jira] [Created] (TIKA-4076) Add magic for Touhou Project Replay File format

2023-06-13 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4076: Summary: Add magic for Touhou Project Replay File format Key: TIKA-4076 URL: https://issues.apache.org/jira/browse/TIKA-4076 Project: Tika Issue Type: Sub-ta

[jira] [Created] (TIKA-4075) Add magic for GRAPPA Database RADX File

2023-06-13 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4075: Summary: Add magic for GRAPPA Database RADX File Key: TIKA-4075 URL: https://issues.apache.org/jira/browse/TIKA-4075 Project: Tika Issue Type: Sub-task

[jira] [Updated] (TIKA-4074) Add magic for TeX Virtual Font format

2023-06-13 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4074: - Description: The TeX Virtual Font format occurs 6,047 times in the second most recent Common Craw

[jira] [Created] (TIKA-4074) Add magic for TeX Virtual Font format

2023-06-13 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4074: Summary: Add magic for TeX Virtual Font format Key: TIKA-4074 URL: https://issues.apache.org/jira/browse/TIKA-4074 Project: Tika Issue Type: Sub-task

[jira] [Created] (TIKA-4073) Add magic for Guitar Pro format

2023-06-13 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4073: Summary: Add magic for Guitar Pro format Key: TIKA-4073 URL: https://issues.apache.org/jira/browse/TIKA-4073 Project: Tika Issue Type: Sub-task R

[jira] [Created] (TIKA-4072) Add magic for Atari Floppy Disk Image Format

2023-06-13 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4072: Summary: Add magic for Atari Floppy Disk Image Format Key: TIKA-4072 URL: https://issues.apache.org/jira/browse/TIKA-4072 Project: Tika Issue Type: Sub-task

[jira] [Created] (TIKA-4071) Add magic for Jigdo Download Template format

2023-06-13 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4071: Summary: Add magic for Jigdo Download Template format Key: TIKA-4071 URL: https://issues.apache.org/jira/browse/TIKA-4071 Project: Tika Issue Type: Sub-task

[jira] [Created] (TIKA-4070) Add magic for MS-DOS Compression Format (SZDD Variant)

2023-06-13 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4070: Summary: Add magic for MS-DOS Compression Format (SZDD Variant) Key: TIKA-4070 URL: https://issues.apache.org/jira/browse/TIKA-4070 Project: Tika Issue Type:

[jira] [Updated] (TIKA-4069) Add magic for Mach-O format

2023-06-13 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4069: - Description: The Mach-O format occurs 542 times in the latest Common Crawl dataset. There is no k

[jira] [Updated] (TIKA-4068) Add magic for FAT Disk Image format

2023-06-13 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4068: - Attachment: 00dee3ef376fa8830f74713e5bf0043d19b026d40e7543782f6c37818b154502 3ba28c

[jira] [Created] (TIKA-4069) Add magic for Mach-O format

2023-06-13 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4069: Summary: Add magic for Mach-O format Key: TIKA-4069 URL: https://issues.apache.org/jira/browse/TIKA-4069 Project: Tika Issue Type: Sub-task Repor

[jira] [Created] (TIKA-4068) Add magic for FAT Disk Image format

2023-06-13 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4068: Summary: Add magic for FAT Disk Image format Key: TIKA-4068 URL: https://issues.apache.org/jira/browse/TIKA-4068 Project: Tika Issue Type: Sub-task

[jira] [Updated] (TIKA-4067) Add magic for ASPRS Lidar data

2023-06-13 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4067: - Description: The ASPRS Lidar data format occurs over 11,000 times in the latest Common Crawl data

[jira] [Created] (TIKA-4067) Add magic for ASPRS Lidar data

2023-06-13 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4067: Summary: Add magic for ASPRS Lidar data Key: TIKA-4067 URL: https://issues.apache.org/jira/browse/TIKA-4067 Project: Tika Issue Type: Sub-task Re

[jira] [Commented] (TIKA-4060) Add magic to audio/aac in tika-mimetypes.xml

2023-06-07 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730315#comment-17730315 ] Gregory Lepore commented on TIKA-4060: -- I'm not 100% sure, but I think the offset is

[jira] [Comment Edited] (TIKA-4053) Improve detection of text files

2023-06-05 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727655#comment-17727655 ] Gregory Lepore edited comment on TIKA-4053 at 6/5/23 3:05 PM: --

[jira] [Updated] (TIKA-4060) Add magic to audio/aac in tika-mimetypes.xml

2023-05-31 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4060: - Attachment: cb1bec08898db7a733b42ac44bdd76b6177cd3a07a2435a83fd99b7453d564d1 > Add magic to audio/

[jira] [Updated] (TIKA-4060) Add magic to audio/aac in tika-mimetypes.xml

2023-05-31 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4060: - Attachment: 067aece423d8694a891a61a45ac0e870914bc1314ef510ac40b36ca3397843ef > Add magic to audio/

[jira] [Created] (TIKA-4060) Add magic to audio/aac in tika-mimetypes.xml

2023-05-31 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4060: Summary: Add magic to audio/aac in tika-mimetypes.xml Key: TIKA-4060 URL: https://issues.apache.org/jira/browse/TIKA-4060 Project: Tika Issue Type: Sub-task

[jira] [Commented] (TIKA-4059) Consider parsing common gzipped formats like we do with package files

2023-05-31 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728037#comment-17728037 ] Gregory Lepore commented on TIKA-4059: -- Looks like the SIARD people have developed th

[jira] [Commented] (TIKA-4059) Consider parsing common gzipped formats like we do with package files

2023-05-31 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728027#comment-17728027 ] Gregory Lepore commented on TIKA-4059: -- [http://justsolve.archiveteam.org/wiki/NII]

[jira] [Commented] (TIKA-3999) audio/xm audio/x-mod

2023-05-31 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727934#comment-17727934 ] Gregory Lepore commented on TIKA-3999: -- In order to make this list more manageable I

[jira] [Created] (TIKA-4058) Add file extension .rmd160 to tika-mimetypes.xml

2023-05-30 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4058: Summary: Add file extension .rmd160 to tika-mimetypes.xml Key: TIKA-4058 URL: https://issues.apache.org/jira/browse/TIKA-4058 Project: Tika Issue Type: Sub-t

[jira] [Commented] (TIKA-4053) Improve detection of text files

2023-05-30 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727655#comment-17727655 ] Gregory Lepore commented on TIKA-4053: -- ref: https://issues.apache.org/jira/browse/T

[jira] [Updated] (TIKA-4054) Add various file identifications to reduce application/octet-stream

2023-05-25 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4054: - Description: Catch all task for various format identification data which are currently being iden

[jira] [Updated] (TIKA-4054) Add various file identifications to reduce application/octet-stream

2023-05-25 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4054: - Description: Catch all task for various format identification data which are currently being iden

[jira] [Updated] (TIKA-4054) Add various file identifications to reduce application/octet-stream

2023-05-25 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4054: - Description: Catch all task for various format identification data which are currently being iden

[jira] [Created] (TIKA-4054) Add various file identifications to reduce application/octet-stream

2023-05-25 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4054: Summary: Add various file identifications to reduce application/octet-stream Key: TIKA-4054 URL: https://issues.apache.org/jira/browse/TIKA-4054 Project: Tika

[jira] [Created] (TIKA-4053) Improve detection of text files

2023-05-25 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4053: Summary: Improve detection of text files Key: TIKA-4053 URL: https://issues.apache.org/jira/browse/TIKA-4053 Project: Tika Issue Type: Sub-task R

[jira] [Created] (TIKA-4052) application/x-cdf

2023-05-25 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4052: Summary: application/x-cdf Key: TIKA-4052 URL: https://issues.apache.org/jira/browse/TIKA-4052 Project: Tika Issue Type: Sub-task Reporter: Grego

[jira] [Commented] (TIKA-4004) font/otf application/vnd.ms-opentype

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725986#comment-17725986 ] Gregory Lepore commented on TIKA-4004: -- What files are you seeing OTTO in? All of the

[jira] [Commented] (TIKA-4004) font/otf application/vnd.ms-opentype

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725979#comment-17725979 ] Gregory Lepore commented on TIKA-4004: -- Original format spec at:   https://www.w3.o

[jira] [Commented] (TIKA-4004) font/otf application/vnd.ms-opentype

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725978#comment-17725978 ] Gregory Lepore commented on TIKA-4004: -- I downloaded some of the original files refer

[jira] [Updated] (TIKA-4004) font/otf application/vnd.ms-opentype

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4004: - Attachment: index.html_id=45_and_type=eot index.html_id=67_and_type=eot

[jira] [Commented] (TIKA-4003) application/vnd.isac.fcs

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725964#comment-17725964 ] Gregory Lepore commented on TIKA-4003: -- My samples only show 2.0 and 3.0, but the spe

[jira] [Commented] (TIKA-4002) application/vnd.tcpdump.pcap

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725963#comment-17725963 ] Gregory Lepore commented on TIKA-4002: -- There are two versions of pcap in PRONOM, I m

[jira] [Commented] (TIKA-3999) audio/xm audio/x-mod

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725919#comment-17725919 ] Gregory Lepore commented on TIKA-3999: -- Note that ffprobe is able to extract metadata

[jira] [Updated] (TIKA-4000) application/vnd.msa-disk-image

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4000: - Attachment: DREAMZ2B.MSA SOTART2.MSA TIKBGBB2.MSA > application/vn

[jira] [Updated] (TIKA-4004) font/otf application/vnd.ms-opentype

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4004: - Attachment: aller-bold.eot aller-light.eot fleurons.eot > font/otf

[jira] [Commented] (TIKA-3996) audio/x-sap

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725906#comment-17725906 ] Gregory Lepore commented on TIKA-3996: -- Offset 0: 5341500D0A   Specification at: h

[jira] [Updated] (TIKA-3996) audio/x-sap

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-3996: - Attachment: airwolf.sap ala_ma_kota.sap alchemia.sap > audio/x-sap

[jira] [Commented] (TIKA-4002) application/vnd.tcpdump.pcap

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725904#comment-17725904 ] Gregory Lepore commented on TIKA-4002: -- [https://www.nationalarchives.gov.uk/PRONOM/F

[jira] [Updated] (TIKA-4002) application/vnd.tcpdump.pcap

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4002: - Attachment: fmt_779_pcap_Packet_Capture_small_capture.pcap > application/vnd.tcpdump.pcap > --

[jira] [Commented] (TIKA-4003) application/vnd.isac.fcs

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725900#comment-17725900 ] Gregory Lepore commented on TIKA-4003: -- [https://www.nationalarchives.gov.uk/PRONOM/F

[jira] [Updated] (TIKA-4003) application/vnd.isac.fcs

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-4003: - Attachment: 3215apc_14.fcs BD-FACS_Aria_II-Compensation_Controls_B515_Stained_C

[jira] [Commented] (TIKA-4005) application/x-endnote-style

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725896#comment-17725896 ] Gregory Lepore commented on TIKA-4005: -- Mostly correct, it looks like the signature w

[jira] [Commented] (TIKA-3992) Add common missing mimes based on Common Crawl data

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725873#comment-17725873 ] Gregory Lepore commented on TIKA-3992: -- Looking at the full-table.csv file there are

[jira] [Commented] (TIKA-4005) application/x-endnote-style

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725796#comment-17725796 ] Gregory Lepore commented on TIKA-4005: -- PRONOM/Siegfried signature information. http

[jira] [Commented] (TIKA-4004) font/otf application/vnd.ms-opentype

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725795#comment-17725795 ] Gregory Lepore commented on TIKA-4004: -- Magic looks like this: 02 00 02 00 at offset

[jira] [Comment Edited] (TIKA-3999) audio/xm audio/x-mod

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725782#comment-17725782 ] Gregory Lepore edited comment on TIKA-3999 at 5/24/23 12:50 PM:

[jira] [Commented] (TIKA-4000) application/vnd.msa-disk-image

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725783#comment-17725783 ] Gregory Lepore commented on TIKA-4000: -- Additional information on this format at:  

[jira] [Commented] (TIKA-3999) audio/xm audio/x-mod

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725782#comment-17725782 ] Gregory Lepore commented on TIKA-3999: -- Attachment added for 130 tracker/module forma

[jira] [Updated] (TIKA-3999) audio/xm audio/x-mod

2023-05-24 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Lepore updated TIKA-3999: - Attachment: mods.xlsx > audio/xm audio/x-mod > > > Key: TIKA-

[jira] [Commented] (TIKA-3999) audio/xm audio/x-mod

2023-05-23 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725568#comment-17725568 ] Gregory Lepore commented on TIKA-3999: -- There is a chance of collisions with other ma

[jira] [Commented] (TIKA-3999) audio/xm audio/x-mod

2023-05-23 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725530#comment-17725530 ] Gregory Lepore commented on TIKA-3999: -- I'm all for increasing the accuracy of format

[jira] [Comment Edited] (TIKA-3999) audio/xm audio/x-mod

2023-05-23 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725391#comment-17725391 ] Gregory Lepore edited comment on TIKA-3999 at 5/23/23 6:02 PM: -

  1   2   >