[GitHub] [tika] tballison commented on pull request #558: TIKA-1735 - Adding DWGRead parser to Tika if available

2022-05-18 Thread GitBox
tballison commented on PR #558: URL: https://github.com/apache/tika/pull/558#issuecomment-1130447575 No, that probably won't work. Sorry. If you send me some examples, I can try some things. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [tika] tballison commented on pull request #558: TIKA-1735 - Adding DWGRead parser to Tika if available

2022-05-18 Thread GitBox
tballison commented on PR #558: URL: https://github.com/apache/tika/pull/558#issuecomment-1130439472 If NaN is the only problem, is there any way to tell jackson to be lax? Maybe something like:

[GitHub] [tika] tballison commented on pull request #558: TIKA-1735 - Adding DWGRead parser to Tika if available

2022-05-18 Thread GitBox
tballison commented on PR #558: URL: https://github.com/apache/tika/pull/558#issuecomment-1130433951 Can you tell if they're writing utf8? Are there any ascii accented data items or non-ascii characters that you can use to figure out what they're default encoding is? -- This is an

[GitHub] [tika] tballison commented on pull request #558: TIKA-1735 - Adding DWGRead parser to Tika if available

2022-05-18 Thread GitBox
tballison commented on PR #558: URL: https://github.com/apache/tika/pull/558#issuecomment-1130431570 Can you guarantee that reading per line will be ok on this json-disaster? If so, that's the way to go. The other thing is that it may be better to specify the encoding and work on the

[GitHub] [tika] tballison commented on pull request #558: TIKA-1735 - Adding DWGRead parser to Tika if available

2022-05-18 Thread GitBox
tballison commented on PR #558: URL: https://github.com/apache/tika/pull/558#issuecomment-1130425745 > If I use buffer reader I get the correct output but it's slower: 3s vs 10s (it's quite a large file) > > ``` > > > >//FileInputStream fis = new

[GitHub] [tika] tballison commented on pull request #558: TIKA-1735 - Adding DWGRead parser to Tika if available

2022-05-16 Thread GitBox
tballison commented on PR #558: URL: https://github.com/apache/tika/pull/558#issuecomment-1127896442 > > @nddipiazza @tballison This looks messy, can you advise a way to clean it up? A better way of doing it? Still think its worth having the comments there? > > OMG, what a mess. The

[GitHub] [tika] tballison commented on pull request #558: TIKA-1735 - Adding DWGRead parser to Tika if available

2022-05-13 Thread GitBox
tballison commented on PR #558: URL: https://github.com/apache/tika/pull/558#issuecomment-1126530862 > @nddipiazza @tballison This looks messy, can you advise a way to clean it up? A better way of doing it? Still think its worth having the comments there? OMG, what a mess. The

[GitHub] [tika] tballison commented on pull request #558: TIKA-1735 - Adding DWGRead parser to Tika if available

2022-05-13 Thread GitBox
tballison commented on PR #558: URL: https://github.com/apache/tika/pull/558#issuecomment-1126298774 > should we use TestContainers to test this within a docker container to make sure it works? or is it sufficient to just run test only if dwgread is installed? Maybe? I worry about

[GitHub] [tika] tballison commented on pull request #558: TIKA-1735 - Adding DWGRead parser to Tika if available

2022-05-13 Thread GitBox
tballison commented on PR #558: URL: https://github.com/apache/tika/pull/558#issuecomment-1126297587 >are you kidding me? you're awesome! all looks great to me. +1 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub