monkmachine commented on PR #558:
URL: https://github.com/apache/tika/pull/558#issuecomment-1130493082
> No, that probably won't work. Sorry. If you send me some examples, I can
try some things.
Can't send you examples unfortunately :(
I did manage to process over 1000's files today w
monkmachine commented on PR #558:
URL: https://github.com/apache/tika/pull/558#issuecomment-1130489653
> Can you guarantee that reading per line will be ok on this json-disaster?
If so, that's the way to go.
>
> The other thing is that you'll want to specify the encoding on your
read
monkmachine commented on PR #558:
URL: https://github.com/apache/tika/pull/558#issuecomment-1130489224
> Can you tell if they're writing utf8? Are there any ascii accented data
items or non-ascii characters that you can use to figure out what they're
default encoding is?
If you can h
monkmachine commented on PR #558:
URL: https://github.com/apache/tika/pull/558#issuecomment-1130486579
> No, that probably won't work. Sorry. If you send me some examples, I can
try some things.
Yeah we'd be ok if Jackson allowed "nan" as well as "NaN" as we could use
JsonReadFeature
monkmachine commented on PR #558:
URL: https://github.com/apache/tika/pull/558#issuecomment-1130393422
If I use buffer reader I get the correct output but it's slower: 3s vs 10s
(it's quite a large file)
```public void jsonConvert() throws FileNotFoundException, IOException {
monkmachine commented on PR #558:
URL: https://github.com/apache/tika/pull/558#issuecomment-1130370795
If I read byte by byte (i.e. byte[] bytes = new byte[1];) I get the correct
result:
![image](https://user-images.githubusercontent.com/36521886/169118333-e9a5509e-8fb4-4b28-9be4-6d326a0
monkmachine commented on PR #558:
URL: https://github.com/apache/tika/pull/558#issuecomment-1130203562
Help! @tballison @nddipiazza Any reason why this section would sometimes
write extra lines out? On some json files when cleaning up it writes out the
file correctly then appends another
monkmachine commented on PR #558:
URL: https://github.com/apache/tika/pull/558#issuecomment-1128022252
>
> > > @nddipiazza @tballison This looks messy, can you advise a way to clean
it up? A better way of doing it? Still think its worth having the comments
there?
> >
monkmachine commented on PR #558:
URL: https://github.com/apache/tika/pull/558#issuecomment-1126521837
@nddipiazza @tballison
This looks messy, can you advise a way to clean it up? A better way of doing
it? Still think its worth having the comments there?
https://github.com/apache/ti
monkmachine commented on PR #558:
URL: https://github.com/apache/tika/pull/558#issuecomment-1126153049
> should we use TestContainers to test this within a docker container to
make sure it works? or is it sufficient to just run test only if dwgread is
installed?
Would thi
monkmachine commented on PR #558:
URL: https://github.com/apache/tika/pull/558#issuecomment-1126150480
> @tballison @monkmachine
>
> > Or do you want to use our current parser only if the dwg executable is
not available.
>
> I would vote +1 on _use current parser only if the dw
11 matches
Mail list logo