Hi Jan,
On 30/08/2023 14:58, Jan Eerdekens wrote:
Hi,
We've been evaluating an using Jena for about 1,5 years now, but are
recently running into a perplexing issue. In a lot of different scenarios,
ways of using Jena, we are getting the exceptions like the one below:
Caused by: org.apache.thrift.protocol.TProtocolException: Unrecognized type
0
at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:140)
~[fuseki-server.jar:4.8.0]
The different scenarios where it has happened are:
- LOADing data into a dataset
- compacting a dataset
- querying a dataset
In all those case we've run into trouble and get an exception that
mentions *org.apache.jena.tdb2.TDBException:
NodeTableTRDF/Read* and *org.apache.thrift.protocol.TProtocolException:
Unrecognized type 0*.
What can cause this? This looks kinda similar to this mailing list
question, https://www.mail-archive.com/[email protected]/msg20409.html,
where it seems data corruption is mentioned that potentially isn't
recoverable?
>
The first time I encountered this issue was while doing a bunch of
sequential LOAD commands to prepare a large dataset for load testing. I
used files of around 50mb (started off with bigger ones) and after about 20
to 25 LOADs it would get this error (also the completion time of a LOAD
would go up and up). So for this scenario I was running locally (Jena
Fuseki running in docker/Rancher) and only running the LOADs and not much
else except for a SELECT here and there (via the Fuseki UI) to check that
performance while LOADing. Is there a way that that could cause data
corruption and the exception we're seeing?
"Unrecognized type 0" has come up in a couple of cases.
It means the node table is corrupt but the problem was caused silently
at some point in the past. The "Unrecognized type 0" exception happens
some time later (not a few seconds - either after a restart or a long
time of usage that has churned the node cache - possibly many months).
There have been some fixes around compaction that addressed bugs in this
area. This has been the most common problem.
Was this database originally create before 4.8.0?
If not, do you have a fixed scenario so that the situation can be
recreated for 4.9.0? Please raise a github issue for it.
Another situation is if another OS process interferes with the files
(container OS or host OS). What operating system is the host machine?
While TDB2 endeavours to protect against multiple copies of TDB running
the same files, that is imperfect if it is two containers and the
database is on a mounted docker volume used by two containers.
One other report seemed to be a backup process was running over the
files. We didn't get to the root cause of that one.
Andy
regards,
Jan Eerdekens