Re: difference between 3.13 and 3.17

Andy Seaborne Tue, 17 Aug 2021 12:12:10 -0700



On 17/08/2021 06:54, [email protected] wrote:

Hello, you were right, there were still unnecessary graphs in my sourcefile. They have been removed in source2.zip.
The difference between datasets 'target_in_memory'(=NG combination wasdone in in_memory dataset) and 'target_persistent' (=NG combination wasdone in persistent dataset) is that 'target_in_memory' dataset doesn'thave predicate<http://www.example.org/rdf/data/pxt/isPresentationOfVariable> at all.
Unfortunately I my sparql-knowledge is not good (as you musta havenoticed), but I'm still reponsible for this NG combination stuff and thecombineNGs.sparql script which was coded by a guy who has left the office.


This is going to be difficult.

Aside from the difference between datasets, the WHERE clauses that use?g inside "GRAPH ?g" are wrong - older versions of Jena executeincorrectly and it is now fixed. The inner ?g is undefined in the BINDand that leads some no results.

So - regardless of anything else - that's going to need fixing in yourscript and that's going to need verification of the expected answers.

---

The SPARQL script is a series of SPARQL Update statements separated by asemicolons - bisect to the find the first update statement that shows adifference. Chop the last half of them off, see if the modified scriptsproduces differences. If yes, repeat. If no, chop the last quarter off.

Also replacing the first 20 lines with the rewrite I posted makes iteasier - no need for servers, execute with the command line "update" or"tdbupdate".


Similarly bisecting the data to find a smaller data sample.

Bisecting is easier when the data and intent is understood and I don'tknow your application.

Attached also a word document changes.zip in which the changes (datamissing from 'target_in_memory' dataset) are marked with yellow.

I'll look some more _sometime_ but to be fair to everyone, it has to fitaround other reports.


    Andy

Br Jaana


Andy Seaborne kirjoitti 16.8.2021 19:35:
It's a bit smaller but I notice it still has the  graph ?g { } and the
data still has a default graph.

Does it really need all that data? I'd be surprised if it takes more
than one subject and it's triples to show  a difference.
There are multiple update steps - which is the first that makes adifference?
And which outcome is right and which is wrong?

Not knowing the right answer makes it much harder and much more time
consuming to work out what going on.

The data load is the same as a PUT of the data without the default
graph - so the first 20 lines can be done with a "curl -XPUT".

    Andy

On 16/08/2021 11:09, [email protected] wrote:
Hello,
sorry for providing you too big amount of data for reproducing theproblem.
Here's much smaller set for data source and a bit smaller script forcombining the NGs.
and the steps to reporoduce:

1) start source apache-jena-fuseki-server on port 3030

2) start target apache-jena-fuseki-3.17.0-server on port 3031
3) unzip and upload the attachment source.zip into sourceapache-jena-fuseki-server on port 3030
curl -XPOST --header 'Content-Type: application/trig' --data-binary@source.trig http://localhost:3030/source
4) create one in-memory dataset (e.g. target_in_memory) and onepersistent (e.g. target_persistent) dataset on targetapache-jena-fuseki-3.17.0-server running on port 3031
5) update the source apache-jena-fuseki-server and source dataset incombine_NGs2.sparql-script id needed
6) run combine_NGs2.sparql-script in in-memory dataset and persistentdataset of the target apache-jena-fuseki-3.17.0-server for instancein jena-fuseki GUI query tab
or using curl:
curl -i -H "Content-Type: application/sparql-update" -X POSThttp://localhost:3031/target_in_memory/update --data-binary"@./combine_NGs2.sparql"
curl -i -H "Content-Type: application/sparql-update" -X POSThttp://localhost:3031/target_persistent/update --data-binary"@./combine_NGs2.sparql"
Compare the resutls. Even form jena-fuseki GUI edit-page when openingthe result default graps in the editor tab window it can be seen thatthe in memory data set has less data than the persistent one.
As I've told before this issue didn't occur withapacahe-jane-fuseki-3.13.
And about your questions:
Where is the 3.13.0 server?
To notice that this doesn't happen with 3.13, just replace the3.17-server with 3.17 in my above steps. I my staps - I guess - thesource sever can be 3.13 or 3.17.
Does it need the data pulled from another server that than execute on
already loaded data?
We didn't manage to combine the NGs within one server - we expectedthat jena would try to use proxy or something like that...
Br, Jaana

Andy Seaborne kirjoitti 13.8.2021 23:02:
On 13/08/2021 12:03, [email protected] wrote:
Andy Seaborne kirjoitti 11.8.2021 17:38:
Hi there,

There isn't enough information to see what's happening.
Hello,

see steps to repeat the issue below.
I've got all the parts of the example - it's not minimal though.
What is a short amount of data, shorter update script that shows theproblem?
Does it need the data pulled from another server that than execute on
already loaded data?

There is a data file of 363,559 quads (which has warnings), a SPARQL
update script of 241 lines.

To work out what is going on, someone has to reduce that large setup
to the part that causes the difference.

The first thing that script does is delete all the local data and pull
some, not all, data from the source server. Is that step necessary?

I don't believe it needs all the data and all the script to show a
difference nor that it needs to pull the data out of one server, and
put it in the local store in order to be different, why not just load
something directly?


The rest of the update does some kind of manipulation of the data - I
don't understand what it is trying to do - its purpose relates the to
data model.

You are in a much better place to reduce that large script to a
minimal one that shows a difference because it's your application.

Does it need all those steps together to show the difference or just
one of them?  (BTW each update step is done independency: there'll be
a point where the answers start diverging.)

Looking at it though, the use of

where{
  {
    graph ?g { }.
  }
  graph ?g {
    .. some pattern ..
    .. some BIND involving ?g ..
 }
}

is pretty suspect.
Omit the first part and put the BIND after the second:

Move the BIND to after the
  graph ?g {
    .. some pattern ..
 }
 .. some BIND involving ?g ..

    Andy
Where is the 3.13.0 server?
1) start source apache-jena-fuseki-server on port 3030

2) start target apache-jena-fuseki-3.17.0-server on port 3031
3) unzip and upload the attachment ds.zip into sourceapache-jena-fuseki-server on port 3030 using command
curl -XPOST --header 'Content-Type: application/trig'--data-binary @ds.trig http://<sourceapache-jena-fuseki-server>:3030/<source dataset>
4) create one in-memory dataset and one persistent dataset ontarget apache-jena-fuseki-3.17.0-server
5) update the source apache-jena-fuseki-server and source datasetin combine_NGs.sparql-file
6) run combine_NGs.sparql-script in in-memory dataset andpersistent dataset of the target apache-jena-fuseki-3.17.0-server
Run how?
7) run query

    SELECT ?subject ?predicate ?object
      WHERE {
       ?subject ?predicate ?object
      }
in in-memory dataset and persistent dataset of the targetapache-jena-fuseki-3.17.0-server and compare the results.
See attachements in_memory.png and persistent.png for my resultsafter the above procedure.
That's screenshots of a count: just do

SELECT (Count(*) AS ?C) { ?s ?p ?o }
Jaana
The first thing to do is dump, or Fuseki backup, the database from
each setup and see if they are the same.

Then if they are, send a minimal reproducible example [1].
Something someone else can run.

    Andy

[1]
https://stackoverflow.com/help/minimal-reproducible-example


On 11/08/2021 13:35, [email protected] wrote:
Hello,
My jena-fuseki database consists of several named graphs. Inorder to provide users graphql-like interface to jena-fuseki Ihave to combine my NGs into one big default graph forHyperGraphql (https://www.hypergraphql.org/) that provides theinterface.
At some point the users started to get less data than before andwhen I investigated the issue I noticed that this was afterupgrading jena-fuseki from 3.13 to 3.17 !
To combine the NGs I'm using the following command:
curl -i -H "Content-Type: application/sparql-update" -X POSThttp://<myTargetHost>:8061/<myDs>/update --data-binary"@./combine-NGs.sparql"
(see NGs_to_be_combined.txt for hint of my source database andcombine-NGs.sparql as the executed script).
So I run the combine-NGs.sparql-script in the target jena hostand the target dataset.
If I use in-memory dataset as my target dataset I get only halfof the triplets cmapared to the amount of triplets withpersistent dataset. This happens only with jena-fuseki 3.17.
In 3.13 I haven't seen this issue!

Br, Jaana

Re: difference between 3.13 and 3.17

Reply via email to