On 03/01/2022 17:44, robert.ba...@tiscali.it wrote:
Hi,

you are right, I was not clear in the request. I try to
explain myself better.
I have a knowledge base of over a billion
triples.
I am testing a query that returns about 2 million results (in
the future I will have many queries that will return a lot of data)
On
the client side I have to allow the download of the results in CSV
format (on asynchronous request, not through batch).

How long does it take?

But, with these
volumes of data, we can have 2 types of errors:
- OutOfMemory on the
Result (I can increase the heap size....)

How are you making the query? (what software?)

Fuseki will stream results back and with the Jena client code, can provide a end-to-end streaming solution.

The fastest results for is the binary Thrift encoding.

RDFConnectionFuseki will use this.

Some queries don't stream.

- Connection timeout on Fuseki
(can I increase the configuration timeout?)

What is timing it out? Some intermediate?

Fuseki by default does not have timeouts. Your configuration may set them but the default is unbounded.

If you have set timeouts, you can create another service to the same database with different settings. It shares the TDB database safely.

For this reason I was
thinking of using the tdbquery command (takes 3 minutes to run with
tdbquery). But I can't stop fuseki to perform the download operation.
Fuseki must remain active at all times to answer all other
questions.

You can't use tdbquery this way.

It should cause an error saying "already in use" or some such message. There is locking on the file system to detect dual use.

With virtualized setups it may be possible to not get the error because filing systems are weird, but all that has happened is the the locking is not seeing the duplicate use, not finding it is possible.

You will corrupt the database.

Corrupt = permanently damage, not recoverable.

    Andy


Il 03.01.2022 17:25 Rinor Sefa ha scritto:

I think if
you describe your use case in more detail, it would be easier to get
help.

For example, can you clarify
- a query? What kind of query

- "many results", any number?
- What do you consider slow and
inefficient and what are would you consider ideal?

Also, why do
you think that the HTTP call is the bottleneck? I think that this is a
wrong assumption. Try to run a simple query and you will see that the
HTTP call is not the bottleneck.

-----Original Message-----
From:
robert.ba...@tiscali.it [1]
Sent: Monday, 3 January 2022 12:59
To:
users@jena.apache.org [3]
Subject: Use command tdbquery

Hi,


i am using a fuseki server and need to run a query which returns a lot
of results. The use of the HTTP call (http: // localhost: 3030 / ds /
query = myQuery) is very slow and inefficient. I thought about using the
tdbquery command. But I don't want to stop fuseki. Is there any way to
do this?

Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti
illimitati e 100 SMS a soli 7,99EUR al mese http://tisca.li/Smart70 [4]




Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a soli 
7,99€ al mese http://tisca.li/Smart70


Reply via email to