Andy, I did a local profiling of the query in a standard
Eclipse/YourKit configuration which took 45s. looks like this is just
a matter of increasing heap space to allow fuseki to complete the
query now.

(slice 0 10000
 (project (?r ?count)
   (extend ((?count ?.0))
     (group (?s ?r) ((?.0 (count)))
       (join
         (bgp (triple ?x ?r ?s))
         (slice 124639 1000
           (project (?s)
             (bgp (triple ?s
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o)))))))))

On Wed, Dec 19, 2018 at 10:57 AM Marco Neumann <marco.neum...@gmail.com> wrote:
>
> so yes an apache proxy and a query timeout limit for fuseki instances
> it will be.
>
> I just checked the same query on an open source virtuoso instance
> (7.2) with the same data and it seems that virtuoso handles the
> request much more resourcefully and to completion. Andy can you
> enlighten me what the main difference here is in the treatment of the
> query by jena (39s) vs virtuoso (1s)?
>
> On Wed, Dec 19, 2018 at 6:56 AM Laura Morales <laure...@mail.com> wrote:
> >
> > > and needs some explaining why we put open endpoints on the web without 
> > > great restrictions
> >
> > I've always been puzzled by this as well. You never see a publicly 
> > reachable PostgreSQL or MariaDB servers, or any other database. There is 
> > always a layer in between which defines a list of possible requests, and 
> > then every requests is optimized to retrieve data from the database. With a 
> > public endpoint instead, this optimization is not possible since anybody 
> > can write any query. I think the reason is simply that a sparql endpoint is 
> > supposed to answer any type of query which traverses any path that is not 
> > well defined a priori. If you only want the server to serve a specific kind 
> > of queries instead, in this case you can in fact use some kind of REST API 
> > in front of it and translate every request to a sparql query; in this 
> > scenario you don't need the endpoint to be public, but you're limiting the 
> > type of queries that a user can ask.
> >
> >
> >
> >
> > Sent: Tuesday, December 18, 2018 at 11:40 PM
> > From: "Marco Neumann" <marco.neum...@gmail.com>
> > To: "Bruno P. Kinoshita" <brunodepau...@yahoo.com.br>, users@jena.apache.org
> > Subject: Re: blocking IP to prevent malicious sparql queries
> > It's good to see people using sparql one way or another. It's still an
> > unusual thing in the wild and needs some explaining why we put open
> > endpoints on the web without great restrictions. But since this one is
> > intended to be a sandbox to play with and learn I take indeed a positive
> > view on this incident.
> >
> > On Tue 18 Dec 2018 at 21:34, Bruno P. Kinoshita
> > <brunodepau...@yahoo.com.br.invalid> wrote:
> >
> > > I think Laura's option is the best/easiest one, and good on you for the
> > > positive point-of-view on these spams Marco! :D
> > > Bruno
> > >
> > > From: Marco Neumann <marco.neum...@gmail.com>
> > > To: users@jena.apache.org
> > > Sent: Wednesday, 19 December 2018 8:58 AM
> > > Subject: Re: blocking IP to prevent malicious sparql queries
> > >
> > > Thank you Laura,
> > >
> > > I was hoping for a quick fix and something along the lines of a fuseki
> > > blacklist filter in the shiro.ini
> > >
> > > but yes the reverse proxy is probably a more sensible approach at this
> > > point.
> > >
> > > In any event good to see sparql spam like this here, it means that the
> > > Semantic Web has most certainly arrived in the mainstream ;)
> > >
> > >
> > >
> > > On Tue, Dec 18, 2018 at 5:35 PM Laura Morales <laure...@mail.com> wrote:
> > >
> > > > While I think the correct answer is YES (perhaps by implementing a 
> > > > custom
> > > > filter), I guess the answer is going to be "use a reverse proxy".
> > > >
> > > >
> > > >
> > > >
> > > > Sent: Tuesday, December 18, 2018 at 6:16 PM
> > > > From: "Marco Neumann" <marco.neum...@gmail.com>
> > > > To: users@jena.apache.org
> > > > Subject: blocking IP to prevent malicious sparql queries
> > > > is it possible to block indiviual IPs with the shiro.ini?
> > > >
> > > > We receive a number of malicious sparql queries from an IP in France
> > > > (193.52.210.70) today
> > > >
> > > > that continuously issues the following SPARQL query:
> > > >
> > > > SELECT ?r (count(*) AS ?count)
> > > > WHERE{ ?x ?r ?s
> > > > { SELECT ?s WHERE
> > > > { ?s a ?o }
> > > > OFFSET 124639 LIMIT 1000 }
> > > > } GROUP BY ?s ?r OFFSET 0 LIMIT 10000
> > > >
> > > > resulting in:
> > > >
> > > > [2018-12-18 18:10:31] AbstractConnector WARN
> > > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > > [2018-12-18 18:10:34] Fuseki WARN [424] RC = 500 : GC overhead limit
> > > > exceeded
> > > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > > [2018-12-18 18:10:34] Fuseki INFO [424] 500 GC overhead limit exceeded
> > > > (39.946 s)
> > > >
> > > > and pushes fuseki offline for a few minutes.
> > > >
> > > >
> > > > --
> > > >
> > > >
> > > > ---
> > > > Marco Neumann
> > > > KONA
> > > >
> > >
> > >
> > > --
> > >
> > >
> > > ---
> > > Marco Neumann
> > > KONA
> > >
> > >
> > >
> >
> > --
> >
> >
> > ---
> > Marco Neumann
> > KONA
>
>
>
> --
>
>
> ---
> Marco Neumann
> KONA



--


---
Marco Neumann
KONA

Reply via email to