Re: Inline Values and XSD Time Series

Andy Seaborne Sat, 03 Mar 2018 07:58:57 -0800


On 01/03/18 16:57, Marco Neumann wrote:

I'd like to see having jena /tdb as powerful as possibly in the future but
also don't mind to delegate to an external index for now to attain faster
data access. e.g. the jena spatial extension gives me roughly 10x faster
data access for my kind of queries over similar FILTER based range queries.


Useful data point.

and yes there should indeed  be a decent audience for improved time series
data performance in jena as well. there might even be room for
standardization later on.

enjoy the snow,


I was - it started melting :-(

(not that Bristol gets much snow but we had some this year).

Marco


On Thu, Mar 1, 2018 at 5:36 PM, Andy Seaborne <[email protected]> wrote:



On 01/03/18 12:46, Marco Neumann wrote:

a query could look like this
<http://www.lotico.com:3030/lotico/sparql?query=PREFIX+spati
al%3A%3Chttp%3A%2F%2Fjena.apache.org%2Fspatial%23%3E%0D%
0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%
2Frdf-schema%23%3E%0D%0A%0D%0ASelect+*+%0D%0AWHERE%7B%0D%
0A%3Fs+spatial%3AdateRange%282011+2012-03%29.%0D%0A%3Fs+
rdfs%3Alabel+%3Fslabel.%0D%0AFILTER%28regex%28%3Fslabel%
2C%22Andy+Seaborne%22%2C%22i%22%29%29%0D%0A%7D%0D%0A&output=text>


PREFIX spatial:<http://jena.apache.org/spatial#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

Select *
WHERE{
?s spatial:dateRange(2011 2012-03).
?s rdfs:label ?slabel.
FILTER(regex(?slabel,"Andy Seaborne","i"))
}



That can be all in one index or ways to make that query faster? Both make
sense.

Find all

?x :atTime ? v . FILTER ( ?v in some datetime range)

which is about making triple patterns faster when there is a FILTER as
well.

If the triple access to the data can start in the right place, stop in the
right place (a range query) then it will be faster than currently access
all values.

That's all doable with the current data on disk (caveat details!)heklps
widely but isn't optimial.  (And leaves the hard question of how to do two
discriminating selection/filters: in parallel and merge? do text and heck
in time? otyher way round?)


A new index that answers all that query, or precalculated results for that
query is separate storage. More complex for the end user but it could be
very powerful.

     Andy


On Thu, Mar 1, 2018 at 1:27 PM, Marco Neumann <[email protected]>
wrote:

https://lucidworks.com/2016/02/13/solrs-daterangefield-perform/


On Thu, Mar 1, 2018 at 1:22 PM, Andy Seaborne <[email protected]> wrote:



On 28/02/18 17:53, Marco Neumann wrote:


thank you, it's less than I hoped for



Concrete example?



but certainly more than what I

can ask for Andy :)

In short I'd like to get the xsd:dateTime scan out of the sparql
filter and perform a more efficient range via a date index similar to
the jena spatial implementation.

I am going to take a look at DateRangeField  and see how it performs
relative to a standard sparql filter range query.

best,
Marco


On Tue, Feb 27, 2018 at 5:21 PM, Andy Seaborne <[email protected]>
wrote:



On 27/02/18 11:41, Marco Neumann wrote:



Hi Andy, (I presume you wrote the following below) could you please
elaborate on the significance of this contribution in TDB?




Hi Marco,

For certain XSD datatypes, the value is stored in the NodeId (64 bits,
minus
the datatype indicator - 56 bits for TDB1, up to 62 bits for TDB2 for
xsd:doubles) itself. It is faster to get the node back out the

database.

If value does not fit in the bits available, the long form is used.
In
the
long form, the NodeId is a pointer into the node table and the node is
stoted as the lexical form+datatype (TDB1: in text; TDB2 in binary /

RDF

Thrift). This applies to strings and URIs.

"The xsd:dateTime and xsd:date ranges cover about 8000 years from
year
zero with a precision down to 1 millisecond. Timezone information is
retained to an accuracy of 15 minutes with special timezones for Z
and
for no explicit timezone."




That's the limit for xsd:dataTime in 56 bits.


https://jena.apache.org/documentation/tdb/architecture.html#inline-

values

does this give us enhanced temporal access methods via TDB that are
exposed as property functions in SPARQL?




What exactly are you looking for here? Range queries or a database you
can
view at a point in time? ("Temporal database" can mean either.)

You get the same SPARQL file capabilities but the inline form is
faster
(measurable and by quite a lot) because it does not go to the node

table.

Despite caching of the node table, it is still faster to get nodes out

of

the DB form the inline form (and I'd like to go faster still).


Point-on-database.

Not possible in TDB1.
Possible (but not exposed) in TDB2.  TDB2 never forgets!

In particular I'd be interested in range queries on xsd:dateTime  here

and the possible  use of DateRangeField (SOLR) along jena-spatial.




Range queries - it would be possible to start in the right place for a
range
scan because the values are in sorted order under this design.

Insert complexity for the different datatypes possible - it might need

"this is a value centric database" flag so e.g. integers, whether

xsd:short
or xsd:??? are stored as binary integers loosing the datatype.

In TDB1, that's true, TDB2 does keep the original datatype. Both are
valid
choices to different use cases.

Hope that answers your questions,

       Andy


Best,
Marco



--


---
Marco Neumann
KONA

Re: Inline Values and XSD Time Series

Reply via email to