I'd like to see having jena /tdb as powerful as possibly in the future but
also don't mind to delegate to an external index for now to attain faster
data access. e.g. the jena spatial extension gives me roughly 10x faster
data access for my kind of queries over similar FILTER based range queries.

and yes there should indeed  be a decent audience for improved time series
data performance in jena as well. there might even be room for
standardization later on.

enjoy the snow,
Marco


On Thu, Mar 1, 2018 at 5:36 PM, Andy Seaborne <a...@apache.org> wrote:

>
>
> On 01/03/18 12:46, Marco Neumann wrote:
>
>> a query could look like this
>> <http://www.lotico.com:3030/lotico/sparql?query=PREFIX+spati
>> al%3A%3Chttp%3A%2F%2Fjena.apache.org%2Fspatial%23%3E%0D%
>> 0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%
>> 2Frdf-schema%23%3E%0D%0A%0D%0ASelect+*+%0D%0AWHERE%7B%0D%
>> 0A%3Fs+spatial%3AdateRange%282011+2012-03%29.%0D%0A%3Fs+
>> rdfs%3Alabel+%3Fslabel.%0D%0AFILTER%28regex%28%3Fslabel%
>> 2C%22Andy+Seaborne%22%2C%22i%22%29%29%0D%0A%7D%0D%0A&output=text>
>>
>>
>> PREFIX spatial:<http://jena.apache.org/spatial#>
>> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
>>
>> Select *
>> WHERE{
>> ?s spatial:dateRange(2011 2012-03).
>> ?s rdfs:label ?slabel.
>> FILTER(regex(?slabel,"Andy Seaborne","i"))
>> }
>>
>
>
> That can be all in one index or ways to make that query faster? Both make
> sense.
>
> Find all
>
> ?x :atTime ? v . FILTER ( ?v in some datetime range)
>
> which is about making triple patterns faster when there is a FILTER as
> well.
>
> If the triple access to the data can start in the right place, stop in the
> right place (a range query) then it will be faster than currently access
> all values.
>
> That's all doable with the current data on disk (caveat details!)heklps
> widely but isn't optimial.  (And leaves the hard question of how to do two
> discriminating selection/filters: in parallel and merge? do text and heck
> in time? otyher way round?)
>
>
> A new index that answers all that query, or precalculated results for that
> query is separate storage. More complex for the end user but it could be
> very powerful.
>
>     Andy
>
>
>
>>
>> On Thu, Mar 1, 2018 at 1:27 PM, Marco Neumann <marco.neum...@gmail.com>
>> wrote:
>>
>> https://lucidworks.com/2016/02/13/solrs-daterangefield-perform/
>>>
>>> On Thu, Mar 1, 2018 at 1:22 PM, Andy Seaborne <a...@apache.org> wrote:
>>>
>>>>
>>>>
>>>> On 28/02/18 17:53, Marco Neumann wrote:
>>>>
>>>>>
>>>>> thank you, it's less than I hoped for
>>>>>
>>>>
>>>>
>>>> Concrete example?
>>>>
>>>>
>>>>
>>>> but certainly more than what I
>>>>> can ask for Andy :)
>>>>>
>>>>> In short I'd like to get the xsd:dateTime scan out of the sparql
>>>>> filter and perform a more efficient range via a date index similar to
>>>>> the jena spatial implementation.
>>>>>
>>>>> I am going to take a look at DateRangeField  and see how it performs
>>>>> relative to a standard sparql filter range query.
>>>>>
>>>>> best,
>>>>> Marco
>>>>>
>>>>>
>>>>> On Tue, Feb 27, 2018 at 5:21 PM, Andy Seaborne <a...@apache.org>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On 27/02/18 11:41, Marco Neumann wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Andy, (I presume you wrote the following below) could you please
>>>>>>> elaborate on the significance of this contribution in TDB?
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Marco,
>>>>>>
>>>>>> For certain XSD datatypes, the value is stored in the NodeId (64 bits,
>>>>>> minus
>>>>>> the datatype indicator - 56 bits for TDB1, up to 62 bits for TDB2 for
>>>>>> xsd:doubles) itself. It is faster to get the node back out the
>>>>>>
>>>>> database.
>>>
>>>>
>>>>>> If value does not fit in the bits available, the long form is used.
>>>>>> In
>>>>>> the
>>>>>> long form, the NodeId is a pointer into the node table and the node is
>>>>>> stoted as the lexical form+datatype (TDB1: in text; TDB2 in binary /
>>>>>>
>>>>> RDF
>>>
>>>> Thrift). This applies to strings and URIs.
>>>>>>
>>>>>>
>>>>>>> "The xsd:dateTime and xsd:date ranges cover about 8000 years from
>>>>>>> year
>>>>>>> zero with a precision down to 1 millisecond. Timezone information is
>>>>>>> retained to an accuracy of 15 minutes with special timezones for Z
>>>>>>> and
>>>>>>> for no explicit timezone."
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> That's the limit for xsd:dataTime in 56 bits.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> https://jena.apache.org/documentation/tdb/architecture.html#inline-
>>>>>>>
>>>>>> values
>>>
>>>>
>>>>>>> does this give us enhanced temporal access methods via TDB that are
>>>>>>> exposed as property functions in SPARQL?
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> What exactly are you looking for here? Range queries or a database you
>>>>>> can
>>>>>> view at a point in time? ("Temporal database" can mean either.)
>>>>>>
>>>>>> You get the same SPARQL file capabilities but the inline form is
>>>>>> faster
>>>>>> (measurable and by quite a lot) because it does not go to the node
>>>>>>
>>>>> table.
>>>
>>>> Despite caching of the node table, it is still faster to get nodes out
>>>>>>
>>>>> of
>>>
>>>> the DB form the inline form (and I'd like to go faster still).
>>>>>>
>>>>>> Point-on-database.
>>>>>>
>>>>>> Not possible in TDB1.
>>>>>> Possible (but not exposed) in TDB2.  TDB2 never forgets!
>>>>>>
>>>>>> In particular I'd be interested in range queries on xsd:dateTime  here
>>>>>>> and the possible  use of DateRangeField (SOLR) along jena-spatial.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Range queries - it would be possible to start in the right place for a
>>>>>> range
>>>>>> scan because the values are in sorted order under this design.
>>>>>>
>>>>>> Insert complexity for the different datatypes possible - it might need
>>>>>>
>>>>> a
>>>
>>>> "this is a value centric database" flag so e.g. integers, whether
>>>>>> xsd:short
>>>>>> or xsd:??? are stored as binary integers loosing the datatype.
>>>>>>
>>>>>> In TDB1, that's true, TDB2 does keep the original datatype. Both are
>>>>>> valid
>>>>>> choices to different use cases.
>>>>>>
>>>>>> Hope that answers your questions,
>>>>>>
>>>>>>       Andy
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>> Marco
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>> ---
>>> Marco Neumann
>>> KONA
>>>
>>>
>>
>>
>>


-- 


---
Marco Neumann
KONA

Reply via email to