Re: Thrift problem / corruption on large TDB2 Fuseki dataset

2023-03-29 Thread Osma Suominen
but still there was a Thrift object that apparently didn't follow it. How was it created? Or was it created, serialized to disk, somehow corrupted on-disk and then read back? -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. B

Thrift problem / corruption on large TDB2 Fuseki dataset

2023-03-29 Thread Osma Suominen
penjdk version "11.0.18" 2023-01-17 LTS Cheers, Osma [1] https://gist.github.com/osma/d61281160e84ea74e9d7dbc155ffaf69 [2] https://gist.github.com/jeffreycwitt/e7c270aae46f403845c87aa57e4b82af [3] https://issues.apache.org/jira/browse/IMPALA-8252 -- Osma Suominen D.Sc. (Tech), Information

Re: Performance regressions in Jena and TDB2

2020-12-07 Thread Osma Suominen
Replying to myself, as I did some follow-up tests. Osma Suominen kirjoitti 4.12.2020 klo 18.42: Now this turned into a rather interesting exercise in using git bisect. I was able to track down the change that caused the slowdown. It's this merge c

Re: Performance regressions in Jena and TDB2

2020-12-04 Thread Osma Suominen
uot; to be "less optimized"; it currently does nothing (very efficiently), it could consume silently the query results. It would also be better if the warmup with writing the required format to /dev/null would also be better. I see that you already did this - great work! -Osma -

Re: Performance regressions in Jena and TDB2

2020-12-02 Thread Osma Suominen
wo years ago, so it's been a while... -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 15 (Unioninkatu 36) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: Performance regressions in Jena and TDB2

2020-12-01 Thread Osma Suominen
corrected soon in the RDF data set as well when we regenerate it in the next few days. -Osma [1] https://github.com/NatLibFi/Skosmos/pull/1098 [2] https://jena.apache.org/tutorials/sparql_datasets.html -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P

Performance regressions in Jena and TDB2

2020-11-30 Thread Osma Suominen
ly I don't have the skills to work directly on the ARQ optimizer or TDB2 code bases. But I'd be happy to test other variations and potential fixes to these performance problems. Cheers, Osma [1] https://finto.fi/rest/v1/finaf/data?format=text/turtle -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 15 (Unioninkatu 36) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: Fuseki scripts no longer executable

2020-02-03 Thread Osma Suominen
-1834 Thanks a lot Andy, that was quick! I've checked the most recent SNAPSHOT builds and the problem is now gone, the scripts are again executable. Sorry for forgetting to mention that I used the .tar.gz distribution, thankfully you and Lorenz found that out quickly. -Osma -- Osma Suo

Fuseki scripts no longer executable

2020-01-31 Thread Osma Suominen
difficult to start a temporary instance of Fuseki... -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 15 (Unioninkatu 36) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: Best way to re-index with Jena full text search

2019-09-20 Thread Osma Suominen
because I think Jena creates index entries as triples are added. So the question is what is the most efficient way to re-index existing data or do I have to re-import all data again each time I add a new field? Thanks a lot Best regards -- Osma Suominen D.Sc. (Tech), Information Systems Spec

Re: Using Jena text search with all predicates

2019-01-24 Thread Osma Suominen
hemas to get all string properties (property's range is xsd:string) etc. On 28/02/2018 20:16, Osma Suominen wrote: Hi Jim! Your observation is correct. jena-text only indexes the RDF properties you have explicitly configured. The configuration for each property may be different. T

Re: Distro package

2018-10-15 Thread Osma Suominen
ocker image as its database: https://github.com/NatLibFi/Skosmos/blob/master/docker-compose.yml -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi

Re: Updating large amounts of data

2018-09-13 Thread Osma Suominen
ly via the HTTP Graph Store API. You can use the s-put tool that comes with Fuseki to access that API. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsin

Re: Updating large amounts of data

2018-09-13 Thread Osma Suominen
for the move. Why do you need to move the temporary graph? The PUT operation is atomic - the data being loaded will only be visible to queries after the whole operation is complete. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kai

Re: fuseki text:query : strange results + Lucene configuration

2018-09-12 Thread Osma Suominen
ut you're right, a combination of text:query + regex or contains is very fast (see example below). Great that you tried this approach as well and it is fast. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4)

Re: fuseki text:query : strange results + Lucene configuration

2018-09-12 Thread Osma Suominen
gt; >> text:storeValues true ; >> text:queryParser text:AnalyzingQueryParser ; >> text:map (      >> [ text:field "title" ; text:predicate dcterms:title ; >> text:analyzer [ a text:ConfigurableAnalyzer ; >>  te

Re: fuseki text:query : strange results + Lucene configuration

2018-09-10 Thread Osma Suominen
d "familyName" ; text:predicate foaf:familyName ;     text:analyzer [ a text:ConfigurableAnalyzer ;    text:tokenizer text:KeywordTokenizer ;    text:filters (text:ASCIIFoldingFilter text:LowerCaseFilter) ] ]          [ text:field "givenName&qu

Re: [sparql] Return only one property of many with the same name

2018-04-24 Thread Osma Suominen
8, at 5:54 AM, Laura Morales wrote: If I have this node :Alice :name "Alice", "Alice Smith" ; :age 25 . how can I return *only one* of the ":name" properties with SPARQL? For example return ("Alice", 25) -- Osma Suominen D.Sc. (Tech), Informatio

Re: Corrupted TDB2 database

2018-04-19 Thread Osma Suominen
nce but elsewhere they seem to, maybe because some file process runs on the real hardware (docker), or may be file locking can be interfered with. Is vg0-root shared in anyway?     Andy On 16/04/18 15:21, Osma Suominen wrote: Hi Andy! Forgot to answer the VM part - yes, this is a VM ru

Re: Corrupted TDB2 database

2018-04-16 Thread Osma Suominen
and I will send them to you. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: Corrupted TDB2 database

2018-04-16 Thread Osma Suominen
ue to the large number of tracebacks after 10:01), though the rest are much smaller. If you really want I can send these to you e.g. via the Funet Filesender service, which is OK for moving around large files. You will get a download link by e-mail. There are no secrets here, as this is intend

Corrupted TDB2 database

2018-04-16 Thread Osma Suominen
e and created a new one. But I thought I'd report the problem here in case someone else has seen the same. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsin

Re: CentOS 7/systemd startup script for fuseki

2018-03-08 Thread Osma Suominen
tps://jena.apache.org/documentation/fuseki2/fuseki-run.html#fuseki-service could clarify: "For newer systemd based Linux systems, the script 'fuseki.service' is provided. Please adapt to your paths and settings." Cheers, Joachim -Ursprüngliche Nachricht- Von: Osma

Re: CentOS 7/systemd startup script for fuseki

2018-03-02 Thread Osma Suominen
HS. Naturally the chosen directory layout also affects the systemd unit file. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: CentOS 7/systemd startup script for fuseki

2018-03-02 Thread Osma Suominen
for the "fuseki" init.d script in /etc/rc.local, which works after making the latter executable. But perhaps the problem has been solved in a better way. Cheers, Joachim -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kai

Re: Using Jena text search with all predicates

2018-02-28 Thread Osma Suominen
, Data Operations Tetherless World Constellation Department of Computer Science Rensselaer Polytechnic Institute -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi

Re: Use PREFIXes by default

2018-02-27 Thread Osma Suominen
, the prefixes need to be in the file itself, otherwise it won't even parse. For JSON-LD the context could also be separate, but often it's inlined. But no, the libraries don't do that. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of

Re: Use PREFIXes by default

2018-02-27 Thread Osma Suominen
e argued, I think it makes sense to do prefix handling on the client side and keep the SPARQL protocol "simple but stupid". Then everything needed to answer a query (well, except for the RDF data set of course) will be contained in the HTTP request. -Osma -- Osma Suominen D.S

Re: Spatial query with 'dynamic' dataset

2018-01-30 Thread Osma Suominen
/documentation/query/text-query.html#uid-field-and-automatic-document-deletion -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: indexing text in HTML content

2018-01-30 Thread Osma Suominen
rable. Nothing is said in these 2 pages: https://jena.apache.org/documentation/notes/typed-literals.html https://jena.apache.org/documentation/query/text-query.html -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014

Re: Slow Lucene query

2017-12-07 Thread Osma Suominen
nfiguration. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: Fuseki all graphs into dataset vs separate graphs

2017-11-27 Thread Osma Suominen
ially loading to TDB via tdbloader and tdbloader2. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: Jena/Fuseki graph sync

2017-11-24 Thread Osma Suominen
rdflib in-memory store, IOMemory) -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: Jena/Fuseki graph sync

2017-11-24 Thread Osma Suominen
emental, it's a replacement in a single atomic operation, so perhaps somewhat simpler than "deleting the old graph and loading the triples of the .nt file into the graph afterwards" that you suggested. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National

Re: Jena/Fuseki graph sync

2017-11-24 Thread Osma Suominen
y (at least not efficiently) to compare blank nodes in two graphs. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: Similar results with full text search

2017-11-23 Thread Osma Suominen
ate Lucene installation. First querying documents from Lucene index, then filtering the result sets with additional meta fields using Jena. This setup is quite complicated so was hoping a tighter integration to Jena would make things easier. Br, Mikael On 22.11.2017 22:40, Osma Suominen wrote:

Re: Similar results with full text search

2017-11-22 Thread Osma Suominen
supported by jena-text. What's your use case? How would you like to use it if it existed? -Osma Osma Suominen kirjoitti 22.11.2017 klo 22:37: Hi Mikael! Fuzzy search is a basic Lucene feature, just like prefix searches. You should be able to use it directly via jena-text using a query

Re: Similar results with full text search

2017-11-22 Thread Osma Suominen
eady works right now. -Osma Mikael Pesonen kirjoitti 22.11.2017 klo 15:44: Are there any plans on implementing similar text search for Jena? Until similarity is implemented, is it possible to query similar texts using Lucene directly, bypassing Jena, but with the same data set? Br, -- Osma

Re: sixteenth anniversary

2017-11-21 Thread Osma Suominen
in Apache SVN - from CVS at SF, SVN at SF and SVN at Apache. Andy -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Libr

Re: TDB1/TDB2 disk space with and without named graphs

2017-11-16 Thread Osma Suominen
union default graph functionality [1]. It could be added of course, just hasn't been. -Osma [1] https://github.com/rdfhdt/hdt-java/issues/3 -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +

Re: TDB1/TDB2 disk space with and without named graphs

2017-11-16 Thread Osma Suominen
taset into a named graph, the size doesn't change. It's still around 5GB. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: TDB1/TDB2 disk space with and without named graphs

2017-11-16 Thread Osma Suominen
named graphs. More than twice the space just because I decide to put the data in a named graph instead of the default graph? And that seems to be the case both for TDB1 (both tdbloader and tdbloader2) and TDB2. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library

TDB1/TDB2 disk space with and without named graphs

2017-11-16 Thread Osma Suominen
sing a named graph My larger goal is to decide whether to use TDB1 or TDB2 (or something else, like HDT or Blazegraph...) for a new bibliographic Linked Data service. Disk space is a factor (though not the most important one) in the calculation. -Osma -- Osma Suominen D.Sc. (Tech), Informati

Re: GC overhead limit exceeded with DELETE/INSERT

2017-09-11 Thread Osma Suominen
5:35: Hi Osma! I'm currently running jena with param -Xmx3600M. What I read "GC overhead limit exceeded" relates to java garbage collection, so maybe upping memory is not the right solution here? Br, On 11.9.2017 15:29, Osma Suominen wrote: Hi Mikael, How much memory have

Re: GC overhead limit exceeded with DELETE/INSERT

2017-09-11 Thread Osma Suominen
quot;;)) AS ?newURI) } } and get this Error 400: GC overhead limit exceeded Fuseki - version 3.4.0 (Build date: 2017-07-17T11:43:07+) There are less than million triplets this should affect. Is there another solution than using limit? Br, -- Osma Suominen D.Sc. (Tech), Information Syste

Re: moving graph between tdb

2017-08-31 Thread Osma Suominen
> i have a large tdb stored dataset with multiple graphs. can i move one > graph from one to another dataset? what would be the command using http > protokoll? > thank you! > andrew > > > -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library o

Re: Query about jena in Raspberry Pi

2017-07-02 Thread Osma Suominen
tion here: https://www.quora.com/Is-there-a-SPARQL-endpoint-or-triple-store-running-on-a-raspberry-pi -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: Fw: Re: Adding more than one text-query index

2017-05-25 Thread Osma Suominen
:label ] ) . What happens in this case? Are the "uri" or "label" fields from both map_graph1 and map_graph2 merged? Do I have to call them with different names like "graph1_uri" and "graph2_uri"? Or are they distinct? Because if they are merged, it's

Re: Adding more than one text-query index

2017-05-24 Thread Osma Suominen
phs (from the same dataset) will be stored in one Lucene index, with the graph IRIs of individual entities (triples/quads in practice) stored in the graphField so that queries can be efficiently restricted to a particular graph. You cannot set different jena-text options per graph. -Osma -

Re: At which point should I consider using text-query indexes?

2017-05-23 Thread Osma Suominen
graph. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: At which point should I consider using text-query indexes?

2017-05-23 Thread Osma Suominen
tand if this is only for full-text searches. Or should I use one of these indexes every time I use one the string functions (https://www.w3.org/TR/sparql11-query/#func-strings) such as CONTAINS, LCASE, etc.? -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finl

Re: /data endpoint

2017-04-05 Thread Osma Suominen
- Is "/query" the only endpoint that users need to know if they want to query my graph? Either that or "/sparql", in the default configuration they are defined exactly the same way. Both accept SPARQL queries (not updates). -Osma -- Osma Suominen D.Sc. (Tech), Informatio

Re: Fuseki config file

2017-04-05 Thread Osma Suominen
e you wouldn't worry about namespaces and such. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: Fuseki config file

2017-04-05 Thread Osma Suominen
never tried. There is little reason to use anything else than Turtle, since that's the most convenient syntax for human beings. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358

Re: Fuseki config file

2017-04-04 Thread Osma Suominen
05.04.2017, 09:44, Laura Morales kirjoitti: Thanks a lot, that fixed the problem! I guess this is a bug? I'd rather call it a missing feature. There is no autodetection of RDF syntax based on file content, the code uses the filename extension to determine it. -Osma -- Osma Suominen

Re: Querying Fuseki

2017-04-04 Thread Osma Suominen
There are many such libraries available (search for "jquery sparql" or "javascript sparql") but I can't recommend any specific one since I haven't really used them myself. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P

Re: Fuseki config file

2017-04-04 Thread Osma Suominen
<#dataset> ; . <#dataset> rdf:type tdb:DatasetTDB ; tdb:location "/home/myself/fusekidb" ; # Query timeout on this dataset (1s, 1000 milliseconds) ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "1" ] ; # Make the de

Re: Jena + HDT

2017-04-04 Thread Osma Suominen
?ysoc (COUNT(DISTINCT ?w) AS ?count) WHERE { ?w schema:about ?ysoc . FILTER(STRSTARTS(STR(?ysoc), 'http://www.yso.fi/onto/yso/')) } GROUP BY ?ysoc ORDER BY DESC(?count) LIMIT 20 --example query-- 04.04.2017, 13:29, Osma Suominen kirjoitti: 04.04.2017, 13:10, Dave Reynolds kirjoitti:

Re: Jena + HDT

2017-04-04 Thread Osma Suominen
of the resulting TDB is 16GB. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: Jena + HDT

2017-04-04 Thread Osma Suominen
, I'll play around with HDT and Jena today to get some more insights. ) >> Jena HDT is in-memory, right? > Is it? I thought it was a on-disk, compressed, and query-able list of quads... > -- Lorenz Bühmann AKSW group, University of Leipzig Group: h

ANN: jena-text, jena-spatial get newer Lucene, drop Solr support

2017-03-10 Thread Osma Suominen
to a Lucene index. However, a new Elasticsearch implementation for jena-text is currently being developed (see https://issues.apache.org/jira/browse/JENA-1305). That may become an alternative for jena-text/Solr users as well. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Speci

Performance regression between Jena 3.1.0 and 3.2.0

2017-03-09 Thread Osma Suominen
iple solutions for each pattern. -Osma [1] http://api.finto.fi/download/yso/yso-skos.ttl [2] https://github.com/NatLibFi/Skosmos/blob/master/model/sparql/GenericSparql.php#L404 -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 0

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-03-03 Thread Osma Suominen
ng for myself, I certainly want to get it included into Jena. It's just a question of fitting it in correctly, which might take a bit of time. --- A. Soroka The University of Virginia Library On Mar 1, 2017, at 1:27 PM, Osma Suominen wrote: Hi Anuj! I have nothing against modularity in

Re: SPARQL query

2017-03-02 Thread Osma Suominen
t specify values for properties that P does not have Claude On Thu, Mar 2, 2017 at 7:30 AM, Osma Suominen wrote: Hi Claude, Do you mean something like this? SELECT ?r WHERE { { # must have at least one of A, B, C { ?r :p :A } UNION { ?r :p :B } UNION { ?r :p :B } } # mus

Re: SPARQL query

2017-03-01 Thread Osma Suominen
. Any help would be appreciated, Claude -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-03-01 Thread Osma Suominen
you get into shading and the like. We have to do that for Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a thing we want to do any more of than needed, I don't think. --- A. Soroka The University of Virginia Library [1] http://openjdk.java.net/projects/jigsaw/ O

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-03-01 Thread Osma Suominen
Otherwise, yes, you get into shading and the like. We have to do that for Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a thing we want to do any more of than needed, I don't think. --- A. Soroka The University of Virginia Library [1] http://openjdk.jav

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-03-01 Thread Osma Suominen
dependency issues by including the Lucene librarires that we included in our es specific pom. Have a look the pom of jena-text-es module here to see how it can be done : https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml Thanks, Anuj Kumar On Wed, Mar 1, 2017 at 7:27 AM, Osma Suom

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-02-28 Thread Osma Suominen
t with having a separate Module for Jena Text ES and see how things go. If they go well, we could extract out Solr and Lucene out of Jena Text. Again this is just a suggestion based on my limited industry experience. Thanks, Anuj Kumar On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen wrote: 28.02

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-02-28 Thread Osma Suominen
liar with how to set it up, and the jena-text instructions are pretty vague unfortunately. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-02-28 Thread Osma Suominen
2017 at 2:22 PM, Osma Suominen wrote: 14.02.2017, 15:15, anuj kumar kirjoitti: I will do it. But I need to first get the simple test working in order to move forward. I hope I someone here can help me. Maybe you need to add an implementWith declaration to TextAssembler.java? -Osma -- Osm

Re: SPARQL Query from one Fuseki server to another Fuseki server

2017-02-16 Thread Osma Suominen
Sorry I meant SPARQL 1.1 Federated Query spec: http://www.w3.org/TR/sparql11-federated-query/ -Osma 16.02.2017, 14:12, Osma Suominen kirjoitti: Hi Sandor, You need to do a federated query. See the SPARQL 1.1 Query spec. Something like this: SELECT * WHERE { SERVICE <http://other-endpo

Re: SPARQL Query from one Fuseki server to another Fuseki server

2017-02-16 Thread Osma Suominen
FROM WHERE {GRAPH {?s ?p ?o . } } but it provides empty set: - | s | p | o | = - What is wrong? Thanks in advance, Sandor -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSING

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-02-14 Thread Osma Suominen
14.02.2017, 15:15, anuj kumar kirjoitti: I will do it. But I need to first get the simple test working in order to move forward. I hope I someone here can help me. Maybe you need to add an implementWith declaration to TextAssembler.java? -Osma -- Osma Suominen D.Sc. (Tech), Information

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-02-14 Thread Osma Suominen
"DRY" comment in the code showing that somebody else has thought about it too. Also it might be helpful to try to reuse all the Lucene unit tests for ES as well, if you can figure out a way to do that. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Lib

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

2017-02-14 Thread Osma Suominen
Lorenz Bühmann AKSW group, University of Leipzig Group: http://aksw.org - semantic web research center -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@helsinki.fi http://www.nationallibrary.fi

Re: 10GB file loading on Fuseki

2017-01-23 Thread Osma Suominen
-30 files with .dat and .idn prefix. Could you please help me out ? how can I load them into fuseki now? -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suomi...@h

Re: 10g data loading in Fuseki

2017-01-18 Thread Osma Suominen
have a .nt file size 10G and I want to upload it into fuseki server as TDB structure not in-memory. After running the server, if I upload it one-time I get SessionTimesOut error, how can I address this problem? Please help me what is your recommendation? Regards, Reihan -- Osma Suominen D.Sc

Re: Spatial Jena

2017-01-11 Thread Osma Suominen
68 -117.12865 32.56668 -117.13865) . } On 9 January 2017 at 09:25, Osma Suominen wrote: Hi Samur, Can you report this to JIRA with a reproducible full description? I.e. link to the data set you used, your jena-spatial index configuration, Jena/Fuseki versions you used, and of course the que

Re: Spatial Jena

2017-01-09 Thread Osma Suominen
all have the same issue. I think there are some merging happening after accessing the index that make it slow. I did not look into the code to see where is the issue. Best, On 9 January 2017 at 09:12, Osma Suominen wrote: Hi Samur, I agree, that's really slow. I wonder if that's so

Re: Spatial Jena

2017-01-09 Thread Osma Suominen
eki INFO [6] exec/select [2017-01-09 09:07:34] Fuseki INFO [6] 200 OK (5.334 s) I wonder why is so slow if the lucene index is so fast and the result set contains only 17 resources. On 9 January 2017 at 08:55, Osma Suominen wrote: Hi Samur! Does it help if you drop the DISTINCT? -O

Re: Spatial Jena

2017-01-08 Thread Osma Suominen
<http://www.w3.org/2000/01/rdf-schema#> SELECT distinct ?place{ ?place spatial:intersectBox (32.55668 -117.12865 32.56668 -117.13865) . } The same query in solr/lucene takes only 20ms. I wonder why fuseki or jena spatial is so slow. Any clue about it? -- Osma Suominen D.Sc. (Tech),

Re: Compile Forked Version

2017-01-07 Thread Osma Suominen
! java.lang.Exception: Unexpected exception, expected but was at org.apache.jena.riot.tokens.TestTokenizer.tokenFirst(TestTok enizer.java:45) at org.apache.jena.riot.tokens.TestTokenizer.tokenUnit_iri18(Te stTokenizer.java:205) The code you are compiling is behind the Apache codebase. It looks like it could

Re: Jena with Lucene 5 or 6

2017-01-03 Thread Osma Suominen
-Osma 3.12.2016, 17:35, Samur Araujo kirjoitti: Is there any plan to migrate Jena/Fuseki for Lucene 5 or 6? Any fork available that have done the migration already? -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELS

Re: Text Search Using Lucene

2016-11-29 Thread Osma Suominen
Sorry I misinterpreted the StackOverflow link, so please ignore the part about the version. I'm assuming you are using a recent Fuseki version. -Osma 29.11.2016, 10:55, Osma Suominen kirjoitti: Hi Abhishek, What are the contents of the Lucene index directory (called "Lucene&quo

Re: Text Search Using Lucene

2016-11-29 Thread Osma Suominen
sue a year ago as in this thread http://thread.gmane.org/gmane.comp.apache.jena.user/7892 but there are no solutions there as well. Can someone please help here? Thanks & Regards Abhishek Kumar -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland

Re: command line sparql: --namedGraph clobbers --data?

2016-11-14 Thread Osma Suominen
Hi Andy, Sure thing! https://issues.apache.org/jira/browse/JENA-1261 -Osma 14.11.2016, 19:28, Andy Seaborne kirjoitti: Osma, Could you raise a JIRA so this does not get overlooked? Thanks Andy On 14/11/16 11:12, Osma Suominen wrote: Hi, I noticed a behavior change in the Jena 3.1.1

command line sparql: --namedGraph clobbers --data?

2016-11-14 Thread Osma Suominen
he file specified by --data does end up in the default graph. I can easily work around this by not using this combination of options (after all, --graph is the more explicit way of loading data into the default graph), I was just surprised when a script broke because of this change. -Os

Re: completion with Lucene: desirable from SPARQL

2016-11-04 Thread Osma Suominen
least get the update to version 5 merged into Jena. At the very minimum, making a PR against Jena would indicate (from a legal perspective) that you wish to contribute the work to Apache Jena, so that others can make use of it. -- Osma Suominen D.Sc. (Tech), Information Systems Specialist Nati

Re: completion with Lucene: desirable from SPARQL

2016-11-03 Thread Osma Suominen
w triples are added for the same subject, but its label is unchanged, then the text index won't see the update and thus the count of references/triples won't be updated either. I may be wrong here, I'm not sure how the update tracking works. -Osma -- Osma Suominen D.Sc. (Tech),

Re: completion with Lucene: desirable from SPARQL

2016-11-03 Thread Osma Suominen
I'll have to implement also the callback for updates like class TextDocProducerTriples in Jena-text. 2016-11-01 13:59 GMT+01:00 Osma Suominen : Hi Jean-Marc, The wildcard queries etc. are basic Lucene features, part of Lucene query syntax, so probably that's why they not documented

Re: Placement of VALUES block affects performance in 3.1.1

2016-11-02 Thread Osma Suominen
02.11.2016, 13:08, Osma Suominen wrote: It's possible that a MINUS expression could be used instead of FILTER NOT EXISTS and perform better. I will have to test this. But other than switching to MINUS, I can't think of any way to express this constraint on collections without using so

Re: Placement of VALUES block affects performance in 3.1.1

2016-11-02 Thread Osma Suominen
(A,C)) now if A is an complex expression, that is a bad idea (probably). If A is a small VALUES block then it makes sense. It isn't done though. Ok. So a potential future optimization perhaps. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finl

Re: completion with Lucene: desirable from SPARQL

2016-11-01 Thread Osma Suominen
both weightings. So, in the short term I have to figure out how to add weights to the Lucene - Jena index. Then I have to read what dbPedia lookup does, and other background material. 2016-10-31 16:42 GMT+01:00 Osma Suominen : Hi Jean-Marc, Depending on what exactly you want from such a se

Re: Placement of VALUES block affects performance in 3.1.1

2016-11-01 Thread Osma Suominen
thing wrong? -Osma On 01/11/16 11:03, Osma Suominen wrote: Hi, I'm investigating a performance regression we're seeing with the current Jena 3.1.1-SNAPSHOT compared to 3.1.0. The data in graph <http://www.yso.fi/onto/yso/> is the YSO ontology, available from http://api.finto.

Placement of VALUES block affects performance in 3.1.1

2016-11-01 Thread Osma Suominen
to affect query evaluation order in this way? It appears to me that in the slow version, ?uri is not bound inside the inner FILTER NOT EXISTS, which causes an explosion of results internally. -Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Bo

Re: Getting rid of triples with bad URIs

2016-10-31 Thread Osma Suominen
bib/reshaperdf/blob/master/src/main/java/org/gesis/reshaperdf/cmd/correct/CorrectCommand.java On 27/10/16 13:24, james anderson wrote: good afternoon; On 2016-10-27, at 11:46, Osma Suominen wrote: Hi Andy! On 27/10/16 12:21, Andy Seaborne wrote: Shouldn't the conversion to triples check t

Re: completion with Lucene: desirable from SPARQL

2016-10-31 Thread Osma Suominen
here is a code snippet here http://stackoverflow.com/questions/120180/how-to-do-query-auto-completion-suggestions-in-lucene but a regular Lucene API may exist. [1] https://github.com/dbpedia/lookup [2] https://github.com/jmvanel/semantic_forms/blob/master/doc/en/administration.md#populating-with-dbpedi

Re: Getting rid of triples with bad URIs

2016-10-27 Thread Osma Suominen
e) which can do recovery, reporting and splitting the output between good and bad. The current parser can't output to different places. It should be easy to register it as a replacement for the standard one. Okay. I will think about this. But most likely I'll just use a separate

Re: Getting rid of triples with bad URIs

2016-10-27 Thread Osma Suominen
rocess. JSON-LD is 3rd party system : jsonld-java. Looks to me like Jena is not checking the output from that as it creates the Jena objects because "ParserProfileChecker" is checking for triple problems (literals as subjects etc) and assumes it's input terms are valid.

  1   2   >