Re: Output Connector - Apache Marmotta

Joshua Dunham Wed, 09 Sep 2015 09:56:28 -0700

Hi Karl, Rafa,

  I finally had some time to work on this and I have a scheme which (largely) 
works very well but I have some question, one stumbling block, and one comment.

First, my environment consists of, Manifold v 2.1, MariaDB which I imported a 
small CSV into for testing, and Marmotta 3.3.

The real interesting bits are in specifying the Task. I have the mySQL input -> 
metadata adjuster -> filesystem output. mySQL is setup and the connection shows 
as OK and on starting the job, it does write files to the output folder.

Getting the list of ID's works well no issue there, and I'm not using 
versioning or access tokens yet. The stumbling block has to do with setting up 
the Data Query and the best use of the $URL and $DATA variables. First: I've 
hijacked the $URL into ~ CONCAT("addresses/", id) AS $(URLCOLUMN) which has the 
effect of creating a folder called addresses in the root of the output folder. 
Inside of the addresses folder it makes numbered files corresponding to the 
rowID. I can point the root folder path at the marmotta import directory and 
even use the context templating feature (setting 'addresses' into the real 
context name). That's really slick for out of the box hack at integration.

What is not apparent is how to use the metadata adjuster to interact with the 
variables in the Data query. I've followed the guide and made a simple hello, 
False, ${city} statement but the only bits that are written into the file are 
the contents of the $DATACOLUMN variable. So, given a simple address book in a 
database with columns, id, street, city, region, country, post code, latitude, 
longitude ... how should I approach making such a data query? My real use cases 
will be much much more complicated so I'm wondering if you have some 
explanation of how I should want to use that field and maybe a small SQL 
snippet example with those columns? :) My end goal is to have a column called 
out and then use the metadata adjuster to simply prepend each column's value 
with a string. So if the city is 'New York' it would write out city:New_York or 
the like. 

=====

The comment was in regards to a bit of sample data which could ship with the 
source. It would be very educational if there was a complex but real 
configuration of ManifoldCF that links to a sqlite3 file as input and maybe the 
same one input db but a different table as output?

=====

My question is; why would I need to setup different transform modules? Since 
there is no real config to do in the transform connector (all the good stuff 
seems to be under Task config) I'm not sure why I would need to make more than 
one and keep reusing it by changing the transform paeans under task?

Thank you!

J

> On 5 July 2015 at 17:27, Karl Wright <[email protected]> wrote:
> Hi Joshua,
> 
> My take:
> 
> --> (A) How I define the data to grab, whether some SQL statement or the
> like. <--
> 
> Have a look at the user documentation here:
> https://manifoldcf.apache.org/release/release-1.9/en_US/end-user-documentation.html#jdbcrepository
> 
> It should be pretty clear how you define what you are looking for.
> 
> --> (B) How to use this data as individual variables which I can arrange
> into a linked data relationship (ManifoldCF mapping module?) <--
> 
> Rafa's previous reply about the RepositoryDocument is appropriate. 
> Basically, an output connector will be handed one of those objects for every
> MCF "document".  The javadoc for it is here:
> 
> https://manifoldcf.apache.org/release/trunk/api/framework/org/apache/manifoldcf/agents/interfaces/RepositoryDocument.html
> 
> --> (C) How difficult would it be to connect to Marmotta's webservice(s).
> I'm not familiar with the exact mechanism, but I saw ManifoldCF has
> support for elasticsearch so maybe I could put something together that
> talks to Marmotta..<--
> 
> You can readily write your own output connector.  There's a book, in fact,
> describing how to do that.  See:
> 
> https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs
> 
> ... and read Chapter 9.
> 
> Thanks,
> Karl
> 
> 
> On Sun, Jul 5, 2015 at 11:53 AM, Joshua Dunham <[email protected]>
> wrote:
>> 
>> That sounds promising. Would you recommend ManifoldCF for this? If so,
>> do you know of any resources which I can use to get up to speed with
>> using it in this way?
>> 
>> -J
>> 
>>> On 4 July 2015 at 21:48,  <[email protected]> wrote:
>>> Hi Joshua,
>>> 
>>> The ManifoldCF unit logic in terms of indexing is the Repository
>>> Document
>>> which, simplifying a lot, model a document composed by content plus
>>> metadata
>>> (key-value). It should be relative easy to tripifly that structure and
>>> push
>>> it to Marmotta using SPARQL update queries or Marmotta’s java client for
>>> adding resources.
>>> The Generic Database connector uses a set of queries for crawling the
>>> database. You should have to use that queries to get you data. I’m not
>>> completely sure if each record result is converted directly to a
>>> Repository
>>> Document, that is something that I would need to check.
>>> 
>>> Hope that helps,
>>> Cheers, Rafa
>>> 
>>> 
>>> 
>>> 
>>> On Sun, Jul 5, 2015 at 2:56 AM, Joshua Dunham <[email protected]>
>>> wrote:
>>>> 
>>>> Hi ManifoldCF Users (and Devs)
>>>> 
>>>> I'm wondering if ManifoldCF can work in my use case. I have some
>>>> random mySQL and Oracle DB's that I would like to connect to and
>>>> extract certain known bits of info, format them each a certain way and
>>>> then store the info in Apache Marmotta [1]. Marmotta is an RDF triple
>>>> store for linked data so I would need to parse and store the mySQL and
>>>> Oracle DB's info into a linked format, which is no problem for me to
>>>> create the relationships etc, I just need something that would let me
>>>> specifically do this.
>>>> 
>>>> From what I've read, ManifoldCF can connect to mySQL and Oracle
>>>> (via non-distributed libraries), and store the results out in several
>>>> target data stores. What isn't clear is
>>>> (A) How I define the data to grab, whether some SQL statement or the
>>>> like.
>>>> (B) How to use this data as individual variables which I can arrange
>>>> into a linked data relationship (ManifoldCF mapping module?)
>>>> (C) How difficult would it be to connect to Marmotta's webservice(s).
>>>> I'm not familiar with the exact mechanism, but I saw ManifoldCF has
>>>> support for elasticsearch so maybe I could put something together that
>>>> talks to Marmotta..
>>>> 
>>>> Would this be possible? If so, could someone point me in the right
>>>> direction?
>>>> 
>>>> Thanks!
>>>> -Joshua
>>>> 
>>>> 
>>>> [1] - http://marmotta.apache.org/index.html

Re: Output Connector - Apache Marmotta

Reply via email to