Re: Output Connector - Apache Marmotta

Joshua Dunham Wed, 09 Sep 2015 12:49:55 -0700

Could you shed any light on the middle part,

=====


What is not apparent is how to use the metadata adjuster to interact with the 
variables in the Data query. I've followed the guide and made a simple hello, 
False, ${city} statement but the only bits that are written into the file are 
the contents of the $DATACOLUMN variable. So, given a simple address book in a 
database with columns, id, street, city, region, country, post code, latitude, 
longitude ... how should I approach making such a data query? 

My real use cases will be much much more complicated so I'm wondering if you 
have some explanation of how I should want to use that field and maybe a small 
SQL snippet example with those columns? :) My end goal is to have a column 
called out and then use the metadata adjuster to simply prepend each column's 
value with a string. So if the city is 'New York' it would write out 
city:New_York or the like.

=====
Thx in advance!

-J

> On Sep 9, 2015, at 1:53 PM, Karl Wright <[email protected]> wrote:
> 
> Hi Joshua,
> 
> "My question is; why would I need to setup different transform modules? Since 
> there is no real config to do in the transform connector (all the good stuff 
> seems to be under Task config) I'm not sure why I would need to make more 
> than one and keep reusing it by changing the transform paeans under task?"
> 
> While the Metadata Adjuster transformer has no configuration, the model that 
> MCF uses for transformers is just like the model it uses for other kinds of 
> connectors.  Pretend for a moment that you needed to call an external system 
> to do content extraction, then you will see the point.
> 
> Thanks,
> Karl
> 
> 
>> On Wed, Sep 9, 2015 at 12:55 PM, Joshua Dunham <[email protected]> 
>> wrote:
>> Hi Karl, Rafa,
>> 
>>   I finally had some time to work on this and I have a scheme which 
>> (largely) works very well but I have some question, one stumbling block, and 
>> one comment.
>> 
>> First, my environment consists of, Manifold v 2.1, MariaDB which I imported 
>> a small CSV into for testing, and Marmotta 3.3.
>> 
>> The real interesting bits are in specifying the Task. I have the mySQL input 
>> -> metadata adjuster -> filesystem output. mySQL is setup and the connection 
>> shows as OK and on starting the job, it does write files to the output 
>> folder.
>> 
>> Getting the list of ID's works well no issue there, and I'm not using 
>> versioning or access tokens yet. The stumbling block has to do with setting 
>> up the Data Query and the best use of the $URL and $DATA variables. First: 
>> I've hijacked the $URL into ~ CONCAT("addresses/", id) AS $(URLCOLUMN) which 
>> has the effect of creating a folder called addresses in the root of the 
>> output folder. Inside of the addresses folder it makes numbered files 
>> corresponding to the rowID. I can point the root folder path at the marmotta 
>> import directory and even use the context templating feature (setting 
>> 'addresses' into the real context name). That's really slick for out of the 
>> box hack at integration.
>> 
>> What is not apparent is how to use the metadata adjuster to interact with 
>> the variables in the Data query. I've followed the guide and made a simple 
>> hello, False, ${city} statement but the only bits that are written into the 
>> file are the contents of the $DATACOLUMN variable. So, given a simple 
>> address book in a database with columns, id, street, city, region, country, 
>> post code, latitude, longitude ... how should I approach making such a data 
>> query? My real use cases will be much much more complicated so I'm wondering 
>> if you have some explanation of how I should want to use that field and 
>> maybe a small SQL snippet example with those columns? :) My end goal is to 
>> have a column called out and then use the metadata adjuster to simply 
>> prepend each column's value with a string. So if the city is 'New York' it 
>> would write out city:New_York or the like.
>> 
>> =====
>> 
>> The comment was in regards to a bit of sample data which could ship with the 
>> source. It would be very educational if there was a complex but real 
>> configuration of ManifoldCF that links to a sqlite3 file as input and maybe 
>> the same one input db but a different table as output?
>> 
>> =====
>> 
>> My question is; why would I need to setup different transform modules? Since 
>> there is no real config to do in the transform connector (all the good stuff 
>> seems to be under Task config) I'm not sure why I would need to make more 
>> than one and keep reusing it by changing the transform paeans under task?
>> 
>> 
>> Thank you!
>> 
>> J
>> 
>> 
>> > On 5 July 2015 at 17:27, Karl Wright <[email protected]> wrote:
>> > Hi Joshua,
>> >
>> > My take:
>> >
>> > --> (A) How I define the data to grab, whether some SQL statement or the
>> > like. <--
>> >
>> > Have a look at the user documentation here:
>> > https://manifoldcf.apache.org/release/release-1.9/en_US/end-user-documentation.html#jdbcrepository
>> >
>> > It should be pretty clear how you define what you are looking for.
>> >
>> > --> (B) How to use this data as individual variables which I can arrange
>> > into a linked data relationship (ManifoldCF mapping module?) <--
>> >
>> > Rafa's previous reply about the RepositoryDocument is appropriate.
>> > Basically, an output connector will be handed one of those objects for 
>> > every
>> > MCF "document".  The javadoc for it is here:
>> >
>> > https://manifoldcf.apache.org/release/trunk/api/framework/org/apache/manifoldcf/agents/interfaces/RepositoryDocument.html
>> >
>> > --> (C) How difficult would it be to connect to Marmotta's webservice(s).
>> > I'm not familiar with the exact mechanism, but I saw ManifoldCF has
>> > support for elasticsearch so maybe I could put something together that
>> > talks to Marmotta..<--
>> >
>> > You can readily write your own output connector.  There's a book, in fact,
>> > describing how to do that.  See:
>> >
>> > https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs
>> >
>> > ... and read Chapter 9.
>> >
>> > Thanks,
>> > Karl
>> >
>> >
>> > On Sun, Jul 5, 2015 at 11:53 AM, Joshua Dunham <[email protected]>
>> > wrote:
>> >>
>> >> That sounds promising. Would you recommend ManifoldCF for this? If so,
>> >> do you know of any resources which I can use to get up to speed with
>> >> using it in this way?
>> >>
>> >> -J
>> >>
>> >>> On 4 July 2015 at 21:48,  <[email protected]> wrote:
>> >>> Hi Joshua,
>> >>>
>> >>> The ManifoldCF unit logic in terms of indexing is the Repository
>> >>> Document
>> >>> which, simplifying a lot, model a document composed by content plus
>> >>> metadata
>> >>> (key-value). It should be relative easy to tripifly that structure and
>> >>> push
>> >>> it to Marmotta using SPARQL update queries or Marmotta’s java client for
>> >>> adding resources.
>> >>> The Generic Database connector uses a set of queries for crawling the
>> >>> database. You should have to use that queries to get you data. I’m not
>> >>> completely sure if each record result is converted directly to a
>> >>> Repository
>> >>> Document, that is something that I would need to check.
>> >>>
>> >>> Hope that helps,
>> >>> Cheers, Rafa
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Sun, Jul 5, 2015 at 2:56 AM, Joshua Dunham <[email protected]>
>> >>> wrote:
>> >>>>
>> >>>> Hi ManifoldCF Users (and Devs)
>> >>>>
>> >>>> I'm wondering if ManifoldCF can work in my use case. I have some
>> >>>> random mySQL and Oracle DB's that I would like to connect to and
>> >>>> extract certain known bits of info, format them each a certain way and
>> >>>> then store the info in Apache Marmotta [1]. Marmotta is an RDF triple
>> >>>> store for linked data so I would need to parse and store the mySQL and
>> >>>> Oracle DB's info into a linked format, which is no problem for me to
>> >>>> create the relationships etc, I just need something that would let me
>> >>>> specifically do this.
>> >>>>
>> >>>> From what I've read, ManifoldCF can connect to mySQL and Oracle
>> >>>> (via non-distributed libraries), and store the results out in several
>> >>>> target data stores. What isn't clear is
>> >>>> (A) How I define the data to grab, whether some SQL statement or the
>> >>>> like.
>> >>>> (B) How to use this data as individual variables which I can arrange
>> >>>> into a linked data relationship (ManifoldCF mapping module?)
>> >>>> (C) How difficult would it be to connect to Marmotta's webservice(s).
>> >>>> I'm not familiar with the exact mechanism, but I saw ManifoldCF has
>> >>>> support for elasticsearch so maybe I could put something together that
>> >>>> talks to Marmotta..
>> >>>>
>> >>>> Would this be possible? If so, could someone point me in the right
>> >>>> direction?
>> >>>>
>> >>>> Thanks!
>> >>>> -Joshua
>> >>>>
>> >>>>
>> >>>> [1] - http://marmotta.apache.org/index.html
>

Re: Output Connector - Apache Marmotta

Reply via email to