Could you shed any light on the middle part,
=====
What is not apparent is how to use the metadata adjuster to interact with the
variables in the Data query. I've followed the guide and made a simple hello,
False, ${city} statement but the only bits that are written into the file are
the contents of the $DATACOLUMN variable. So, given a simple address book in a
database with columns, id, street, city, region, country, post code, latitude,
longitude ... how should I approach making such a data query?
My real use cases will be much much more complicated so I'm wondering if you
have some explanation of how I should want to use that field and maybe a small
SQL snippet example with those columns? :) My end goal is to have a column
called out and then use the metadata adjuster to simply prepend each column's
value with a string. So if the city is 'New York' it would write out
city:New_York or the like.
=====
Thx in advance!
-J
> On Sep 9, 2015, at 1:53 PM, Karl Wright <[email protected]> wrote:
>
> Hi Joshua,
>
> "My question is; why would I need to setup different transform modules? Since
> there is no real config to do in the transform connector (all the good stuff
> seems to be under Task config) I'm not sure why I would need to make more
> than one and keep reusing it by changing the transform paeans under task?"
>
> While the Metadata Adjuster transformer has no configuration, the model that
> MCF uses for transformers is just like the model it uses for other kinds of
> connectors. Pretend for a moment that you needed to call an external system
> to do content extraction, then you will see the point.
>
> Thanks,
> Karl
>
>
>> On Wed, Sep 9, 2015 at 12:55 PM, Joshua Dunham <[email protected]>
>> wrote:
>> Hi Karl, Rafa,
>>
>> I finally had some time to work on this and I have a scheme which
>> (largely) works very well but I have some question, one stumbling block, and
>> one comment.
>>
>> First, my environment consists of, Manifold v 2.1, MariaDB which I imported
>> a small CSV into for testing, and Marmotta 3.3.
>>
>> The real interesting bits are in specifying the Task. I have the mySQL input
>> -> metadata adjuster -> filesystem output. mySQL is setup and the connection
>> shows as OK and on starting the job, it does write files to the output
>> folder.
>>
>> Getting the list of ID's works well no issue there, and I'm not using
>> versioning or access tokens yet. The stumbling block has to do with setting
>> up the Data Query and the best use of the $URL and $DATA variables. First:
>> I've hijacked the $URL into ~ CONCAT("addresses/", id) AS $(URLCOLUMN) which
>> has the effect of creating a folder called addresses in the root of the
>> output folder. Inside of the addresses folder it makes numbered files
>> corresponding to the rowID. I can point the root folder path at the marmotta
>> import directory and even use the context templating feature (setting
>> 'addresses' into the real context name). That's really slick for out of the
>> box hack at integration.
>>
>> What is not apparent is how to use the metadata adjuster to interact with
>> the variables in the Data query. I've followed the guide and made a simple
>> hello, False, ${city} statement but the only bits that are written into the
>> file are the contents of the $DATACOLUMN variable. So, given a simple
>> address book in a database with columns, id, street, city, region, country,
>> post code, latitude, longitude ... how should I approach making such a data
>> query? My real use cases will be much much more complicated so I'm wondering
>> if you have some explanation of how I should want to use that field and
>> maybe a small SQL snippet example with those columns? :) My end goal is to
>> have a column called out and then use the metadata adjuster to simply
>> prepend each column's value with a string. So if the city is 'New York' it
>> would write out city:New_York or the like.
>>
>> =====
>>
>> The comment was in regards to a bit of sample data which could ship with the
>> source. It would be very educational if there was a complex but real
>> configuration of ManifoldCF that links to a sqlite3 file as input and maybe
>> the same one input db but a different table as output?
>>
>> =====
>>
>> My question is; why would I need to setup different transform modules? Since
>> there is no real config to do in the transform connector (all the good stuff
>> seems to be under Task config) I'm not sure why I would need to make more
>> than one and keep reusing it by changing the transform paeans under task?
>>
>>
>> Thank you!
>>
>> J
>>
>>
>> > On 5 July 2015 at 17:27, Karl Wright <[email protected]> wrote:
>> > Hi Joshua,
>> >
>> > My take:
>> >
>> > --> (A) How I define the data to grab, whether some SQL statement or the
>> > like. <--
>> >
>> > Have a look at the user documentation here:
>> > https://manifoldcf.apache.org/release/release-1.9/en_US/end-user-documentation.html#jdbcrepository
>> >
>> > It should be pretty clear how you define what you are looking for.
>> >
>> > --> (B) How to use this data as individual variables which I can arrange
>> > into a linked data relationship (ManifoldCF mapping module?) <--
>> >
>> > Rafa's previous reply about the RepositoryDocument is appropriate.
>> > Basically, an output connector will be handed one of those objects for
>> > every
>> > MCF "document". The javadoc for it is here:
>> >
>> > https://manifoldcf.apache.org/release/trunk/api/framework/org/apache/manifoldcf/agents/interfaces/RepositoryDocument.html
>> >
>> > --> (C) How difficult would it be to connect to Marmotta's webservice(s).
>> > I'm not familiar with the exact mechanism, but I saw ManifoldCF has
>> > support for elasticsearch so maybe I could put something together that
>> > talks to Marmotta..<--
>> >
>> > You can readily write your own output connector. There's a book, in fact,
>> > describing how to do that. See:
>> >
>> > https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs
>> >
>> > ... and read Chapter 9.
>> >
>> > Thanks,
>> > Karl
>> >
>> >
>> > On Sun, Jul 5, 2015 at 11:53 AM, Joshua Dunham <[email protected]>
>> > wrote:
>> >>
>> >> That sounds promising. Would you recommend ManifoldCF for this? If so,
>> >> do you know of any resources which I can use to get up to speed with
>> >> using it in this way?
>> >>
>> >> -J
>> >>
>> >>> On 4 July 2015 at 21:48, <[email protected]> wrote:
>> >>> Hi Joshua,
>> >>>
>> >>> The ManifoldCF unit logic in terms of indexing is the Repository
>> >>> Document
>> >>> which, simplifying a lot, model a document composed by content plus
>> >>> metadata
>> >>> (key-value). It should be relative easy to tripifly that structure and
>> >>> push
>> >>> it to Marmotta using SPARQL update queries or Marmotta’s java client for
>> >>> adding resources.
>> >>> The Generic Database connector uses a set of queries for crawling the
>> >>> database. You should have to use that queries to get you data. I’m not
>> >>> completely sure if each record result is converted directly to a
>> >>> Repository
>> >>> Document, that is something that I would need to check.
>> >>>
>> >>> Hope that helps,
>> >>> Cheers, Rafa
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Sun, Jul 5, 2015 at 2:56 AM, Joshua Dunham <[email protected]>
>> >>> wrote:
>> >>>>
>> >>>> Hi ManifoldCF Users (and Devs)
>> >>>>
>> >>>> I'm wondering if ManifoldCF can work in my use case. I have some
>> >>>> random mySQL and Oracle DB's that I would like to connect to and
>> >>>> extract certain known bits of info, format them each a certain way and
>> >>>> then store the info in Apache Marmotta [1]. Marmotta is an RDF triple
>> >>>> store for linked data so I would need to parse and store the mySQL and
>> >>>> Oracle DB's info into a linked format, which is no problem for me to
>> >>>> create the relationships etc, I just need something that would let me
>> >>>> specifically do this.
>> >>>>
>> >>>> From what I've read, ManifoldCF can connect to mySQL and Oracle
>> >>>> (via non-distributed libraries), and store the results out in several
>> >>>> target data stores. What isn't clear is
>> >>>> (A) How I define the data to grab, whether some SQL statement or the
>> >>>> like.
>> >>>> (B) How to use this data as individual variables which I can arrange
>> >>>> into a linked data relationship (ManifoldCF mapping module?)
>> >>>> (C) How difficult would it be to connect to Marmotta's webservice(s).
>> >>>> I'm not familiar with the exact mechanism, but I saw ManifoldCF has
>> >>>> support for elasticsearch so maybe I could put something together that
>> >>>> talks to Marmotta..
>> >>>>
>> >>>> Would this be possible? If so, could someone point me in the right
>> >>>> direction?
>> >>>>
>> >>>> Thanks!
>> >>>> -Joshua
>> >>>>
>> >>>>
>> >>>> [1] - http://marmotta.apache.org/index.html
>