There are a few implementations in Java and in Python, and the Wikipedia page 
has some good intro about it. But last time I used it, it was about one year 
ago, in Python. Once you implement it the first time, it gets quite 
straightforward. 
 If you have further questions about BoW, NLP, etc, feel free to drop me an 
e-mail off list (as the discussion has digressed a little bit from Jena) :-)
Bruno

 
      From: Jonathan Camilleri <camilleri....@gmail.com>
 To: Bruno P. Kinoshita <ki...@apache.org> 
Cc: "users@jena.apache.org" <users@jena.apache.org>
 Sent: Tuesday, 29 December 2015 1:11 AM
 Subject: Re: Parsing UDFs...
   
Can you provide more information on BoW?
Is there some quick start guide? :)

Jon

On 28 December 2015 at 12:20, Bruno P. Kinoshita <ki...@apache.org> wrote:

> Hi Jonathan;
>
>
> I have built Jena on Windows a few times, but never used it to run Jena
> (though I think I once started Fuseki 1 on Windows). But I believe it
> should work. Don't know if you have a deadline for your project, but even
> then you may find useful to spend some time going through Jena's
> documentation - http://jena.apache.org/
>
>
> Maybe what you are looking for is Fuseki? It provides a web layer and
> SPARQL endpoint using Jena (on downloads, click on the link for Fuseki, not
> for Jena). Then take a look at
> http://jena.apache.org/documentation/fuseki2/index.html
>
>
> Spend some time reading about RDF, Reification, etc, even if you already
> know about the topics, as these notes may explain more about how Jena works
> and uses these concepts.
>
>
> Finally, on parsing UDF's, if I understand correctly, you are trying to
> apply a NLP algorithm on URL's used to identify resources in RDF.
>
>
> If that's the case, and if you want to use BoW or NER (CRF, MaxEnt, etc),
> you would probably be considering only a part of the URL? For example, for
> http://niwa.co.nz/tax#galaxias_aff_divergens_northern
> <http://niwa.co.nz/tax#galaxias_aff_divergens_northern.>, you would
> extract just galaxias_aff_divergens_northern, getting "galaxias aff
> divergens northen" (which comes from Galaxias aff. divergens 'northern' -
> https://tad.niwa.co.nz/trs#trs/1727994/Galaxias aff. divergens
> 'northern'/summary, FWIW). Or you could implement a simple tokenizer that
> included the domain as well...
>
>
> If you used BoW, you could apply, for instance, cosine distance and find
> URL's that look similar. If you decide to use a NER classifier, you may
> need a bigger corpus (or many different corpora, depending on your data) to
> correctly classify the URL's. Not sure if that'd would work well for your
> assignment, probably BoW is the simplest approach.
>
>
> Hope that helps.
> Bruno
>
>
> ------------------------------
> *From:* Jonathan Camilleri <camilleri....@gmail.com>
> *To:* users@jena.apache.org
> *Sent:* Monday, 28 December 2015 9:05 PM
> *Subject:* Re: Parsing UDFs...
>
> I also need help figuring out whether Apache Jena can be installed on
> Windows, I have not yet quite managed to find a suitable installation guide
> which explains how to start-up and stop the service, or something of the
> sort, I just downloaded the bunch of files and I realized that they can be
> uncompressed.
>
> On 28 December 2015 at 09:04, Jonathan Camilleri <camilleri....@gmail.com>
> wrote:
>
> > I am trying to come up with an algorithm that parses and creates a
> machine
> > learning algorithm e.g. classifying URLs read from RDF files into
> > categories.
> >
> > The examples I have found so far were a bit limiting so I am asking if
> > there is any project that is worth mimicking.  I have done some
> experiments
> > with Eclipse but they were not very complete so far, I am now stuck at
> > trying to understand what syntax to use to read particular parts of a UDF
> > file.
> >
> > I have read tutorials at W3C as well, they appear to provide information
> > on the file formats.
> >
> > Further reading
> > 1. https://en.wikipedia.org/wiki/Bag-of-words_model
> > 2. http://nlp.stanford.edu/software/CRF-NER.shtml
> >
> > See attachments.
> >
> > --
> > Jonathan Camilleri
> >
> > Mobile (MT): ++356 7982 7113
> > E-mail: camilleri....@gmail.com
> > Please consider your environmental responsibility before printing this
> > e-mail.
> >
> > I usually reply to emails within 2 business days.  If it's urgent, give
> me
> > a call.
>
> >
> >
>
>
> --
> Jonathan Camilleri
>
> Mobile (MT): ++356 7982 7113
> E-mail: camilleri....@gmail.com
> Please consider your environmental responsibility before printing this
> e-mail.
>
> I usually reply to emails within 2 business days.  If it's urgent, give me
> a call.
>
>
>


-- 
Jonathan Camilleri

Mobile (MT): ++356 7982 7113
E-mail: camilleri....@gmail.com
Please consider your environmental responsibility before printing this
e-mail.

I usually reply to emails within 2 business days.  If it's urgent, give me
a call.


   

Reply via email to