Hi,

As another workaround, I was thinking that I could actually generate locally
the DBpedia index with all the data using the dumps (
http://wiki.dbpedia.org/Downloads36), in a way similar to the dbpedia_43k.

What do you think?

Thanks,
David

On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler <
[email protected]> wrote:

> Hi
>
> I will try to find some time in the evening to reproduce this.
>
> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli
> <[email protected]> wrote:
> > Thanks Rupert,
> >
> > I'm trying to follow your instructions but I encounter a couple of issues
> > (probably due to inexperience):
> >  [1] when dropping the config files, they enter some loop of
> > REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
> > bundle), is that normal?
>
> This is very strange and should not be caused by the FileInstaller.
> Maybe there is some loop between the Sling Installer - trying to
> install the default configuration and the FileInstaller that may cause
> this under some circumstances.
>
> >  [2] after I restart Stanbol, and try to query an entity from the
> entityhub
> > I receive the following error:
> >
> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
> > org.apache.felix.http.jetty /entityhub/sites/entity/
> > (java.lang.IllegalStateException: Unable to initialize the Cache with
> Yard
> > dbpediaCache! This is usually caused by Errors while reading the Cache
> > Configuration from the Yard.) java.lang.IllegalStateException: Unable to
> > initialize the Cache with Yard dbpediaCache! This is usually caused by
> > Errors while reading the Cache Configuration from the Yard.
> > at
> >
> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
> >
> >
> > Do I need to initialize the Cache in some way?
> >
> No it does not. Prepared in Indexes do include a document that
> provides a list of the indexed fields. In future this may be used to
> determine if a query can be successfully executed on the local index
> or not. In addition this is used in case an Entity within the index is
> updated with an newer version.
> However this configuration is optional and is not required. This
> Exception should only appear if the document is present but illegal
> formatted. However the SolrYard initialized for the dbpediaCache
> should be empty.
>
> Therefore I think it is somehow related to the above problem of
> overriding configurations.
>
> In general the way how the default configuration is loaded is
> sub-optional in the moment. Especially using a single defaultdata
> bundle for both the OpenNLP models and the dbpedia configuration +
> default index was not a good Idea, because one can not exclude/change
> the dbpedia stuff without affecting other components that depend on
> OpenNLP.
> Therefore I think we need to discuss how to better structure the
> configurations and data needed to run stanbol.
>
> There is also an other issue that the SolrYard only once copies
> provided indexes and does not check for updates. This would it make
> hard the upgrade from the small index provided with the default data
> to a bigger version.
>
> Both this things are related to the problems and need to be addressed
> before the first stanbol release. Independent of those I will try to
> find a simple solution for what you intend to do.
>
> In the meantime I suggest you go for the initially proposed workaround.
>
> best
> Rupert Westenthaler
>
> > Thanks for your help,
> >
> > David
> >
> >
> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
> > [email protected]> wrote:
> >
> >> Hi
> >>
> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
> >> <[email protected]> wrote:
> >> > I solved in the same way, but loosing the caching capabilities.
> >> > Is there any possibility to keep both all the data and the cache?
> >> >
> >> > Andrea
> >> >
> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
> >> >
> >> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
> >> >>
> >> >> Thanks,
> >> >> David
> >> >>
> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
> >> >> [email protected]> wrote:
> >> >>
> >> >>> Hi Rupert,
> >> >>>
> >> >>> I recently updated the Stanbol install, and I found that the RDF
> >> returned
> >> >>> by the EntityHub is missing some props (specifically the dbprop as
> far
> >> as I
> >> >>> can see).
> >> >>>
> >> >>> This is the command that I use for testing:
> >> >>> curl -H "accept: application/rdf+xml" "
> >> >>>
> >>
> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
> >> >>> "
> >> >>>
> >> >>> which outputs the attached RDF file.
> >> >>>
> >> >>> I cleared all of the sling folder (rm -fr sling) and checked the
> with
> >> the
> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
> >> >>>
> >> >>> Does this depend on the mapping.txt file?
> >> >>>
> >>
> >> If you plan to create your own dbpedia index, than the mapping.txt
> >> file would be the way how to configure what properties are
> >> includes/excluded.
> >> Typically dbprop values are low quality. They are just naive 1:1
> >> mappings of key value pairs as found in the info boxes. Because of
> >> this they are excluded from the indexes.
> >>
> >> At runtime the returned data depend on the used Cache strategy:
> >>
> >> Currently there are three possibilities (configured with the referenced
> >> Site)
> >> 1) no cache: bot queries and retrieval so use a remote service
> >> 2) used: Queries are executed by the remote service. Retrieved
> >> Entities are stored locally. The cached data depend on the mappings
> >> defined for the cache.
> >> 3) all: Both queries and retrieval are based on the cache. The remote
> >> service are only used as fallback in the case that the cache is not
> >> available (e.g. if you deactivate solrYard).
> >>
> >> So if you you are fine with (2) than you could use the configuration
> >> as previously used by the stable launcher [1].
> >> I think the easiest way to install this is to use this is to add the
> >> Felix File Installer [2] to the Stanbol Environment. You will need to
> >> delete the current referencedSite for dbpedia first and than add the
> >> three configuration files as described by [1].
> >>
> >> If your requirements are not covered by the currently available option
> >> it would be nice if you could write a short user story, because I am
> >> thinking about how to improve this feature and input like that would
> >> be really valuable.
> >>
> >> best
> >> Rupert Westenthaler
> >>
> >> [1] The dbpedia config consists of three files. the referenced site,
> >> cache and solryard components with the "-dbpedia" endings.
> >>
> >>
> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
> >>
> >> [2] http://felix.apache.org/site/apache-felix-file-install.html
> >>
> >> p.s. I keep this part because it describes very well how the cache
> >> strategy "used" work:
> >> >>>>> Hi David
> >> >>>>>
> >> >>>>> Assuming that you are using the default distribution of Apache
> >> Stanbol.
> >> >>>>>
> >> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will be
> >> >>>>> - only the first time answered by retrieving the Entity form
> >> DBpedia.org
> >> >>>>> - the Information are cached in a local cache. By that values of
> the
> >> >>>>> documents are filtered (see (a) for details)
> >> >>>>> - the cached version is returned
> >> >>>>>
> >> >>>>> (a) The default configuration for dbpedia stores all fields
> however
> >> >>>>> filters values for literals so that only values with the language
> >> "en,
> >> >>>>> de, fr, it, es" or no language are stored.
> >> >>>>>
> >> >>>>>
> >> >>>>> Assuming that you have started for zero when updating to a new
> >> version
> >> >>>>> this also means that you have downloaded a new version of this
> Entity
> >> >>>>> from dbPedia.
> >> >>>>>
> >>
> >> --
> >> | Rupert Westenthaler             [email protected]
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
> >
> >
> >
> > --
> > David Riccitelli
> >
> > Interact SpA
> > Via A. Bargoni 78 (scala F)
> > 00153 Roma
> >
> > T +39 06 58318 301
> > F +39 06 58318 303
> >
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
David Riccitelli

Interact SpA
Via A. Bargoni 78 (scala F)
00153 Roma

T +39 06 58318 301
F +39 06 58318 303

Reply via email to