Hi

I will try to find some time in the evening to reproduce this.

On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli
<[email protected]> wrote:
> Thanks Rupert,
>
> I'm trying to follow your instructions but I encounter a couple of issues
> (probably due to inexperience):
>  [1] when dropping the config files, they enter some loop of
> REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
> bundle), is that normal?

This is very strange and should not be caused by the FileInstaller.
Maybe there is some loop between the Sling Installer - trying to
install the default configuration and the FileInstaller that may cause
this under some circumstances.

>  [2] after I restart Stanbol, and try to query an entity from the entityhub
> I receive the following error:
>
> 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
> org.apache.felix.http.jetty /entityhub/sites/entity/
> (java.lang.IllegalStateException: Unable to initialize the Cache with Yard
> dbpediaCache! This is usually caused by Errors while reading the Cache
> Configuration from the Yard.) java.lang.IllegalStateException: Unable to
> initialize the Cache with Yard dbpediaCache! This is usually caused by
> Errors while reading the Cache Configuration from the Yard.
> at
> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>
>
> Do I need to initialize the Cache in some way?
>
No it does not. Prepared in Indexes do include a document that
provides a list of the indexed fields. In future this may be used to
determine if a query can be successfully executed on the local index
or not. In addition this is used in case an Entity within the index is
updated with an newer version.
However this configuration is optional and is not required. This
Exception should only appear if the document is present but illegal
formatted. However the SolrYard initialized for the dbpediaCache
should be empty.

Therefore I think it is somehow related to the above problem of
overriding configurations.

In general the way how the default configuration is loaded is
sub-optional in the moment. Especially using a single defaultdata
bundle for both the OpenNLP models and the dbpedia configuration +
default index was not a good Idea, because one can not exclude/change
the dbpedia stuff without affecting other components that depend on
OpenNLP.
Therefore I think we need to discuss how to better structure the
configurations and data needed to run stanbol.

There is also an other issue that the SolrYard only once copies
provided indexes and does not check for updates. This would it make
hard the upgrade from the small index provided with the default data
to a bigger version.

Both this things are related to the problems and need to be addressed
before the first stanbol release. Independent of those I will try to
find a simple solution for what you intend to do.

In the meantime I suggest you go for the initially proposed workaround.

best
Rupert Westenthaler

> Thanks for your help,
>
> David
>
>
> On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
> [email protected]> wrote:
>
>> Hi
>>
>> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
>> <[email protected]> wrote:
>> > I solved in the same way, but loosing the caching capabilities.
>> > Is there any possibility to keep both all the data and the cache?
>> >
>> > Andrea
>> >
>> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
>> >
>> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
>> >>
>> >> Thanks,
>> >> David
>> >>
>> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
>> >> [email protected]> wrote:
>> >>
>> >>> Hi Rupert,
>> >>>
>> >>> I recently updated the Stanbol install, and I found that the RDF
>> returned
>> >>> by the EntityHub is missing some props (specifically the dbprop as far
>> as I
>> >>> can see).
>> >>>
>> >>> This is the command that I use for testing:
>> >>> curl -H "accept: application/rdf+xml" "
>> >>>
>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
>> >>> "
>> >>>
>> >>> which outputs the attached RDF file.
>> >>>
>> >>> I cleared all of the sling folder (rm -fr sling) and checked the with
>> the
>> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
>> >>>
>> >>> Does this depend on the mapping.txt file?
>> >>>
>>
>> If you plan to create your own dbpedia index, than the mapping.txt
>> file would be the way how to configure what properties are
>> includes/excluded.
>> Typically dbprop values are low quality. They are just naive 1:1
>> mappings of key value pairs as found in the info boxes. Because of
>> this they are excluded from the indexes.
>>
>> At runtime the returned data depend on the used Cache strategy:
>>
>> Currently there are three possibilities (configured with the referenced
>> Site)
>> 1) no cache: bot queries and retrieval so use a remote service
>> 2) used: Queries are executed by the remote service. Retrieved
>> Entities are stored locally. The cached data depend on the mappings
>> defined for the cache.
>> 3) all: Both queries and retrieval are based on the cache. The remote
>> service are only used as fallback in the case that the cache is not
>> available (e.g. if you deactivate solrYard).
>>
>> So if you you are fine with (2) than you could use the configuration
>> as previously used by the stable launcher [1].
>> I think the easiest way to install this is to use this is to add the
>> Felix File Installer [2] to the Stanbol Environment. You will need to
>> delete the current referencedSite for dbpedia first and than add the
>> three configuration files as described by [1].
>>
>> If your requirements are not covered by the currently available option
>> it would be nice if you could write a short user story, because I am
>> thinking about how to improve this feature and input like that would
>> be really valuable.
>>
>> best
>> Rupert Westenthaler
>>
>> [1] The dbpedia config consists of three files. the referenced site,
>> cache and solryard components with the "-dbpedia" endings.
>>
>> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
>>
>> [2] http://felix.apache.org/site/apache-felix-file-install.html
>>
>> p.s. I keep this part because it describes very well how the cache
>> strategy "used" work:
>> >>>>> Hi David
>> >>>>>
>> >>>>> Assuming that you are using the default distribution of Apache
>> Stanbol.
>> >>>>>
>> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will be
>> >>>>> - only the first time answered by retrieving the Entity form
>> DBpedia.org
>> >>>>> - the Information are cached in a local cache. By that values of the
>> >>>>> documents are filtered (see (a) for details)
>> >>>>> - the cached version is returned
>> >>>>>
>> >>>>> (a) The default configuration for dbpedia stores all fields however
>> >>>>> filters values for literals so that only values with the language
>> "en,
>> >>>>> de, fr, it, es" or no language are stored.
>> >>>>>
>> >>>>>
>> >>>>> Assuming that you have started for zero when updating to a new
>> version
>> >>>>> this also means that you have downloaded a new version of this Entity
>> >>>>> from dbPedia.
>> >>>>>
>>
>> --
>> | Rupert Westenthaler             [email protected]
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>
>
> --
> David Riccitelli
>
> Interact SpA
> Via A. Bargoni 78 (scala F)
> 00153 Roma
>
> T +39 06 58318 301
> F +39 06 58318 303
>



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to