Hi

Rather than working on the Workaround I decided to invest some time in
finishing STANBOL-140 and implementing STANBOL-287.
Together with the proposal made in [1] to split up the default data in
several bundles this should solve the issues described/discussed here.

best
Rupert

[1] http://markmail.org/message/bf7qurmzos45h23b

On Thu, Jul 14, 2011 at 8:34 AM, Rupert Westenthaler
<[email protected]> wrote:
> On Thu, Jul 14, 2011 at 8:30 AM, David Riccitelli
> <[email protected]> wrote:
>> Thanks Rupert,
>>
>> A description on how to do this is available in [1].
>>
>>
>> I can't see the [1] :-)
>
> does this count as missing attachment? ^^
>
> [1] 
> http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/
>
>>
>> David
>>
>> On Thu, Jul 14, 2011 at 8:56 AM, Rupert Westenthaler <
>> [email protected]> wrote:
>>
>>> Hi
>>>
>>> Yes this is possible, but would need (depending on the hardware) quite
>>> some time.
>>> A description on how to do this is available in [1].
>>>
>>> Instead of installing the dbpedia.solrindex.zip file as described in
>>> the readme, you could directly
>>>
>>> * shutdown stanbol
>>> * delete the "dbpedia_43k" index in
>>> "{stanbol-root}/sling/entityhub/solrYard/indexes"
>>> * copy the index located in the
>>> "{indexing-root}/indexing/destination/indexes" to
>>> "{stanbol-root}/sling/entityhub/solrYard/indexes" and rename it to
>>> "dbpedia_43k"
>>> * restart stanbol.
>>>
>>> After that Stanbol should use the new index.
>>>
>>> Copying the "dbpedia.solrindex.zip" to the datafiles directory and
>>> than changing the value of "Solr Index/Core" in the configuration of
>>> the SolrYard for dbPedia form "dbpedia_43k" to "dbpedia" should also
>>> work.
>>>
>>> best
>>> Rupert
>>>
>>> On Wed, Jul 13, 2011 at 11:58 AM, David Riccitelli
>>> <[email protected]> wrote:
>>> > Hi,
>>> >
>>> > As another workaround, I was thinking that I could actually generate
>>> locally
>>> > the DBpedia index with all the data using the dumps (
>>> > http://wiki.dbpedia.org/Downloads36), in a way similar to the
>>> dbpedia_43k.
>>> >
>>> > What do you think?
>>> >
>>> > Thanks,
>>> > David
>>> >
>>> > On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler <
>>> > [email protected]> wrote:
>>> >
>>> >> Hi
>>> >>
>>> >> I will try to find some time in the evening to reproduce this.
>>> >>
>>> >> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli
>>> >> <[email protected]> wrote:
>>> >> > Thanks Rupert,
>>> >> >
>>> >> > I'm trying to follow your instructions but I encounter a couple of
>>> issues
>>> >> > (probably due to inexperience):
>>> >> >  [1] when dropping the config files, they enter some loop of
>>> >> > REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
>>> >> > bundle), is that normal?
>>> >>
>>> >> This is very strange and should not be caused by the FileInstaller.
>>> >> Maybe there is some loop between the Sling Installer - trying to
>>> >> install the default configuration and the FileInstaller that may cause
>>> >> this under some circumstances.
>>> >>
>>> >> >  [2] after I restart Stanbol, and try to query an entity from the
>>> >> entityhub
>>> >> > I receive the following error:
>>> >> >
>>> >> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
>>> >> > org.apache.felix.http.jetty /entityhub/sites/entity/
>>> >> > (java.lang.IllegalStateException: Unable to initialize the Cache with
>>> >> Yard
>>> >> > dbpediaCache! This is usually caused by Errors while reading the Cache
>>> >> > Configuration from the Yard.) java.lang.IllegalStateException: Unable
>>> to
>>> >> > initialize the Cache with Yard dbpediaCache! This is usually caused by
>>> >> > Errors while reading the Cache Configuration from the Yard.
>>> >> > at
>>> >> >
>>> >>
>>> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>>> >> >
>>> >> >
>>> >> > Do I need to initialize the Cache in some way?
>>> >> >
>>> >> No it does not. Prepared in Indexes do include a document that
>>> >> provides a list of the indexed fields. In future this may be used to
>>> >> determine if a query can be successfully executed on the local index
>>> >> or not. In addition this is used in case an Entity within the index is
>>> >> updated with an newer version.
>>> >> However this configuration is optional and is not required. This
>>> >> Exception should only appear if the document is present but illegal
>>> >> formatted. However the SolrYard initialized for the dbpediaCache
>>> >> should be empty.
>>> >>
>>> >> Therefore I think it is somehow related to the above problem of
>>> >> overriding configurations.
>>> >>
>>> >> In general the way how the default configuration is loaded is
>>> >> sub-optional in the moment. Especially using a single defaultdata
>>> >> bundle for both the OpenNLP models and the dbpedia configuration +
>>> >> default index was not a good Idea, because one can not exclude/change
>>> >> the dbpedia stuff without affecting other components that depend on
>>> >> OpenNLP.
>>> >> Therefore I think we need to discuss how to better structure the
>>> >> configurations and data needed to run stanbol.
>>> >>
>>> >> There is also an other issue that the SolrYard only once copies
>>> >> provided indexes and does not check for updates. This would it make
>>> >> hard the upgrade from the small index provided with the default data
>>> >> to a bigger version.
>>> >>
>>> >> Both this things are related to the problems and need to be addressed
>>> >> before the first stanbol release. Independent of those I will try to
>>> >> find a simple solution for what you intend to do.
>>> >>
>>> >> In the meantime I suggest you go for the initially proposed workaround.
>>> >>
>>> >> best
>>> >> Rupert Westenthaler
>>> >>
>>> >> > Thanks for your help,
>>> >> >
>>> >> > David
>>> >> >
>>> >> >
>>> >> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
>>> >> > [email protected]> wrote:
>>> >> >
>>> >> >> Hi
>>> >> >>
>>> >> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
>>> >> >> <[email protected]> wrote:
>>> >> >> > I solved in the same way, but loosing the caching capabilities.
>>> >> >> > Is there any possibility to keep both all the data and the cache?
>>> >> >> >
>>> >> >> > Andrea
>>> >> >> >
>>> >> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
>>> >> >> >
>>> >> >> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
>>> >> >> >>
>>> >> >> >> Thanks,
>>> >> >> >> David
>>> >> >> >>
>>> >> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
>>> >> >> >> [email protected]> wrote:
>>> >> >> >>
>>> >> >> >>> Hi Rupert,
>>> >> >> >>>
>>> >> >> >>> I recently updated the Stanbol install, and I found that the RDF
>>> >> >> returned
>>> >> >> >>> by the EntityHub is missing some props (specifically the dbprop
>>> as
>>> >> far
>>> >> >> as I
>>> >> >> >>> can see).
>>> >> >> >>>
>>> >> >> >>> This is the command that I use for testing:
>>> >> >> >>> curl -H "accept: application/rdf+xml" "
>>> >> >> >>>
>>> >> >>
>>> >>
>>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
>>> >> >> >>> "
>>> >> >> >>>
>>> >> >> >>> which outputs the attached RDF file.
>>> >> >> >>>
>>> >> >> >>> I cleared all of the sling folder (rm -fr sling) and checked the
>>> >> with
>>> >> >> the
>>> >> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
>>> >> >> >>>
>>> >> >> >>> Does this depend on the mapping.txt file?
>>> >> >> >>>
>>> >> >>
>>> >> >> If you plan to create your own dbpedia index, than the mapping.txt
>>> >> >> file would be the way how to configure what properties are
>>> >> >> includes/excluded.
>>> >> >> Typically dbprop values are low quality. They are just naive 1:1
>>> >> >> mappings of key value pairs as found in the info boxes. Because of
>>> >> >> this they are excluded from the indexes.
>>> >> >>
>>> >> >> At runtime the returned data depend on the used Cache strategy:
>>> >> >>
>>> >> >> Currently there are three possibilities (configured with the
>>> referenced
>>> >> >> Site)
>>> >> >> 1) no cache: bot queries and retrieval so use a remote service
>>> >> >> 2) used: Queries are executed by the remote service. Retrieved
>>> >> >> Entities are stored locally. The cached data depend on the mappings
>>> >> >> defined for the cache.
>>> >> >> 3) all: Both queries and retrieval are based on the cache. The remote
>>> >> >> service are only used as fallback in the case that the cache is not
>>> >> >> available (e.g. if you deactivate solrYard).
>>> >> >>
>>> >> >> So if you you are fine with (2) than you could use the configuration
>>> >> >> as previously used by the stable launcher [1].
>>> >> >> I think the easiest way to install this is to use this is to add the
>>> >> >> Felix File Installer [2] to the Stanbol Environment. You will need to
>>> >> >> delete the current referencedSite for dbpedia first and than add the
>>> >> >> three configuration files as described by [1].
>>> >> >>
>>> >> >> If your requirements are not covered by the currently available
>>> option
>>> >> >> it would be nice if you could write a short user story, because I am
>>> >> >> thinking about how to improve this feature and input like that would
>>> >> >> be really valuable.
>>> >> >>
>>> >> >> best
>>> >> >> Rupert Westenthaler
>>> >> >>
>>> >> >> [1] The dbpedia config consists of three files. the referenced site,
>>> >> >> cache and solryard components with the "-dbpedia" endings.
>>> >> >>
>>> >> >>
>>> >>
>>> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
>>> >> >>
>>> >> >> [2] http://felix.apache.org/site/apache-felix-file-install.html
>>> >> >>
>>> >> >> p.s. I keep this part because it describes very well how the cache
>>> >> >> strategy "used" work:
>>> >> >> >>>>> Hi David
>>> >> >> >>>>>
>>> >> >> >>>>> Assuming that you are using the default distribution of Apache
>>> >> >> Stanbol.
>>> >> >> >>>>>
>>> >> >> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will
>>> be
>>> >> >> >>>>> - only the first time answered by retrieving the Entity form
>>> >> >> DBpedia.org
>>> >> >> >>>>> - the Information are cached in a local cache. By that values
>>> of
>>> >> the
>>> >> >> >>>>> documents are filtered (see (a) for details)
>>> >> >> >>>>> - the cached version is returned
>>> >> >> >>>>>
>>> >> >> >>>>> (a) The default configuration for dbpedia stores all fields
>>> >> however
>>> >> >> >>>>> filters values for literals so that only values with the
>>> language
>>> >> >> "en,
>>> >> >> >>>>> de, fr, it, es" or no language are stored.
>>> >> >> >>>>>
>>> >> >> >>>>>
>>> >> >> >>>>> Assuming that you have started for zero when updating to a new
>>> >> >> version
>>> >> >> >>>>> this also means that you have downloaded a new version of this
>>> >> Entity
>>> >> >> >>>>> from dbPedia.
>>> >> >> >>>>>
>>> >> >>
>>> >> >> --
>>> >> >> | Rupert Westenthaler             [email protected]
>>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>>> >> >> | A-5500 Bischofshofen
>>> >> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > David Riccitelli
>>> >> >
>>> >> > Interact SpA
>>> >> > Via A. Bargoni 78 (scala F)
>>> >> > 00153 Roma
>>> >> >
>>> >> > T +39 06 58318 301
>>> >> > F +39 06 58318 303
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> | Rupert Westenthaler             [email protected]
>>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>>> >> | A-5500 Bischofshofen
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > David Riccitelli
>>> >
>>> > Interact SpA
>>> > Via A. Bargoni 78 (scala F)
>>> > 00153 Roma
>>> >
>>> > T +39 06 58318 301
>>> > F +39 06 58318 303
>>> >
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             [email protected]
>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>>
>>
>>
>> --
>> David Riccitelli
>>
>> Interact SpA
>> Via A. Bargoni 78 (scala F)
>> 00153 Roma
>>
>> T +39 06 58318 301
>> F +39 06 58318 303
>>
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to