Hi

Yes this is possible, but would need (depending on the hardware) quite
some time.
A description on how to do this is available in [1].

Instead of installing the dbpedia.solrindex.zip file as described in
the readme, you could directly

* shutdown stanbol
* delete the "dbpedia_43k" index in
"{stanbol-root}/sling/entityhub/solrYard/indexes"
* copy the index located in the
"{indexing-root}/indexing/destination/indexes" to
"{stanbol-root}/sling/entityhub/solrYard/indexes" and rename it to
"dbpedia_43k"
* restart stanbol.

After that Stanbol should use the new index.

Copying the "dbpedia.solrindex.zip" to the datafiles directory and
than changing the value of "Solr Index/Core" in the configuration of
the SolrYard for dbPedia form "dbpedia_43k" to "dbpedia" should also
work.

best
Rupert

On Wed, Jul 13, 2011 at 11:58 AM, David Riccitelli
<[email protected]> wrote:
> Hi,
>
> As another workaround, I was thinking that I could actually generate locally
> the DBpedia index with all the data using the dumps (
> http://wiki.dbpedia.org/Downloads36), in a way similar to the dbpedia_43k.
>
> What do you think?
>
> Thanks,
> David
>
> On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler <
> [email protected]> wrote:
>
>> Hi
>>
>> I will try to find some time in the evening to reproduce this.
>>
>> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli
>> <[email protected]> wrote:
>> > Thanks Rupert,
>> >
>> > I'm trying to follow your instructions but I encounter a couple of issues
>> > (probably due to inexperience):
>> >  [1] when dropping the config files, they enter some loop of
>> > REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
>> > bundle), is that normal?
>>
>> This is very strange and should not be caused by the FileInstaller.
>> Maybe there is some loop between the Sling Installer - trying to
>> install the default configuration and the FileInstaller that may cause
>> this under some circumstances.
>>
>> >  [2] after I restart Stanbol, and try to query an entity from the
>> entityhub
>> > I receive the following error:
>> >
>> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
>> > org.apache.felix.http.jetty /entityhub/sites/entity/
>> > (java.lang.IllegalStateException: Unable to initialize the Cache with
>> Yard
>> > dbpediaCache! This is usually caused by Errors while reading the Cache
>> > Configuration from the Yard.) java.lang.IllegalStateException: Unable to
>> > initialize the Cache with Yard dbpediaCache! This is usually caused by
>> > Errors while reading the Cache Configuration from the Yard.
>> > at
>> >
>> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>> >
>> >
>> > Do I need to initialize the Cache in some way?
>> >
>> No it does not. Prepared in Indexes do include a document that
>> provides a list of the indexed fields. In future this may be used to
>> determine if a query can be successfully executed on the local index
>> or not. In addition this is used in case an Entity within the index is
>> updated with an newer version.
>> However this configuration is optional and is not required. This
>> Exception should only appear if the document is present but illegal
>> formatted. However the SolrYard initialized for the dbpediaCache
>> should be empty.
>>
>> Therefore I think it is somehow related to the above problem of
>> overriding configurations.
>>
>> In general the way how the default configuration is loaded is
>> sub-optional in the moment. Especially using a single defaultdata
>> bundle for both the OpenNLP models and the dbpedia configuration +
>> default index was not a good Idea, because one can not exclude/change
>> the dbpedia stuff without affecting other components that depend on
>> OpenNLP.
>> Therefore I think we need to discuss how to better structure the
>> configurations and data needed to run stanbol.
>>
>> There is also an other issue that the SolrYard only once copies
>> provided indexes and does not check for updates. This would it make
>> hard the upgrade from the small index provided with the default data
>> to a bigger version.
>>
>> Both this things are related to the problems and need to be addressed
>> before the first stanbol release. Independent of those I will try to
>> find a simple solution for what you intend to do.
>>
>> In the meantime I suggest you go for the initially proposed workaround.
>>
>> best
>> Rupert Westenthaler
>>
>> > Thanks for your help,
>> >
>> > David
>> >
>> >
>> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
>> > [email protected]> wrote:
>> >
>> >> Hi
>> >>
>> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
>> >> <[email protected]> wrote:
>> >> > I solved in the same way, but loosing the caching capabilities.
>> >> > Is there any possibility to keep both all the data and the cache?
>> >> >
>> >> > Andrea
>> >> >
>> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
>> >> >
>> >> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
>> >> >>
>> >> >> Thanks,
>> >> >> David
>> >> >>
>> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
>> >> >> [email protected]> wrote:
>> >> >>
>> >> >>> Hi Rupert,
>> >> >>>
>> >> >>> I recently updated the Stanbol install, and I found that the RDF
>> >> returned
>> >> >>> by the EntityHub is missing some props (specifically the dbprop as
>> far
>> >> as I
>> >> >>> can see).
>> >> >>>
>> >> >>> This is the command that I use for testing:
>> >> >>> curl -H "accept: application/rdf+xml" "
>> >> >>>
>> >>
>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
>> >> >>> "
>> >> >>>
>> >> >>> which outputs the attached RDF file.
>> >> >>>
>> >> >>> I cleared all of the sling folder (rm -fr sling) and checked the
>> with
>> >> the
>> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
>> >> >>>
>> >> >>> Does this depend on the mapping.txt file?
>> >> >>>
>> >>
>> >> If you plan to create your own dbpedia index, than the mapping.txt
>> >> file would be the way how to configure what properties are
>> >> includes/excluded.
>> >> Typically dbprop values are low quality. They are just naive 1:1
>> >> mappings of key value pairs as found in the info boxes. Because of
>> >> this they are excluded from the indexes.
>> >>
>> >> At runtime the returned data depend on the used Cache strategy:
>> >>
>> >> Currently there are three possibilities (configured with the referenced
>> >> Site)
>> >> 1) no cache: bot queries and retrieval so use a remote service
>> >> 2) used: Queries are executed by the remote service. Retrieved
>> >> Entities are stored locally. The cached data depend on the mappings
>> >> defined for the cache.
>> >> 3) all: Both queries and retrieval are based on the cache. The remote
>> >> service are only used as fallback in the case that the cache is not
>> >> available (e.g. if you deactivate solrYard).
>> >>
>> >> So if you you are fine with (2) than you could use the configuration
>> >> as previously used by the stable launcher [1].
>> >> I think the easiest way to install this is to use this is to add the
>> >> Felix File Installer [2] to the Stanbol Environment. You will need to
>> >> delete the current referencedSite for dbpedia first and than add the
>> >> three configuration files as described by [1].
>> >>
>> >> If your requirements are not covered by the currently available option
>> >> it would be nice if you could write a short user story, because I am
>> >> thinking about how to improve this feature and input like that would
>> >> be really valuable.
>> >>
>> >> best
>> >> Rupert Westenthaler
>> >>
>> >> [1] The dbpedia config consists of three files. the referenced site,
>> >> cache and solryard components with the "-dbpedia" endings.
>> >>
>> >>
>> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
>> >>
>> >> [2] http://felix.apache.org/site/apache-felix-file-install.html
>> >>
>> >> p.s. I keep this part because it describes very well how the cache
>> >> strategy "used" work:
>> >> >>>>> Hi David
>> >> >>>>>
>> >> >>>>> Assuming that you are using the default distribution of Apache
>> >> Stanbol.
>> >> >>>>>
>> >> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will be
>> >> >>>>> - only the first time answered by retrieving the Entity form
>> >> DBpedia.org
>> >> >>>>> - the Information are cached in a local cache. By that values of
>> the
>> >> >>>>> documents are filtered (see (a) for details)
>> >> >>>>> - the cached version is returned
>> >> >>>>>
>> >> >>>>> (a) The default configuration for dbpedia stores all fields
>> however
>> >> >>>>> filters values for literals so that only values with the language
>> >> "en,
>> >> >>>>> de, fr, it, es" or no language are stored.
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> Assuming that you have started for zero when updating to a new
>> >> version
>> >> >>>>> this also means that you have downloaded a new version of this
>> Entity
>> >> >>>>> from dbPedia.
>> >> >>>>>
>> >>
>> >> --
>> >> | Rupert Westenthaler             [email protected]
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>> >
>> >
>> >
>> > --
>> > David Riccitelli
>> >
>> > Interact SpA
>> > Via A. Bargoni 78 (scala F)
>> > 00153 Roma
>> >
>> > T +39 06 58318 301
>> > F +39 06 58318 303
>> >
>>
>>
>>
>> --
>> | Rupert Westenthaler             [email protected]
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>
>
> --
> David Riccitelli
>
> Interact SpA
> Via A. Bargoni 78 (scala F)
> 00153 Roma
>
> T +39 06 58318 301
> F +39 06 58318 303
>



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to