Hi Rather than working on the Workaround I decided to invest some time in finishing STANBOL-140 and implementing STANBOL-287. Together with the proposal made in [1] to split up the default data in several bundles this should solve the issues described/discussed here.
best Rupert [1] http://markmail.org/message/bf7qurmzos45h23b On Thu, Jul 14, 2011 at 8:34 AM, Rupert Westenthaler <[email protected]> wrote: > On Thu, Jul 14, 2011 at 8:30 AM, David Riccitelli > <[email protected]> wrote: >> Thanks Rupert, >> >> A description on how to do this is available in [1]. >> >> >> I can't see the [1] :-) > > does this count as missing attachment? ^^ > > [1] > http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/ > >> >> David >> >> On Thu, Jul 14, 2011 at 8:56 AM, Rupert Westenthaler < >> [email protected]> wrote: >> >>> Hi >>> >>> Yes this is possible, but would need (depending on the hardware) quite >>> some time. >>> A description on how to do this is available in [1]. >>> >>> Instead of installing the dbpedia.solrindex.zip file as described in >>> the readme, you could directly >>> >>> * shutdown stanbol >>> * delete the "dbpedia_43k" index in >>> "{stanbol-root}/sling/entityhub/solrYard/indexes" >>> * copy the index located in the >>> "{indexing-root}/indexing/destination/indexes" to >>> "{stanbol-root}/sling/entityhub/solrYard/indexes" and rename it to >>> "dbpedia_43k" >>> * restart stanbol. >>> >>> After that Stanbol should use the new index. >>> >>> Copying the "dbpedia.solrindex.zip" to the datafiles directory and >>> than changing the value of "Solr Index/Core" in the configuration of >>> the SolrYard for dbPedia form "dbpedia_43k" to "dbpedia" should also >>> work. >>> >>> best >>> Rupert >>> >>> On Wed, Jul 13, 2011 at 11:58 AM, David Riccitelli >>> <[email protected]> wrote: >>> > Hi, >>> > >>> > As another workaround, I was thinking that I could actually generate >>> locally >>> > the DBpedia index with all the data using the dumps ( >>> > http://wiki.dbpedia.org/Downloads36), in a way similar to the >>> dbpedia_43k. >>> > >>> > What do you think? >>> > >>> > Thanks, >>> > David >>> > >>> > On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler < >>> > [email protected]> wrote: >>> > >>> >> Hi >>> >> >>> >> I will try to find some time in the evening to reproduce this. >>> >> >>> >> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli >>> >> <[email protected]> wrote: >>> >> > Thanks Rupert, >>> >> > >>> >> > I'm trying to follow your instructions but I encounter a couple of >>> issues >>> >> > (probably due to inexperience): >>> >> > [1] when dropping the config files, they enter some loop of >>> >> > REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall >>> >> > bundle), is that normal? >>> >> >>> >> This is very strange and should not be caused by the FileInstaller. >>> >> Maybe there is some loop between the Sling Installer - trying to >>> >> install the default configuration and the FileInstaller that may cause >>> >> this under some circumstances. >>> >> >>> >> > [2] after I restart Stanbol, and try to query an entity from the >>> >> entityhub >>> >> > I receive the following error: >>> >> > >>> >> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0] >>> >> > org.apache.felix.http.jetty /entityhub/sites/entity/ >>> >> > (java.lang.IllegalStateException: Unable to initialize the Cache with >>> >> Yard >>> >> > dbpediaCache! This is usually caused by Errors while reading the Cache >>> >> > Configuration from the Yard.) java.lang.IllegalStateException: Unable >>> to >>> >> > initialize the Cache with Yard dbpediaCache! This is usually caused by >>> >> > Errors while reading the Cache Configuration from the Yard. >>> >> > at >>> >> > >>> >> >>> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214) >>> >> > >>> >> > >>> >> > Do I need to initialize the Cache in some way? >>> >> > >>> >> No it does not. Prepared in Indexes do include a document that >>> >> provides a list of the indexed fields. In future this may be used to >>> >> determine if a query can be successfully executed on the local index >>> >> or not. In addition this is used in case an Entity within the index is >>> >> updated with an newer version. >>> >> However this configuration is optional and is not required. This >>> >> Exception should only appear if the document is present but illegal >>> >> formatted. However the SolrYard initialized for the dbpediaCache >>> >> should be empty. >>> >> >>> >> Therefore I think it is somehow related to the above problem of >>> >> overriding configurations. >>> >> >>> >> In general the way how the default configuration is loaded is >>> >> sub-optional in the moment. Especially using a single defaultdata >>> >> bundle for both the OpenNLP models and the dbpedia configuration + >>> >> default index was not a good Idea, because one can not exclude/change >>> >> the dbpedia stuff without affecting other components that depend on >>> >> OpenNLP. >>> >> Therefore I think we need to discuss how to better structure the >>> >> configurations and data needed to run stanbol. >>> >> >>> >> There is also an other issue that the SolrYard only once copies >>> >> provided indexes and does not check for updates. This would it make >>> >> hard the upgrade from the small index provided with the default data >>> >> to a bigger version. >>> >> >>> >> Both this things are related to the problems and need to be addressed >>> >> before the first stanbol release. Independent of those I will try to >>> >> find a simple solution for what you intend to do. >>> >> >>> >> In the meantime I suggest you go for the initially proposed workaround. >>> >> >>> >> best >>> >> Rupert Westenthaler >>> >> >>> >> > Thanks for your help, >>> >> > >>> >> > David >>> >> > >>> >> > >>> >> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler < >>> >> > [email protected]> wrote: >>> >> > >>> >> >> Hi >>> >> >> >>> >> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese >>> >> >> <[email protected]> wrote: >>> >> >> > I solved in the same way, but loosing the caching capabilities. >>> >> >> > Is there any possibility to keep both all the data and the cache? >>> >> >> > >>> >> >> > Andrea >>> >> >> > >>> >> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote: >>> >> >> > >>> >> >> >> Ok, stopping the solrYard dbpedia_43k component solved for me. >>> >> >> >> >>> >> >> >> Thanks, >>> >> >> >> David >>> >> >> >> >>> >> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli < >>> >> >> >> [email protected]> wrote: >>> >> >> >> >>> >> >> >>> Hi Rupert, >>> >> >> >>> >>> >> >> >>> I recently updated the Stanbol install, and I found that the RDF >>> >> >> returned >>> >> >> >>> by the EntityHub is missing some props (specifically the dbprop >>> as >>> >> far >>> >> >> as I >>> >> >> >>> can see). >>> >> >> >>> >>> >> >> >>> This is the command that I use for testing: >>> >> >> >>> curl -H "accept: application/rdf+xml" " >>> >> >> >>> >>> >> >> >>> >> >>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi >>> >> >> >>> " >>> >> >> >>> >>> >> >> >>> which outputs the attached RDF file. >>> >> >> >>> >>> >> >> >>> I cleared all of the sling folder (rm -fr sling) and checked the >>> >> with >>> >> >> the >>> >> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it. >>> >> >> >>> >>> >> >> >>> Does this depend on the mapping.txt file? >>> >> >> >>> >>> >> >> >>> >> >> If you plan to create your own dbpedia index, than the mapping.txt >>> >> >> file would be the way how to configure what properties are >>> >> >> includes/excluded. >>> >> >> Typically dbprop values are low quality. They are just naive 1:1 >>> >> >> mappings of key value pairs as found in the info boxes. Because of >>> >> >> this they are excluded from the indexes. >>> >> >> >>> >> >> At runtime the returned data depend on the used Cache strategy: >>> >> >> >>> >> >> Currently there are three possibilities (configured with the >>> referenced >>> >> >> Site) >>> >> >> 1) no cache: bot queries and retrieval so use a remote service >>> >> >> 2) used: Queries are executed by the remote service. Retrieved >>> >> >> Entities are stored locally. The cached data depend on the mappings >>> >> >> defined for the cache. >>> >> >> 3) all: Both queries and retrieval are based on the cache. The remote >>> >> >> service are only used as fallback in the case that the cache is not >>> >> >> available (e.g. if you deactivate solrYard). >>> >> >> >>> >> >> So if you you are fine with (2) than you could use the configuration >>> >> >> as previously used by the stable launcher [1]. >>> >> >> I think the easiest way to install this is to use this is to add the >>> >> >> Felix File Installer [2] to the Stanbol Environment. You will need to >>> >> >> delete the current referencedSite for dbpedia first and than add the >>> >> >> three configuration files as described by [1]. >>> >> >> >>> >> >> If your requirements are not covered by the currently available >>> option >>> >> >> it would be nice if you could write a short user story, because I am >>> >> >> thinking about how to improve this feature and input like that would >>> >> >> be really valuable. >>> >> >> >>> >> >> best >>> >> >> Rupert Westenthaler >>> >> >> >>> >> >> [1] The dbpedia config consists of three files. the referenced site, >>> >> >> cache and solryard components with the "-dbpedia" endings. >>> >> >> >>> >> >> >>> >> >>> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181 >>> >> >> >>> >> >> [2] http://felix.apache.org/site/apache-felix-file-install.html >>> >> >> >>> >> >> p.s. I keep this part because it describes very well how the cache >>> >> >> strategy "used" work: >>> >> >> >>>>> Hi David >>> >> >> >>>>> >>> >> >> >>>>> Assuming that you are using the default distribution of Apache >>> >> >> Stanbol. >>> >> >> >>>>> >>> >> >> >>>>> Requests for http://dbpedia.org/resource/Valentino_Rossi will >>> be >>> >> >> >>>>> - only the first time answered by retrieving the Entity form >>> >> >> DBpedia.org >>> >> >> >>>>> - the Information are cached in a local cache. By that values >>> of >>> >> the >>> >> >> >>>>> documents are filtered (see (a) for details) >>> >> >> >>>>> - the cached version is returned >>> >> >> >>>>> >>> >> >> >>>>> (a) The default configuration for dbpedia stores all fields >>> >> however >>> >> >> >>>>> filters values for literals so that only values with the >>> language >>> >> >> "en, >>> >> >> >>>>> de, fr, it, es" or no language are stored. >>> >> >> >>>>> >>> >> >> >>>>> >>> >> >> >>>>> Assuming that you have started for zero when updating to a new >>> >> >> version >>> >> >> >>>>> this also means that you have downloaded a new version of this >>> >> Entity >>> >> >> >>>>> from dbPedia. >>> >> >> >>>>> >>> >> >> >>> >> >> -- >>> >> >> | Rupert Westenthaler [email protected] >>> >> >> | Bodenlehenstraße 11 ++43-699-11108907 >>> >> >> | A-5500 Bischofshofen >>> >> >> >>> >> > >>> >> > >>> >> > >>> >> > -- >>> >> > David Riccitelli >>> >> > >>> >> > Interact SpA >>> >> > Via A. Bargoni 78 (scala F) >>> >> > 00153 Roma >>> >> > >>> >> > T +39 06 58318 301 >>> >> > F +39 06 58318 303 >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> | Rupert Westenthaler [email protected] >>> >> | Bodenlehenstraße 11 ++43-699-11108907 >>> >> | A-5500 Bischofshofen >>> >> >>> > >>> > >>> > >>> > -- >>> > David Riccitelli >>> > >>> > Interact SpA >>> > Via A. Bargoni 78 (scala F) >>> > 00153 Roma >>> > >>> > T +39 06 58318 301 >>> > F +39 06 58318 303 >>> > >>> >>> >>> >>> -- >>> | Rupert Westenthaler [email protected] >>> | Bodenlehenstraße 11 ++43-699-11108907 >>> | A-5500 Bischofshofen >>> >> >> >> >> -- >> David Riccitelli >> >> Interact SpA >> Via A. Bargoni 78 (scala F) >> 00153 Roma >> >> T +39 06 58318 301 >> F +39 06 58318 303 >> > > > > -- > | Rupert Westenthaler [email protected] > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen > -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
