On Thu, Jul 14, 2011 at 8:30 AM, David Riccitelli <[email protected]> wrote: > Thanks Rupert, > > A description on how to do this is available in [1]. > > > I can't see the [1] :-)
does this count as missing attachment? ^^ [1] http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/ > > David > > On Thu, Jul 14, 2011 at 8:56 AM, Rupert Westenthaler < > [email protected]> wrote: > >> Hi >> >> Yes this is possible, but would need (depending on the hardware) quite >> some time. >> A description on how to do this is available in [1]. >> >> Instead of installing the dbpedia.solrindex.zip file as described in >> the readme, you could directly >> >> * shutdown stanbol >> * delete the "dbpedia_43k" index in >> "{stanbol-root}/sling/entityhub/solrYard/indexes" >> * copy the index located in the >> "{indexing-root}/indexing/destination/indexes" to >> "{stanbol-root}/sling/entityhub/solrYard/indexes" and rename it to >> "dbpedia_43k" >> * restart stanbol. >> >> After that Stanbol should use the new index. >> >> Copying the "dbpedia.solrindex.zip" to the datafiles directory and >> than changing the value of "Solr Index/Core" in the configuration of >> the SolrYard for dbPedia form "dbpedia_43k" to "dbpedia" should also >> work. >> >> best >> Rupert >> >> On Wed, Jul 13, 2011 at 11:58 AM, David Riccitelli >> <[email protected]> wrote: >> > Hi, >> > >> > As another workaround, I was thinking that I could actually generate >> locally >> > the DBpedia index with all the data using the dumps ( >> > http://wiki.dbpedia.org/Downloads36), in a way similar to the >> dbpedia_43k. >> > >> > What do you think? >> > >> > Thanks, >> > David >> > >> > On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler < >> > [email protected]> wrote: >> > >> >> Hi >> >> >> >> I will try to find some time in the evening to reproduce this. >> >> >> >> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli >> >> <[email protected]> wrote: >> >> > Thanks Rupert, >> >> > >> >> > I'm trying to follow your instructions but I encounter a couple of >> issues >> >> > (probably due to inexperience): >> >> > [1] when dropping the config files, they enter some loop of >> >> > REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall >> >> > bundle), is that normal? >> >> >> >> This is very strange and should not be caused by the FileInstaller. >> >> Maybe there is some loop between the Sling Installer - trying to >> >> install the default configuration and the FileInstaller that may cause >> >> this under some circumstances. >> >> >> >> > [2] after I restart Stanbol, and try to query an entity from the >> >> entityhub >> >> > I receive the following error: >> >> > >> >> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0] >> >> > org.apache.felix.http.jetty /entityhub/sites/entity/ >> >> > (java.lang.IllegalStateException: Unable to initialize the Cache with >> >> Yard >> >> > dbpediaCache! This is usually caused by Errors while reading the Cache >> >> > Configuration from the Yard.) java.lang.IllegalStateException: Unable >> to >> >> > initialize the Cache with Yard dbpediaCache! This is usually caused by >> >> > Errors while reading the Cache Configuration from the Yard. >> >> > at >> >> > >> >> >> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214) >> >> > >> >> > >> >> > Do I need to initialize the Cache in some way? >> >> > >> >> No it does not. Prepared in Indexes do include a document that >> >> provides a list of the indexed fields. In future this may be used to >> >> determine if a query can be successfully executed on the local index >> >> or not. In addition this is used in case an Entity within the index is >> >> updated with an newer version. >> >> However this configuration is optional and is not required. This >> >> Exception should only appear if the document is present but illegal >> >> formatted. However the SolrYard initialized for the dbpediaCache >> >> should be empty. >> >> >> >> Therefore I think it is somehow related to the above problem of >> >> overriding configurations. >> >> >> >> In general the way how the default configuration is loaded is >> >> sub-optional in the moment. Especially using a single defaultdata >> >> bundle for both the OpenNLP models and the dbpedia configuration + >> >> default index was not a good Idea, because one can not exclude/change >> >> the dbpedia stuff without affecting other components that depend on >> >> OpenNLP. >> >> Therefore I think we need to discuss how to better structure the >> >> configurations and data needed to run stanbol. >> >> >> >> There is also an other issue that the SolrYard only once copies >> >> provided indexes and does not check for updates. This would it make >> >> hard the upgrade from the small index provided with the default data >> >> to a bigger version. >> >> >> >> Both this things are related to the problems and need to be addressed >> >> before the first stanbol release. Independent of those I will try to >> >> find a simple solution for what you intend to do. >> >> >> >> In the meantime I suggest you go for the initially proposed workaround. >> >> >> >> best >> >> Rupert Westenthaler >> >> >> >> > Thanks for your help, >> >> > >> >> > David >> >> > >> >> > >> >> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler < >> >> > [email protected]> wrote: >> >> > >> >> >> Hi >> >> >> >> >> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese >> >> >> <[email protected]> wrote: >> >> >> > I solved in the same way, but loosing the caching capabilities. >> >> >> > Is there any possibility to keep both all the data and the cache? >> >> >> > >> >> >> > Andrea >> >> >> > >> >> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote: >> >> >> > >> >> >> >> Ok, stopping the solrYard dbpedia_43k component solved for me. >> >> >> >> >> >> >> >> Thanks, >> >> >> >> David >> >> >> >> >> >> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli < >> >> >> >> [email protected]> wrote: >> >> >> >> >> >> >> >>> Hi Rupert, >> >> >> >>> >> >> >> >>> I recently updated the Stanbol install, and I found that the RDF >> >> >> returned >> >> >> >>> by the EntityHub is missing some props (specifically the dbprop >> as >> >> far >> >> >> as I >> >> >> >>> can see). >> >> >> >>> >> >> >> >>> This is the command that I use for testing: >> >> >> >>> curl -H "accept: application/rdf+xml" " >> >> >> >>> >> >> >> >> >> >> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi >> >> >> >>> " >> >> >> >>> >> >> >> >>> which outputs the attached RDF file. >> >> >> >>> >> >> >> >>> I cleared all of the sling folder (rm -fr sling) and checked the >> >> with >> >> >> the >> >> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it. >> >> >> >>> >> >> >> >>> Does this depend on the mapping.txt file? >> >> >> >>> >> >> >> >> >> >> If you plan to create your own dbpedia index, than the mapping.txt >> >> >> file would be the way how to configure what properties are >> >> >> includes/excluded. >> >> >> Typically dbprop values are low quality. They are just naive 1:1 >> >> >> mappings of key value pairs as found in the info boxes. Because of >> >> >> this they are excluded from the indexes. >> >> >> >> >> >> At runtime the returned data depend on the used Cache strategy: >> >> >> >> >> >> Currently there are three possibilities (configured with the >> referenced >> >> >> Site) >> >> >> 1) no cache: bot queries and retrieval so use a remote service >> >> >> 2) used: Queries are executed by the remote service. Retrieved >> >> >> Entities are stored locally. The cached data depend on the mappings >> >> >> defined for the cache. >> >> >> 3) all: Both queries and retrieval are based on the cache. The remote >> >> >> service are only used as fallback in the case that the cache is not >> >> >> available (e.g. if you deactivate solrYard). >> >> >> >> >> >> So if you you are fine with (2) than you could use the configuration >> >> >> as previously used by the stable launcher [1]. >> >> >> I think the easiest way to install this is to use this is to add the >> >> >> Felix File Installer [2] to the Stanbol Environment. You will need to >> >> >> delete the current referencedSite for dbpedia first and than add the >> >> >> three configuration files as described by [1]. >> >> >> >> >> >> If your requirements are not covered by the currently available >> option >> >> >> it would be nice if you could write a short user story, because I am >> >> >> thinking about how to improve this feature and input like that would >> >> >> be really valuable. >> >> >> >> >> >> best >> >> >> Rupert Westenthaler >> >> >> >> >> >> [1] The dbpedia config consists of three files. the referenced site, >> >> >> cache and solryard components with the "-dbpedia" endings. >> >> >> >> >> >> >> >> >> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181 >> >> >> >> >> >> [2] http://felix.apache.org/site/apache-felix-file-install.html >> >> >> >> >> >> p.s. I keep this part because it describes very well how the cache >> >> >> strategy "used" work: >> >> >> >>>>> Hi David >> >> >> >>>>> >> >> >> >>>>> Assuming that you are using the default distribution of Apache >> >> >> Stanbol. >> >> >> >>>>> >> >> >> >>>>> Requests for http://dbpedia.org/resource/Valentino_Rossi will >> be >> >> >> >>>>> - only the first time answered by retrieving the Entity form >> >> >> DBpedia.org >> >> >> >>>>> - the Information are cached in a local cache. By that values >> of >> >> the >> >> >> >>>>> documents are filtered (see (a) for details) >> >> >> >>>>> - the cached version is returned >> >> >> >>>>> >> >> >> >>>>> (a) The default configuration for dbpedia stores all fields >> >> however >> >> >> >>>>> filters values for literals so that only values with the >> language >> >> >> "en, >> >> >> >>>>> de, fr, it, es" or no language are stored. >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>> Assuming that you have started for zero when updating to a new >> >> >> version >> >> >> >>>>> this also means that you have downloaded a new version of this >> >> Entity >> >> >> >>>>> from dbPedia. >> >> >> >>>>> >> >> >> >> >> >> -- >> >> >> | Rupert Westenthaler [email protected] >> >> >> | Bodenlehenstraße 11 ++43-699-11108907 >> >> >> | A-5500 Bischofshofen >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > David Riccitelli >> >> > >> >> > Interact SpA >> >> > Via A. Bargoni 78 (scala F) >> >> > 00153 Roma >> >> > >> >> > T +39 06 58318 301 >> >> > F +39 06 58318 303 >> >> > >> >> >> >> >> >> >> >> -- >> >> | Rupert Westenthaler [email protected] >> >> | Bodenlehenstraße 11 ++43-699-11108907 >> >> | A-5500 Bischofshofen >> >> >> > >> > >> > >> > -- >> > David Riccitelli >> > >> > Interact SpA >> > Via A. Bargoni 78 (scala F) >> > 00153 Roma >> > >> > T +39 06 58318 301 >> > F +39 06 58318 303 >> > >> >> >> >> -- >> | Rupert Westenthaler [email protected] >> | Bodenlehenstraße 11 ++43-699-11108907 >> | A-5500 Bischofshofen >> > > > > -- > David Riccitelli > > Interact SpA > Via A. Bargoni 78 (scala F) > 00153 Roma > > T +39 06 58318 301 > F +39 06 58318 303 > -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
