Hi Steve I think I have found the problem and provided a fix with revision 1147784 [1]
If you still have issues please feel free to reopen STANBOL-259 [2] best Rupert [1] http://svn.apache.org/viewvc?view=revision&revision=1147784 [2] https://issues.apache.org/jira/browse/STANBOL-259 On Thu, Jul 14, 2011 at 8:37 AM, Steve Reiner <[email protected]> wrote: > Rupert , > > Thanks for the help > > I can use Stanbol running in a Linux vmware vm until next week. > > Thanks, > Steve > -----Original Message----- > From: Rupert Westenthaler [mailto:[email protected]] > Sent: Wednesday, July 13, 2011 11:22 PM > To: [email protected] > Subject: Re: index file issue on windows > > Hi > > > On Thu, Jul 14, 2011 at 4:27 AM, Steve Reiner > <[email protected]> wrote: >> Know I should add to jira, just want to make sure I didn't need to >> some additional step to get the index to work >> >> Was actually getting a different error, not the cache thing, but index >> not yet installed when use /engines >> >> On Windows with latest code get the index not yet installed error (and >> weirdly also with what I built 7/10 that used to work with the >> sling/datafiles workaround on Windows) (Linux with 7/10 code is still >> fine): > > Do you delete the {stanbol}/sling folder after upgrading to the newest > version? If not you might still use the old version because within the /sling > folder there is a cache that is not overridden with the new version just > because the launcher jar file is replaced? > >> >> (org.apache.stanbol.enhancer.servicesapi.EngineException: >> 'NamedEntityTaggingEngine' failed to process content item >> 'urn:content-item-sha1-88a2b5f6520df87e4567c06b48e742b7d1c71e9c' with >> type >> 'text/plain': org.apache.stanbol.entityhub.servicesapi.yard.YardException: >> SolrIndex entityhub is not available. The necessary Index is not yet >> installed.) org.apache.stanbol.enhancer.servicesapi.EngineException: >> 'NamedEntityTaggingEngine' failed to process content item >> 'urn:content-item-sha1-88a2b5f6520df87e4567c06b48e742b7d1c71e9c' with >> type >> 'text/plain': org.apache.stanbol.entityhub.servicesapi.yard.YardException: >> SolrIndex entityhub is not available. The necessary Index is not yet >> installed. >> at >> org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTagg >> ingEng >> ine.computeEnhancements(NamedEntityTaggingEngine.java:323) >> > Since revision 1144364 the BundleDataFilePovider (the one that seams not to > work on Windows) is also used to load the entityhub index. > Therefore also the initialization of the SolrYard used by the Entityhub will > not work. As I am writing this I now know that this would also prevent the > initialization of any other SolrYard (such as the dbpediaCache) because also > the default initialization does relay an the same BundleDataFilePovider to > load the required Solr configuration. So this would also explain the problems > you had with the workaround I suggested. > > The two required files are in this directory [1]. If you copy them to the > {stanbol-root}/datafiles directory it should solve the problem. > After copying the files there you will need to deactivate/activate the > SolrYards so they pick up this files. > > [1] > http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/ > >> Have workarounds in sling/datafiles (have en-*.bin, >> dbpedia_43k.solrindex.zip ) >> (change for STANBOL-259, as Fabian commented, didn't fix the en-*.bin >> load issue, still needed the workaround) >> > > I will look into that next Week when I am back in the office. I can not do > much without access to a Windows box. > >> From the felix web console "Stanbol Data File Provider" >> Seems to be looking for entityhub.solrindex.zip and and not finding it >> (tried having dbpedia_43k.solrindex.zip copied to >> entityhub.solrindex.zip in datafiles but got same error after restart >> and engine use) >> > > You need to restart the SolrYard because it lookups the required files in the > activation. Restarting the Engine will not cause the SolrYard to be restarted > >> Tried also after being clean: blow away sling dir, mvn clean, run >> shell script script to get defaultdata files, mvn install -DskipTests >> MAVEN_OPTS=-Xmx1024M -XX:MaxPermSize=128M in env >> > > I am really sorry for all this writing without coming up with a real > solution, but it is really hard to solve Windows related problems without > access to a Windows box. So if you are not in a hurry it would be maybe more > effective to delay working on this until next week. > > best > Rupert Westenthaler > >> Steve >> -----Original Message----- >> From: Steve Reiner [mailto:[email protected]] >> Sent: Wednesday, July 13, 2011 12:09 AM >> To: '[email protected]' >> Subject: RE: EntityHub and DBpedia >> >> I am getting something like this too after updating with the code >> checked in yesterday. Problem wasn't there in the code the day before. >> >> (using /engines page) >> >> -----Original Message----- >> From: David Riccitelli [mailto:[email protected]] >> Sent: Tuesday, July 12, 2011 11:58 PM >> To: [email protected] >> Subject: Re: EntityHub and DBpedia >> >> Thanks Rupert, >> >> I'm trying to follow your instructions but I encounter a couple of >> issues (probably due to inexperience): >> [1] when dropping the config files, they enter some loop of >> REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall >> bundle), is that normal? >> [2] after I restart Stanbol, and try to query an entity from the >> entityhub I receive the following error: >> >> 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0] >> org.apache.felix.http.jetty /entityhub/sites/entity/ >> (java.lang.IllegalStateException: Unable to initialize the Cache with >> Yard dbpediaCache! This is usually caused by Errors while reading the >> Cache Configuration from the Yard.) java.lang.IllegalStateException: >> Unable to initialize the Cache with Yard dbpediaCache! This is usually >> caused by Errors while reading the Cache Configuration from the Yard. >> at >> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImp >> l.java >> :214) >> >> >> Do I need to initialize the Cache in some way? >> >> Thanks for your help, >> >> David >> >> >> On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler < >> [email protected]> wrote: >> >>> Hi >>> >>> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese >>> <[email protected]> wrote: >>> > I solved in the same way, but loosing the caching capabilities. >>> > Is there any possibility to keep both all the data and the cache? >>> > >>> > Andrea >>> > >>> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote: >>> > >>> >> Ok, stopping the solrYard dbpedia_43k component solved for me. >>> >> >>> >> Thanks, >>> >> David >>> >> >>> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli < >>> >> [email protected]> wrote: >>> >> >>> >>> Hi Rupert, >>> >>> >>> >>> I recently updated the Stanbol install, and I found that the RDF >>> returned >>> >>> by the EntityHub is missing some props (specifically the dbprop >>> >>> as far >>> as I >>> >>> can see). >>> >>> >>> >>> This is the command that I use for testing: >>> >>> curl -H "accept: application/rdf+xml" " >>> >>> >>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia. >>> org/resource/Valentino_Rossi >>> >>> " >>> >>> >>> >>> which outputs the attached RDF file. >>> >>> >>> >>> I cleared all of the sling folder (rm -fr sling) and checked the >>> >>> with >>> the >>> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it. >>> >>> >>> >>> Does this depend on the mapping.txt file? >>> >>> >>> >>> If you plan to create your own dbpedia index, than the mapping.txt >>> file would be the way how to configure what properties are >>> includes/excluded. >>> Typically dbprop values are low quality. They are just naive 1:1 >>> mappings of key value pairs as found in the info boxes. Because of >>> this they are excluded from the indexes. >>> >>> At runtime the returned data depend on the used Cache strategy: >>> >>> Currently there are three possibilities (configured with the >>> referenced >>> Site) >>> 1) no cache: bot queries and retrieval so use a remote service >>> 2) used: Queries are executed by the remote service. Retrieved >>> Entities are stored locally. The cached data depend on the mappings >>> defined for the cache. >>> 3) all: Both queries and retrieval are based on the cache. The remote >>> service are only used as fallback in the case that the cache is not >>> available (e.g. if you deactivate solrYard). >>> >>> So if you you are fine with (2) than you could use the configuration >>> as previously used by the stable launcher [1]. >>> I think the easiest way to install this is to use this is to add the >>> Felix File Installer [2] to the Stanbol Environment. You will need to >>> delete the current referencedSite for dbpedia first and than add the >>> three configuration files as described by [1]. >>> >>> If your requirements are not covered by the currently available >>> option it would be nice if you could write a short user story, >>> because I am thinking about how to improve this feature and input >>> like that would be really valuable. >>> >>> best >>> Rupert Westenthaler >>> >>> [1] The dbpedia config consists of three files. the referenced site, >>> cache and solryard components with the "-dbpedia" endings. >>> >>> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable >>> / >>> src/main/resources/resources/config/?pathrev=1140181 >>> >>> [2] http://felix.apache.org/site/apache-felix-file-install.html >>> >>> p.s. I keep this part because it describes very well how the cache >>> strategy "used" work: >>> >>>>> Hi David >>> >>>>> >>> >>>>> Assuming that you are using the default distribution of Apache >>> Stanbol. >>> >>>>> >>> >>>>> Requests for http://dbpedia.org/resource/Valentino_Rossi will >>> >>>>> be >>> >>>>> - only the first time answered by retrieving the Entity form >>> DBpedia.org >>> >>>>> - the Information are cached in a local cache. By that values >>> >>>>> of the documents are filtered (see (a) for details) >>> >>>>> - the cached version is returned >>> >>>>> >>> >>>>> (a) The default configuration for dbpedia stores all fields >>> >>>>> however filters values for literals so that only values with >>> >>>>> the language >>> "en, >>> >>>>> de, fr, it, es" or no language are stored. >>> >>>>> >>> >>>>> >>> >>>>> Assuming that you have started for zero when updating to a new >>> version >>> >>>>> this also means that you have downloaded a new version of this >>> >>>>> Entity from dbPedia. >>> >>>>> >>> >>> -- >>> | Rupert Westenthaler [email protected] >>> | Bodenlehenstraße 11 ++43-699-11108907 >>> | A-5500 Bischofshofen >>> >> >> >> >> -- >> David Riccitelli >> >> Interact SpA >> Via A. Bargoni 78 (scala F) >> 00153 Roma >> >> T +39 06 58318 301 >> F +39 06 58318 303 >> >> > > > > -- > | Rupert Westenthaler [email protected] > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen > > -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
