Re: dbpedia solr index dump

harish suvarna Thu, 23 Aug 2012 05:54:35 -0700

The query for Paris is not working as I indexed only Chinese files. I tried
Chinese strings instead of Paris. It is not working. I know that those
entries are there in dbpedia.


How do you start in -I DEBUG mode? I do the
java -Xmx1024m -XX:MaxPermSize=256m -Xdebug
-Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n  -jar
launchers/full/target/org.apache.stanbol.launchers.full-0.10.0-incubating-SNAPSHOT.jar

command to start Stanbol server.

-harish

On Wed, Aug 22, 2012 at 9:41 PM, Rupert Westenthaler <
[email protected]> wrote:

> Hi,
>
> Usually when one does not get the expected results it is related to
> data contained by the dbpedia referenced site. So I will try to
> provide some information on how to best debug what is happening.
>
> Can you maybe provide data for some Entities by providing the results
> of a Entityhub
> query such as
>
>     curl -H "Accept: application/rdf+xml" \
>         "
> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Paris
> "
>
> You can also use an other Entity as Paris if it is more representative
> for your data.
>
> An other interesting thing todo is
>
> 1) staring Stanbol in the DEBUG modus (by adding the "-l DEBUG" option
> when starting)
> 2) send a Document to the Enhancer
> 3) now you should see the used Solr Queries in the log (you might need
> to filter the extensive logging for the component
> "org.apache.stanbol.entityhub.yard.solr.impl.SolrQueryFactory"
> 4) check those queries manually by sending them to
>
>     http://localhost:8080/solr/default/dbpedia/select?q=
>
> BTW: You can also look at data stored in the Solr Index by requesting
> a Document via its URI e.g.
>
>
> http://localhost:8080/solr/default/dbpedia/select?q=uri:http\://dbpedia.org/resource/Paris
>
>
> This should help in looking into your issue.
>
> best
> Rupert
>
> On Thu, Aug 23, 2012 at 12:57 AM, harish suvarna <[email protected]>
> wrote:
> > I am finally successfull after converting some chinese dbpedia dump files
> > to utf8. But I can't hit any dbpedia links in stanbol using this solr
> dump.
> > I am just wondering whether I should pre-process the chinese dbpedia dump
> > files. I uploaded the new jar file successfully as a new bundle. Then I
> > defined a new engine using this reference site 'dbpedia'. I donot have
> any
> > other dbpedia solr dump. The chain says it is active and all 3 engines
> are
> > available.
> > If I put the dbpedia solr index from Ogrisel (1.19GB), it works fine. I
> get
> > some dbpedia links.
> > Am I missing anything
> > else?<http://localhost:8080/system/console/bundles/179>I did add the
> > instance_types and person_data from english dump.
> >
> > -harish
> >
> >
> >
> > On Tue, Aug 21, 2012 at 6:22 PM, harish suvarna <[email protected]>
> wrote:
> >
> >>
> >>
> >> On Mon, Aug 20, 2012 at 9:30 PM, Rupert Westenthaler <
> >> [email protected]> wrote:
> >>
> >>> On Tue, Aug 21, 2012 at 2:30 AM, harish suvarna <[email protected]>
> >>> wrote:
> >>> >>
> >>> >> I had not yet time to look at dbpedia 3.8. They might have changed
> >>> >> names of some dump files. Generally "instance_types" are very
> >>> >> important (this provides the information about the type of an
> Entity).
> >>> >> "person_data" includes additional information for persons, AFAIK
> those
> >>> >> information are not included in the default configuration of the
> >>> >> dbpedia indexing tool
> >>> >>
> >>> >>
> >>> > Not all language dumps have these files. Japanese, Italian also donot
> >>> have
> >>> > these files. These files are listed in the readme file. Hence I was
> >>> looking
> >>> > for these.
> >>> >
> >>> Types are the same for all languages. Therefore they are only
> >>> available in English.
> >>> I am no sure about "person_data" but there it might be the same.
> >>>
> >>> In other words - if you build an index for a specific language you
> >>> need to include the English dumps of those that are not language
> >>> specific.
> >>>
> >>> >>> I will try this. Thanks a lot.
> >>
> >>> >
> >>> >> > I get a java exception.
> >>> >>
> >>> >> The included exceptions look like the RDF file containing the
> Chinese
> >>> >> labels is not well formatted. The experience says that this is most
> >>> >> likely related to char encoding issues. This was also the case with
> >>> >> some dbpedia 3.7 files (see the special treatment of some files in
> the
> >>> >> shell script of the dbpedia).
> >>> >>
> >>> >> OK. I will try to debug this.
> >>> >
> >>>
> >> >>>>
> >>
> >> I converted the labels_zh.nt to utf-8 using ms word. MS word adds the
> bom
> >> bytes though. I needed to remove the bom bytes.
> >> Then lables_ZH.NT WENT THROUGH. But long abstracts has same problem. So
> I
> >> am still working on these other files.
> >>  Thanks a lot for all your patience and all stanbol teachings.
> >>
> >>
> >>
> >> --
> >> Thanks
> >> Harish
> >>
> >>
> >
> >
> > --
> > Thanks
> > Harish
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
Thanks
Harish

Re: dbpedia solr index dump

Reply via email to