Thank you Lewis! Then I should assume that 2.3 is "broken"? I'll try the upcoming 2.4 as you suggested.
I still have one more question, apart from that, what is the best way to debug Any23 issues like this? On Wed, Sep 30, 2020 at 8:26 PM Lewis John McGibbney <[email protected]> wrote: > Hi Mauro, > > On 2020/09/24 11:28:24, Mauro Asprea <[email protected]> wrote: > > Hello, what am I doing wrong? > > > > I downloaded the CLI binary distribution and verified that google does > find > > the embedded LD-JSON triplets as you see here > > > https://search.google.com/structured-data/testing-tool#url=https%3A%2F%2Fwww.monster.com%2Fjobs%2Fsearch%2F%3Fq%3DRuby%26where%3DAustin__2C-TX > > Just a quick note. It is impossible to know what kind of 'standard' Google > uses for the structure data testing tool. As you can see, it is also being > deprecated pretty quickly. > > > > > Then I run any23 but I get no quads/triplets... > > > > hamilcar:apache-any23-cli-2.3:> bin/any23 rover -p -s -t " > > > https://www.monster.com/jobs/search/?q=Ruby&where=Austin__2C-TX" -f > json > > > -e html-embedded-jsonld -l monster-jsonld.log > > Using the 2.4 RC#1 (https://dist.apache.org/repos/dist/dev/any23/2.4/) I > get the following results which include the one Organization, one ItemList > and 29 itemListElement's > > ./bin/any23 rover -p -s -t " > https://www.monster.com/jobs/search/?q=Ruby&where=Austin__2C-TX" -f json > -e html-embedded-jsonld -l monster-jsonld.log -o monster.json > > ------------------------------------------------------------------------ > Apache Any23 :: rover > ------------------------------------------------------------------------ > > >Summary: > -total calls: 1 > -total triples: 128 > -total runtime: 30 ms! > -tripls/ms: 4 > -ms/calls: 30 > >Extractor: html-embedded-jsonld > -total calls: 1 > -total triples: 128 > -total runtime: 30 ms! > -tripls/ms: 4 > -ms/calls: 30 > > ------------------------------------------------------------------------ > Apache Any23 SUCCESS > Total time: 2s > Finished at: Wed Sep 30 11:11:04 PDT 2020 > Final Memory: 107M/367M > ------------------------------------------------------------------------ > I suggest that you should upgrade to 2.4. > > > > > > > How can I increase the logging level to see any hidden debug messages? > > You would need to literally hack the source code to add more debug > logging. > > I created a ticket to address the log4j appender issue - > https://issues.apache.org/jira/browse/ANY23-454 > > > > > Also as you can see, this webpage has an embedded LD+JSON script that is > > not being picked up by the extractor. Help? > > > > If I remove the extractor flag e.g. -e html-embedded-jsonld, then I get > lots more results. Some of these are however trivial in nature so would > need to be filtered out. > > >Summary: > -total calls: 21 > -total triples: 189 > -total runtime: 688 ms! > -tripls/ms: 0 > -ms/calls: 32 > >Extractor: html-head-icbm > -total calls: 1 > -total triples: 0 > -total runtime: 6 ms! > -tripls/ms: 0 > -ms/calls: 6 > >Extractor: html-mf-geo > -total calls: 1 > -total triples: 0 > -total runtime: 1 ms! > -tripls/ms: 0 > -ms/calls: 1 > >Extractor: html-head-meta > -total calls: 1 > -total triples: 16 > -total runtime: 5 ms! > -tripls/ms: 3 > -ms/calls: 5 > >Extractor: html-mf-adr > -total calls: 1 > -total triples: 0 > -total runtime: 1 ms! > -tripls/ms: 0 > -ms/calls: 1 > >Extractor: html-mf-hcalendar > -total calls: 1 > -total triples: 0 > -total runtime: 1 ms! > -tripls/ms: 0 > -ms/calls: 1 > >Extractor: html-mf-hresume > -total calls: 1 > -total triples: 0 > -total runtime: 1 ms! > -tripls/ms: 0 > -ms/calls: 1 > >Extractor: html-mf-hreview > -total calls: 1 > -total triples: 0 > -total runtime: 1 ms! > -tripls/ms: 0 > -ms/calls: 1 > >Extractor: consolidation-extractor > -total calls: 1 > -total triples: 0 > -total runtime: 0 ms! > -ms/calls: 0 > >Extractor: html-xpath > -total calls: 1 > -total triples: 0 > -total runtime: 0 ms! > -ms/calls: 0 > >Extractor: html-head-title > -total calls: 1 > -total triples: 1 > -total runtime: 1 ms! > -tripls/ms: 1 > -ms/calls: 1 > >Extractor: html-mf-hcard > -total calls: 1 > -total triples: 0 > -total runtime: 0 ms! > -ms/calls: 0 > >Extractor: html-rdfa11 > -total calls: 1 > -total triples: 44 > -total runtime: 33 ms! > -tripls/ms: 1 > -ms/calls: 33 > >Extractor: html-mf-hreview-aggregate > -total calls: 1 > -total triples: 0 > -total runtime: 1 ms! > -tripls/ms: 0 > -ms/calls: 1 > >Extractor: html-mf-license > -total calls: 1 > -total triples: 0 > -total runtime: 3 ms! > -tripls/ms: 0 > -ms/calls: 3 > >Extractor: html-mf-xfn > -total calls: 1 > -total triples: 0 > -total runtime: 2 ms! > -tripls/ms: 0 > -ms/calls: 2 > >Extractor: html-mf-species > -total calls: 1 > -total triples: 0 > -total runtime: 1 ms! > -tripls/ms: 0 > -ms/calls: 1 > >Extractor: html-mf-hlisting > -total calls: 1 > -total triples: 0 > -total runtime: 0 ms! > -ms/calls: 0 > >Extractor: html-microdata > -total calls: 1 > -total triples: 0 > -total runtime: 2 ms! > -tripls/ms: 0 > -ms/calls: 2 > >Extractor: html-mf-hrecipe > -total calls: 1 > -total triples: 0 > -total runtime: 0 ms! > -ms/calls: 0 > >Extractor: html-embedded-jsonld > -total calls: 1 > -total triples: 128 > -total runtime: 627 ms! > -tripls/ms: 0 > -ms/calls: 627 > >Extractor: html-head-links > -total calls: 1 > -total triples: 0 > -total runtime: 2 ms! > -tripls/ms: 0 > -ms/calls: 2 > -- Mauro Asprea E-Mail: [email protected] Mobile: +34 654297582 Keybase: https://keybase.io/brutuscat
