Hi Mauro, On 2020/09/24 11:28:24, Mauro Asprea <[email protected]> wrote: > Hello, what am I doing wrong? > > I downloaded the CLI binary distribution and verified that google does find > the embedded LD-JSON triplets as you see here > https://search.google.com/structured-data/testing-tool#url=https%3A%2F%2Fwww.monster.com%2Fjobs%2Fsearch%2F%3Fq%3DRuby%26where%3DAustin__2C-TX
Just a quick note. It is impossible to know what kind of 'standard' Google uses for the structure data testing tool. As you can see, it is also being deprecated pretty quickly. > > Then I run any23 but I get no quads/triplets... > > hamilcar:apache-any23-cli-2.3:> bin/any23 rover -p -s -t " > > https://www.monster.com/jobs/search/?q=Ruby&where=Austin__2C-TX" -f json > > -e html-embedded-jsonld -l monster-jsonld.log Using the 2.4 RC#1 (https://dist.apache.org/repos/dist/dev/any23/2.4/) I get the following results which include the one Organization, one ItemList and 29 itemListElement's ./bin/any23 rover -p -s -t "https://www.monster.com/jobs/search/?q=Ruby&where=Austin__2C-TX" -f json -e html-embedded-jsonld -l monster-jsonld.log -o monster.json ------------------------------------------------------------------------ Apache Any23 :: rover ------------------------------------------------------------------------ >Summary: -total calls: 1 -total triples: 128 -total runtime: 30 ms! -tripls/ms: 4 -ms/calls: 30 >Extractor: html-embedded-jsonld -total calls: 1 -total triples: 128 -total runtime: 30 ms! -tripls/ms: 4 -ms/calls: 30 ------------------------------------------------------------------------ Apache Any23 SUCCESS Total time: 2s Finished at: Wed Sep 30 11:11:04 PDT 2020 Final Memory: 107M/367M ------------------------------------------------------------------------ I suggest that you should upgrade to 2.4. > > > How can I increase the logging level to see any hidden debug messages? You would need to literally hack the source code to add more debug logging. I created a ticket to address the log4j appender issue - https://issues.apache.org/jira/browse/ANY23-454 > > Also as you can see, this webpage has an embedded LD+JSON script that is > not being picked up by the extractor. Help? > If I remove the extractor flag e.g. -e html-embedded-jsonld, then I get lots more results. Some of these are however trivial in nature so would need to be filtered out. >Summary: -total calls: 21 -total triples: 189 -total runtime: 688 ms! -tripls/ms: 0 -ms/calls: 32 >Extractor: html-head-icbm -total calls: 1 -total triples: 0 -total runtime: 6 ms! -tripls/ms: 0 -ms/calls: 6 >Extractor: html-mf-geo -total calls: 1 -total triples: 0 -total runtime: 1 ms! -tripls/ms: 0 -ms/calls: 1 >Extractor: html-head-meta -total calls: 1 -total triples: 16 -total runtime: 5 ms! -tripls/ms: 3 -ms/calls: 5 >Extractor: html-mf-adr -total calls: 1 -total triples: 0 -total runtime: 1 ms! -tripls/ms: 0 -ms/calls: 1 >Extractor: html-mf-hcalendar -total calls: 1 -total triples: 0 -total runtime: 1 ms! -tripls/ms: 0 -ms/calls: 1 >Extractor: html-mf-hresume -total calls: 1 -total triples: 0 -total runtime: 1 ms! -tripls/ms: 0 -ms/calls: 1 >Extractor: html-mf-hreview -total calls: 1 -total triples: 0 -total runtime: 1 ms! -tripls/ms: 0 -ms/calls: 1 >Extractor: consolidation-extractor -total calls: 1 -total triples: 0 -total runtime: 0 ms! -ms/calls: 0 >Extractor: html-xpath -total calls: 1 -total triples: 0 -total runtime: 0 ms! -ms/calls: 0 >Extractor: html-head-title -total calls: 1 -total triples: 1 -total runtime: 1 ms! -tripls/ms: 1 -ms/calls: 1 >Extractor: html-mf-hcard -total calls: 1 -total triples: 0 -total runtime: 0 ms! -ms/calls: 0 >Extractor: html-rdfa11 -total calls: 1 -total triples: 44 -total runtime: 33 ms! -tripls/ms: 1 -ms/calls: 33 >Extractor: html-mf-hreview-aggregate -total calls: 1 -total triples: 0 -total runtime: 1 ms! -tripls/ms: 0 -ms/calls: 1 >Extractor: html-mf-license -total calls: 1 -total triples: 0 -total runtime: 3 ms! -tripls/ms: 0 -ms/calls: 3 >Extractor: html-mf-xfn -total calls: 1 -total triples: 0 -total runtime: 2 ms! -tripls/ms: 0 -ms/calls: 2 >Extractor: html-mf-species -total calls: 1 -total triples: 0 -total runtime: 1 ms! -tripls/ms: 0 -ms/calls: 1 >Extractor: html-mf-hlisting -total calls: 1 -total triples: 0 -total runtime: 0 ms! -ms/calls: 0 >Extractor: html-microdata -total calls: 1 -total triples: 0 -total runtime: 2 ms! -tripls/ms: 0 -ms/calls: 2 >Extractor: html-mf-hrecipe -total calls: 1 -total triples: 0 -total runtime: 0 ms! -ms/calls: 0 >Extractor: html-embedded-jsonld -total calls: 1 -total triples: 128 -total runtime: 627 ms! -tripls/ms: 0 -ms/calls: 627 >Extractor: html-head-links -total calls: 1 -total triples: 0 -total runtime: 2 ms! -tripls/ms: 0 -ms/calls: 2
