Hi Mauro,

On 2020/09/24 11:28:24, Mauro Asprea <[email protected]> wrote: 
> Hello, what am I doing wrong?
> 
> I downloaded the CLI binary distribution and verified that google does find
> the embedded LD-JSON triplets as you see here
> https://search.google.com/structured-data/testing-tool#url=https%3A%2F%2Fwww.monster.com%2Fjobs%2Fsearch%2F%3Fq%3DRuby%26where%3DAustin__2C-TX

Just a quick note. It is impossible to know what kind of 'standard' Google uses 
for the structure data testing tool. As you can see, it is also being 
deprecated pretty quickly. 

> 
> Then I run any23 but I get no quads/triplets...
> 
> hamilcar:apache-any23-cli-2.3:> bin/any23 rover -p -s -t "
> > https://www.monster.com/jobs/search/?q=Ruby&where=Austin__2C-TX"; -f json
> > -e html-embedded-jsonld -l monster-jsonld.log

Using the 2.4 RC#1 (https://dist.apache.org/repos/dist/dev/any23/2.4/) I get 
the following results which include the one Organization, one ItemList and 29 
itemListElement's

./bin/any23 rover -p -s -t 
"https://www.monster.com/jobs/search/?q=Ruby&where=Austin__2C-TX"; -f json -e 
html-embedded-jsonld -l monster-jsonld.log -o monster.json

------------------------------------------------------------------------
Apache Any23 :: rover
------------------------------------------------------------------------

>Summary:
   -total calls: 1
   -total triples: 128
   -total runtime: 30 ms!
   -tripls/ms: 4
   -ms/calls: 30
>Extractor: html-embedded-jsonld
   -total calls: 1
   -total triples: 128
   -total runtime: 30 ms!
   -tripls/ms: 4
   -ms/calls: 30

------------------------------------------------------------------------
Apache Any23 SUCCESS
Total time: 2s
Finished at: Wed Sep 30 11:11:04 PDT 2020
Final Memory: 107M/367M
------------------------------------------------------------------------
I suggest that you should upgrade to 2.4.

> 
> 
> How can I increase the logging level to see any hidden debug messages?

You would need to literally hack the source code to add  more debug logging. 

I created a ticket to address the log4j appender issue - 
https://issues.apache.org/jira/browse/ANY23-454

> 
> Also as you can see, this webpage has an embedded LD+JSON script that is
> not being picked up by the extractor. Help?
> 

If I remove the extractor flag e.g. -e html-embedded-jsonld, then I get lots 
more results. Some of these are however trivial in nature so would need to be 
filtered out.

>Summary:
   -total calls: 21
   -total triples: 189
   -total runtime: 688 ms!
   -tripls/ms: 0
   -ms/calls: 32
>Extractor: html-head-icbm
   -total calls: 1
   -total triples: 0
   -total runtime: 6 ms!
   -tripls/ms: 0
   -ms/calls: 6
>Extractor: html-mf-geo
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-head-meta
   -total calls: 1
   -total triples: 16
   -total runtime: 5 ms!
   -tripls/ms: 3
   -ms/calls: 5
>Extractor: html-mf-adr
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-mf-hcalendar
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-mf-hresume
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-mf-hreview
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: consolidation-extractor
   -total calls: 1
   -total triples: 0
   -total runtime: 0 ms!
   -ms/calls: 0
>Extractor: html-xpath
   -total calls: 1
   -total triples: 0
   -total runtime: 0 ms!
   -ms/calls: 0
>Extractor: html-head-title
   -total calls: 1
   -total triples: 1
   -total runtime: 1 ms!
   -tripls/ms: 1
   -ms/calls: 1
>Extractor: html-mf-hcard
   -total calls: 1
   -total triples: 0
   -total runtime: 0 ms!
   -ms/calls: 0
>Extractor: html-rdfa11
   -total calls: 1
   -total triples: 44
   -total runtime: 33 ms!
   -tripls/ms: 1
   -ms/calls: 33
>Extractor: html-mf-hreview-aggregate
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-mf-license
   -total calls: 1
   -total triples: 0
   -total runtime: 3 ms!
   -tripls/ms: 0
   -ms/calls: 3
>Extractor: html-mf-xfn
   -total calls: 1
   -total triples: 0
   -total runtime: 2 ms!
   -tripls/ms: 0
   -ms/calls: 2
>Extractor: html-mf-species
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-mf-hlisting
   -total calls: 1
   -total triples: 0
   -total runtime: 0 ms!
   -ms/calls: 0
>Extractor: html-microdata
   -total calls: 1
   -total triples: 0
   -total runtime: 2 ms!
   -tripls/ms: 0
   -ms/calls: 2
>Extractor: html-mf-hrecipe
   -total calls: 1
   -total triples: 0
   -total runtime: 0 ms!
   -ms/calls: 0
>Extractor: html-embedded-jsonld
   -total calls: 1
   -total triples: 128
   -total runtime: 627 ms!
   -tripls/ms: 0
   -ms/calls: 627
>Extractor: html-head-links
   -total calls: 1
   -total triples: 0
   -total runtime: 2 ms!
   -tripls/ms: 0
   -ms/calls: 2

Reply via email to