Hi Meraj, Running the website you've provided through any23-vm.apache.org results in the following output
@prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix doac: <http://ramonantonio.net/doac/0.1/#> . @prefix dcterms: <http://purl.org/dc/terms/> . <http://www.macmall.com/p/Apple-Mac-Mini/product~dpno~13312163~pdp.ijcehjd> dcterms:title "MacMall | Apple Mac mini dual-core Intel Core i5 1.4GHz (Turbo Boost up to 2.7GHz), 4GB RAM, 500GB Hard Drive, Intel HD Graphics 5000, Mac OS X Yosemite MGEM2LL/A"@en . _:node7ab4123bafd8f45a207e47585841b13 a <http://schema.org/Product> ; <http://schema.org/Product/description> "Mac mini Dual-Core Intel Core i5 1.4GHz, 4GB DDR3 memory, 500GB SATA hard drive, Intel HD Graphics 5000 processor, 802.11ac Wi-Fi, Bluetooth, Gigabit Ethernet, HDMI, SDXC card slot, Two Thunderbolt 2 Ports, Audio in/out, IR receiver"@en ; <http://schema.org/Product/name> """ Apple Mac mini dual-core Intel Core i5 1.4GHz (Turbo Boost up to 2.7GHz), 4GB RAM, 500GB Hard Drive, Intel HD Graphics 5000, Mac OS X Yosemite (MGEM2LL/A) """@en . _:node9d41b06013eb3d847ae58af99799bbb a <http://schema.org/Offer> ; <http://schema.org/Offer/price> "$479.00"@en ; <http://schema.org/Offer/availability> <http://schema.org/InStock> . _:node7ab4123bafd8f45a207e47585841b13 <http://schema.org/Product/offers> _:node9d41b06013eb3d847ae58af99799bbb . _:nodeaa9a2f42eeabd0b7dfbbd7dfcf6f9 a <http://schema.org/AggregateRating> ; <http://schema.org/AggregateRating/bestRating> "Null"@en ; <http://schema.org/AggregateRating/ratingValue> "Null"@en ; <http://schema.org/AggregateRating/reviewCount> "2"@en . _:node7ab4123bafd8f45a207e47585841b13 <http://schema.org/Product/aggregateRating> _:nodeaa9a2f42eeabd0b7dfbbd7dfcf6f9 . <http://www.macmall.com/p/Apple-Mac-Mini/product~dpno~13312163~pdp.ijcehjd> <http://www.w3.org/1999/xhtml/microdata#item> _:node7ab4123bafd8f45a207e47585841b13 ; dcterms:title "MacMall | Apple Mac mini dual-core Intel Core i5 1.4GHz (Turbo Boost up to 2.7GHz), 4GB RAM, 500GB Hard Drive, Intel HD Graphics 5000, Mac OS X Yosemite MGEM2LL/A"@en ; <http://www.w3.org/1999/xhtml/vocab#nofollow> <http://www.facebook.com/share.php?u=<;url\>> ; <http://www.w3.org/1999/xhtml/vocab#ALTERNATE-STYLESHEET> <http://www.macmall.com/p/Apple-Mac-Mini/product~dpno~13312163~pdp.ijcehjd//mall/stylesheet/wbd.css> ; <http://www.w3.org/1999/xhtml/vocab#canonical> <http://www.macmall.com/p/Apple-Mac-Mini/product~dpno~13312163~pdp.ijcehjd> ; <http://www.w3.org/1999/xhtml/vocab#ALTERNATE-STYLESHEET> <http://www.macmall.com/p/Apple-Mac-Mini/product~dpno~13312163~pdp.ijcehjd///i2.cc-inc.com/sprite/css/mainMenuExtended02.css> , <http://www.macmall.com/p/Apple-Mac-Mini/product~dpno~13312163~pdp.ijcehjd//css/search/typeahead/reset.css?typeahead-widget-1.1.1> , <http://www.macmall.com/p/Apple-Mac-Mini/product~dpno~13312163~pdp.ijcehjd//css/reset.css?ver=1> ; <http://www.w3.org/1999/xhtml/vocab#generator> "ToolTwist"@en ; <http://www.w3.org/1999/xhtml/vocab#description> "Apple Mac mini dual-core Intel Core i5 1.4GHz (Turbo Boost up to 2.7GHz), 4GB RAM, 500GB Hard Drive, Intel HD Graphics 5000, Mac OS X Yosemite MGEM2LL/A for $479.00 at macmall.com. Systems - Mac Mini - Mac Mini w/ Intel Core i5 Duo Processor - 1.4 GHz Mac Mini Computers from macmall.com."@en ; <http://www.w3.org/1999/xhtml/vocab#keywords> "Apple Mac mini dual-core Intel Core i5 1.4GHz (Turbo Boost up to 2.7GHz), 4GB RAM, 500GB Hard Drive, Intel HD Graphics 5000, Mac OS X Yosemite, Apple Mac Mini, Mac Mini w/ Intel Core i5 Duo Processor, 1.4 GHz Mac Mini Computers, macmini, 3TED Systems"@en ; <http://www.w3.org/1999/xhtml/vocab#format-detection> "telephone=no"@en ; <http://www.w3.org/1999/xhtml/vocab#p:domain_verify> "e02911354daa2202c515e76b11f9561b"@en ; <http://www.w3.org/1999/xhtml/vocab#robots> "noodp,noydir"@en . I think we can improve upon this by supporting both xmlns:og=" http://opengraphprotocol.org/schema/" and xmlns:fb=" http://www.facebook.com/2008/fbml" namespaces... right now it would appear that we don't. In particular the overwhelming majority of triples coming from thus page appear to be coming from the microdata parser as they are exracted from the microdata itemProp's. One thing I've noticed is that although the HTML TitleExtract [0] is being called, the HTMLMetaExtractor [1] is not! We need to investigate this further. [0] https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/html/TitleExtractor.java [1] https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/html/HTMLMetaExtractor.java On Fri, Jan 16, 2015 at 9:07 AM, <user-digest-h...@any23.apache.org> wrote: > > I am trying to retrieve an object value as in subject-predicate-object > from a web page and using the following code , the page I am > extracting it from is > http://www.macmall.com/p/Apple-Mac-Mini/product~dpno~13312163~pdp.ijcehjd > > as you can clearly see it has og markup using RDFa , however the below > code fails to extract any og values , can you please let me know > what I might be doing wrong, the property Name that I am trying to > extract is OGP.imageUrl > > Thanks. > > private static String retrieveOGPProperty(String URL,String propertyName) { > > logger.trace("Entering the method retrieveOGPProperty "); > String propertyValue = null; > OGP ogp = OGP.getInstance(); > org.openrdf.sail.Sail store = new > org.openrdf.sail.memory.MemoryStore(); > try { > store.initialize(); > } catch (SailException e) { > // TODO Auto-generated catch block > e.printStackTrace(); > } > try { > org.openrdf.repository.RepositoryConnection conn = new > org.openrdf.repository.sail.SailRepository(store).getConnection(); > RepositoryResult<org.openrdf.model.Statement> statements = > conn.getStatements(RDFUtils.uri(URL), ogp.imageURL, null,false); > if(statements.hasNext()){ > //get the first property > Value object = statements.next().getObject(); > propertyValue = object.stringValue(); > } > } catch (RepositoryException e) { > // Log the error and ignore it > logger.error("Error occured while extracting the OGP property > "+propertyName,e); > } > > logger.trace("Exiting the method retrieveOGPProperty "); > > return propertyValue; > } > > > -- *Lewis*