Hello Robert, I confirm that the version of the service is the same of the latest binary distribution that you can download from [1] (0.7.0-incubating).
To be sure to programmatically extract the same data you obtain from the live service you have to check that the Any23 extraction[2] is run with the same parameters [3]. The service implementation is defined in [4]. In particular what generally produces differences in the data layout is the Metadata Nesting flag [5] wich connects the graph forest[6] extracted from a page in a unique connected graph representing the original HTML DOM nesting relationships of the forest. Hope it helps. The best. Mic [1] http://any23.apache.org/download.html [2] org.apache.any23.Any23#extract [3] org.apache.any23.extractor.ExtractionParameters [4] org.apache.any23.servlet.WebResponder [5] org.apache.any23.extractor.ExtractionParameters#METADATA_NESTING_FLAG [6] http://en.wikipedia.org/wiki/Tree_(graph_theory) On 2 October 2012 21:50, Robert Meusel <rob...@informatik.uni-mannheim.de>wrote: > Below an example of the difference. I used the example document from the > first mail: > > Any23.org > --------- > > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Recipe> . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/fn> "Receta de Tarta de naranja y > chocolate" . > _:node16687929411e1c5598e05ffddf69fac5 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> . > _:node16687929411e1c5598e05ffddf69fac5 < > http://vocab.sindice.net/any23#hrecipe/ingredientName> "Para el bizcocho > (molde 22cm):" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/ingredient> > _:node16687929411e1c5598e05ffddf69fac5 . > _:nodeda504a8bc7ff3ed52c337355cfc2f32 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> . > _:nodeda504a8bc7ff3ed52c337355cfc2f32 < > http://vocab.sindice.net/any23#hrecipe/ingredientName> "4 huevos" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/ingredient> > _:nodeda504a8bc7ff3ed52c337355cfc2f32 . > _:node4cbc939e99bcbe4916b8052cce951f0 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> . > _:node4cbc939e99bcbe4916b8052cce951f0 < > http://vocab.sindice.net/any23#hrecipe/ingredientName> "120g > az\u00C3\u00BAcar" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/ingredient> > _:node4cbc939e99bcbe4916b8052cce951f0 . > _:node18b572d55f1f49c8742f5832f7ca1df7 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> . > _:node18b572d55f1f49c8742f5832f7ca1df7 < > http://vocab.sindice.net/any23#hrecipe/ingredientName> "80g harina" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/ingredient> > _:node18b572d55f1f49c8742f5832f7ca1df7 . > _:node4f2936f11ef2132f52cf8a136c79c51 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> . > _:node4f2936f11ef2132f52cf8a136c79c51 < > http://vocab.sindice.net/any23#hrecipe/ingredientName> "20g harina de > ma\u00C3\u00ADz" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/ingredient> > _:node4f2936f11ef2132f52cf8a136c79c51 . > _:noded6a7db9359456aa3d6662954f5c2963 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> . > _:noded6a7db9359456aa3d6662954f5c2963 < > http://vocab.sindice.net/any23#hrecipe/ingredientName> "1cucharada cacao > en polvo" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/ingredient> > _:noded6a7db9359456aa3d6662954f5c2963 . > _:node54b5ecd17d4ddec85fa5364a8a7392 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> . > _:node54b5ecd17d4ddec85fa5364a8a7392 < > http://vocab.sindice.net/any23#hrecipe/ingredientName> "Para la tarta de > naranja:" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/ingredient> > _:node54b5ecd17d4ddec85fa5364a8a7392 . > _:nodec65a69f3455ceda8fcc2afa0f2d49c65 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> . > _:nodec65a69f3455ceda8fcc2afa0f2d49c65 < > http://vocab.sindice.net/any23#hrecipe/ingredientName> "2sobres gelatina > de naranja" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/ingredient> > _:nodec65a69f3455ceda8fcc2afa0f2d49c65 . > _:node6aee7ca9c9f4793ccfdb3e77f9afce < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> . > _:node6aee7ca9c9f4793ccfdb3e77f9afce < > http://vocab.sindice.net/any23#hrecipe/ingredientName> "1 litro leche" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/ingredient> > _:node6aee7ca9c9f4793ccfdb3e77f9afce . > _:noded3d876338d9752e298e643aaae7858 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> . > _:noded3d876338d9752e298e643aaae7858 < > http://vocab.sindice.net/any23#hrecipe/ingredientName> "Para la > cobertura:" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/ingredient> > _:noded3d876338d9752e298e643aaae7858 . > _:node783dfe412abb82751fb50ae23646ff < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> . > _:node783dfe412abb82751fb50ae23646ff < > http://vocab.sindice.net/any23#hrecipe/ingredientName> "1trozo chocolate > de cobertura" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/ingredient> > _:node783dfe412abb82751fb50ae23646ff . > _:nodef57df9a870a7e5cc6ed67f3e7a8d623 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> . > _:nodef57df9a870a7e5cc6ed67f3e7a8d623 < > http://vocab.sindice.net/any23#hrecipe/ingredientName> "1trozo > mantequilla" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/ingredient> > _:nodef57df9a870a7e5cc6ed67f3e7a8d623 . > _:node66ec8bf0ef1ec431142c21cc1a968f2e < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> . > _:node66ec8bf0ef1ec431142c21cc1a968f2e < > http://vocab.sindice.net/any23#hrecipe/ingredientName> "trozos de > naranja" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/ingredient> > _:node66ec8bf0ef1ec431142c21cc1a968f2e . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/yield> "Para 8 personas" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/instructions> > "Preparaci\u00C3\u00B3n paso a paso \nPreparar el bizcocho: Separamos > las yemas de las claras y a\u00C3\u00B1adimos a las yemas el > az\u00C3\u00BAcar. Montamos la mezcla con la batidora de varillas, y > a\u00C3\u00B1adimos poco a poco, con ayuda de una esp\u00C3\u00A1tula, el > cacao. Montamos las claras al punto de nieve y las a\u00C3\u00B1adimos con > la mezcla anterior. Precalentamos el horno arriba y abajo a 200\u00C2\u00BA > mientras echamos la mezcla en un molde engrasado. Una vez listo el horno, > lo bajamos a 170\u00C2\u00BA y lo horneamos en posici\u00C3\u00B3n central > durante 8 o 10 minutos. Luego meter a la nevera y dejar enfriar. \n > \nPara la tarta de naranja: poner al fuego medio litro de leche y cuando > est\u00C3\u00A9 hiviendo a\u00C3\u00B1adir los 2 sobres de gelatina de > naranja hasta que queden disueltos y despu\u00C3\u00A9s a\u00C3\u00B1adir > los otros 500ml de leche fr\u00C3\u00ADa. Verter encima de la base de > bizcocho.\n \nPara la cobertura: Fundir el chocolate y echarlo sobre la > capa anterior (Cuando est\u00C3\u00A9 un poco cuajado, para que no se > mezcle). Y decorar encima del chocolate con trozos de naranja caramelizada. > Dejar enfriar en la nevera y ya se puede servir." . > _:noded72dc17d6916ba79e19876c5baf93e6a < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Duration> . > _:noded72dc17d6916ba79e19876c5baf93e6a < > http://vocab.sindice.net/any23#hrecipe/durationTime> "20-40 min" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/duration> > _:noded72dc17d6916ba79e19876c5baf93e6a . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/photo> < > http://any23.org/descargas/foto.aspx?id=10881&w=340&h=280> . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/photo> < > http://any23.org/descargas/foto.aspx?id=10882&w=103&h=68> . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/photo> < > http://any23.org/descargas/foto.aspx?id=10883&w=103&h=68> . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/photo> < > http://any23.org/descargas/foto.aspx?id=10884&w=103&h=68> . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/photo> < > http://any23.org/descargas/foto.aspx?id=3603&w=120&h=80> . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/author> "noelia21" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/tag> "/recetas/tags/batidora.aspx" > . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/tag> > "/recetas/tags/chocolate.aspx" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/tag> "/recetas/tags/dulces.aspx" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/tag> "/recetas/tags/exotica.aspx" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/tag> "/recetas/tags/horno.aspx" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/tag> > "/recetas/tags/internacional.aspx" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/tag> > "/recetas/tags/mediterranea.aspx" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/tag> "/recetas/tags/naranjas.aspx" > . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/tag> "/recetas/tags/postres.aspx" . > _:node8d8610c0d82b27c73ca7b13cb9dd17c4 < > http://vocab.sindice.net/any23#hrecipe/tag> "/recetas/tags/tartas.aspx" . > <http://any23.org/tmp/> <http://www.facebook.com/2008/fbmlapp_id> > "203441256369548" . > <http://any23.org/tmp/> <http://opengraphprotocol.org/schema/language> > "es" . > <http://any23.org/tmp/> <http://opengraphprotocol.org/schema/title> > "Receta de Tarta de naranja y chocolate - Gallina Blanca" . > <http://any23.org/tmp/> <http://opengraphprotocol.org/schema/url> " > http://www.gallinablanca.es/receta/tarta-de-naranja-y-chocolate.aspx" . > <http://any23.org/tmp/> <http://opengraphprotocol.org/schema/image> " > http://www.gallinablanca.es/descargas/foto.aspx?id=10881&w=340&h=280" . > <http://any23.org/tmp/> <http://opengraphprotocol.org/schema/description> > "La receta de Tarta de naranja y chocolate se prepara con: Para el > bizcocho (molde 22cm):, 4 huevos, 120g az\u00C3\u00BAcar, 80g harina, 20g > harina de ma\u00C3\u00ADz, 1cucharada cacao en polvo, Para la tarta de > naranja:, 2sobres gelatina de naranja, 1 litro leche, Para la cobertura:, > 1trozo chocolate de cobertura, 1trozo mantequilla, trozos de naranja" . > <http://any23.org/tmp/> <http://any23.org/tmp/publisher> < > https://plus.google.com/105493319455787602116> . > <http://any23.org/tmp/css/thickbox.css> <http://any23.org/tmp/stylesheet> > <https://plus.google.com/105493319455787602116> . > <http://any23.org/tmp/css/thickbox_ie.css> < > http://any23.org/tmp/stylesheet> < > https://plus.google.com/105493319455787602116> . > <http://any23.org/tmp/css/style.css> <http://any23.org/tmp/stylesheet> < > https://plus.google.com/105493319455787602116> . > <http://any23.org/tmp/css/print.css> <http://any23.org/tmp/stylesheet> < > https://plus.google.com/105493319455787602116> . > <http://any23.org/tmp/> <http://any23.org/tmp/canonical> < > http://www.gallinablanca.es/receta/tarta-de-naranja-y-chocolate.aspx> . > <http://any23.org/tmp/> <http://any23.org/tmp/tag> < > http://any23.org/tmp//recetas/tags/batidora.aspx> . > <http://any23.org/tmp/> <http://any23.org/tmp/tag> < > http://any23.org/tmp//recetas/tags/chocolate.aspx> . > <http://any23.org/tmp/> <http://any23.org/tmp/tag> < > http://any23.org/tmp//recetas/tags/dulces.aspx> . > <http://any23.org/tmp/> <http://any23.org/tmp/tag> < > http://any23.org/tmp//recetas/tags/exotica.aspx> . > <http://any23.org/tmp/> <http://any23.org/tmp/tag> < > http://any23.org/tmp//recetas/tags/horno.aspx> . > <http://any23.org/tmp/> <http://any23.org/tmp/tag> < > http://any23.org/tmp//recetas/tags/internacional.aspx> . > <http://any23.org/tmp/> <http://any23.org/tmp/tag> < > http://any23.org/tmp//recetas/tags/mediterranea.aspx> . > <http://any23.org/tmp/> <http://any23.org/tmp/tag> < > http://any23.org/tmp//recetas/tags/naranjas.aspx> . > <http://any23.org/tmp/> <http://any23.org/tmp/tag> < > http://any23.org/tmp//recetas/tags/postres.aspx> . > <http://any23.org/tmp/> <http://any23.org/tmp/tag> < > http://any23.org/tmp//recetas/tags/tartas.aspx> . > <http://any23.org/tmp/> <http://any23.org/tmp/nofollow> < > http://www.gallinablancastar.com> . > <http://any23.org/tmp/> <http://any23.org/tmp/nofollow> < > https://www.confianzaonline.es/empresas/gallinablanca.htm> . > <http://any23.org/tmp/> <http://any23.org/tmp/nofollow> < > http://www.calidalia.org> . > > > > Use Java Lib Directly > --------------------- > > _:node291f9650fb7cb180dfe9f2a517da3d4 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Recipe> <http://www.test.de> > <ex:html-mf-hrecipe> . > _:node2db92a81f15e0e34a74f1938ee71661 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> <http://www.test.de> > <ex:html-mf-hrecipe> . > _:node6a8e73a8c7b7ae98c4de29d7f105de5 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> <http://www.test.de> > <ex:html-mf-hrecipe> . > _:node312ed52e474c88e73381ffb1cced7f < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> <http://www.test.de> > <ex:html-mf-hrecipe> . > _:node843d8e81527589708c293b86acfd6538 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> <http://www.test.de> > <ex:html-mf-hrecipe> . > _:nodeb4701607270bb5b5f068a6ffd5d36 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> <http://www.test.de> > <ex:html-mf-hrecipe> . > _:node9d82df158631ba6b1b86e67eafc2f18 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> <http://www.test.de> > <ex:html-mf-hrecipe> . > _:node446bd1cf164674a42ec6be12e1176ea < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> <http://www.test.de> > <ex:html-mf-hrecipe> . > _:node55db47b314a4fe5d30f78def4f35e1 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> <http://www.test.de> > <ex:html-mf-hrecipe> . > _:noded1566ddf4b7bde404d53c0b0f3ff880 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> <http://www.test.de> > <ex:html-mf-hrecipe> . > _:nodead1a35d4c581857a9369c5239492625 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> <http://www.test.de> > <ex:html-mf-hrecipe> . > _:noded1b19daab03c227b181f52f289b90db < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> <http://www.test.de> > <ex:html-mf-hrecipe> . > _:node611354e4ae2246391f952effa94a4aa < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> <http://www.test.de> > <ex:html-mf-hrecipe> . > _:nodea3c6b44caad7f06a660231ee54e614 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Ingredient> <http://www.test.de> > <ex:html-mf-hrecipe> . > _:node412ea813c154eb362482233c5be4a0e4 < > http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < > http://vocab.sindice.net/any23#hrecipe/Duration> <http://www.test.de> > <ex:html-mf-hrecipe> . > > Thanks, > Robert > > -----Original Message----- > From: Robert Meusel [mailto:rob...@informatik.uni-mannheim.de] > Sent: Dienstag, 2. Oktober 2012 21:14 > To: user@any23.apache.org > Subject: Re: Irregularities with HRecipeExtractor > > Hi > > Any23.org extracts all included information from the document like links > between recipe and ingedients, values of ingredients the whole tree. My > Java code which uses the same lib version as any23.org returns just one > single triple for each occurence of an incredient or recipe e.g. _node... > #type ingredient. All other information are left out as the values of the > nodes and the connection between recipe and ingredient. > > Thanks for your help > > Robert > > > > Lewis John Mcgibbney <lewis.mcgibb...@gmail.com> schrieb: > > >Hi Robert, > > > >2012/10/2 Robert Meusel <rob...@informatik.uni-mannheim.de>: > > > >> Does anybody know where this differents come from and how we can fix it? > > > >What are the differences? > > > >Thank you > >Lewis > > > > -- Michele Mostarda Senior Software Engineer skype: michele.mostarda twitter: micmos mail: m...@michelemostarda.com site : http://www.michelemostarda.com