Re: [Dbpedia-discussion] image url hash keys incorrect?

2011-12-06 Thread Tommy Chheng
This solution is also flawed. Check with Batman_Kane.jpg I recommend using org.apache.commons.codec.digest.DigestUtils#md5Hex Relying on a commonly used library is a lot less bug prone. On Mon, Dec 5, 2011 at 4:17 PM, Tommy Chheng tommy.chh...@gmail.com wrote: Thanks to the folks

Re: [Dbpedia-discussion] image url hash keys incorrect?

2011-12-05 Thread Tommy Chheng
submit a patch but i'm unsure how to do so. On Sat, Dec 3, 2011 at 6:38 PM, Tommy Chheng tommy.chh...@gmail.com wrote: I'm using ImageExtractor#getImageUrl in the extraction_framework to get the url of an image.        val md = MessageDigest.getInstance(MD5)        val messageDigest

[Dbpedia-discussion] image url hash keys incorrect?

2011-12-03 Thread Tommy Chheng
I'm using ImageExtractor#getImageUrl in the extraction_framework to get the url of an image. val md = MessageDigest.getInstance(MD5) val messageDigest = md.digest(fileName.getBytes) val md5 = (new BigInteger(1, messageDigest)).toString(16) val hash1 =

Re: [Dbpedia-discussion] Help with Dbpedia Extraction Framework

2011-12-01 Thread Tommy Chheng
When using mvn scala:run, use MAVEN_OPTS=-Xmx rather than JAVA_OPTS The dump also comes in 27 files rather than one big one. You can use these alternatively. -- @tommychheng qwiki.com On Wed, Nov 30, 2011 at 11:01 PM, Amit Kumar amitk...@yahoo-inc.com wrote: Hi Pablo, Thanks for your

[Dbpedia-discussion] ImageExtractor getImageUrl

2011-10-20 Thread Tommy Chheng
I'm trying to use the ImageExtractor but it doesn't seem to work with utf8 characters correctly. For this filename: 東京大学総合研究博物館小石川分館0001.jpg I get http://upload.wikimedia.org/wikipedia/commons/1/1e/古市公威像0103.JPG But i should get:

[Dbpedia-discussion] Abstract parsing error

2011-08-09 Thread Tommy Chheng
I'm using the AbstractExtractor to grab abstract from wiki pages. It incorrectly parses http://en.wikipedia.org/wiki/Special:Export/Achilles I get this whole text block as one TextNode rather than a TextNode, SectionNode, TextNode. has come to mean a person's principal weakness.

[Dbpedia-discussion] Category Node?

2011-07-18 Thread Tommy Chheng
I'm trying to use the WikiParser to determine the category list of a wikipedia page. The category tags are represented as TextNode objects but when I print out the toWikiText, it get an empty string. Should categories be TextNodes and if so, what's the correct extract the category name from the

Re: [Dbpedia-discussion] Compiling extraction_framework problems

2011-07-10 Thread Tommy Chheng
Thanks, it was an internet connection problem. I re-compiled and it worked now. Are these packages custom built in some fashion? Curious to know why they are different from the mvnrepository mirror in version number style and package names. -- Tommy Chheng On Sunday, July 10, 2011 at 9:07 AM

[Dbpedia-discussion] Hosting dbpedia jars on a public maven repo?

2011-07-10 Thread Tommy Chheng
Are there plans to deploy the dbpedia jars to a public mvn repo? I'll be happy to make the changes in the pom files if it's ok to do this. -- Tommy Chheng -- All of the data generated in your IT infrastructure

[Dbpedia-discussion] failed dependencies(harvester2/jbind) in extraction_framework main/live

2011-07-10 Thread Tommy Chheng
+ urlhttps://oss.sonatype.org/content/repositories/releases/url + /repository + -- Tommy Chheng http://tommy.chheng.com -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains

[Dbpedia-discussion] Compiling extraction_framework problems

2011-07-09 Thread Tommy Chheng
to successfully build the extraction_framework? Thanks for helping me get started with developing with dbpedia! -- Tommy Chheng -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains