Re: [Dbpedia-gsoc] GSoC2014 Extraction using Map Reduce

2014-03-21 Thread Simon Wichmann
Hi gsoc2014 fellows, I will paste my thoughts from my proposal about 4.3 -- Extraction Parallelization here. Maybe it is of some use for whoever will work on this task. 1. download step: You mention that Wikimedia are are capping the number of per-ip connections to 2. This means that if

Re: [Dbpedia-gsoc] GSoC2014 Extraction using Map Reduce

2014-03-21 Thread Andrea Di Menna
Hi all, a very very quick note to you all re point 1. download step, based on my personal experience: it is also possible to get wikipedia dumps from official mirrors ( http://dumps.wikimedia.org/mirrors.html). I have personally used the Your.org mirror which has no cap on connections. I have

Re: [Dbpedia-gsoc] GSoC2014 Extraction using Map Reduce

2014-03-19 Thread Abhijit Pratap Singh Tomar
Hi Dimitris, I was looking over at the github code mentioned in the discussion. First of all, can you give me an idea about how big a handicap would it be to not know Scala for this project ? I am not familiar with Scala at all but if we will be using only a specific subset of the language then I

Re: [Dbpedia-gsoc] GSoC2014 Extraction using Map Reduce

2014-03-17 Thread Dimitris Kontokostas
Hello Abhijit, (ccing the gsoc list) As I mentioned in my previous mail to the list [1] we have a list of mentors (4 at the time). After the selection period, one will be the *main* mentor. With dumps, we refer to wikipedia language dumps [2]. This is what DBpedia processes to extract RDF.

Re: [Dbpedia-gsoc] GSoC2014 Extraction using Map Reduce

2014-03-06 Thread Dimitris Kontokostas
Hello Abhijit and welcome to the DBpedia community, please take a look at the following pages for details and feel free to ask questions http://wiki.dbpedia.org/gsoc2014/ideas/ExtractionwithMapReduce/