Hi gsoc2014 fellows,
I will paste my thoughts from my proposal about 4.3 -- Extraction
Parallelization here. Maybe it is of some use for whoever will work on
this task.
1. download step:
You mention that Wikimedia are are capping the number of per-ip
connections to 2. This means that if
Hi all,
a very very quick note to you all re point 1. download step, based on my
personal experience:
it is also possible to get wikipedia dumps from official mirrors (
http://dumps.wikimedia.org/mirrors.html).
I have personally used the Your.org mirror which has no cap on connections.
I have
Hi Dimitris,
I was looking over at the github code mentioned in the discussion. First of
all, can you give me an idea about how big a handicap would it be to not
know Scala for this project ? I am not familiar with Scala at all but if we
will be using only a specific subset of the language then I
Hello Abhijit,
(ccing the gsoc list)
As I mentioned in my previous mail to the list [1] we have a list of
mentors (4 at the time). After the selection period, one will be the *main*
mentor.
With dumps, we refer to wikipedia language dumps [2]. This is what DBpedia
processes to extract RDF.
Hello Abhijit and welcome to the DBpedia community,
please take a look at the following pages for details and feel free to ask
questions
http://wiki.dbpedia.org/gsoc2014/ideas/ExtractionwithMapReduce/