there is no 10gb limit, but it is the recommended bucket size if you
want to split up the file, according to my recent discussion with the
archive.org team, and they have been helping me optimize the storage.
the idea of mine is to make smaller blocks that can be fetched quickly
and that people fo
There is no such 10GB limit,
http://archive.org/details/ARCHIVETEAM-YV-6360017-6399947 (238 GB example)
ArchiveTeam/WikiTeam is uploading some dumps to Internet Archive, if you
want to join the effort use the mailing list
https://groups.google.com/group/wikiteam-discuss to avoid wasting resources.
Hello People,
I have completed my first set in uploading the osm/fosm dataset (350gb
unpacked) to archive.org
http://osmopenlayers.blogspot.de/2012/05/upload-finished.html
We can do something similar with wikipedia, the bucket size of
archive.org is 10gb, we need to split up the data in a way that
On Thu, May 17, 2012 at 07:43:09AM -0400, Anthony wrote:
>
> In fact, I think someone at WMF should contact Amazon and see if
> they'll let us conduct the experiment for free, in exchange for us
> creating the dump for them to host as a public data set
> (http://aws.amazon.com/publicdatasets/).
On 17/05/12 12:49, Anthony wrote:
Please have someone at WMF coordinate this so that there aren't
multiple requests made. In my opinion, it should preferably be made
by a WMF employee.
Fill out the form at
https://aws-portal.amazon.com/gp/aws/html-forms-controller/aws-dataset-inquiry
Tell them
On Thu, May 17, 2012 at 8:11 AM, Thomas Dalton wrote:
> On 17 May 2012 12:43, Anthony wrote:
>> In fact, I think someone at WMF should contact Amazon and see if
>> they'll let us conduct the experiment for free, in exchange for us
>> creating the dump for them to host as a public data set
>> (htt
On 17 May 2012 12:43, Anthony wrote:
> In fact, I think someone at WMF should contact Amazon and see if
> they'll let us conduct the experiment for free, in exchange for us
> creating the dump for them to host as a public data set
> (http://aws.amazon.com/publicdatasets/).
What dump are you going
Please have someone at WMF coordinate this so that there aren't
multiple requests made. In my opinion, it should preferably be made
by a WMF employee.
Fill out the form at
https://aws-portal.amazon.com/gp/aws/html-forms-controller/aws-dataset-inquiry
Tell them you want to create a public data se
On Thu, May 17, 2012 at 7:27 AM, J Alexandr Ledbury-Romanov
wrote:
> I'd like to point out that the increasingly technical nature of this
> conversation probably belongs either on wikitech-l, or off-list, and that
> the strident nature of the comments is fast approaching inappropriate.
Really? I
I'd like to point out that the increasingly technical nature of this
conversation probably belongs either on wikitech-l, or off-list, and that
the strident nature of the comments is fast approaching inappropriate.
Alex
Wikimedia-l list administrator
2012/5/17 Anthony
> On Thu, May 17, 2012 at
On Thu, May 17, 2012 at 2:06 AM, John wrote:
> On Thu, May 17, 2012 at 1:52 AM, Anthony wrote:
>> On Thu, May 17, 2012 at 1:22 AM, John wrote:
>> > Anthony the process is linear, you have a php inserting X number of rows
>> > per
>> > Y time frame.
>>
>> Amazing. I need to switch all my databas
On Thu, May 17, 2012 at 6:06 AM, John wrote:
> If your willing to foot the bill for the new hardware
> Ill gladly prove my point
given the millions of dollars that wikipedia has, it should not be a
problem to provide such resources for a good cause like that.
--
James Michael DuPont
Member of F
On Thu, May 17, 2012 at 1:52 AM, Anthony wrote:
> On Thu, May 17, 2012 at 1:22 AM, John wrote:
> > Anthony the process is linear, you have a php inserting X number of rows
> per
> > Y time frame.
>
> Amazing. I need to switch all my databases to MySQL. It can insert X
> rows per Y time frame,
On Thu, May 17, 2012 at 1:22 AM, John wrote:
> Anthony the process is linear, you have a php inserting X number of rows per
> Y time frame.
Amazing. I need to switch all my databases to MySQL. It can insert X
rows per Y time frame, regardless of whether the database is 20
gigabytes or 20 teraby
Well to be honest, I am still upset about how much data is deleted
from wikipedia because it is not "notable",
there are so many articles that I might be interested in that are lost
in the same garbage as spam and other things.
We should make non notable articles and non harmful ones available in
t
Anthony the process is linear, you have a php inserting X number of rows
per Y time frame. Yes rebuilding the externallinks, links, and langlinks
tables will take some additional time and wont scale. However I have been
working with the toolserver since 2007 and Ive lost count of the number of
time
On Thu, May 17, 2012 at 12:45 AM, John wrote:
> Simple.wikipedia is nothing like en.wikipedia I care to dispute that
> statement, All WMF wikis are setup basically the same (an odd extension here
> or there is different, and different namespace names at times) but for the
> purpose of recovery sim
*Simple.wikipedia is nothing like en.wikipedia* I care to dispute that
statement, All WMF wikis are setup basically the same (an odd extension
here or there is different, and different namespace names at times) but for
the purpose of recovery simplewiki_p is a very standard example. this issue
isnt
On Thu, May 17, 2012 at 12:30 AM, John wrote:
> Ill run a quick benchmark and import the full history of simple.wikipedia to
> my laptop wiki on a stick, and give an exact duration
Simple.wikipedia is nothing like en.wikipedia. For one thing, there's
no need to turn on $wgCompressRevisions with
Ill run a quick benchmark and import the full history of simple.wikipedia
to my laptop wiki on a stick, and give an exact duration
On Thu, May 17, 2012 at 12:26 AM, John wrote:
> Toolserver is a clone of the wmf servers minus files. they run a database
> replication of all wikis. these times ar
Toolserver is a clone of the wmf servers minus files. they run a database
replication of all wikis. these times are dependent on available hardware
and may very, but should provide a decent estimate
On Thu, May 17, 2012 at 12:23 AM, Anthony wrote:
> On Thu, May 17, 2012 at 12:18 AM, John wrote
On Thu, May 17, 2012 at 12:18 AM, John wrote:
> take a look at http://www.mediawiki.org/wiki/Manual:Importing_XML_dumps for
> exactly how to import an existing dump, I know the process of re-importing
> a cluster for the toolserver is normally just a few days when they have the
> needed dumps.
To
On Thu, May 17, 2012 at 12:13 AM, John wrote:
> that two week estimate was given worst case scenario. Given the best case
> we are talking as little as a few hours for the smaller wikis to 5 days or
> so for a project the size of enwiki. (see
> http://lists.wikimedia.org/pipermail/xmldatadumps-l/2
take a look at http://www.mediawiki.org/wiki/Manual:Importing_XML_dumps for
exactly how to import an existing dump, I know the process of re-importing
a cluster for the toolserver is normally just a few days when they have the
needed dumps.
On Thu, May 17, 2012 at 12:13 AM, John
> wrote:
> that
that two week estimate was given worst case scenario. Given the best case
we are talking as little as a few hours for the smaller wikis to 5 days or
so for a project the size of enwiki. (see
http://lists.wikimedia.org/pipermail/xmldatadumps-l/2012-May/000491.htmlfor
progress on image dumps`)
On We
On Thu, May 17, 2012 at 12:03:02AM -0400, John wrote:
> Except for files, getting a content clone up is relativity easy, and can be
> done in a fairly quick order (aka less than two weeks for everything). I
> know there is talk about getting a rsync setup for images.
Ouch, 2 weeks. We need the ima
Except for files, getting a content clone up is relativity easy, and can be
done in a fairly quick order (aka less than two weeks for everything). I
know there is talk about getting a rsync setup for images.
___
Wikimedia-l mailing list
Wikimedia-l@lists.
On Wed, May 16, 2012 at 11:11:04PM -0400, John wrote:
> I know from experience that a wiki can be re-built from any one of the
> dumps that are provided, (pages-meta-current) for example contains
> everything needed to reboot a site except its user database
> (names/passwords ect). see
> http://www
28 matches
Mail list logo