Re: [Wiki-research-l] My data summit working groups
The main limitation is that MongoDB has only rudimentary support for parallelism. I'm trying to design a system that various departments can use as a data source, and the statistics on the Editor Trends page show MongoDB maxed out for days to dump en.wiki. I'd like more ability to grow capacity, especially long-term. On Sun, Feb 13, 2011 at 15:43, Steven Walling wrote: > On Sun, Feb 13, 2011 at 3:32 PM, David Strauss > wrote: >> >> > Edit history in an accessible form -- create a queryable NoSQL form of >> data dumps >> >> I'd like to get this started ASAP. I think we can set up a bridge to >> synchronize directly from MediaWiki to a tool like Cassandra. It will >> provide a superior source for both XML dumps and analysis. > > See http://strategy.wikimedia.org/wiki/Editor_Trends_Study/Software for an > already ongoing project very similar to this notion. > > > ___ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > -- David Strauss | da...@davidstrauss.net | +1 512 577 5827 [mobile] ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] My data summit working groups
On Sun, Feb 13, 2011 at 3:32 PM, David Strauss wrote: > > Edit history in an accessible form -- create a queryable NoSQL form of > data dumps > > I'd like to get this started ASAP. I think we can set up a bridge to > synchronize directly from MediaWiki to a tool like Cassandra. It will > provide a superior source for both XML dumps and analysis. > See http://strategy.wikimedia.org/wiki/Editor_Trends_Study/Software for an already ongoing project very similar to this notion. ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[Wiki-research-l] My data summit working groups
> Edit history in an accessible form -- create a queryable NoSQL form of data dumps I'd like to get this started ASAP. I think we can set up a bridge to synchronize directly from MediaWiki to a tool like Cassandra. It will provide a superior source for both XML dumps and analysis. > Data dumps -- ongoing improvements of the data dump creation process I think we can improve this process by working on a queryable NoSQL system that syncs directly from MediaWiki. It should allow us to produce dumps in parallel and with more bandwidth than querying MySQL. > Privacy -- making sure we act consistently with the letter and intent of our privacy policy ( http://wikimediafoundation.org/wiki/Privacy_policy ) in developing new analytics solutions I'm happy to share thoughts here and participate in discussions. > Big Data ad hoc mining infrastructure -- working through design considerations for a NoSQL cluster This seems to go hand-in-hand with the first two working groups. > Fundraiser Analytics & Testing -- group devoted to QA of existing systems I'm trying to ramp down my work here so I can move onto the other challenges. -- David Strauss | da...@davidstrauss.net | +1 512 577 5827 [mobile] signature.asc Description: This is a digitally signed message part ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l