Re: [Wiki-research-l] My data summit working groups

2011-02-18 Thread David Strauss
The main limitation is that MongoDB has only rudimentary support for
parallelism. I'm trying to design a system that various departments
can use as a data source, and the statistics on the Editor Trends page
show MongoDB maxed out for days to dump en.wiki. I'd like more ability
to grow capacity, especially long-term.

On Sun, Feb 13, 2011 at 15:43, Steven Walling  wrote:
> On Sun, Feb 13, 2011 at 3:32 PM, David Strauss 
> wrote:
>>
>> > Edit history in an accessible form -- create a queryable NoSQL form of
>> data dumps
>>
>> I'd like to get this started ASAP. I think we can set up a bridge to
>> synchronize directly from MediaWiki to a tool like Cassandra. It will
>> provide a superior source for both XML dumps and analysis.
>
> See http://strategy.wikimedia.org/wiki/Editor_Trends_Study/Software for an
> already ongoing project very similar to this notion.
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>



-- 
David Strauss
   | da...@davidstrauss.net
   | +1 512 577 5827 [mobile]

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] My data summit working groups

2011-02-13 Thread Steven Walling
On Sun, Feb 13, 2011 at 3:32 PM, David Strauss wrote:

> > Edit history in an accessible form -- create a queryable NoSQL form of
> data dumps
>
> I'd like to get this started ASAP. I think we can set up a bridge to
> synchronize directly from MediaWiki to a tool like Cassandra. It will
> provide a superior source for both XML dumps and analysis.
>

See http://strategy.wikimedia.org/wiki/Editor_Trends_Study/Software for an
already ongoing project very similar to this notion.
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] My data summit working groups

2011-02-13 Thread David Strauss
> Edit history in an accessible form -- create a queryable NoSQL form of
data dumps

I'd like to get this started ASAP. I think we can set up a bridge to
synchronize directly from MediaWiki to a tool like Cassandra. It will
provide a superior source for both XML dumps and analysis.

> Data dumps -- ongoing improvements of the data dump creation process

I think we can improve this process by working on a queryable NoSQL
system that syncs directly from MediaWiki. It should allow us to produce
dumps in parallel and with more bandwidth than querying MySQL.

> Privacy -- making sure we act consistently with the letter and intent
of our privacy policy
( http://wikimediafoundation.org/wiki/Privacy_policy ) in developing new
analytics solutions

I'm happy to share thoughts here and participate in discussions.

> Big Data ad hoc mining infrastructure -- working through design
considerations for a NoSQL cluster

This seems to go hand-in-hand with the first two working groups.

> Fundraiser Analytics & Testing -- group devoted to QA of existing
systems

I'm trying to ramp down my work here so I can move onto the other
challenges.

-- 
David Strauss
   | da...@davidstrauss.net
   | +1 512 577 5827 [mobile]



signature.asc
Description: This is a digitally signed message part
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l