Re: [Wikitech-l] Wikimedia logging infrastructure

2010-08-10 Thread Tim Starling
On 10/08/10 15:16, Rob Lanphier wrote:
 We have a single collection point for all of our logging, which is
 actually just a sampling of the overall traffic (designed to be
 roughly one out of every 1000 hits).  The process is described here:
 http://wikitech.wikimedia.org/view/Squid_logging
 
 My understanding is that this code is also involved somewhere:
 http://svn.wikimedia.org/viewvc/mediawiki/trunk/webstatscollector/
 ...but I'm a little unclear what the relationship between that code
 and code in trunk/udplog.

Maybe you should find out who wrote the relevant code and set up the
relevant infrastructure, and ask them directly. It's not difficult to
find out who it was.

 At any rate, there are a couple of problems with the way that it works:
 1.  Once we saturate the NIC on the logging machine, the quality of
 our sampling degrades pretty rapidly.  We've generally had a problem
 with that over the past few months.

We haven't saturated any NICs.

-- Tim Starling



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikimedia logging infrastructure

2010-08-10 Thread Rob Lanphier
On Mon, Aug 9, 2010 at 11:17 PM, Tim Starling tstarl...@wikimedia.org wrote:
 On 10/08/10 15:16, Rob Lanphier wrote:
 We have a single collection point for all of our logging, which is
 actually just a sampling of the overall traffic (designed to be
 roughly one out of every 1000 hits).  The process is described here:
 http://wikitech.wikimedia.org/view/Squid_logging

 My understanding is that this code is also involved somewhere:
 http://svn.wikimedia.org/viewvc/mediawiki/trunk/webstatscollector/
 ...but I'm a little unclear what the relationship between that code
 and code in trunk/udplog.

 Maybe you should find out who wrote the relevant code and set up the
 relevant infrastructure, and ask them directly. It's not difficult to
 find out who it was.

Well, yes, I was hoping you'd weigh in on this thread.

 At any rate, there are a couple of problems with the way that it works:
 1.  Once we saturate the NIC on the logging machine, the quality of
 our sampling degrades pretty rapidly.  We've generally had a problem
 with that over the past few months.

 We haven't saturated any NICs.

Sorry, I assumed it was a NIC.  There has been packet loss, from what
I understand.  I'll leave it at that.

Rob

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Sentence-level editing

2010-08-10 Thread Daniel Friesen
Jan Paul Posma wrote:
 Hello everyone,

 As this is my first post to the mailing list, let me introduce myself 
 shortly. My name is Jan Paul Posma, and I'm a 20 year old Computer Science 
 student from the Netherlands. I was introduced to MediaWiki by Roan Kattouw, 
 contractor for the Usability Initiative, who also happens to be a friend of 
 mine. :-)

 The reason for mailing to the list is the research I'll be conducting this 
 year: building a new editor for MediaWiki. Now I guess this has been 
 discussed over and over again, but this is a bit different. Instead of 
 building a true WYSIWYG editor, I'm proposing to build an editor that's based 
 on adding extra markup to the original, rendered page. This extra markup 
 provides the ability to edit these segments. With this approach, it's 
 possible to slowly enable editing for different elements. First, we can 
 enable editing for simple sentences (thus the title sentence-level 
 editing). Simple in this context means: without most wikicodes. I.e. only 
 links are allowed, and perhaps bold and italic. This editor can be extended 
 step by step to include other elements, such as references, images, tem
  plates, lists, tables, etc.

 The last few weeks I've worked on some prototypes to illustrate this idea.
 You can find the most advanced prototype here: 
 http://janpaulposma.nl/sle/prototype/prototype3.html
 The full project proposal and prototypes can be found here: 
 http://www.mediawiki.org/wiki/User:JanPaul123/Sentence-level_editing

 Right now I'm not looking for anything in specific, just whether or not you 
 think this is a good idea, technically feasible, etc. If you have suggestions 
 of any kind I'll be happy to hear them!

 Thanks for your time!
 Regards,
 Jan Paul Posma
   
Interesting.
I find the switch to 3row textarea from content a little jarring. A
further developed version could make use of contentEditable to provide
the same editing ability but without the negative side effects of the
textarea. This could either be done as a actual lightweight wysiwyg --
floating undo, bold, italic, link/unlink buttons (being sure that when
the user instinctively types [[internal link]] it turns into a link --
Wikia failed there). Or converting the simple content back to WikiText
and showing the syntax ([[, '', etc...) inline, with a floating cancel
and preview button just using contentEditable as a textarea replacement
that doesn't disrupt the page flow as much.

If you do start to get into the contentEditable stuff later, be sure to
try to contact me about it. I was working on a company project that
could really have used a good inline wysiwyg editor, but I found all the
wysiwyg editors out there lacking. They don't support contentEditable,
they don't let you detach the toolbar from the editor (or better, give
you an api and let you build the toolbar yourself), and they're such a
horrid mess of bloated code I can't for the life of me find a way to
extract the components that focus on cleaning up the mess the browsers
make of the html and abstracting the wisiwyg api. And the browsers make
a mess of things. So while I've had to make due with a workaround since
we haven't launched yet, I've been hoping to eventually finish creating
a MIT or MIT/GPL licensed ContentEditable library that'll abstract the
basic wysiwyg mess that every editor seams to be reinventing and make
rolling your own custom wysiwyg editor easier. Depending on what state
we're at whenever you do get to that, we could collaborate on that, or I
could probably spin it as a small contract or sponsorship for an
open-source project we could use which would also be useful in your
project and for other projects, if we're already launched. Or, perhaps
enough time might elapse that we already hired someone to develop the
library and left it open for people to freely use.

-- 
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikimedia logging infrastructure

2010-08-10 Thread Robert Rohde
Rob,

I'm not completely sure whether or not you are talking about the same
logging infrastructure that leads to our traffic stats at
stats.grok.se [1].  However, having worked with those stats and the
raw files provides by Domas [2], I am pretty sure that those squid
traffic stats are intended to be a complete traffic sample (or nearly
so) and not a 1/1000 sample.

We have done various fractionated samples in the past, but I believe
the squid logs used for traffic stats at the present time are not
fractionated.

If you are talking about a different process of logging not associated
with the traffic logs, then I apologize for my confusion.

-Robert Rohde

[1] http://stats.grok.se/
[2] http://dammit.lt/wikistats/



On Mon, Aug 9, 2010 at 10:16 PM, Rob Lanphier ro...@wikimedia.org wrote:
 Hi everyone,

 We're in the process of figuring out how we fix some of the issues in
 our logging infrastructure.  I'm both sending this email out to get
 the more knowledgeable folks to chime in about where I've got the
 details wrong, and for general comment on how we're doing our logging.
  We may need to recruit contract developers to work on this stuff, so
 we want to make sure we have clear and accurate information available,
 and we need to figure out what exactly we want to direct those people
 to do.

 We have a single collection point for all of our logging, which is
 actually just a sampling of the overall traffic (designed to be
 roughly one out of every 1000 hits).  The process is described here:
 http://wikitech.wikimedia.org/view/Squid_logging

 My understanding is that this code is also involved somewhere:
 http://svn.wikimedia.org/viewvc/mediawiki/trunk/webstatscollector/
 ...but I'm a little unclear what the relationship between that code
 and code in trunk/udplog.

 At any rate, there are a couple of problems with the way that it works:
 1.  Once we saturate the NIC on the logging machine, the quality of
 our sampling degrades pretty rapidly.  We've generally had a problem
 with that over the past few months.
 2.  We'd like to increase the granularity of logging so that we can do
 more sophiticated analysis.  For example, if we decide to run a test
 banner to a limited audience, we need to make sure we're getting more
 complete logs for that audience or else we're not getting enough data
 to do any useful analysis.

 If this were your typical commercial operation, the answer would be
 why aren't you just logging into Streambase? (or some other data
 warehousing storage solution).  I'm not suggesting that we do that (or
 even look at any of the solutions that bill themselves as open source
 alternatives), since, while our needs are increasing, we still aren't
 planning to be anywhere near as sophisticated as a lot of data
 tracking orgs.  Still, it's worth asking questions about our existing
 setup.  Should we be looking optimize our existing single-box setup,
 extending our software to have multi-node collection, or looking at a
 whole new collection strategy?

 Rob

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikimedia logging infrastructure

2010-08-10 Thread Daniel Kinzler
Robert Rohde schrieb:
 Rob,
 
 I'm not completely sure whether or not you are talking about the same
 logging infrastructure that leads to our traffic stats at
 stats.grok.se [1].  However, having worked with those stats and the
 raw files provides by Domas [2], I am pretty sure that those squid
 traffic stats are intended to be a complete traffic sample (or nearly
 so) and not a 1/1000 sample.

A lot of people seem to be confused about this. After getting contradicting info
at WikiSym about this, I asked Domas - and he confirmed that it counts the
complete traffic, it's not sampled. But it's only *counting*.

Any extra logs that contain detailed info about individual requests *are*
sampled, and are generally temporary.

-- daniel



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Sentence-level editing

2010-08-10 Thread Roan Kattouw
2010/8/10 Andrew Garrett agarr...@wikimedia.org:
 I do notice that the Preview button for sentence-level editing
 doesn't quite work (it shows the old text). There's some stuff
 missing, but I assume that this is because it's not finished yet.

It couldn't really work, could it? Cross-domain restrictions prevent
it from running sentences you enter on janpaulposma.nl through the
parser (sentences can contain links) on en.wikipedia.org . Could be
fixed by either setting up a MediaWiki instance on the same domain
(which is of limited use) or by or writing a quick proxy script for
api.php (much more useful, as that'll return 'real' results), but I'm
guessing this stage is really just about the UI.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Testing] Selenium

2010-08-10 Thread Benedikt Kaempgen
Hi,

Thanks a lot for your reply, Markus.

With your hints, I have made the SimpleSeleniumTestSuite working on my 
installation. 

In order to have a go on SMW testing, I am interested in two things, mainly:

* Where to have the extension test suites placed and registered for testing? 
This is already discussed through the mailing list and I hope there will be a 
consensus, soon.

* What functionalities does your testing framework provide? It would be good to 
have both a documentation of already implemented functions and planned or 
soon-to-come functions. I know that Selenium already has many built-in tests, 
but your framework has the potential to provide for simple MW and extension 
specific tests that motivates developers (even non-technicians) to develop 
system tests.

Let me know if I can help.

Regards

Benedikt

--
Karlsruher Institut für Technologie (KIT)
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

Benedikt Kämpgen
Wissenschaftlicher Mitarbeiter

Kaiserstraße 12
Gebäude 11.40
76131 Karlsruhe

Telefon: +49 721 608-7946
Fax: +49 721 608-6580
E-Mail: benedikt.kaemp...@kit.edu
Web: http://www.kit.edu/
 
KIT - Universität des Landes Baden-Württemberg und
nationales Forschungszentrum in der Helmholtz-Gemeinschaft


-Original Message-
From: wikitech-l-boun...@lists.wikimedia.org 
[mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Markus Glaser
Sent: Tuesday, August 03, 2010 10:53 PM
To: Wikimedia developers
Subject: Re: [Wikitech-l] [Testing] Selenium

Hi Benedikt,

the framework was reworked several times the last few weeks, so I am afraid the 
documentation is slightly out of date. I will update it the next few days. As 
of now, you have to add your test classes to the autoloader and then adapt 
these settings and put it in your LocalSettings.php:

$wgEnableSelenium = true;
$wgGroupPermissions['sysop']['selenium'] = true;
$wgSeleniumTestSuites = array(
'SimpleSeleniumTestSuite',
);
// use no protocol here
$wgSeleniumTestsSeleniumHost = 'localhost';
// use of protocol is mandatory! also, selenium requests a trailing slash
$wgSeleniumTestsWikiUrl = 'http://localhost/phase3/';
$wgSeleniumServerPort = ;
$wgSeleniumTestsWikiUser  = 'WikiSysop';
$wgSeleniumTestsWikiPassword  = 'password';
$wgSeleniumTestsBrowsers = array(
'firefox' = '*chrome d:\\Firefox35\\firefox.exe',
'iexplorer' = '*iexploreproxy',
'opera' = '*chrome /usr/bin/opera',
);
$wgSeleniumTestsUseBrowser = 'firefox';

You can find a sample test in the maintenance/tests/selenium folder, which 
consists of a test case and a test suite. It's the test suite you have to add 
to the autoloader. For the sample test, this has already been done in the trunk.

Cheers,
Markus


-Ursprüngliche Nachricht-
Von: wikitech-l-boun...@lists.wikimedia.org 
[mailto:wikitech-l-boun...@lists.wikimedia.org] Im Auftrag von Benedikt Kaempgen
Gesendet: Dienstag, 3. August 2010 17:38
An: Wikimedia developers
Betreff: [Wikitech-l] [Testing] Selenium

Hello,

In order to test SMW, I would like to try out your Selenium testing framework, 
as described here [1]. 

Two things are not that clear to me:

- As of now, you have to manually add the test file to 
maintenance/tests/RunSeleniumTests.php. This will be replaced by a command line 
argument in the future. What exactly is one supposed to do here?

- Also, in section Architecture some files are mentioned, that I cannot find 
in /trunk/phase3, e.g., selenium/SimpleSeleniumTest oder 
selenium/LocalSeleniumSettings.php.sample. Why is this not the case?

Regards,

Benedikt

[1] http://www.mediawiki.org/wiki/SeleniumFramework


--
Karlsruhe Institute of Technology (KIT)
Institute of Applied Informatics and Formal Description Methods (AIFB)

Benedikt Kämpgen
Research Associate

Kaiserstraße 12
Building 11.40
76131 Karlsruhe, Germany

Phone: +49 721 608-7946
Fax: +49 721 608-6580
Email: benedikt.kaemp...@kit.edu
Web: http://www.kit.edu/

KIT - University of the State of Baden-Wuerttemberg and National Research 
Center of the Helmholtz Association


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikimedia logging infrastructure

2010-08-10 Thread Mark Bergsma
On 10-08-10 07:16, Rob Lanphier wrote:
 At any rate, there are a couple of problems with the way that it works:
 1.  Once we saturate the NIC on the logging machine, the quality of
 our sampling degrades pretty rapidly.  We've generally had a problem
 with that over the past few months.
   

As already stated elsewhere, we didn't really saturate any NICs, just
some socket buffers. Because of the large number of configured log
pipes, the software (udp2log) could not empty the socket buffers fast
enough.

 If this were your typical commercial operation, the answer would be
 why aren't you just logging into Streambase? (or some other data
 warehousing storage solution).  I'm not suggesting that we do that (or
 even look at any of the solutions that bill themselves as open source
 alternatives), since, while our needs are increasing, we still aren't
 planning to be anywhere near as sophisticated as a lot of data
 tracking orgs.  Still, it's worth asking questions about our existing
 setup.  Should we be looking optimize our existing single-box setup,
 extending our software to have multi-node collection, or looking at a
 whole new collection strategy?

   

Besides the ideas that are currently being kicked around of improving or
rewriting the udp log collection software, there's also always the
short-term, easy option of sending a multicast UDP stream, and having
multiple collectors with distinct log pipes setup. E.g. one machine for
the sampled logging, and another, independent machine to do all the
special purpose log streams. I do like more efficient software solutions
rather than throwing more iron at the problem, though. :)

-- 
Mark Bergsma m...@wikimedia.org
Operations Engineer, Wikimedia Foundation


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Sentence-level editing

2010-08-10 Thread Jan Paul Posma
On 10-Aug-2010, at 11:05, Roan Kattouw wrote:

 2010/8/10 Andrew Garrett agarr...@wikimedia.org:
 I do notice that the Preview button for sentence-level editing
 doesn't quite work (it shows the old text). There's some stuff
 missing, but I assume that this is because it's not finished yet.

Ah, I guess I wasn't quite clear on that. These are prototypes, 
user-interface mashups, without actual server-side logic behind it.

The next step of the project will be to write the server-side stuff that 
matches the sentences, and find out how good this his, what the edge-cases are 
and how those are handled, etc.

Finally, it can be made to a working plugin which only does the 
sentence-editing. The other things like references, images, etc. are a bit 
harder. These can be built one-by-one to extend this editor, but I think that 
only the sentence-level editor can be quite useful already.

Besides, with only the ability to edit the sentences there will be enough 
challenges already: performance, security, handling edit collisions, 
localization, etc.

Anyway, it would take quite a while to be able to do everything shown in the 
third prototype. It shows what we *could* have eventually. The plan is that in 
half a year the second prototype will be fully functional and available as an 
extension.

Best regards, Jan Paul


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Sentence-level editing

2010-08-10 Thread Jan Paul Posma

 Interesting.
 I find the switch to 3row textarea from content a little jarring. A
 further developed version could make use of contentEditable to provide
 the same editing ability but without the negative side effects of the
 textarea.

Actually, showing the original wikitext is one of the goals of this 
user-interface. The goal of this project is to lower the barriers for novice 
users, and providing a better way for them to learn the wikitext syntax. Hiding 
the wikitext completely would defeat this purpose.

In the best cases novice users would first edit some sentences, while noticing 
the wikitext syntax, but only some links and perhaps bold/italics. This is much 
less scary than a big textarea with lots of code. Then they may experiment with 
references, images, templates (once these are implemented), so they can slowly 
learn more. Eventually they can either be advanced users of this editor, or try 
the editor we have now, which will always be more powerful. The editor I'm 
proposing *isn't* a total replacement for the current editor, just an addition.

Best regards, Jan Paul
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Sentence-level editing

2010-08-10 Thread Nikola Smolenski
On 08/10/2010 11:59 AM, Jan Paul Posma wrote:
 Finally, it can be made to a working plugin which only does the 
 sentence-editing. The other things like references, images, etc. are a bit 
 harder. These can be built one-by-one to extend this editor, but I think that 
 only the sentence-level editor can be quite useful already.

An idea: perhaps instead of extending the sentenct-editor, you could 
introduce new kinds of editors for different types of content. For 
example, you could have an image editor with + and - buttons to change 
the size of the image and those ≡ buttons to change the alignment of the 
image; or a template editor that would just present a list of fields to 
fill in. This should be both easier to do for you, and easier to use for 
the end user.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Wikimedia logging infrastructure

2010-08-10 Thread Domas Mituzas
Hi!
 multiple collectors with distinct log pipes setup. E.g. one machine for
 the sampled logging, and another, independent machine to do all the
 special purpose log streams. I do like more efficient software solutions
 rather than throwing more iron at the problem, though. :)

Frankly, we could have same on single machine - e.g. two listeners on same 
multicast stream - for SMP perf :-)

Domas
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] a bug with wikipedia table dump

2010-08-10 Thread Platonides
Forwarding to xmldatadumps-l

Alexander Sibiryakov wrote:
 Hello.
 
 I found a bug with dump of 'page' table on last update at wikipedia dump 
 service (http://download.wikimedia.org).
 
 This file shouldn't be empty
 http://download.wikimedia.org/enwiki/20100730/enwiki-20100730-page.sql.gz but 
 it is.
 
 http://download.wikimedia.org/enwiki/20100730/ status is 'done' for it.
 
 Thanks for reading.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Sentence-level editing

2010-08-10 Thread Roan Kattouw
2010/8/10 Neil Kandalgaonkar ne...@wikimedia.org:
 This will introduce a big chasm in between making a simple sentence
 edit, and then making a big change like cleaning up a whole paragraph.
 But that's quite acceptable and we can build on this to make paragraph
 editors, image inserters, infobox editors, and so on.

Exactly. It's true that, at first, there'll be a rather large gap
between the 'simple' sentence-level editor and the 'full'
old-fashioned edit page, but the long-term vision (voiced by some, at
least, and I agree with it) it to close most (not all, that can't
really be done) of that gap with editors that can accomplish other
simple-ish tasks.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikimedia logging infrastructure

2010-08-10 Thread Rob Lanphier
Hi Mark,

Thanks for the helpful reply.  Comments inline:

On Tue, Aug 10, 2010 at 2:54 AM, Mark Bergsma m...@wikimedia.org wrote:
 As already stated elsewhere, we didn't really saturate any NICs, just
 some socket buffers. Because of the large number of configured log
 pipes, the software (udp2log) could not empty the socket buffers fast
 enough.

Based on this and IRC conversations with Tim and Domas, here's my
understanding of things now (restating to make sure that I
understand):

The current system is a single-threaded application that takes packets
in synchronously, and spits them out to several places based on the
configuration file described here:
http://wikitech.wikimedia.org/view/Squid_logging

One problem that we're hitting is that the configuration of this
daemon^H^H^H^H^H^Hlistener is that when it gets too bogged down with a
complex configuration, it doesn't get around to emptying the socket
buffer.  Since it's single threaded, it's handling each of the
configured logging destinations before reading the next packet.  We're
not CPU-bound at this point.  The existing solution seems to start
flaking out at 40% CPU with a complicated configuration, and is
humming along at 20% with the current simplified config.  The problem
is that we're blocking while we fire up awk or whatever on the logging
side, and overflowing the socket buffer.

A solution that Tim and others are kicking around is reworking the
listener in one or more of the following ways:
1.  Move to some non-blocking networking library (e.g. Boost asio, libevent)
2.  Go multi-threaded

Mark, as you point out, we could go with some multicast solution if we
need to split it up among boxes.  As Domas points out, we could even
go multi-process on the same box without really maxing it out.

The solutions we're talking about seem to solve the socket buffer
problem, but it sounds like we may also need to get some clearer
requirements on any new functionality that's needed.  It sounds like
we'll be able to get some more mileage out of the existing solution
with some of the reworking described above.  It's not entirely clear
yet if this buys us enough capacity+capability for the increased
requirements.  I'll check in with Tomasz and others working on
fundraiser stuff to find out more.

Rob

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Sentence-level editing

2010-08-10 Thread Aryeh Gregor
On Mon, Aug 9, 2010 at 6:55 PM, Jan Paul Posma jp.po...@gmail.com wrote:
 The last few weeks I've worked on some prototypes to illustrate this idea.
 You can find the most advanced prototype here: 
 http://janpaulposma.nl/sle/prototype/prototype3.html
 The full project proposal and prototypes can be found here: 
 http://www.mediawiki.org/wiki/User:JanPaul123/Sentence-level_editing

 Right now I'm not looking for anything in specific, just whether or not you 
 think this is a good idea, technically feasible, etc. If you have suggestions 
 of any kind I'll be happy to hear them!

This looks like an excellent incremental improvement in editing
usability.  One major problem I've seen (anecdotally) with people
trying to edit typical Wikipedia articles is that they get intimidated
by the wall of wikitext, which often has large templates and things
obscuring the text they were trying to get at.  I think the problem of
wikitext complexity will be greatly mitigated if you can go into some
edit mode where you can just click a sentence to edit it.  New users
can then readily ignore the funny square brackets and such, and we
don't have to deal with trying to convert between WYSIWYG and
wikitext.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikimedia logging infrastructure

2010-08-10 Thread Platonides
Rob Lanphier wrote:
 Since it's single threaded, it's handling each of the
 configured logging destinations before reading the next packet.  We're
 not CPU-bound at this point.  The existing solution seems to start
 flaking out at 40% CPU with a complicated configuration, and is
 humming along at 20% with the current simplified config.  The problem
 is that we're blocking while we fire up awk or whatever on the logging
 side, and overflowing the socket buffer.

It is only launched once, then reused. So the problem is that they don't
eat the pipe data fast enough, so the pipe accumulates 64Kb and finally
blocks.


 A solution that Tim and others are kicking around is reworking the
 listener in one or more of the following ways:
 1.  Move to some non-blocking networking library (e.g. Boost asio, libevent)
 2.  Go multi-threaded
 
 The solutions we're talking about seem to solve the socket buffer
 problem, but it sounds like we may also need to get some clearer
 requirements on any new functionality that's needed.  It sounds like
 we'll be able to get some more mileage out of the existing solution
 with some of the reworking described above.  It's not entirely clear
 yet if this buys us enough capacity+capability for the increased
 requirements.  I'll check in with Tomasz and others working on
 fundraiser stuff to find out more.
 
 Rob

Going multithread is really easy for a socket listener. However, not so
much in the LogProcessors. If they are shared accross threads, you may
end up with all threads blocked in the fwrite and if they aren't shared,
the files may easily corrupt (depends on what you are exactly doing with
them).

Since the problem is that the socket buffer fills, it surprised me that
the server didn't increase SO_RCVBUF. That's not a solution but should
help (already set in /proc/sys/net/core/rmem_default ?).

The real issue is: what are you placing on your pipes that are so slow
to read from them?
Optimizing those scripts could be a simpler solution.
Wouldn't be hard to make the pipe writes non-blocking, properly blaming
the slow pipes that couldn't be written


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l