[Wikitech-l] Live recent changes feed

2013-03-09 Thread Victor Vasiliev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi everybody,

For long time it was acknowledged that our current way of serving the
recent changes feed to users (IRC with formatting using funny control
codes) is one of the worst-suited for this purpose. It made the life
miserable both for users who had to parse it (since nobody is actually
reading it from IRC) and for developers who had to fit that thing into
IRC line length limit. Time passed, and many ways were suggested to fix
this (including 
and
),
but
nobody actually went ahead and made it work.

After recent discussion on this list I realized that this has been in
discussion for as long as four years I went WTF and decided to Just Go
Ahead and Fix It. As a result, I made a patch to MediaWiki which allows
it to output recent changes feed in JSON:
   

Also, I wrote a daemon which captures this feed and serves them through
WebSockets and simple text-oriented protocol which serves same JSON
without WebSocket wrapping (for poor souls writing in languages without
proper WebSocket support):


This daemon is written in Python using Twisted and Autobahn and it takes
~200 lines of code (initial version took ~80).

As a bonus, this involves no XML streaming in any form (unlike XMPP or
PubSubHubbub), so the unicorns are happy and unharmed, and minds of
programmers implementing this will remain unfried.

I hope that now getting recent changes via reasonable format is a matter
of code review and deployment, and we will finally get something
reasonable to work with (with access from web browsers!).

- -- Victor.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQIcBAEBAgAGBQJRPBffAAoJEHEOTaoYvDHXUCMP/jml/EGAxXLuz1sGrS5R0iRF
EJCjUKkysl1Gw0Wmr597UETtF1BCHh1myicGBN6tEjEd4N9rkNC8embBIdMjnlNN
KFfJeg4cSMhfIprjFQHdYjy3jw6mK1Kr87jc/KIWkDdWwoV5EmcbQ/cGc/UQrcd2
9cVmc3qUXWEf/oxhv3nGTfeW6gJDRZshpB66+YNr5LzAaBhroastW1r0b8UDXZt9
3u1BOr9lcHbi62DLqPOCH+aXljOidrjoWff+cV9CzUS9M4axcHThzu4Eo1s7EpgX
iWPVTuk3By3/EPxk9gJPETl7oPET6qNvNkUzix9Enu3iGuaWwEcano8xgFIfAWp8
/Prf00xIe6VjMWssb3M+G9OkaclDBTPnMs9WxYMGHui8SZT62zQowJKeF+HrphjA
A/rrpHEfQz4TlutrvtPthSKTAICzuXDcnXLUxIHhvJfVF6iq57ntA8iJ2vrrqQge
ISOIZRgfDNQFb1UOER4P5VsXN1fKaP72OCSbP9smlVOtWgoCz0IqifdFSvc/Wo/O
Fj5cafbPPB8R0AqMb29bnv89u6SvVCh5Y3v9pK5523xo0LVP+WGXe+WNuxW9jjeZ
+y/d3EQTjl40pP/MzsBxR+BCz+Q84myjmpO0FvmPPxqxnA2bz0dSyfYyZlIIu7Mj
zesgY0TGThmu12q0Y068
=oGgQ
-END PGP SIGNATURE-

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-09 Thread Liangent
On Sun, Mar 10, 2013 at 1:19 PM, Victor Vasiliev  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hi everybody,
>
> For long time it was acknowledged that our current way of serving the
> recent changes feed to users (IRC with formatting using funny control
> codes) is one of the worst-suited for this purpose. It made the life
> miserable both for users who had to parse it (since nobody is actually
> reading it from IRC)

Note that some people *are* actually reading it from IRC. For example,
I read it when I'm starting a bot run, if the bot is made quickly and
doesn't output much info.

-Liangent

> and for developers who had to fit that thing into
> IRC line length limit. Time passed, and many ways were suggested to fix
> this (including 
> and
> ),
> but
> nobody actually went ahead and made it work.
>
> After recent discussion on this list I realized that this has been in
> discussion for as long as four years I went WTF and decided to Just Go
> Ahead and Fix It. As a result, I made a patch to MediaWiki which allows
> it to output recent changes feed in JSON:
>
>
> Also, I wrote a daemon which captures this feed and serves them through
> WebSockets and simple text-oriented protocol which serves same JSON
> without WebSocket wrapping (for poor souls writing in languages without
> proper WebSocket support):
> 
>
> This daemon is written in Python using Twisted and Autobahn and it takes
> ~200 lines of code (initial version took ~80).
>
> As a bonus, this involves no XML streaming in any form (unlike XMPP or
> PubSubHubbub), so the unicorns are happy and unharmed, and minds of
> programmers implementing this will remain unfried.
>
> I hope that now getting recent changes via reasonable format is a matter
> of code review and deployment, and we will finally get something
> reasonable to work with (with access from web browsers!).
>
> - -- Victor.
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with undefined - http://www.enigmail.net/
>
> iQIcBAEBAgAGBQJRPBffAAoJEHEOTaoYvDHXUCMP/jml/EGAxXLuz1sGrS5R0iRF
> EJCjUKkysl1Gw0Wmr597UETtF1BCHh1myicGBN6tEjEd4N9rkNC8embBIdMjnlNN
> KFfJeg4cSMhfIprjFQHdYjy3jw6mK1Kr87jc/KIWkDdWwoV5EmcbQ/cGc/UQrcd2
> 9cVmc3qUXWEf/oxhv3nGTfeW6gJDRZshpB66+YNr5LzAaBhroastW1r0b8UDXZt9
> 3u1BOr9lcHbi62DLqPOCH+aXljOidrjoWff+cV9CzUS9M4axcHThzu4Eo1s7EpgX
> iWPVTuk3By3/EPxk9gJPETl7oPET6qNvNkUzix9Enu3iGuaWwEcano8xgFIfAWp8
> /Prf00xIe6VjMWssb3M+G9OkaclDBTPnMs9WxYMGHui8SZT62zQowJKeF+HrphjA
> A/rrpHEfQz4TlutrvtPthSKTAICzuXDcnXLUxIHhvJfVF6iq57ntA8iJ2vrrqQge
> ISOIZRgfDNQFb1UOER4P5VsXN1fKaP72OCSbP9smlVOtWgoCz0IqifdFSvc/Wo/O
> Fj5cafbPPB8R0AqMb29bnv89u6SvVCh5Y3v9pK5523xo0LVP+WGXe+WNuxW9jjeZ
> +y/d3EQTjl40pP/MzsBxR+BCz+Q84myjmpO0FvmPPxqxnA2bz0dSyfYyZlIIu7Mj
> zesgY0TGThmu12q0Y068
> =oGgQ
> -END PGP SIGNATURE-
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-10 Thread Kevin Israel
On 03/10/2013 12:19 AM, Victor Vasiliev wrote:
> After recent discussion on this list I realized that this has been
> in discussion for as long as four years I went WTF and decided to
> Just Go Ahead and Fix It. As a result, I made a patch to MediaWiki
> which allows it to output recent changes feed in JSON: 
> 
> 
> Also, I wrote a daemon which captures this feed and serves them
> through WebSockets and simple text-oriented protocol [...] : 
> 
> 
> This daemon is written in Python using Twisted and Autobahn and it
> takes ~200 lines of code (initial version took ~80).

One thing you should consider is whether to escape non-ASCII
characters (characters above U+007F) or to encode them using UTF-8.

Python's json.dumps() escapes these characters by default
(ensure_ascii = True). If you don't want them escaped (as hex-encoded
UTF-16 code units), it's best to decide now, before clients with
broken UTF-8 support come into use.

I recently made a [patch][1] (not yet merged) that would add an opt-in
"UTF8_OK" feature to FormatJson::encode(). The new option would
unescape everything above U+007F (except for U+2028 and U+2029, for
compatibility with JavaScript eval() based parsing).

> I hope that now getting recent changes via reasonable format is a
> matter of code review and deployment, and we will finally get
> something reasonable to work with (with access from web
> browsers!).

I don't consider encoding "撤销由158.64.77.102于2013年1月22日 (二)
16:46的版本24659468中的繁简破坏" (90 bytes using UTF-8) as

"\u64a4\u9500\u7531158.64.77.102\u4e8e2013\u5e741\u670822\u65e5
(\u4e8c)
16:46\u7684\u7248\u672c24659468\u4e2d\u7684\u7e41\u7b80\u7834\u574f"
(141 bytes)

to be reasonable at all for a brand-new protocol running over an 8-bit
clean channel.

[1]: https://gerrit.wikimedia.org/r/#/c/50140/

-- 
Wikipedia user PleaseStand
http://en.wikipedia.org/wiki/User:PleaseStand

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-10 Thread Brian Wolff
On 2013-03-10 1:20 AM, "Victor Vasiliev"  wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hi everybody,
>
> For long time it was acknowledged that our current way of serving the
> recent changes feed to users (IRC with formatting using funny control
> codes) is one of the worst-suited for this purpose. It made the life
> miserable both for users who had to parse it (since nobody is actually
> reading it from IRC) and for developers who had to fit that thing into
> IRC line length limit. Time passed, and many ways were suggested to fix
> this (including 
> and
> <
https://www.mediawiki.org/wiki/Requests_for_comment/Structured_data_push_notification_support_for_recent_changes
>),
> but
> nobody actually went ahead and made it work.
>
> After recent discussion on this list I realized that this has been in
> discussion for as long as four years I went WTF and decided to Just Go
> Ahead and Fix It. As a result, I made a patch to MediaWiki which allows
> it to output recent changes feed in JSON:
>
>
> Also, I wrote a daemon which captures this feed and serves them through
> WebSockets and simple text-oriented protocol which serves same JSON
> without WebSocket wrapping (for poor souls writing in languages without
> proper WebSocket support):
> 
>
> This daemon is written in Python using Twisted and Autobahn and it takes
> ~200 lines of code (initial version took ~80).
>
> As a bonus, this involves no XML streaming in any form (unlike XMPP or
> PubSubHubbub), so the unicorns are happy and unharmed, and minds of
> programmers implementing this will remain unfried.
>
> I hope that now getting recent changes via reasonable format is a matter
> of code review and deployment, and we will finally get something
> reasonable to work with (with access from web browsers!).
>
> - -- Victor.
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with undefined - http://www.enigmail.net/
>
> iQIcBAEBAgAGBQJRPBffAAoJEHEOTaoYvDHXUCMP/jml/EGAxXLuz1sGrS5R0iRF
> EJCjUKkysl1Gw0Wmr597UETtF1BCHh1myicGBN6tEjEd4N9rkNC8embBIdMjnlNN
> KFfJeg4cSMhfIprjFQHdYjy3jw6mK1Kr87jc/KIWkDdWwoV5EmcbQ/cGc/UQrcd2
> 9cVmc3qUXWEf/oxhv3nGTfeW6gJDRZshpB66+YNr5LzAaBhroastW1r0b8UDXZt9
> 3u1BOr9lcHbi62DLqPOCH+aXljOidrjoWff+cV9CzUS9M4axcHThzu4Eo1s7EpgX
> iWPVTuk3By3/EPxk9gJPETl7oPET6qNvNkUzix9Enu3iGuaWwEcano8xgFIfAWp8
> /Prf00xIe6VjMWssb3M+G9OkaclDBTPnMs9WxYMGHui8SZT62zQowJKeF+HrphjA
> A/rrpHEfQz4TlutrvtPthSKTAICzuXDcnXLUxIHhvJfVF6iq57ntA8iJ2vrrqQge
> ISOIZRgfDNQFb1UOER4P5VsXN1fKaP72OCSbP9smlVOtWgoCz0IqifdFSvc/Wo/O
> Fj5cafbPPB8R0AqMb29bnv89u6SvVCh5Y3v9pK5523xo0LVP+WGXe+WNuxW9jjeZ
> +y/d3EQTjl40pP/MzsBxR+BCz+Q84myjmpO0FvmPPxqxnA2bz0dSyfYyZlIIu7Mj
> zesgY0TGThmu12q0Y068
> =oGgQ
> -END PGP SIGNATURE-
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Good work. Its wonderful to see people just go and fix things that need
fixing instead of the usual bikesheddingness that often takes place.

-bawolff

P.s. I too used to read the irc rc feed (back when i was an active editor
at enwikinews). It can be useful on smaller projects
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-10 Thread Victor Vasiliev
On 03/10/2013 06:30 AM, Kevin Israel wrote:
> On 03/10/2013 12:19 AM, Victor Vasiliev wrote:
> One thing you should consider is whether to escape non-ASCII
> characters (characters above U+007F) or to encode them using UTF-8.

"Whatever the JSON encoder we use does".

> Python's json.dumps() escapes these characters by default
> (ensure_ascii = True). If you don't want them escaped (as hex-encoded
> UTF-16 code units), it's best to decide now, before clients with
> broken UTF-8 support come into use.

As long as it does not add newlines, this is perfectly fine protocol-wise.

> I recently made a [patch][1] (not yet merged) that would add an opt-in
> "UTF8_OK" feature to FormatJson::encode(). The new option would
> unescape everything above U+007F (except for U+2028 and U+2029, for
> compatibility with JavaScript eval() based parsing).

The part between MediaWiki and the daemon does not matter that much
(except for hitting the size limit on packets, and even then we are on
WMF's internal network, so we should not expect any packet loss and
problems with fragmentation). The daemon extracts the wiki name from the
JSON it received, so it reencodes the change anyways in the middle.

>> I hope that now getting recent changes via reasonable format is a
>> matter of code review and deployment, and we will finally get
>> something reasonable to work with (with access from web
>> browsers!).
> 
> I don't consider encoding "撤销由158.64.77.102于2013年1月22日 (二)
> 16:46的版本24659468中的繁简破坏" (90 bytes using UTF-8) as
> 
> "\u64a4\u9500\u7531158.64.77.102\u4e8e2013\u5e741\u670822\u65e5
> (\u4e8c)
> 16:46\u7684\u7248\u672c24659468\u4e2d\u7684\u7e41\u7b80\u7834\u574f"
> (141 bytes)
> 
> to be reasonable at all for a brand-new protocol running over an 8-bit
> clean channel.
> 

That's your bikeshed, not mine.

-- Victor.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-10 Thread Kevin Israel
On 03/10/2013 06:27 PM, Victor Vasiliev wrote:
> On 03/10/2013 06:30 AM, Kevin Israel wrote:
>> On 03/10/2013 12:19 AM, Victor Vasiliev wrote:
>> One thing you should consider is whether to escape non-ASCII
>> characters (characters above U+007F) or to encode them using UTF-8.
> 
> "Whatever the JSON encoder we use does".
> 
>> Python's json.dumps() escapes these characters by default
>> (ensure_ascii = True). If you don't want them escaped (as hex-encoded
>> UTF-16 code units), it's best to decide now, before clients with
>> broken UTF-8 support come into use.
> 
> As long as it does not add newlines, this is perfectly fine protocol-wise.

If "Whatever the JSON encoder we use does" means that one day, the
daemon starts sending UTF-8 encoded characters, it is quite possible
that existing clients will break because of previously unnoticed
encoding bugs. So I would like to see some formal documentation of the
protocol.

>> I recently made a [patch][1] (not yet merged) that would add an opt-in
>> "UTF8_OK" feature to FormatJson::encode(). The new option would
>> unescape everything above U+007F (except for U+2028 and U+2029, for
>> compatibility with JavaScript eval() based parsing).
> 
> The part between MediaWiki and the daemon does not matter that much
> (except for hitting the size limit on packets, and even then we are on
> WMF's internal network, so we should not expect any packet loss and
> problems with fragmentation). The daemon extracts the wiki name from the
> JSON it received, so it reencodes the change anyways in the middle.

It's good to know that it's quite easy to change the format of the
internal UDP packets without breaking existing clients -- that it's
possible to start using UTF-8 on the UDP side if necessary.

-- 
Wikipedia user PleaseStand
http://en.wikipedia.org/wiki/User:PleaseStand

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-10 Thread Bartosz Dziewoński

On Mon, 11 Mar 2013 00:11:59 +0100, Kevin Israel  wrote:


If "Whatever the JSON encoder we use does" means that one day, the
daemon starts sending UTF-8 encoded characters, it is quite possible
that existing clients will break because of previously unnoticed
encoding bugs. So I would like to see some formal documentation of the
protocol.


It's 2013. If something still doesn't support receiving UTF-8 data and sending 
it back without corrupting the text, it should be chucked out of the window 
like now.

And I don't mean things like properly determining the length of a string etc., 
as these are not UTF-8 specific, and *are* hard to get right; I mean not 
breaking binary data.


--
Matma Rex

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-10 Thread Petr Bena
I appreciate someone does something, but this should have been more
discussed. I would like to highlight that our goal should NOT be to do
this a way that is most simple for developers of mediawiki to
implement and most simple for devops to maintain and setup. Our goal
should be to make this feed most simple to implement into target
application (bot, tool) for the developers of that tool. The ideal
feed should be pretty simple to be parseable by something as trivial
as a shell script with netcat or telnet on remote server (absolutely
no need to use some 3rd party libraries). I am fine with using JSON as
one option, but if it's the only option this new feed is supposed to
provide, it will be very hard to implement in some tools. Basically
anything what will require some extra libraries will make it harder
than it actually is - despite it could be more flexible and faster.

On Mon, Mar 11, 2013 at 12:11 AM, Kevin Israel  wrote:
> On 03/10/2013 06:27 PM, Victor Vasiliev wrote:
>> On 03/10/2013 06:30 AM, Kevin Israel wrote:
>>> On 03/10/2013 12:19 AM, Victor Vasiliev wrote:
>>> One thing you should consider is whether to escape non-ASCII
>>> characters (characters above U+007F) or to encode them using UTF-8.
>>
>> "Whatever the JSON encoder we use does".
>>
>>> Python's json.dumps() escapes these characters by default
>>> (ensure_ascii = True). If you don't want them escaped (as hex-encoded
>>> UTF-16 code units), it's best to decide now, before clients with
>>> broken UTF-8 support come into use.
>>
>> As long as it does not add newlines, this is perfectly fine protocol-wise.
>
> If "Whatever the JSON encoder we use does" means that one day, the
> daemon starts sending UTF-8 encoded characters, it is quite possible
> that existing clients will break because of previously unnoticed
> encoding bugs. So I would like to see some formal documentation of the
> protocol.
>
>>> I recently made a [patch][1] (not yet merged) that would add an opt-in
>>> "UTF8_OK" feature to FormatJson::encode(). The new option would
>>> unescape everything above U+007F (except for U+2028 and U+2029, for
>>> compatibility with JavaScript eval() based parsing).
>>
>> The part between MediaWiki and the daemon does not matter that much
>> (except for hitting the size limit on packets, and even then we are on
>> WMF's internal network, so we should not expect any packet loss and
>> problems with fragmentation). The daemon extracts the wiki name from the
>> JSON it received, so it reencodes the change anyways in the middle.
>
> It's good to know that it's quite easy to change the format of the
> internal UDP packets without breaking existing clients -- that it's
> possible to start using UTF-8 on the UDP side if necessary.
>
> --
> Wikipedia user PleaseStand
> http://en.wikipedia.org/wiki/User:PleaseStand
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-10 Thread Tyler Romeo
On Sun, Mar 10, 2013 at 7:34 PM, Petr Bena  wrote:

> I appreciate someone does something, but this should have been more
> discussed. I would like to highlight that our goal should NOT be to do
> this a way that is most simple for developers of mediawiki to
> implement and most simple for devops to maintain and setup. Our goal
> should be to make this feed most simple to implement into target
> application (bot, tool) for the developers of that tool. The ideal
> feed should be pretty simple to be parseable by something as trivial
> as a shell script with netcat or telnet on remote server (absolutely
> no need to use some 3rd party libraries). I am fine with using JSON as
> one option, but if it's the only option this new feed is supposed to
> provide, it will be very hard to implement in some tools. Basically
> anything what will require some extra libraries will make it harder
> than it actually is - despite it could be more flexible and faster.
>

I agree with the discussion, but I think this is a good starting point.

You can expect some patches from me soon. ;)

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-10 Thread Brian Wolff
>
> If "Whatever the JSON encoder we use does" means that one day, the
> daemon starts sending UTF-8 encoded characters, it is quite possible
> that existing clients will break because of previously unnoticed
> encoding bugs. So I would like to see some formal documentation of the
> protocol.

Json standard is pretty clear that any character can be escaped using \u
 code point or you can just have things be utf8. If clients break
because they can't handle that, that is the client's fault. Its not a hard
requirement.

I see  no reason why we couldnt change later if need be. Furthermore I see
no reason why we would care which way we went on that issue. The raw json
isnt meant for human eyes.

-bawolff
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-10 Thread Victor Vasiliev
On 03/10/2013 07:34 PM, Petr Bena wrote:
> I appreciate someone does something, but this should have been more
> discussed.

Well, we can discuss this now.  I don't like the discussions which end
up with "here's a design of our superpony which should have those 9000+
features", and then we have no progress.  I think it's sort of nice we
have a starting point now.

> I would like to highlight that our goal should NOT be to do
> this a way that is most simple for developers of mediawiki to
> implement and most simple for devops to maintain and setup. Our goal
> should be to make this feed most simple to implement into target
> application (bot, tool) for the developers of that tool.

There are different type of clients, and they all have their own
"easiest format".  WebSockets is pretty much the only serious option
there, and hence this is something which we really want to support.
WebSocket protocol is also very complex, so for non-browser apps I added
the second protocol (I am still not sure how good this idea is).

> The ideal
> feed should be pretty simple to be parseable by something as trivial
> as a shell script with netcat or telnet on remote server (absolutely
> no need to use some 3rd party libraries). I am fine with using JSON as
> one option, but if it's the only option this new feed is supposed to
> provide, it will be very hard to implement in some tools.

We want this to be a machine-readable feed.  The two most widespread
universal formats for transferring machine-readable structure data are
XML and JSON. JSON parsing is almost always easier than XML (if it's
not, then it's probably the fault of the JSON library in use).

The text-based protocol works with netcat (that's how I tested it).
However, it turns out that awk and sed are not well-suited for parsing
structured data (have you ever tried to parse XML from a shell script?).

> Basically
> anything what will require some extra libraries will make it harder
> than it actually is - despite it could be more flexible and faster.

I don't think I buy into the statement that lack of built-in JSON parser
is an issue with JSON, and not with that language's library structure.

-- Victor.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-10 Thread Tyler Romeo
My main concern with the program in its current state is the lack of
sufficient design. I mean, both the Configuration and MessageRouter objects
are glorified dictionaries (or defaultdicts), and global variables are used
for the router and config.

Also, the config protocol is almost definitely a bad idea. Since it's
unauthenticated, the only way to guarantee security is to use a Unix socket
(or some other only-locally-accessible method), at which point you already
have the means of stopping the server and reading the config. Finally,
stats should be fine if publicly available. In other words, the only useful
thing the control protocol could be used for is reloading the configuration.

Other than that, minor quirks, such as handleJSONCommand, a protocol
function, being put in the Subscriber class.

And, of course, there's the issue of performance. Python doesn't handle
threads, and since Twisted isn't multiprocess AFAIK, this might not be able
to handle that many connections.

Finally, other than WebSocket and the socket interface, the one
other subscription method we should have it some sort of HTTP hook call,
i.e., it sends an HTTP request to the subscriber. This allows event-driven
clients without having a socket constantly open.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-10 Thread Victor Vasiliev
On 03/10/2013 08:59 PM, Tyler Romeo wrote:
> My main concern with the program in its current state is the lack of
> sufficient design. I mean, both the Configuration and MessageRouter objects
> are glorified dictionaries (or defaultdicts), and global variables are used
> for the router and config.

That's quite intentional. The intent was to make it as simple as possible.

Anyways, this thing is currently 200 lines of code, so refactoring it is
really simple. I guess at some point every simple 200-line Python script
has to turn into a beautiful, elegant 600-line Python script.

(or into a 50-line Haskell script where nobody really understands what's
going on, but that's the other story)

> Also, the config protocol is almost definitely a bad idea. Since it's
> unauthenticated, the only way to guarantee security is to use a Unix socket
> (or some other only-locally-accessible method), at which point you already
> have the means of stopping the server and reading the config. Finally,
> stats should be fine if publicly available. In other words, the only useful
> thing the control protocol could be used for is reloading the configuration.

Eh... it is a Unix socket. The only actual purpose I added it was to
support configuration reloading, because doing that through SIGUSR1
would bring us to the signal minefield.

> Other than that, minor quirks, such as handleJSONCommand, a protocol
> function, being put in the Subscriber class.

Well, there are two classes doing almost the same thing, but having
incompatible interface due to being derived from different protocol
classes. I wanted to fix it first, but that would involve multiple
inheritance, so I decided just to offload the feature to Subscriber. Of
course, I could have created another class called JSONSubscriber for that.

As usual, patches welcome.

> And, of course, there's the issue of performance. Python doesn't handle
> threads, and since Twisted isn't multiprocess AFAIK, this might not be able
> to handle that many connections.

Well, the issue here is that you essentially have a simple program which
takes message from one port and then resends it to many others. Even if
threads would be of help here, Python works better with I/O-bound
multithreading than with other sorts.

> Finally, other than WebSocket and the socket interface, the one
> other subscription method we should have it some sort of HTTP hook call,
> i.e., it sends an HTTP request to the subscriber. This allows event-driven
> clients without having a socket constantly open.

I am not sure what exactly do you mean by that.


Thank you for your feedback.

-- Victor.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-10 Thread Matthew Flaschen
On 03/10/2013 07:53 PM, Brian Wolff wrote:
> Json standard is pretty clear that any character can be escaped using \u
>  code point or you can just have things be utf8. If clients break
> because they can't handle that, that is the client's fault. Its not a hard
> requirement.

Just a note, the JSON RFC (https://www.ietf.org/rfc/rfc4627.txt, section
3) explicitly allows any of the main Unicode encodings (UTF-32, UTF-16,
UTF-8) with both endiannesses (except of course UTF-8).

UTF-8 *is* the default encoding, and it's the best choice, but not the
only one.

Matt Flaschen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-10 Thread Tyler Romeo
On Sun, Mar 10, 2013 at 10:38 PM, Victor Vasiliev  wrote:

> > Finally, other than WebSocket and the socket interface, the one
> > other subscription method we should have it some sort of HTTP hook call,
> > i.e., it sends an HTTP request to the subscriber. This allows
> event-driven
> > clients without having a socket constantly open.
>
> I am not sure what exactly do you mean by that.


When a message is sent, it is delivered by the daemon submitting an HTTP
POST request to a registered client URI. This is a commonly used scheme for
push notification delivery, such as when using Amazon's notification
service.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-10 Thread Brian Wolff
On 2013-03-11 12:26 AM, "Tyler Romeo"  wrote:
>
> On Sun, Mar 10, 2013 at 10:38 PM, Victor Vasiliev 
wrote:
>
> > > Finally, other than WebSocket and the socket interface, the one
> > > other subscription method we should have it some sort of HTTP hook
call,
> > > i.e., it sends an HTTP request to the subscriber. This allows
> > event-driven
> > > clients without having a socket constantly open.
> >
> > I am not sure what exactly do you mean by that.
>
>
> When a message is sent, it is delivered by the daemon submitting an HTTP
> POST request to a registered client URI. This is a commonly used scheme
for
> push notification delivery, such as when using Amazon's notification
> service.
>
> *--*
> *Tyler Romeo*
> Stevens Institute of Technology, Class of 2015
> Major in Computer Science
> www.whizkidztech.com | tylerro...@gmail.com
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wait, so it just sends http post requests to some address until explicitly
told to stop? That sounds like an incredibly bad idea (if I understand it
correctly)

*if you forget to unsubscribe we send you post requests until the end of
eternity.
*dos vector - register someone you don't like's url. Register 100
variants from the same domain. Push enwikipedia's rc feed there.

In any case, I don't see the need to have every form of push api imaginable
implemented. Especially not initially but even in general.

-bawolff
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-10 Thread Tyler Romeo
On Sun, Mar 10, 2013 at 11:53 PM, Brian Wolff  wrote:

> *if you forget to unsubscribe we send you post requests until the end of
> eternity.
>

Have it cut off if it receives an invalid HTTP response.

*dos vector - register someone you don't like's url. Register 100
> variants from the same domain. Push enwikipedia's rc feed there.


Or you roll out some EC2 instances and open 100 sockets. (And before
you say rate-limit based on IP address, the same can be done for the HTTP
idea.)

In any case, I don't see the need to have every form of push api imaginable
> implemented. Especially not initially but even in general.


Agreed, but this is a pretty basic one. In fact, if you use HTTP keep
alive, it's almost identical to the TCP push method anyway, just that you
can use a web server rather than rolling your own socket client.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-11 Thread Brian Wolff
On 2013-03-11 1:11 AM, "Tyler Romeo"  wrote:
>
> *dos vector - register someone you don't like's url. Register 100
> > variants from the same domain. Push enwikipedia's rc feed there.
>
>
> Or you roll out some EC2 instances and open 100 sockets. (And before
> you say rate-limit based on IP address, the same can be done for the HTTP
> idea.)
>

I mean you could use such a service to DoS somebody else. If you can open
sockets, then its your own server.

Sure you could add some mechamism to prove you own the domain where you
want the rc updates to be sent, but things can get rather complex.

--bawolff
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-11 Thread Jeroen De Dauw
Hey,

Sure you could add some mechamism to prove you own the domain where you
> want the rc updates to be sent, but things can get rather complex.
>

Google uses, or at least used to use, the following to do exactly that:

On request provide a auth file to the user which includes some unique
identifier. Require this file to be made available via the domain in
question. Have the user point to the location where it is made available
and check if it is actually there. If so, domain authenticated.

That seems rather simple to create.

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-11 Thread Brian Wolff
On 2013-03-11 3:46 PM, "Jeroen De Dauw"  wrote:
>
> Hey,
>
> Sure you could add some mechamism to prove you own the domain where you
> > want the rc updates to be sent, but things can get rather complex.
> >
>
> Google uses, or at least used to use, the following to do exactly that:
>
> On request provide a auth file to the user which includes some unique
> identifier. Require this file to be made available via the domain in
> question. Have the user point to the location where it is made available
> and check if it is actually there. If so, domain authenticated.
>
> That seems rather simple to create.
>
> Cheers
>
> --
> Jeroen De Dauw
> http://www.bn2vs.com
> Don't panic. Don't be evil.
> --
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

I think that proves my point - what you describe is not what google does.
Google tells the user the path for the file (i believe the usual place is
in the root of the domain). The user does not pick the path. Otherwise I
could prove I own wikipedia (assuming mime types weren't checked) by using
action=raw.

Things that finiky to be made secure should be avoided imo.

-bawolff
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-11 Thread Jeroen De Dauw
Hey,

 what you describe is not what google does.

Google tells the user the path for the file (i believe the usual place is
> in the root of the domain). The user does not pick the path. Otherwise I
> could prove I own wikipedia (assuming mime types weren't checked) by using
> action=raw.
>

Good point, I remembered that wrong then.

I think that proves my point - what you describe is not what google does.
>

https://en.wikipedia.org/wiki/Fallacy_fallacy

The approach still seems simple. In fact, it seems more simple. So why
would we not want to use it?

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-11 Thread Tyler Romeo
Honestly, the solution could be as simple as requiring that the HTTP
response have a certain header or something.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-11 Thread Brian Wolff
On 2013-03-11 4:32 PM, "Tyler Romeo"  wrote:
>
> Honestly, the solution could be as simple as requiring that the HTTP
> response have a certain header or something.
>
> *--*
> *Tyler Romeo*
> Stevens Institute of Technology, Class of 2015
> Major in Computer Science
> www.whizkidztech.com | tylerro...@gmail.com
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Ok. I withdraw my security related objections :). Some sort of header based
checking to make sure the posts are wanted sounds sane (provided that very
initially a get request is used to verify this. Post requests to arbitrary
unverified urls can be dangerous.).

-bawolff
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Live recent changes feed

2013-03-11 Thread Tyler Romeo
On Mon, Mar 11, 2013 at 3:53 PM, Brian Wolff  wrote:

> (provided that very
> initially a get request is used to verify this. Post requests to arbitrary
> unverified urls can be dangerous.).
>

Totally agreed. If anything the GET request could be used to obtain initial
information about the client, such as which channels to subscribe to.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l