[Wikidata-bugs] [Maniphest] T332953: Migrate PipelineLib repos to GitLab

2023-04-18 Thread Eevans
Eevans added a comment.


  I can be the point of contact for mediawiki/services/kask 
<https://gerrit.wikimedia.org/g/mediawiki/services/kask>, and am ready when you 
are!

TASK DETAIL
  https://phabricator.wikimedia.org/T332953

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Eevans
Cc: Eevans, Seddon, MSantos, kevinbazira, odimitrijevic, BTullis, Ottomata, 
calbon, fgiunchedi, WMDE-leszek, leila, fkaelin, ItamarWMDE, elukey, 
KartikMistry, santhosh, Martaannaj, sbassett, bking, bd808, Ladsgroup, Krinkle, 
Legoktm, tstarling, Physikerwelt, dcausse, Jdrewniak, taavi, hnowlan, 
Michaelcochez, cjming, Jdforrester-WMF, dduvall, Aklapper, thcipriani, 
Bellucii32, Themindcoder, Stevemunene, Adamm71, Jersione, Itsmeduncan, 
Hellket777, Cleo_Lemoisson, Brielikethecheese, LisafBia6531, JArguello-WMF, 
Astuthiodit_1, Atieno, 786, EChetty, TheReadOnly, Biggs657, karapayneWMDE, 
toberto, joanna_borun, Simonmaignan, Invadibot, DAbad, MPhamWMF, Devnull, 
maantietaja, Juan90264, Muchiri124, Confetti68, Anerka, Alter-paule, Beast1978, 
CBogen, Un1tY, Nintendofan885, Akuckartz, Otr500, Hook696, WDoranWMF, Ddurigon, 
MJL, Kent7301, brennen, Mateo1977, EvanProdromou, joker88john, Legado_Shulgin, 
ReaperDawn, CucyNoiD, Nandana, NebulousIris, Namenlos314, aezell, 
skpuneethumar, Gaboe420, Zylc, Giuliamocci, Davinaclare77, Abdeaitali, 
Cpaulf30, 1978Gage2001, Techguru.pc, Lahi, Operator873, Gq86, Af420, Xinbenlv, 
Vacio, Sharvaniharan, Bsandipan, scblr, Xover, GoranSMilovanovic, SPoore, 
TBolliger, Chicocvenancio, Hfbn0, QZanden, EBjune, Tbscho, Taquo, LawExplorer, 
catalandres, Eginhard, Lewizho99, Zppix, JJMC89, Maathavan, TerraCodes, DDJJ, 
_jensen, rosalieper, Agabi10, PEarleyWMF, Neuronton, RuyP, Liudvikas, 
Scott_WUaS, Pchelolo, Karthik_sripal, Izno, Wong128hk, Luke081515, Bsadowski1, 
Niharika, Wikidata-bugs, Jitrixis, aude, Bawolff, Dbrant, Dinoguy1000, 
Gryllida, Lydia_Pintscher, faidon, Grunny, ssastry, scfc, Alchimista, Arlolra, 
csteipp, Mbch331, Jay8g, Krenair
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T204024: Store WikibaseQualityConstraint check data in persistent storage instead of in the cache

2022-12-01 Thread Eevans
Eevans removed a project: Cassandra.

TASK DETAIL
  https://phabricator.wikimedia.org/T204024

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Eevans
Cc: ItamarWMDE, WMDE-leszek, Lydia_Pintscher, Pintoch, Tpt, Smalyshev, Eevans, 
daniel, mobrovac, Jonas, Lucas_Werkmeister_WMDE, Aklapper, Addshore, 
Astuthiodit_1, karapayneWMDE, joanna_borun, Invadibot, Devnull, maantietaja, 
Muchiri124, Akuckartz, Eihel, holger.knust, RhinosF1, Legado_Shulgin, 
ReaperDawn, Nandana, Davinaclare77, Techguru.pc, Lahi, Gq86, GoranSMilovanovic, 
Hfbn0, QZanden, Esc3300, merbst, LawExplorer, Zppix, _jensen, rosalieper, 
Agabi10, Scott_WUaS, Pchelolo, Wong128hk, abian, Hardikj, Wikidata-bugs, aude, 
faidon, Mbch331, Jay8g, fgiunchedi, LSobanski
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] [Updated] T178445: flapping monitoring for recommendation_api on scb

2020-01-10 Thread Eevans
Eevans edited projects, added Core Platform Team Workboards (Clinic Duty Team); 
removed Core Platform Team.

TASK DETAIL
  https://phabricator.wikimedia.org/T178445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Eevans
Cc: jcrespo, brennen, Joe, Volans, mobrovac, Smalyshev, Gehel, Stashbot, 
Aklapper, Dzahn, darthmon_wmde, ET4Eva, Legado_Shulgin, Nandana, Davinaclare77, 
Qtn1293, Techguru.pc, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, 
Chicocvenancio, Th3d3v1ls, Hfbn0, QZanden, EBjune, LawExplorer, Avner, Zppix, 
_jensen, rosalieper, Scott_WUaS, Pchelolo, FloNight, Wong128hk, Eevans, 
Hardikj, Wikidata-bugs, aude, Capt_Swing, faidon, Mbch331, Rxy, Jay8g, 
fgiunchedi, WDoranWMF, holger.knust, EvanProdromou, Agabi10
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T204024: Store WikibaseQualityConstraint check data in an SQL table instead of in the cache

2018-09-12 Thread Eevans
Eevans added a comment.

In T204024#4576751, @daniel wrote:
This use case seems similar to caching parsoid HTML, which is done in RESTbase and backed by Cassandra. It's similar, because it's re-generated upon edit, and accessed from clients upon view, via an API. It's also similar in that losing this data is not absolutely critical, as it can be regenerated, but having to re-generate  all of it may cause a problematic spike in load on application servers (and databases and the query service).

However, in contrast to the parsoid use case, information does not need to be  stored for old revisions.

As  to the model: the wikidata folks will have the details, but as far as I'm aware, it's a JSON blob for each Wikidata entity (items, properties, etc). Granularity could be increased to per-statement blobs.

Puring is, as far as I known, currently only done per edit of the subject. However,  use cases for bulk purges exist (in particular, when constraints definitions change), but they are just ignored at the moment, as  far as I know. I could be wrong about that, though.


If I understand the above correctly, we're saying that this is strictly key/value, where the key is an entity ID, and the value an opaque JSON blob.  When the subject is edited, the value is overwritten with the most recent constraint check.  And when the format of constraint definitions change, we need to be able to bulk purge previous entries in the obsolete format.  Is this correct?

Some additional questions...

An opaque k/v store won't allow anything but discrete lookup by entity ID, how are violations queried?  In other words, this seems to only be a small part of the larger model, what does that look like, and why are we creating this separation (i.e. what problem does this solve)?

Numbers regarding total number of entities, and the size of the values will be important of course, but perhaps most important will be some idea about access patterns.  How frequently will entities be (over)written?  How often read?  I realize the answer to this is probably a distribution, and that this may involve some educated guess work.

What happens if constraint definitions change?  Are we able to wholesale drop the older ones?  Is the constraint check inlined on a miss, and is the latency (and additional load) under such circumstances acceptable?  Or will some sort of transition be needed where we fall back to the older check when that's available, and replace them gradually?

I'll probably have more questions.TASK DETAILhttps://phabricator.wikimedia.org/T204024EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: EevansCc: Eevans, daniel, mobrovac, Jonas, Lucas_Werkmeister_WMDE, Aklapper, Addshore, Lahi, Gq86, GoranSMilovanovic, QZanden, merbst, LawExplorer, Agabi10, Hardikj, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2016-01-12 Thread Eevans
Eevans added a comment.

In https://phabricator.wikimedia.org/T116247#1927791, @Ottomata wrote:

> I believe we can close this task, ja?  Got a few defined here: 
> https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema/mediawiki


+1


TASK DETAIL
  https://phabricator.wikimedia.org/T116247

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mobrovac, Eevans
Cc: RobLa-WMF, Nuria, gerritbot, intracer, EBernhardson, Smalyshev, yuvipanda, 
Hardikj, daniel, aaron, GWicke, mobrovac, MZMcBride, bd808, JanZerebecki, 
Halfak, Krenair, brion, chasemp, Eevans, mmodell, Ottomata, Matanya, Aklapper, 
JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, 
Mbch331, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-12-01 Thread Eevans
Eevans added a comment.

In https://phabricator.wikimedia.org/T116247#1839888, @Ottomata wrote:

> @gwicke and I discussed the schema/revision in meta issue in IRC today. He 
> had an idea that I quite like!
>
> @gwicke suggested that instead of using (schema, revision) to uniquely ID a 
> schema, that we just use a URI.  EventLogging does this already with schemas 
> stored in meta.wikmedia.org, but the URI resolution is done behind the 
> scenes.  Explicitly setting meta.schema to a URI in each event allows us to 
> easily look up a schema outside of any EventLogging/EventBus context.  I 
> believe it would be easy to support this in EventLogging code as long as 
> extracting the schema name and revision from the URI is standardized.  
> Whatever the URI is, its last two path elements should be name/revision, e.g. 
> `.../schemas/jsonschema/{title}/{rev}`.
>
> This would certainly solve the issues that @nuria and I had about not 
> including schema ids in the events.
>
> Thoughts?  I'll look into the implementation of this tomorrow to make sure 
> there isn't something that would make this difficult.


I like it.


TASK DETAIL
  https://phabricator.wikimedia.org/T116247

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mobrovac, Eevans
Cc: Nuria, gerritbot, intracer, EBernhardson, Smalyshev, yuvipanda, Hardikj, 
daniel, aaron, GWicke, mobrovac, MZMcBride, bd808, JanZerebecki, Halfak, 
Krenair, brion, chasemp, Eevans, mmodell, Ottomata, Matanya, Aklapper, 
JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, 
RobLa-WMF, Mbch331, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-26 Thread Eevans
Eevans added a comment.

In https://phabricator.wikimedia.org/T116247#1749452, @Ottomata wrote:

> Right, but how would you do this in say, Hive?  Or in bash?


In bash:

  $ sudo apt-get install uuid
  $ ID=$(uuid -v 1)
  $ grep "content: time" <(uuid -d $ID)
  content: time:  2015-10-26 15:16:20.026434.0 UTC

In Java (applicable to Hive?):

  import java.util.Date;
  import java.util.UUID;
  
  public class Time {
  public static void main(String...args) {
  UUID id = UUID.fromString(args[0]);
  double timestamp = (id.timestamp() - 0x01b21dd213814000L)*100/1e6;
  System.out.println(new Date((long)timestamp));
  }
  }

Anyway, I don't object to including the redundant iso8601 timestamp, I just 
wanted to make sure it was clear that it's not at all difficult to extract a 
timestamp from a v1 UUID (and even less onerous when you figure that code like 
this would be tucked away in a helper somewhere).


TASK DETAIL
  https://phabricator.wikimedia.org/T116247

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Eevans
Cc: EBernhardson, Smalyshev, yuvipanda, Hardikj, daniel, aaron, GWicke, 
mobrovac, MZMcBride, bd808, JanZerebecki, Halfak, Krenair, brion, chasemp, 
Eevans, mmodell, Ottomata, Mattflaschen, Matanya, Aklapper, JAllemandou, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, RobLa-WMF, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-23 Thread Eevans
Eevans added a comment.

In https://phabricator.wikimedia.org/T116247#1748095, @Ottomata wrote:

> > So the producer would store the same time stamp twice? UUID v1 already 
> > contains it.
>
>
> Could you provide an example of what this UUID would look like?
>
> A reason for having a timestamp only field is so that applications can use it 
> for time based logic without having to also know how to extract the timestamp 
> out of an overloaded uuid.


Using Python as an example (and sticking strictly to what's in the standard 
lib):

  from uuid import uuid1
  
  u = uuid1()
  
  print datetime.datetime.fromtimestamp((u.time - 0x01b21dd213814000L)*100/1e9)

The constant `0x01b21dd213814000` represents the number of 100-ns units between 
the epoch that UUIDs use (1582-10-15 00:00:00), and the standard unix epoch.


TASK DETAIL
  https://phabricator.wikimedia.org/T116247

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Eevans
Cc: EBernhardson, Smalyshev, yuvipanda, Hardikj, daniel, aaron, GWicke, 
mobrovac, MZMcBride, bd808, JanZerebecki, Halfak, Krenair, brion, chasemp, 
Eevans, mmodell, Ottomata, Mattflaschen, Matanya, Aklapper, JAllemandou, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, RobLa-WMF, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T114443: EventBus MVP

2015-10-16 Thread Eevans
Eevans added a comment.

In https://phabricator.wikimedia.org/T114443#1731284, @GWicke wrote:

> In https://phabricator.wikimedia.org/T114443#1730753, @Eevans wrote:
>
> > 1. Already leverages a (really slick) JSON schema registry 
> > <https://meta.wikimedia.org/wiki/Category:Schemas_%28active%29?status=active>
>
>
> Optionally fetching schemas from a URL isn't that hard really. Example code:
>
>   if (/^https?:\/\//.test(schema)) {
> return preq.get(schema);
>   } else {
> return readFromFile(schema);
>   }
>
>
> This lets us support files for core events, and fetching schemas from meta 
> for EL. Schema validation is a call to a library.


The main reason that I listed this as a benefit, is because I don't understand 
why we need to distinguish between classes of events in this way (at the 
architectural level).  Since EL already has an answer for schema registry, it 
seemed like an advantage.

However, if we assume that we need an additional class of in-tree schemas, then 
the inverse is also true; It would be just as trivial to implement reading from 
the filesystem.

> > 1. Provides a pluggable, composable, architecture with support for a wide 
> > range of readers/writers

> 

> 

> How would this be an advantage for the EventBus portion? Many third-party 
> users will actually only want a minimal event bus, and EL doesn't seem to 
> help with this from what I have seen.


For starters, it means that we have alternatives for environments where Kafka 
is overkill (small third-party installations, dev environments, mw-vagrant, 
etc).  Using, for example, sqlite instead of Kafka is already something 
supported.

There is also a tremendous amount of flexibility here, and even if we assume 
that we need none of that now, it's impossible to assume we never will.  Having 
the ability to compose arbitrary event stream topologies, from/to a wide 
variety of sources/sinks, multiplex, and add in-line processing, sounds like a 
great set of capabilities to base such a project on.

> > - schema registry availability

> 

> 

> There are more concerns here than just availability (although that's 
> important, too).

> 

> Third party users won't necessarily want to give their service access to the 
> internet in order to fetch schemas. We need to provide a way to retrieve a 
> full set of core schemas, and a git repository is an easy way to achieve this.


Third parties could use our schema registry, or use the same extension we do, 
to host one of their own.  Or, (as mentioned elsewhere), we could export 
snapshots of the relevant schemas via CI to ship along side the code (this 
seems safe, as a revision is immutable).

> We also need proper code review and versioning for core schemas, and wikis 
> don't really support code review. We could consider storing pointers to 
> schemas (URLs) instead of the actual schemas in git, but this adds complexity 
> without much apparent benefit:


I would say that both versioning and review are well covered here.  I get your 
point that it's not as specialized as code review tooling might be, but wikis 
are an established means for collaboration.

> Workflow with schemas in git:

> 

> 1. create a patch with a schema change

> 2. code review

> 

>   Workflow with pointers to schemas (URLs) in git:

> 3. save a new schema on meta; note revision id

> 4. create a patch with a schema URL change

> 5. code review


That doesn't seem too onerous to me.

> > For performance, it needs to be Good Enough(tm), where Good Enough should 
> > be something we can quantify based on factors like latency, throughput, and 
> > capacity costs that aren't prohibitively expensive when weighed against 
> > other factors (e.g. engineering effort).

> 

> 

> See https://phabricator.wikimedia.org/T88459#1604768. tl;dr: It's not 
> necessarily clear that saving very little code (see above) for EL schema 
> fetching outweights the cost of additional hardware.


I always find these things difficult to quantify.  There are so many variables. 
 If hypothetically speaking, it only saved us a week, what is that worth?  What 
could we do with another week (lost opportunity costs)?

Also, how do you quantify the value of using a piece of software that other 
teams are already using?  Where you have a wider set of active developers, and 
more eyes on it?  Where ops is already familiar with it?

I don't pretend to know the answers to these.


TASK DETAIL
  https://phabricator.wikimedia.org/T114443

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ottomata, Eevans
Cc: mark, MZMcBride, Krinkle, EBernhardson, bd808, Joe, dr0ptp4kt, madhuvishy, 
Nuria, ori, faidon, 

[Wikidata-bugs] [Maniphest] [Commented On] T114443: EventBus MVP

2015-10-16 Thread Eevans
Eevans added a comment.

In https://phabricator.wikimedia.org/T114443#1726139, @Eevans wrote:

> I've expanded upon @gwicke's prototype a bit, progress here: 
> https://github.com/wikimedia/restevent


Actually, after some additional research, I think that we should strongly 
consider the approach that @Ottomata was advocating for here 
<https://gerrit.wikimedia.org/r/#/c/235671/2>, namely that we base this service 
off the existing `eventlogging` Python module 
<https://phabricator.wikimedia.org/diffusion/EEVL/browse/master/server/>.

Rationale (in no particular order):

1. Already leverages a (really slick) JSON schema registry 
<https://meta.wikimedia.org/wiki/Category:Schemas_%28active%29?status=active>
2. Already handles validation against said registry
3. Provides a pluggable, composable, architecture with support for a wide range 
of readers/writers
4. Builds on an existing code base that is already used in production

TL;DR It looks like the quickest, easiest route to meeting the MVP objectives, 
(plus a number of bonus objectives, for free).

From https://phabricator.wikimedia.org/T88459 (starting about here 
<https://phabricator.wikimedia.org/T88459#1601022>, I think), the objections as 
I understand them seem to be:

1. schema registry availability
2. performance

Both of which I think are tractable.

For schema registry, it should be relatively straightforward to make this 
service more available.

For performance, it needs to be Good Enough(tm), where Good Enough should be 
something we can quantify based on factors like latency, throughput, and 
capacity costs that aren't prohibitively expensive when weighed against other 
factors (e.g. engineering effort).


TASK DETAIL
  https://phabricator.wikimedia.org/T114443

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ottomata, Eevans
Cc: mark, MZMcBride, Krinkle, EBernhardson, bd808, Joe, dr0ptp4kt, madhuvishy, 
Nuria, ori, faidon, aaron, GWicke, mobrovac, Eevans, Ottomata, Matanya, 
Aklapper, JAllemandou, jkroll, Smalyshev, Hardikj, Wikidata-bugs, Jdouglas, 
RobH, aude, Deskana, Manybubbles, daniel, JanZerebecki, RobLa-WMF, Jay8g, 
fgiunchedi, Dzahn, jeremyb, Legoktm, chasemp, Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T114443: EventBus MVP

2015-10-14 Thread Eevans
Eevans added a comment.

I've expanded upon @gwicke's prototype a bit, progress here: 
https://github.com/wikimedia/restevent


TASK DETAIL
  https://phabricator.wikimedia.org/T114443

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ottomata, Eevans
Cc: mark, MZMcBride, Krinkle, EBernhardson, bd808, Joe, dr0ptp4kt, madhuvishy, 
Nuria, ori, faidon, aaron, GWicke, mobrovac, Eevans, Ottomata, Matanya, 
Aklapper, JAllemandou, jkroll, Smalyshev, Hardikj, Wikidata-bugs, Jdouglas, 
RobH, aude, Deskana, Manybubbles, daniel, JanZerebecki, RobLa-WMF, Jay8g, 
fgiunchedi, Dzahn, jeremyb, Legoktm, chasemp, Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T114443: EventBus MVP

2015-10-06 Thread Eevans
Eevans added a comment.

> A message queue is not a database, it's a router. ...


Of course, but I drew the analogy because in both cases you have readers and 
writers of structured data.  That this particular use case is log structured 
instead of object relational doesn't change that aspect.


TASK DETAIL
  https://phabricator.wikimedia.org/T114443

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Eevans
Cc: EBernhardson, bd808, Joe, dr0ptp4kt, madhuvishy, Nuria, ori, faidon, aaron, 
GWicke, mobrovac, Eevans, Ottomata, Matanya, Aklapper, JAllemandou, jkroll, 
Smalyshev, Hardikj, Wikidata-bugs, Jdouglas, RobH, aude, Deskana, Manybubbles, 
mark, JanZerebecki, RobLa-WMF, fgiunchedi, Dzahn, jeremyb, chasemp, Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T114443: EventBus MVP

2015-10-05 Thread Eevans
Eevans added a comment.

In https://phabricator.wikimedia.org/T114443#1701296, @Joe wrote:

> Apart from the concerns on a practical use case which I agree with, I have a 
> big doubt about the implementation idea:
>
> I am in general a fan of the paradigm that it's better to beg for forgiveness 
> than to ask for permission, and of Postel's robustness principle, so I don't 
> really see what use a service in front of kafka would serve us, apart from 
> introducing another software that could fail and some latency.
>
> Messages we send onto kafka will be anyways verified on the receiving end 
> (considering them "trusted" would be foolish), so we will need to write 
> validation libraries in basically all the languages we will consume our data 
> from; this is the standard way to build communications protocols and I don't 
> see a good reason for introducing a level of indirection here.


Why is this, why would they //need// to be verified on the receiving end?

I see this as being somewhat analogous to a database.  In any database you 
//could// store your data opaquely, allow each client to marshal it according 
to some shared notion of schema, and then have every client validate (the 
untrustworthy input) on read, but how is that better?  If the data is 
structured according to a well defined schema, why not let the system 
persisting it apply those constraints on write?  Assuming the goal is to 
disseminate these events to an arbitrary number of independently implemented 
systems, it seems the latter approach would provided better guarantees about 
the integrity of the data, and eliminate a lot of redundancy among 
implementations.

> So, I have two questions I'd like an answer to:

> 

> - What is the advantage of having a service validate messages before they get 
> into the queue (Kafka or other doesn't really matter)


It assures a single consistent set of constraints on events, independent of the 
various producer/consumer implementations.

> - Why building libraries that do the validations based on shared schemas not 
> enough?


A service provides a single high level abstraction that hides the details of 
the underlying implementation (allowing said implementation to be transparently 
changed), eliminates redundancy among implementations, and prevents a single 
buggy consumer from propagating corrupt events to all consumers.


TASK DETAIL
  https://phabricator.wikimedia.org/T114443

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Eevans
Cc: EBernhardson, bd808, Joe, dr0ptp4kt, madhuvishy, Nuria, ori, faidon, aaron, 
GWicke, mobrovac, Halfak, Eevans, Ottomata, Matanya, Aklapper, JAllemandou, 
jkroll, Smalyshev, Hardikj, Wikidata-bugs, Jdouglas, RobH, aude, Deskana, 
Manybubbles, mark, JanZerebecki, RobLa-WMF, fgiunchedi, Dzahn, jeremyb, 
chasemp, Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs