Smalyshev added a comment.

We could probably pre-process the input, yes. Though I am not sure we should encourage these things... while something like case-insensitive match is common search functionality, pasting URLs etc. with magic rules seems to be going a bit too far. But if old one supported it, fine. I wish there were some docs though or tests which extra syntaxes we process. So far I've got:

  • URL - should it match only wikidata URL? Would http://google.com/Q42 also look for Q42 (which is kinda weird)?
  • Parens removal - (Q42) is the same as Q42. Should it work for others too, e.g. (P42)? (L42)? (Douglas Adams)? Should (Q42 and Q42)))) and ())Q42() also work?

If this fails, the regex /.*(\b\w{2,})/s tried to grep the last ASCII sequence from the users input, and parse that as an entity ID

That's not how Elastic search currently works, in general, we don't have ID parsing etc. In fact, Elastic index has no idea what "ID" is. We match the title against Elastic index. We could add an extra step of ID-parsing, but this complicates things quite a bunch, since if we don't run the search we don't have the necessary data. We could parse ID and then extract it and run the search on it, but that looks like duplicating the work. Anyway, if we have pre-processing rules, we could probably do it.


TASK DETAIL
https://phabricator.wikimedia.org/T179061

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: gerritbot, hoo, thiemowmde, Smalyshev, Multichill, Aklapper, Lydia_Pintscher, Lahi, Lordiis, GoranSMilovanovic, Adik2382, Jrbranaa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, Avner, Lewizho99, Maathavan, debt, Gehel, Jdrewniak, FloNight, Wikidata-bugs, aude, jayvdb, Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to