Re: [Wikitech-l] Feedback requested on our search APIs

2015-06-10 Thread Andre Klapper
On Wed, 2015-06-10 at 02:01 -0400, MZMcBride wrote:
 a list of a dozen features that are missing (search by file size,
 by color, by image file format, etc.).

Also see https://phabricator.wikimedia.org/T101089 and
https://phabricator.wikimedia.org/T101087


-- 
Andre Klapper | Wikimedia Bugwrangler
http://blogs.gnome.org/aklapper/


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Feedback requested on our search APIs

2015-06-10 Thread Lydia Pintscher
On Wed, Jun 10, 2015 at 8:01 AM, MZMcBride z...@mzmcbride.com wrote:
 I have two recurring thoughts about search lately, since you asked.

 First, multimedia search is absolutely horrible, basically non-existent.
 If you go to Wikimedia Commons and try its search functionality and then
 compare to any other media service on the Internet, you can quickly come
 up with a list of a dozen features that are missing (search by file size,
 by color, by image file format, etc.).

To really make this awesome we need structured data support for
Commons with Wikidata. We'll be making more progress on it in the
second half of this year but there is a lot to do.

snip

 Beyond these two points, it's vitally important that we able to
 arbitrarily query Wikidata soon. I'm hoping this functionality is live on
 Wikimedia wikis by the end of 2015. And speaking to APIs specifically, we
 really need to focus on projects such as Wiktionary and Wikisource that
 are desperately in need of API support to serialize and add structure to
 what is currently very fragile blobs of wikitext markup.

Please give feedback on the latest proposal for Wikidata support for
Wiktionary: 
https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2015-05


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Feedback requested on our search APIs

2015-06-10 Thread MZMcBride
Dan Garry wrote:
In line with that, one of our primary responsibilities is to ensure that
our search APIs are stable, fast, and easy to use. We'd love to hear from
the people that are using our APIs, so we can learn what you love about
them, what frustrates you, and what we can do to improve them for you.

I have two recurring thoughts about search lately, since you asked.

First, multimedia search is absolutely horrible, basically non-existent.
If you go to Wikimedia Commons and try its search functionality and then
compare to any other media service on the Internet, you can quickly come
up with a list of a dozen features that are missing (search by file size,
by color, by image file format, etc.).

Second, Wikimedia still hasn't aggregated and released anonymized search
data. People use Special:Search daily and they encounter a page of search
results instead of having a redirect take them to the appropriate
destination. Or sometimes worse there's no coverage at all of what our
users are searching for. It's a long tail, yes, but we could start filling
in gaps if we had data about what users are looking for. We could save
users a lot of time and build better sites by analyzing what users are
looking for and not finding or what they're looking for and not
immediately being redirected toward. And yes, of course, there are privacy
considerations (the infamous AOL case, c.), but nothing insurmountable.

Beyond these two points, it's vitally important that we able to
arbitrarily query Wikidata soon. I'm hoping this functionality is live on
Wikimedia wikis by the end of 2015. And speaking to APIs specifically, we
really need to focus on projects such as Wiktionary and Wikisource that
are desperately in need of API support to serialize and add structure to
what is currently very fragile blobs of wikitext markup.

MZMcBride

P.S. RIP, SAD.



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Feedback requested on our search APIs

2015-06-10 Thread Nikolas Everett
On Tue, Jun 9, 2015 at 2:19 AM, Gergo Tisza gti...@wikimedia.org wrote:

 On Mon, Jun 8, 2015 at 4:16 PM, Brian Wolff bawo...@gmail.com wrote:

  Additionally, from the help page, its not entirely clear about some of
  the limitations. e.g. You can't do incategory:Foo OR intitle:bar.
  regexes on intitle don't seem to work over the whole title, only word
  level tokens (I think, maybe? I'm a bit unclear on how the regex
  operator works).
 

 Being able to see a parse tree of the search expression would be nice, like
 with the parse/expandtemplates APIs. That would make it easier to find out
 whether the search fails because the query is parsed differently from what
 you imagined, or because there really is nothing to return.


You can _kindof_ get that now by adding the cirrusDumpQuery url parameter.
But it only dumps the query as sent by Cirrus to Elasticsearch and that
contains a query_string query that Elasticsearch (Lucene really) parses on
its own.

One interesting option would be to make a way for Cirrus to return
Elasticsearch's explain results. Its not perfect because it only explains
why things are found and scored the way they are but it doesn't explain why
things aren't found. Exporting the actual parsed query is more ambitious.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Feedback requested on our search APIs

2015-06-10 Thread Nikolas Everett
On Mon, Jun 8, 2015 at 7:16 PM, Brian Wolff bawo...@gmail.com wrote:

 You can't do incategory:Foo OR intitle:bar.
 regexes on intitle don't seem to work over the whole title, only word
 level tokens (I think, maybe? I'm a bit unclear on how the regex
 operator works).


intitle is word level though you can do phrase searching. Its pretty much
the same as a regular search but limited to the title field.
incategory:Foo OR intitle:Bar is a limitation I'm working on now. No idea
when it'll be avilable. Limitation comes from us trying to be cute with the
command parsing in Cirrus and not writing a whole grammar for the query
language.
Regexes only work for wikitext. This is a somewhat arbitrary decision on my
part - we need to made special ngram fields to accelerate the regex
searching and we only do that for wikitext. We _can_ do it for other fields
at the cost of update time and disk space.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Feedback requested on our search APIs

2015-06-10 Thread Brian Wolff

 To really make this awesome we need structured data support for
 Commons with Wikidata. We'll be making more progress on it in the
 second half of this year but there is a lot to do.


Sure, to really make that awsome, yeah you need wikidata. But we are
far away from hitting the point where we need wikidata. In fact the
three examples McBride gave don't need wikidata. mime type and file
size are easily programmaticly available already. And unless I'm
mistaken, functionally dependent metadata like algortihmically
determined main image colour, are out of scope of wikidata.

--bawolff

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Feedback requested on our search APIs

2015-06-10 Thread Lydia Pintscher
On Wed, Jun 10, 2015 at 7:36 PM, Brian Wolff bawo...@gmail.com wrote:
 To really make this awesome we need structured data support for
 Commons with Wikidata. We'll be making more progress on it in the
 second half of this year but there is a lot to do.

 Sure, to really make that awsome, yeah you need wikidata. But we are
 far away from hitting the point where we need wikidata. In fact the
 three examples McBride gave don't need wikidata. mime type and file
 size are easily programmaticly available already.

Yeah of course.

 And unless I'm
 mistaken, functionally dependent metadata like algortihmically
 determined main image colour, are out of scope of wikidata.

We've been thinking about this a bit but no decision has been made.
It'd be nice to make these accessible in the same way as other
properties without needing to store and maintain them the same way.
We've been thinking about some kind of fake properties for example.
But we'll worry about that when we get there.
We're getting a bit off-topic. Sorry, Dan.


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Feedback requested on our search APIs

2015-06-09 Thread Gergo Tisza
On Mon, Jun 8, 2015 at 4:16 PM, Brian Wolff bawo...@gmail.com wrote:

 Additionally, from the help page, its not entirely clear about some of
 the limitations. e.g. You can't do incategory:Foo OR intitle:bar.
 regexes on intitle don't seem to work over the whole title, only word
 level tokens (I think, maybe? I'm a bit unclear on how the regex
 operator works).


Being able to see a parse tree of the search expression would be nice, like
with the parse/expandtemplates APIs. That would make it easier to find out
whether the search fails because the query is parsed differently from what
you imagined, or because there really is nothing to return.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Feedback requested on our search APIs

2015-06-08 Thread S Page
On Mon, Jun 8, 2015 at 4:16 PM, Brian Wolff bawo...@gmail.com wrote:

 The search api (by which I mean query=search in api.php) is somewhat
 poorly documented. You have to dig to find
 https://www.mediawiki.org/wiki/Help:CirrusSearch .


I recently added https://www.mediawiki.org/wiki/API:Search_and_discovery
which clarifies the connection with Help:CirrusSearch, and mentions other
kinds of searching like geosearch.


 I would much prefer
 that the relavent documentation was including in the normal api.php
 auto-generated help.


https://gerrit.wikimedia.org/r/216899 changes the
'apihelp-query+search-param-search message' in
https://www.mediawiki.org/wiki/Special:ApiHelp/query+search to
*srsearch*

Search for page titles and page content that match this value. You can use
the search string to invoke special wiki search features, depending on what
its search backend implements.
But API query search can only use CirrusSearch features if it's installed.
I think Extension:CirrusSearch could handle the 'APIGetAllowedParams' hook
to modified this help text. If I understand correctly, it might be easier
to interpose WMF-specific help text that links to mw:Help:CirrusSearch in a
'wikimedia-apihelp-query+search-param-search' key in
extensions/WikimediaMessages/i18n/wikimediaoverrides/en.json ; I tried it
locally and it didn't work.



 Even better would be if that api allowed users to
 specify the options using normal url parameters, (as a separate
 options from using operators in the search string). Its also not
 entirely the most clear from the api that the search options differ
 depending on which extensions you have installed.


What do you mean? Beyone special terms in srsearch I'm not aware of any
changes to query+search's sr parameters depending on extensions.


 Additionally, from the help page, its not entirely clear about some of
 the limitations. e.g. You can't do incategory:Foo OR intitle:bar.
 regexes on intitle don't seem to work over the whole title, only word
 level tokens (I think, maybe? I'm a bit unclear on how the regex
 operator works).


Yes it's not a full reference.

-- 
=S Page  WMF Tech writer
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Feedback requested on our search APIs

2015-06-08 Thread Brian Wolff
On 6/8/15, Dan Garry dga...@wikimedia.org wrote:
 Do you use our search API? If so, I'd like to hear from you!

 The Discovery Department
 https://wikimediafoundation.org/wiki/Staff_and_contractors#Discovery at
 the Wikimedia Foundation is tasked with building a path of discovery to
 relevant and trusted knowledge. In line with that, one of our primary
 responsibilities is to ensure that our search APIs are stable, fast, and
 easy to use. We'd love to hear from the people that are using our APIs, so
 we can learn what you love about them, what frustrates you, and what we can
 do to improve them for you.

 I'd prefer that you keep the comments about the API itself rather than the
 relevance of the results it returns; I plan to start a separate thread
 about the result relevance, since they're separate topics.

 If you have some feedback, please reply in this thread or reach out to me
 privately.

 Thanks!

 Dan

 --
 Dan Garry
 Product Manager, Discovery
 Wikimedia Foundation
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

The search api (by which I mean query=search in api.php) is somewhat
poorly documented. You have to dig to find
https://www.mediawiki.org/wiki/Help:CirrusSearch . I would much prefer
that the relavent documentation was including in the normal api.php
auto-generated help. Even better would be if that api allowed users to
specify the options using normal url parameters, (as a separate
options from using operators in the search string). Its also not
entirely the most clear from the api that the search options differ
depending on which extensions you have installed.

Additionally, from the help page, its not entirely clear about some of
the limitations. e.g. You can't do incategory:Foo OR intitle:bar.
regexes on intitle don't seem to work over the whole title, only word
level tokens (I think, maybe? I'm a bit unclear on how the regex
operator works).

Cheers,
Brian

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Feedback requested on our search APIs

2015-06-08 Thread Brian Wolff
On 6/8/15, S Page sp...@wikimedia.org wrote:
 On Mon, Jun 8, 2015 at 4:16 PM, Brian Wolff bawo...@gmail.com wrote:

 The search api (by which I mean query=search in api.php) is somewhat
 poorly documented. You have to dig to find
 https://www.mediawiki.org/wiki/Help:CirrusSearch .


 I recently added https://www.mediawiki.org/wiki/API:Search_and_discovery
 which clarifies the connection with Help:CirrusSearch, and mentions other
 kinds of searching like geosearch.


Last I looked at the docs was about 6 months ago. Glad to hear they're
improving.


 I would much prefer
 that the relavent documentation was including in the normal api.php
 auto-generated help.


 https://gerrit.wikimedia.org/r/216899 changes the
 'apihelp-query+search-param-search message' in
 https://www.mediawiki.org/wiki/Special:ApiHelp/query+search to
 *srsearch*

 Search for page titles and page content that match this value. You can use
 the search string to invoke special wiki search features, depending on what
 its search backend implements.
 But API query search can only use CirrusSearch features if it's installed.
 I think Extension:CirrusSearch could handle the 'APIGetAllowedParams' hook
 to modified this help text. If I understand correctly, it might be easier
 to interpose WMF-specific help text that links to mw:Help:CirrusSearch in a
 'wikimedia-apihelp-query+search-param-search' key in
 extensions/WikimediaMessages/i18n/wikimediaoverrides/en.json ; I tried it
 locally and it didn't work.

It shouldn't be WMF specific (since its not WMF specific like TOS
links), it should be specific to CirrusSearch.

One possible implementation would be to do an override message (I
would note, that the wikimediaoverride messages aren't direct
overrides, they are replacement messages used by other code that does
the overriding). In my original email I was thinking more from a user
perspective of what I'd like to see, without thought to how it would
be implemented. Without looking at the code, I would probably favour
an extra hook just for the search module, instead of using the generic
hook.



 Even better would be if that api allowed users to
 specify the options using normal url parameters, (as a separate
 options from using operators in the search string). Its also not
 entirely the most clear from the api that the search options differ
 depending on which extensions you have installed.


 What do you mean? Beyone special terms in srsearch I'm not aware of any
 changes to query+search's sr parameters depending on extensions.


Yeah, that doesn't happen currently. I think it should be the case, it
would mesh much better with the mediawiki api if instead of doing
https://commons.wikimedia.org/w/api.php?action=querylist=searchsrsearch=Black+incategory:Felis_silvestris_catussrnamespace=6
you could do something like
https://commons.wikimedia.org/w/api.php?action=querylist=searchsrincategory=Felis_silvestris_catussrsearch=Blacksrnamespace=6
. Especially if all the parameters were documented in the normal api
way, I think it would represent a big boon to discovering the hidden
features of search. (I appreciate it might be a lot of work to express
all the search options possible, but the original email sounded like
it wanted a wishlist).

--
bawolff

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l