[Wikidata-bugs] [Maniphest] [Commented On] T143706: Implement TermLookup based on Elastic

2018-05-30 Thread daniel
daniel added a comment.
To enable this, the entry for 'PrefetchingTermLookup' in data-access/src/PerRepositoryServiceWiring.php needs to be changed to instantiate an ElasticTermLookup instead of a BufferingTermLookup, depending on config.TASK DETAILhttps://phabricator.wikimedia.org/T143706EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, danielCc: gerritbot, WMDE-leszek, Smalyshev, hoo, Aklapper, aude, JeroenDeDauw, Tobi_WMDE_SW, thiemowmde, adrianheine, Lydia_Pintscher, Ricordisamoa, daniel, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, LawExplorer, Vali.matei, Lewizho99, Maathavan, Volker_E, Wikidata-bugs, GWicke, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143706: Implement TermLookup based on Elastic

2018-05-30 Thread daniel
daniel added a comment.
Note: there are several places in the code that call TermIndex::getTermsOfEntities(). These would need to be converted to using a TermLookup in order to benefit from this.TASK DETAILhttps://phabricator.wikimedia.org/T143706EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, danielCc: gerritbot, WMDE-leszek, Smalyshev, hoo, Aklapper, aude, JeroenDeDauw, Tobi_WMDE_SW, thiemowmde, adrianheine, Lydia_Pintscher, Ricordisamoa, daniel, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, LawExplorer, Vali.matei, Lewizho99, Maathavan, Volker_E, Wikidata-bugs, GWicke, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143706: Implement TermLookup based on Elastic

2018-03-16 Thread gerritbot
gerritbot added a comment.
Change 416641 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Implement TermLookup with ElasticSearch

https://gerrit.wikimedia.org/r/416641TASK DETAILhttps://phabricator.wikimedia.org/T143706EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: gerritbot, WMDE-leszek, Smalyshev, hoo, Liuxinyu970226, Aklapper, aude, JeroenDeDauw, Tobi_WMDE_SW, thiemowmde, adrianheine, Lydia_Pintscher, Ricordisamoa, daniel, Majesticalreaper22, Ahmed123, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, LawExplorer, Vali.matei, Lewizho99, Maathavan, Volker_E, Wikidata-bugs, GWicke, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143706: Implement TermLookup based on Elastic

2018-03-05 Thread gerritbot
gerritbot added a comment.
Change 416641 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/Wikibase@master] [WIP] Implement TermLookup with ElasticSearch

https://gerrit.wikimedia.org/r/416641TASK DETAILhttps://phabricator.wikimedia.org/T143706EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: gerritbot, WMDE-leszek, Smalyshev, hoo, Liuxinyu970226, Aklapper, aude, JeroenDeDauw, Tobi_WMDE_SW, thiemowmde, adrianheine, Lydia_Pintscher, Ricordisamoa, daniel, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, LawExplorer, Vali.matei, Volker_E, Wikidata-bugs, GWicke, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143706: Implement TermLookup based on Elastic

2018-03-05 Thread Smalyshev
Smalyshev added a comment.
Another API quirk: getLabel(s) and getDescription(s) return null when the term does not exist. getPrefetchedTerm returns null when term has not been prefetched and false when it does not exist. Probably needs to be unified, otherwise it is annoying to integrate those two.TASK DETAILhttps://phabricator.wikimedia.org/T143706EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: WMDE-leszek, Smalyshev, hoo, Liuxinyu970226, Aklapper, aude, JeroenDeDauw, Tobi_WMDE_SW, thiemowmde, adrianheine, Lydia_Pintscher, Ricordisamoa, daniel, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, LawExplorer, Vali.matei, Volker_E, Wikidata-bugs, GWicke, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143706: Implement TermLookup based on Elastic

2018-03-05 Thread thiemowmde
thiemowmde added a comment.

@Smalyshev wrote:
I don't see where term types are defined […]


That's exactly the issue I referred to. These "term types" being strings is mostly an implementation detail of how the wb_terms table is designed. The currently specified term types can be found in TermIndexEntry. Please use these constants to make it easier to find places working with these term types.


TermIndexEntry::TYPE_LABEL = "label"
TermIndexEntry::TYPE_ALIAS = "alias"
TermIndexEntry::TYPE_DESCRIPTION = "description"


but could there be more?

We played around with ideas to have other types for #lexicographical_data, but we are most probably not going to do this. So these 3 types are all that exist.

Is Elastic implementation supposed to support them […]

Please either return no result if a caller asks for an unknown term type, or throw an exception.

Also, there's an overlap of coverage - you can use getLabel or you can use getPrefetchedTerm(... 'label' ...).

This is just one of the many problems of this combination of older and newer interfaces. Please focus on the more narrow methods (everything without $termType) if possible.

code dealing with prefetching, buffering, resolving different repos […] should all this code be reimplemented for ElasticSearch implementation of prefetchTerms or can it be somehow reused? All that code relies on TermBuffer/TermIndex now.

We might need to split the existing implementations. But I believe it is easier to start over and not touch the existing implementations. For example, you can start by implementing nothing but a TermLookup. If this works, you can add a small TermBuffer implementation on top that allows for prefetching. Since TermBuffer alone is not very useful, you might need to use the PrefetchingTermLookup interface as a decorator instead:

class PrefetchingElasticTermLookup implements PrefetchingTermLookup {

	private $lookup;

	public function __construct( ElasticTermLookup $lookup ) {
		$this->lookup = $lookup;
	}

	public function prefetchTerms( array $entityIds, array $termTypes = null, array $languageCodes = null ) {
		/* TBD */
	}

	public function getPrefetchedTerm( EntityId $entityId, $termType, $languageCode ) {
		switch ( $termType ) {
			case TermIndexEntry::TYPE_LABEL:
return $this->getLabel( $entityId, $languageCode );
			case TermIndexEntry::TYPE_DESCRIPTION:
return $this->getDescription( $entityId, $languageCode );
			default:
throw new \InvalidArgumentException( "Not defined for \$termType \"$termType\"" );
		}
	}

	public function getLabel( EntityId $entityId, $languageCode ) {
		$this->lookup->getLabel( $entityId, $languageCode );
	}

	public function getLabels( EntityId $entityId, array $languageCodes ) {
		$this->lookup->getLabels( $entityId, $languageCodes );
	}

	public function getDescription( EntityId $entityId, $languageCode ) {
		$this->lookup->getDescription( $entityId, $languageCode );
	}

	public function getDescriptions( EntityId $entityId, array $languageCodes ) {
		$this->lookup->getDescriptions( $entityId, $languageCodes );
	}

}



TermLookup methods are supposed to be dealing with entities from federated repos too, and also working on the WikibaseClient side […]

This might indeed be needed. However, I suggest to not care for these for the moment. We can add such functionality later via dispatchers, decorators, and such.TASK DETAILhttps://phabricator.wikimedia.org/T143706EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: thiemowmdeCc: WMDE-leszek, Smalyshev, hoo, Liuxinyu970226, Aklapper, aude, JeroenDeDauw, Tobi_WMDE_SW, thiemowmde, adrianheine, Lydia_Pintscher, Ricordisamoa, daniel, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, LawExplorer, Vali.matei, Volker_E, Wikidata-bugs, GWicke, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143706: Implement TermLookup based on Elastic

2018-03-02 Thread Smalyshev
Smalyshev added a comment.
So if I look into PrefetchingTermLookup, it's a sum of TermBuffer and TermLookup, which is:

public function prefetchTerms( array $entityIds, array $termTypes = null, array $languageCodes = null )
public function getPrefetchedTerm( EntityId $entityId, $termType, $languageCode )
public function getLabel( EntityId $entityId, $languageCode );
public function getLabels( EntityId $entityId, array $languageCodes );
public function getDescription( EntityId $entityId, $languageCode );
public function getDescriptions( EntityId $entityId, array $languageCodes );

I don't see where term types are defined and how they relate to labels/descriptions - though I guess 'label' is type for labels and 'description' for descriptions, but could there be more? Is Elastic implementation supposed to support them or should throw if they're encountered? There are constants in TermIndexEntry, but I assume those are implementation details for SQL part and I should stay away from them?EntityTermLookupBase seems to just use plain strings, not constants. Is it ok to keep doing this?

Also, there's an overlap of coverage - you can use getLabel or you can use getPrefetchedTerm(... 'label' ...). When which one is supposed to be used? Should getLabel use prefetched data if available?

I see there's a bunch of code dealing with prefetching, buffering, resolving different repos etc. organizing - should all this code be reimplemented for ElasticSearch implementation of prefetchTerms or can it be somehow reused? All that code relies on TermBuffer/TermIndex now.

As I understand, now TermLookup methods are supposed to be dealing with entities from federated repos too, and also working on the WikibaseClient side where there's no repo config. However, if we want to have ElasticSearch implementation, this means we need access to configuration of target repo, and more specifically, to its CirrusSearch settings (at least basic ones like index names, cluster names, etc. - we probably don't need search profiles for just fetching labels). I am not sure those are easily available. So do we need this capability for ElasticSearch implementation? If yes, we'd need to figure out how to get the right configs... Or we could duplicate config in client configs - that would probably work for index names - which are also very conventional and derived from wiki names, so we could also create them probably - but connection settings still need to come from somewhere, unless we assume everybody uses the same endpoint. Which is I think the case in production, but not sure how safe it is to rely on it.TASK DETAILhttps://phabricator.wikimedia.org/T143706EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: WMDE-leszek, Smalyshev, hoo, Liuxinyu970226, Aklapper, aude, JeroenDeDauw, Tobi_WMDE_SW, thiemowmde, adrianheine, Lydia_Pintscher, Ricordisamoa, daniel, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, LawExplorer, Vali.matei, Volker_E, Wikidata-bugs, GWicke, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143706: Implement TermLookup based on Elastic

2018-03-01 Thread Smalyshev
Smalyshev added a comment.
Also noticing something strange: WikibaseRepo::getPrefetchingTermLookup returns PrefetchingTermLookup, but implementation uses $this->getWikibaseServices()->getTermBuffer() which returns TermBuffer - which doesn't have to be implementing PrefetchingTermLookup (even though existing concrete implementations of TermBuffer seem to implement it).TASK DETAILhttps://phabricator.wikimedia.org/T143706EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Rzuwig, Smalyshev, hoo, Liuxinyu970226, Aklapper, aude, JanZerebecki, JeroenDeDauw, Tobi_WMDE_SW, thiemowmde, adrianheine, Lydia_Pintscher, Ricordisamoa, daniel, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, LawExplorer, Vali.matei, Volker_E, Wikidata-bugs, GWicke, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143706: Implement TermLookup based on Elastic

2018-03-01 Thread Smalyshev
Smalyshev added a comment.
@thiemowmde Thanks, now I understand the idea much better. However, it seems to be that it might be better to implement TermIndex, not TermLookup - since otherwise we do not get to benefit from caching/prefetching functionality that specific TermLookup classes implement on top of TermIndex and PrefetchingTermLookup. Does it make sense or am I missing something here?TASK DETAILhttps://phabricator.wikimedia.org/T143706EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Rzuwig, Smalyshev, hoo, Liuxinyu970226, Aklapper, aude, JanZerebecki, JeroenDeDauw, Tobi_WMDE_SW, thiemowmde, adrianheine, Lydia_Pintscher, Ricordisamoa, daniel, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, LawExplorer, Vali.matei, Volker_E, Wikidata-bugs, GWicke, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143706: Implement TermLookup based on Elastic

2018-01-30 Thread thiemowmde
thiemowmde added a comment.
We do have a TermLookup interface with a few implementations, one that directly accesses the entity, and one that accesses wb_terms. This ticket suggests an other implementation that queries the Cirrus/Elastic index.

The features of TermLookup are currently used in quite a lot of very different places:


In the getLabelByLanguage function in Lua.
In the LinkBegin hook handler that makes sure proper labels are displayed when linking to Wikidata entities.
In the LabelDescriptionLookup implementations, which are again used in about two dozen different places.


It might be possible to replace some of these usages with new Cirrus/Elastic based infrastructure that does not implement TermLookup. However, there are so many places, it doesn't seem to be a good idea to replace them all. The easiest way forward seems to be a new Cirrus/Elastic based TermLookup that can then be used wherever necessary, and slowly replace especially the remaining wb_terms lookups.TASK DETAILhttps://phabricator.wikimedia.org/T143706EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: thiemowmdeCc: Rzuwig, Smalyshev, hoo, Liuxinyu970226, Aklapper, aude, JanZerebecki, JeroenDeDauw, Tobi_WMDE_SW, thiemowmde, adrianheine, Lydia_Pintscher, Ricordisamoa, daniel, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, LawExplorer, Vali.matei, Volker_E, Wikidata-bugs, GWicke, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143706: Implement TermLookup based on Elastic

2018-01-30 Thread Smalyshev
Smalyshev added a comment.
@Lydia_Pintscher, @daniel Would like a bit more clarity about this one - what needs to be done here (if at all)?TASK DETAILhttps://phabricator.wikimedia.org/T143706EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Rzuwig, Smalyshev, hoo, Liuxinyu970226, Aklapper, aude, JanZerebecki, JeroenDeDauw, Tobi_WMDE_SW, thiemowmde, adrianheine, Lydia_Pintscher, Ricordisamoa, daniel, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, LawExplorer, Vali.matei, Volker_E, Wikidata-bugs, GWicke, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143706: Implement TermLookup based on Elastic

2017-12-12 Thread Smalyshev
Smalyshev added a comment.
I think we have all pieces for this, but not sure what API should be there or whether we still want this?TASK DETAILhttps://phabricator.wikimedia.org/T143706EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Rzuwig, Smalyshev, hoo, Liuxinyu970226, Aklapper, aude, JanZerebecki, JeroenDeDauw, Tobi_WMDE_SW, thiemowmde, adrianheine, Lydia_Pintscher, Ricordisamoa, daniel, Lahi, Gq86, GoranSMilovanovic, QZanden, Vali.matei, Volker_E, Wikidata-bugs, GWicke, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143706: Implement TermLookup based on Elastic

2017-02-07 Thread Smalyshev
Smalyshev added a comment.
Just to be clear, which lookup(s) we're talking about:


term -> entity ID
term+language -> entity ID
entity+language -> term(s)
entity -> term(s) in any language
TASK DETAILhttps://phabricator.wikimedia.org/T143706EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Rzuwig, Smalyshev, hoo, Liuxinyu970226, Aklapper, aude, JanZerebecki, JeroenDeDauw, Tobi_WMDE_SW, thiemowmde, adrianheine, Lydia_Pintscher, Ricordisamoa, daniel, Vali.matei, D3r1ck01, Izno, Wikidata-bugs, GWicke, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs