Hi everyone, One more thing - should I create a new thread to discuss on prototyping my project (entity suggester) and to discuss any issues I'm facing along the way or ask for help? Or should I just stick to this old thread?
Cheers, Nilesh On Sat, May 4, 2013 at 11:05 PM, Nilesh Chakraborty <nil...@nileshc.com>wrote: > Thanks for the help, Katie. I'll be looking into how Solr has been > integrated with the GeoData extension. About wikidata-vagrant, no problem, > I'll install it by following this > page<http://www.mediawiki.org/wiki/Extension:Wikibase> > . > > You're right, raw DB access can be painful and I'd need to rewrite a lot > of code. I'm considering two options: > > *i)* Using the database-related code in the wikidata extension (I'm > studying the DataModel classes and how they interact with the database) to > fetch what I need and feed them into the recommendation engine. > > *ii)* Not accessing the DB at all. Rather, I can write map-reduce scripts > to extract all the training data and everything I need for each Item from > the wikidatawiki data dump and feed it into the recommendation engine. I > can use a cron job to download the latest data dump when available, and run > the scripts on it. I don't think it would be an issue even if the engine > lags by the interval the dumps are generated in, since the whole > recommendation thing is all about approximations. > > My request to the devs, the community - please discuss the pros and cons > of each method and suggest which one you think would be the best, mainly in > terms of performance. I personally feel that option (ii) would be cleaner. > > Cheers, > Nilesh > > > > On Fri, May 3, 2013 at 3:53 PM, aude <aude.w...@gmail.com> wrote: > >> On Fri, May 3, 2013 at 5:39 AM, Nilesh Chakraborty <nil...@nileshc.com >> >wrote: >> >> > Hi Lydia, >> > >> > I am currently drafting my proposal, I shall submit within a few hours >> once >> > the initial version is complete. >> > >> > I installed mediawiki-vagrant on my PC and it went quite smoothly. I >> could >> > do all the usual things through the browser; I logged into the mysql >> server >> > to examine the database schema. >> > >> > I also began to clone the >> > wikidata-vagrant<https://github.com/SilkeMeyer/wikidata-vagrant> repo. >> > But it seems that the 'git submodule update --init' part would take a >> long >> > time - if I'm not mistaken, it's a huge download (excluding the vagrant >> up >> > command, which alone takes around 1.25 hours to download everything). I >> > wanted to clarify something before downloading it all. >> > >> > Since the entity suggester will be working with wikidata, it'll >> obviously >> > need to access the whole live dataset from the database (not the xml >> dump) >> > to make the recommendations. I tried searching for database access APIs >> or >> > high-level REST APIs for wikidata, but couldn't figure out how I to do >> > that. Could you point me to the proper documentation? >> > >> >> One of the best examples of a MediaWiki extension interacting with a Java >> service is how Solr is used. Solr is still pretty new at Wikimedia, >> though. It is used with the GeoData extension and then Solr is used by >> geodata api modules. >> >> I think Solr gets updated via a cronjob (solrupdate.php) which creates >> jobs >> in the job queue. Not 100% sure of the exact details. >> >> I do not think direct access to the live database is very practical. I >> think anyway the data (json blobs) would need indexing in some particular >> way to support what the entity selector needs to do. >> >> http://www.mediawiki.org/wiki/Extension:GeoData >> >> The Translate extension also uses Solr in some way, though I am not very >> familiar with the details. >> >> On the operations side, puppet is used to configure everything. The >> puppet >> git repo is available to see how things are done. >> >> >> https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=tree;f=modules/solr;hb=HEAD >> >> >> > >> > And also, what is the best way to add a few .jar files to wikidata and >> > execute them with custom commands (nohup java blah.jar --blah blah --> >> > running as daemons)? I can of course set it up on my development box >> inside >> > virtualbox - I want to know how to "integrate" it into the system so >> that >> > any other user can download vagrant and wikidata and have the jars all >> > ready and running? What is the proper development workflow for this? >> > >> >> wikidata-vagrant is maintained in github, though I think might not work >> perfectly right now. We need to update it and it's on our to-do, and >> perhaps could be moved to gerrit. I do not know about integrating the >> jars, but should be possible. >> >> Cheers, >> Katie Filbert >> >> [answering from this email, as I am not subscribed to wikitech-l on my >> wikimedia.de email] >> >> >> > >> > Thanks, >> > Nilesh >> > >> > >> > >> > On Sun, Apr 28, 2013 at 3:01 AM, Nilesh Chakraborty <nil...@nileshc.com >> > >wrote: >> > >> > > Awesome. Got it. >> > > >> > > I see what you mean, great, thank you. :) >> > > >> > > Cheers, >> > > Nilesh >> > > On Apr 28, 2013 2:56 AM, "Lydia Pintscher" < >> lydia.pintsc...@wikimedia.de >> > > >> > > wrote: >> > > >> > >> On Sat, Apr 27, 2013 at 11:14 PM, Nilesh Chakraborty < >> > nil...@nileshc.com> >> > >> wrote: >> > >> > Hi Lydia, >> > >> > >> > >> > That helps a lot, and makes it way more interesting. Rather than >> > being a >> > >> > one-size-fits-all solution, as it seems to me, each property or >> each >> > >> type >> > >> > of property (eg. different relationships) will need individual >> > attention >> > >> > and different methods/metrics for recommendation. >> > >> > >> > >> > The examples you gave, like continents, sex, relations like >> > father/son, >> > >> > uncle/aunt/spouse, or place-oriented properties like place of >> birth, >> > >> > country of citizenship, ethnic group etc. - each type has a certain >> > >> pattern >> > >> > to it (if a person was born in the US, US should be one of the >> > >> countries he >> > >> > was a citizen of; US census/ethnicity statistics may be used to >> > predict >> > >> > ethnic group etc.) I'm already starting to chalk out a few patterns >> > and >> > >> how >> > >> > they can be used for recommendation. In my proposal, should I go >> into >> > >> > details regarding these? Or should I just give a few examples and >> > >> explain >> > >> > how the algorithms would work, to explain the idea? >> > >> >> > >> Give some examples and how you'd handle them. You definitely don't >> > >> need to have it for all properties. What's important is giving an >> idea >> > >> about how you'd tackle the problem. Give the reader the impression >> > >> that you know what you are talking about and can handle the larger >> > >> problem. >> > >> >> > >> Also: Don't make the system too intelligent like it knowing about US >> > >> census data for example. Keep it simple and stupid for now. Things >> > >> like "property A is usually used with value X, Y or Z" should cover a >> > >> lot already and are likely enough for most cases. >> > >> >> > >> >> > >> Cheers >> > >> Lydia >> > >> >> > >> -- >> > >> Lydia Pintscher - http://about.me/lydia.pintscher >> > >> Community Communications for Technical Projects >> > >> >> > >> Wikimedia Deutschland e.V. >> > >> Obentrautstr. 72 >> > >> 10963 Berlin >> > >> www.wikimedia.de >> > >> >> > >> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. >> V. >> > >> >> > >> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg >> > >> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das >> > >> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. >> > >> >> > >> _______________________________________________ >> > >> Wikitech-l mailing list >> > >> Wikitech-l@lists.wikimedia.org >> > >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> > > >> > > >> > >> > >> > -- >> > A quest eternal, a life so small! So don't just play the guitar, build >> one. >> > You can also email me at cont...@nileshc.com or visit my >> > website<http://www.nileshc.com/> >> > _______________________________________________ >> > Wikitech-l mailing list >> > Wikitech-l@lists.wikimedia.org >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> > >> >> >> >> -- >> @wikimediadc / @wikidata >> _______________________________________________ >> Wikitech-l mailing list >> Wikitech-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> > > > > -- > A quest eternal, a life so small! So don't just play the guitar, build one. > You can also email me at cont...@nileshc.com or visit my > website<http://www.nileshc.com/> > > -- A quest eternal, a life so small! So don't just play the guitar, build one. You can also email me at cont...@nileshc.com or visit my website<http://www.nileshc.com/> _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l