Hi Denny,

Thanks! I am not sure how accurate it will be, if it doesn't meet
expectations, I might need to think about optimizing the model, different
metrics etc.; I haven't really thought about those at the moment.

> what do the two properties without a value mean here?

Let me explain those. You can take a look at the wiki pages mentioned on
this page - https://github.com/nilesh-c/wikidata-entity-suggester

Currently, things like these are stored on the recommendation engine:
100,32,7
60,151,7
...
56,152----10256,12
...

In the first kind you can see, pairs of <item> and <property> are there,
along with the relative affinity 7. Now, suppose we have lots of "city"
items and their respective properties. Say someone tries adding another
item that is a city. Now, as he begins adding properties to that item
(properties that generally belong to a city of course), irrespective of
whether he enters any values for them or not, the entity suggester will
suggest "similar" properties. We are not even talking about "values" here.
If the user *does* add values, better recommendations are fetched. This is
primarily about fetching recommendations for "properties"

So, if someone starts adding a new city called "Wonderland" and adds
properties like "is in the administrative
unit<http://www.wikidata.org/wiki/Property:P131>"
or "head of local government <http://www.wikidata.org/wiki/Property:P6>",
the suggester will tell the user that probably
"country<http://www.wikidata.org/wiki/Property:P17>"
and "flag image <http://www.wikidata.org/wiki/Property:P41>" are some
properties that he/she should add. At least that's the idea.

Now, suggesting values - the current implementation of suggesting values is
just a side-addition. It might not be really accurate. What I intend to add
afterwards is something like this: after the user enters stuff like
41,32,45----462347,.... blah blah, he wants "value" suggestions for
property 31, ie. suggesting "values" to properties.

So, in brief, what currently happens:
Suggest "property-value" mappings to new "item".
Suggest "properties" to new "item".
(new item means, anonymous item, an item without an ID, yet to be added)

What I need to add:
Suggest "value" to a "property".
(This is exactly what you were expecting)

In essence, we combine these 3 types of recommendations and do some magic.
I hope this helps you to understand it better. :)

> How quickly are updates processed by the backend, any idea?

On an Intel core i5-2500K quad-core machine with 4G RAM, this dataset (17th
April - 
wikidatawiki-20130417-pages-meta-current.xml.bz2<http://dumps.wikimedia.org/wikidatawiki/20130417/wikidatawiki-20130417-pages-meta-current.xml.bz2>
):

Data points (pairs) - 8360275
Items - 1965516
Properties and Property-Value pairs - 686318

-- took about 45 mins to build the CSV files, and 15-20 mins to build the
Myrrix model. So it's about 1 hour in total. Parallelizing the CSV file
building will probably bring that time down a bit, not certain though.

Adding new data (items, properties etc) at runtime is pretty much
instantaneous - adding a bunch of 1000 data points will probably take 1
sec, adding 10 data points will be 100ms approximately (including the PHP
client's time and all). It's just an estimate from what I've experienced. I
haven't done any proper benchmarks myself.

Cheers,
Nilesh




On Wed, May 22, 2013 at 5:23 PM, Denny Vrandečić <
denny.vrande...@wikimedia.de> wrote:

> Awesome, that looks already pretty promising!
>
> I am not completely sure I understand a few things:
>
> 107----4167410
> 106
> 107----215627
> 156
>
> what do the two properties without a value mean here?
>
> I would have expected:
>
> 107----4167410
> 107----215627
>
> and now ask for suggested values for 31,
> or for suggested properties to add.
>
> But these are already details. The results seem pretty promising.
>
> How quickly are updates processed by the backend, any idea?
>
>
>
>
>
>
>
> 2013/5/21 Nilesh Chakraborty <nil...@nileshc.com>
>
> > Hello,
> >
> > I have some updates on the Entity Suggester prototype. Here are the two
> > repos:
> > 1. https://github.com/nilesh-c/wikidata-entity-suggester
> > 2. https://github.com/nilesh-c/wes-php-client
> >
> > As it stands now, deployment-wise, I have a single Java war file that's
> > deployed on Tomcat. And there's a PHP client that can be used from PHP
> code
> > to push data into or fetch suggestions from that engine.
> >
> > I have made a simple, crude demo that you can access here
> > -http://home.nileshc.com/wesTest.php.
> > You can find the code for it in the wes-php-client repo. It's hosted on
> my
> > home desktop temporarily. I am having some non-technical problems with
> the
> > VPS I'm managing and customer support is working on it. After it starts
> to
> > work, I may try deploying this to the VPS. So, if you have to face an
> > embarrassing 404 page, I'm really sorry, I'll be working on it. If it
> stays
> > up, well and good. :)
> > <http://home.nileshc.com/wesTest.php>
> >
> > You can give it a bunch of property IDs, or a bunch of property-value
> > pairs, or a mix of both; select the the type of recommendation and hit
> "Get
> > suggestions!" :) Feedback is much appreciated.
> >
> > Cheers,
> > Nilesh
> >
> >
> > On Tue, May 14, 2013 at 2:36 AM, Matthew Flaschen
> > <mflasc...@wikimedia.org>wrote:
> >
> > > On 05/13/2013 04:28 PM, Nilesh Chakraborty wrote:
> > > > Hi Matt,
> > > >
> > > > Yes, you're right, they are available as separately licensed
> downloads.
> > > > Only the stand-alone "Serving Layer" is needed for the Entity
> > Suggester.
> > > > It's licensed under Apache v. 2.0. Since I'm using the software
> as-is,
> > > > without any code modifications, I suppose it's compatible with what
> > > > Wikidata would allow?
> > >
> > > Apache 2.0-licensed software should be fine, even if you do need/want
> to
> > > modify it.
> > >
> > > Matt Flaschen
> > >
> > > _______________________________________________
> > > Wikitech-l mailing list
> > > Wikitech-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > >
> >
> >
> >
> > --
> > A quest eternal, a life so small! So don't just play the guitar, build
> one.
> > You can also email me at cont...@nileshc.com or visit my
> > website<http://www.nileshc.com/>
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
>
>
> --
> Project director Wikidata
> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
> Tel. +49-30-219 158 26-0 | http://wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/681/51985.
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
A quest eternal, a life so small! So don't just play the guitar, build one.
You can also email me at cont...@nileshc.com or visit my
website<http://www.nileshc.com/>
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to