Sweeeeeeet ! :) Looking forward to try this out.
This will be a great tool !
PS : I found this dataset that can interest some of you :
http://www.mpi-inf.mpg.de/yago-naga/yago/
++
On 06/01/2011 04:40 PM, Rupert Westenthaler wrote:
Hi
Based on your Request I have worked the last two days on several
improvements of the Indexing Tool.
Most important the Indexing Util now directly creates a Bundle that
when installed in the Entityhub will create all the necessary
Entityhub components to use the Indexed RDF data as an Referenced Site
I have also created the generic RDF configuration with a lot of
additional documentation.
I am currently working on some final things. So expect to see the
stuff in the SVN tomorrow.
best
Rupert Westenthaler
On Wed, Jun 1, 2011 at 10:32 AM, Olivier Grisel
<[email protected]> wrote:
2011/6/1 Florent André<[email protected]>:
Hi Rupert,
Thanks for your valuables answers !
In fact, if get it now, the meaning of indexing in entity hub is not just
about index, but about create a new (offline) entity hub.
You said :
The Solr Yard provides better performance especially for big Datasets.
...
The Clerezza is fine for smaller data sets.
Do you have a "magic number" (a vague will be fine :) ) that define the
limit for a big dataset ?
The SolrYard implementation should be pretty scalable (tens or
hundreds millions of entities). The ClerezzaYard will suffer from a
limitation though. It won't be scalable to more than a couple of
thousands of entities as long as the following is not fixed:
https://issues.apache.org/jira/browse/CLEREZZA-466
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel