On 11/08/13 22:29, Tom Morris wrote:
On Sat, Aug 10, 2013 at 2:30 PM, Markus Krötzsch
<mar...@semantic-mediawiki.org <mailto:mar...@semantic-mediawiki.org>>
wrote:
Anyway, if you restrict yourself to tools that are installed by
default on your system, then it will be difficult to do many
interesting things with a 4.5G RDF file ;-) Seriously, the RDF dump
is really meant specifically for tools that take RDF inputs. It is
not very straightforward to encode all of Wikidata in triples, and
it leads to some inconvenient constructions (especially a lot of
reification). If you don't actually want to use an RDF tool and you
are just interested in the data, then there would be easier ways of
getting it.
A single fact per line seems like a pretty convenient format to me.
What format do you recommend that's easier to process?
I'd suggest some custom format that at least keeps single data values in
one line. For example, in RDF, you have to do two joins to find all
items that have a property with a date in the year 2010. Even with a
line-by-line format, you will not be able to grep this. So I think a
less normalised representation would be nicer for direct text-based
processing. For text-based processing, I would probably prefer a format
where one statement is encoded on one line. But it really depends on
what you want to do. Maybe you could also remove some data to obtain
something that is easier to process.
Markus
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l