The metadata used to be included in the image table, but it was changed 6 months ago out to External Storage. See https://phabricator.wikimedia.org/T275268#7178983
On Fri, 4 Feb 2022 at 20:44, Mitar <mmi...@gmail.com> wrote: > Hi! > > Will do. Thanks. > > After going through the image table dump, it seems not all data is in > there. For example, page count for Djvu files is missing. Instead of > metadata in the image table dump, a reference to text table [1] is > provided: > > {"data":[],"blobs":{"data":"tt:609531648","text":"tt:609531649"}} > > But that table itself does not seem to be available as a dump? Or am I > missing something or misunderstanding something? > > [1] https://www.mediawiki.org/wiki/Manual:Text_table > > > Mitar > > On Fri, Feb 4, 2022 at 6:54 AM Ariel Glenn WMF <ar...@wikimedia.org> > wrote: > > > > This looks great! If you like, you might add the link and a brief > description to this page: > https://meta.wikimedia.org/wiki/Data_dumps/Other_tools so that more > people can find and use the library :-) > > > > (Anyone else have tools they wrote and use, that aren't on this list? > Please add them!) > > > > Ariel > > > > On Fri, Feb 4, 2022 at 2:31 AM Mitar <mmi...@gmail.com> wrote: > >> > >> Hi! > >> > >> If it is useful to anyone else, I have added to my library [1] in Go > >> for processing dumps support for processing SQL dumps directly, > >> without having to load them into a database. So one can process them > >> directly to extract data, like dumps in other formats. > >> > >> [1] https://gitlab.com/tozd/go/mediawiki > >> > >> > >> Mitar > >> > >> On Thu, Feb 3, 2022 at 9:13 AM Mitar <mmi...@gmail.com> wrote: > >> > > >> > Hi! > >> > > >> > I see. Thanks. > >> > > >> > > >> > Mitar > >> > > >> > On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF <ar...@wikimedia.org> > wrote: > >> > > > >> > > The media/file descriptions contained in the dump are the wikitext > of the revisions of pages with the File: prefix, plus the metadata about > those pages and revisions (user that made the edit, timestamp of edit, edit > comment, and so on). > >> > > > >> > > Width and hieght of the image, the media type, the sha1 of the > image and a few other details can be obtained by looking at the > image.sql.gz file available for download for the dumps for each wiki. Have > a look at https://www.mediawiki.org/wiki/Manual:Image_table for more info. > >> > > > >> > > Hope that helps! > >> > > > >> > > Ariel Glenn > >> > > > >> > > > >> > > > >> > > On Wed, Feb 2, 2022 at 10:45 PM Mitar <mmi...@gmail.com> wrote: > >> > >> > >> > >> Hi! > >> > >> > >> > >> I am trying to find a dump of all imageinfo data [1] for all files > on > >> > >> Commons. I thought that "Articles, templates, media/file > descriptions, > >> > >> and primary meta-pages" XML dump would contain that, given the > >> > >> "media/file descriptions" part, but it seems this is not the case. > Is > >> > >> there a dump which contains that information? And what is > "media/file > >> > >> descriptions" then? Wiki pages of files? > >> > >> > >> > >> [1] https://www.mediawiki.org/wiki/API:Imageinfo > >> > >> > >> > >> > >> > >> Mitar > >> > >> > >> > >> -- > >> > >> http://mitar.tnode.com/ > >> > >> https://twitter.com/mitar_m > >> > >> _______________________________________________ > >> > >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org > >> > >> To unsubscribe send an email to > xmldatadumps-l-le...@lists.wikimedia.org > >> > > >> > > >> > > >> > -- > >> > http://mitar.tnode.com/ > >> > https://twitter.com/mitar_m > >> > >> > >> > >> -- > >> http://mitar.tnode.com/ > >> https://twitter.com/mitar_m > > > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m > _______________________________________________ > Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org > To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org >
_______________________________________________ Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org