Hi! I made this ticket [1] to track regaining access to metadata as a dump.
[1] https://phabricator.wikimedia.org/T301039 Mitar On Tue, Feb 8, 2022 at 2:32 AM Platonides <platoni...@gmail.com> wrote: > > The metadata used to be included in the image table, but it was changed 6 > months ago out to External Storage. See > https://phabricator.wikimedia.org/T275268#7178983 > > > On Fri, 4 Feb 2022 at 20:44, Mitar <mmi...@gmail.com> wrote: >> >> Hi! >> >> Will do. Thanks. >> >> After going through the image table dump, it seems not all data is in >> there. For example, page count for Djvu files is missing. Instead of >> metadata in the image table dump, a reference to text table [1] is >> provided: >> >> {"data":[],"blobs":{"data":"tt:609531648","text":"tt:609531649"}} >> >> But that table itself does not seem to be available as a dump? Or am I >> missing something or misunderstanding something? >> >> [1] https://www.mediawiki.org/wiki/Manual:Text_table >> >> >> Mitar >> >> On Fri, Feb 4, 2022 at 6:54 AM Ariel Glenn WMF <ar...@wikimedia.org> wrote: >> > >> > This looks great! If you like, you might add the link and a brief >> > description to this page: >> > https://meta.wikimedia.org/wiki/Data_dumps/Other_tools so that more >> > people can find and use the library :-) >> > >> > (Anyone else have tools they wrote and use, that aren't on this list? >> > Please add them!) >> > >> > Ariel >> > >> > On Fri, Feb 4, 2022 at 2:31 AM Mitar <mmi...@gmail.com> wrote: >> >> >> >> Hi! >> >> >> >> If it is useful to anyone else, I have added to my library [1] in Go >> >> for processing dumps support for processing SQL dumps directly, >> >> without having to load them into a database. So one can process them >> >> directly to extract data, like dumps in other formats. >> >> >> >> [1] https://gitlab.com/tozd/go/mediawiki >> >> >> >> >> >> Mitar >> >> >> >> On Thu, Feb 3, 2022 at 9:13 AM Mitar <mmi...@gmail.com> wrote: >> >> > >> >> > Hi! >> >> > >> >> > I see. Thanks. >> >> > >> >> > >> >> > Mitar >> >> > >> >> > On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF <ar...@wikimedia.org> >> >> > wrote: >> >> > > >> >> > > The media/file descriptions contained in the dump are the wikitext of >> >> > > the revisions of pages with the File: prefix, plus the metadata about >> >> > > those pages and revisions (user that made the edit, timestamp of >> >> > > edit, edit comment, and so on). >> >> > > >> >> > > Width and hieght of the image, the media type, the sha1 of the image >> >> > > and a few other details can be obtained by looking at the >> >> > > image.sql.gz file available for download for the dumps for each wiki. >> >> > > Have a look at https://www.mediawiki.org/wiki/Manual:Image_table for >> >> > > more info. >> >> > > >> >> > > Hope that helps! >> >> > > >> >> > > Ariel Glenn >> >> > > >> >> > > >> >> > > >> >> > > On Wed, Feb 2, 2022 at 10:45 PM Mitar <mmi...@gmail.com> wrote: >> >> > >> >> >> > >> Hi! >> >> > >> >> >> > >> I am trying to find a dump of all imageinfo data [1] for all files on >> >> > >> Commons. I thought that "Articles, templates, media/file >> >> > >> descriptions, >> >> > >> and primary meta-pages" XML dump would contain that, given the >> >> > >> "media/file descriptions" part, but it seems this is not the case. Is >> >> > >> there a dump which contains that information? And what is "media/file >> >> > >> descriptions" then? Wiki pages of files? >> >> > >> >> >> > >> [1] https://www.mediawiki.org/wiki/API:Imageinfo >> >> > >> >> >> > >> >> >> > >> Mitar >> >> > >> >> >> > >> -- >> >> > >> http://mitar.tnode.com/ >> >> > >> https://twitter.com/mitar_m >> >> > >> _______________________________________________ >> >> > >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org >> >> > >> To unsubscribe send an email to >> >> > >> xmldatadumps-l-le...@lists.wikimedia.org >> >> > >> >> > >> >> > >> >> > -- >> >> > http://mitar.tnode.com/ >> >> > https://twitter.com/mitar_m >> >> >> >> >> >> >> >> -- >> >> http://mitar.tnode.com/ >> >> https://twitter.com/mitar_m >> >> >> >> -- >> http://mitar.tnode.com/ >> https://twitter.com/mitar_m >> _______________________________________________ >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org >> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org -- http://mitar.tnode.com/ https://twitter.com/mitar_m _______________________________________________ Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org