Hi!

I made this ticket [1] to track regaining access to metadata as a dump.

[1] https://phabricator.wikimedia.org/T301039


Mitar

On Tue, Feb 8, 2022 at 2:32 AM Platonides <platoni...@gmail.com> wrote:
>
> The metadata used to be included in the image table, but it was changed 6 
> months ago out to External Storage. See 
> https://phabricator.wikimedia.org/T275268#7178983
>
>
> On Fri, 4 Feb 2022 at 20:44, Mitar <mmi...@gmail.com> wrote:
>>
>> Hi!
>>
>> Will do. Thanks.
>>
>> After going through the image table dump, it seems not all data is in
>> there. For example, page count for Djvu files is missing. Instead of
>> metadata in the image table dump, a reference to text table [1] is
>> provided:
>>
>> {"data":[],"blobs":{"data":"tt:609531648","text":"tt:609531649"}}
>>
>> But that table itself does not seem to be available as a dump? Or am I
>> missing something or misunderstanding something?
>>
>> [1] https://www.mediawiki.org/wiki/Manual:Text_table
>>
>>
>> Mitar
>>
>> On Fri, Feb 4, 2022 at 6:54 AM Ariel Glenn WMF <ar...@wikimedia.org> wrote:
>> >
>> > This looks great! If you like, you might add the link and a  brief 
>> > description to this page: 
>> > https://meta.wikimedia.org/wiki/Data_dumps/Other_tools  so that more 
>> > people can find and use the library :-)
>> >
>> > (Anyone else have tools they wrote and use, that aren't on this list? 
>> > Please add them!)
>> >
>> > Ariel
>> >
>> > On Fri, Feb 4, 2022 at 2:31 AM Mitar <mmi...@gmail.com> wrote:
>> >>
>> >> Hi!
>> >>
>> >> If it is useful to anyone else, I have added to my library [1] in Go
>> >> for processing dumps support for processing SQL dumps directly,
>> >> without having to load them into a database. So one can process them
>> >> directly to extract data, like dumps in other formats.
>> >>
>> >> [1] https://gitlab.com/tozd/go/mediawiki
>> >>
>> >>
>> >> Mitar
>> >>
>> >> On Thu, Feb 3, 2022 at 9:13 AM Mitar <mmi...@gmail.com> wrote:
>> >> >
>> >> > Hi!
>> >> >
>> >> > I see. Thanks.
>> >> >
>> >> >
>> >> > Mitar
>> >> >
>> >> > On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF <ar...@wikimedia.org> 
>> >> > wrote:
>> >> > >
>> >> > > The media/file descriptions contained in the dump are the wikitext of 
>> >> > > the revisions of pages with the File: prefix, plus the metadata about 
>> >> > > those pages and revisions (user that made the edit, timestamp of 
>> >> > > edit, edit comment, and so on).
>> >> > >
>> >> > > Width and hieght of the image, the media type, the sha1 of the image 
>> >> > > and a few other details can be obtained by looking at the 
>> >> > > image.sql.gz file available for download for the dumps for each wiki. 
>> >> > > Have a look at https://www.mediawiki.org/wiki/Manual:Image_table for 
>> >> > > more info.
>> >> > >
>> >> > > Hope that helps!
>> >> > >
>> >> > > Ariel Glenn
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Wed, Feb 2, 2022 at 10:45 PM Mitar <mmi...@gmail.com> wrote:
>> >> > >>
>> >> > >> Hi!
>> >> > >>
>> >> > >> I am trying to find a dump of all imageinfo data [1] for all files on
>> >> > >> Commons. I thought that "Articles, templates, media/file 
>> >> > >> descriptions,
>> >> > >> and primary meta-pages" XML dump would contain that, given the
>> >> > >> "media/file descriptions" part, but it seems this is not the case. Is
>> >> > >> there a dump which contains that information? And what is "media/file
>> >> > >> descriptions" then? Wiki pages of files?
>> >> > >>
>> >> > >> [1] https://www.mediawiki.org/wiki/API:Imageinfo
>> >> > >>
>> >> > >>
>> >> > >> Mitar
>> >> > >>
>> >> > >> --
>> >> > >> http://mitar.tnode.com/
>> >> > >> https://twitter.com/mitar_m
>> >> > >> _______________________________________________
>> >> > >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
>> >> > >> To unsubscribe send an email to 
>> >> > >> xmldatadumps-l-le...@lists.wikimedia.org
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > http://mitar.tnode.com/
>> >> > https://twitter.com/mitar_m
>> >>
>> >>
>> >>
>> >> --
>> >> http://mitar.tnode.com/
>> >> https://twitter.com/mitar_m
>>
>>
>>
>> --
>> http://mitar.tnode.com/
>> https://twitter.com/mitar_m
>> _______________________________________________
>> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
>> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
_______________________________________________
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org

Reply via email to