[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-09 Thread Mitar
Hi!

I made this ticket [1] to track regaining access to metadata as a dump.

[1] https://phabricator.wikimedia.org/T301039


Mitar

On Tue, Feb 8, 2022 at 2:32 AM Platonides  wrote:
>
> The metadata used to be included in the image table, but it was changed 6 
> months ago out to External Storage. See 
> https://phabricator.wikimedia.org/T275268#7178983
>
>
> On Fri, 4 Feb 2022 at 20:44, Mitar  wrote:
>>
>> Hi!
>>
>> Will do. Thanks.
>>
>> After going through the image table dump, it seems not all data is in
>> there. For example, page count for Djvu files is missing. Instead of
>> metadata in the image table dump, a reference to text table [1] is
>> provided:
>>
>> {"data":[],"blobs":{"data":"tt:609531648","text":"tt:609531649"}}
>>
>> But that table itself does not seem to be available as a dump? Or am I
>> missing something or misunderstanding something?
>>
>> [1] https://www.mediawiki.org/wiki/Manual:Text_table
>>
>>
>> Mitar
>>
>> On Fri, Feb 4, 2022 at 6:54 AM Ariel Glenn WMF  wrote:
>> >
>> > This looks great! If you like, you might add the link and a  brief 
>> > description to this page: 
>> > https://meta.wikimedia.org/wiki/Data_dumps/Other_tools  so that more 
>> > people can find and use the library :-)
>> >
>> > (Anyone else have tools they wrote and use, that aren't on this list? 
>> > Please add them!)
>> >
>> > Ariel
>> >
>> > On Fri, Feb 4, 2022 at 2:31 AM Mitar  wrote:
>> >>
>> >> Hi!
>> >>
>> >> If it is useful to anyone else, I have added to my library [1] in Go
>> >> for processing dumps support for processing SQL dumps directly,
>> >> without having to load them into a database. So one can process them
>> >> directly to extract data, like dumps in other formats.
>> >>
>> >> [1] https://gitlab.com/tozd/go/mediawiki
>> >>
>> >>
>> >> Mitar
>> >>
>> >> On Thu, Feb 3, 2022 at 9:13 AM Mitar  wrote:
>> >> >
>> >> > Hi!
>> >> >
>> >> > I see. Thanks.
>> >> >
>> >> >
>> >> > Mitar
>> >> >
>> >> > On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF  
>> >> > wrote:
>> >> > >
>> >> > > The media/file descriptions contained in the dump are the wikitext of 
>> >> > > the revisions of pages with the File: prefix, plus the metadata about 
>> >> > > those pages and revisions (user that made the edit, timestamp of 
>> >> > > edit, edit comment, and so on).
>> >> > >
>> >> > > Width and hieght of the image, the media type, the sha1 of the image 
>> >> > > and a few other details can be obtained by looking at the 
>> >> > > image.sql.gz file available for download for the dumps for each wiki. 
>> >> > > Have a look at https://www.mediawiki.org/wiki/Manual:Image_table for 
>> >> > > more info.
>> >> > >
>> >> > > Hope that helps!
>> >> > >
>> >> > > Ariel Glenn
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Wed, Feb 2, 2022 at 10:45 PM Mitar  wrote:
>> >> > >>
>> >> > >> Hi!
>> >> > >>
>> >> > >> I am trying to find a dump of all imageinfo data [1] for all files on
>> >> > >> Commons. I thought that "Articles, templates, media/file 
>> >> > >> descriptions,
>> >> > >> and primary meta-pages" XML dump would contain that, given the
>> >> > >> "media/file descriptions" part, but it seems this is not the case. Is
>> >> > >> there a dump which contains that information? And what is "media/file
>> >> > >> descriptions" then? Wiki pages of files?
>> >> > >>
>> >> > >> [1] https://www.mediawiki.org/wiki/API:Imageinfo
>> >> > >>
>> >> > >>
>> >> > >> Mitar
>> >> > >>
>> >> > >> --
>> >> > >> http://mitar.tnode.com/
>> >> > >> https://twitter.com/mitar_m
>> >> > >> ___
>> >> > >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
>> >> > >> To unsubscribe send an email to 
>> >> > >> xmldatadumps-l-le...@lists.wikimedia.org
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > http://mitar.tnode.com/
>> >> > https://twitter.com/mitar_m
>> >>
>> >>
>> >>
>> >> --
>> >> http://mitar.tnode.com/
>> >> https://twitter.com/mitar_m
>>
>>
>>
>> --
>> http://mitar.tnode.com/
>> https://twitter.com/mitar_m
>> ___
>> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
>> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org


[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-07 Thread Platonides
The metadata used to be included in the image table, but it was changed 6
months ago out to External Storage. See
https://phabricator.wikimedia.org/T275268#7178983


On Fri, 4 Feb 2022 at 20:44, Mitar  wrote:

> Hi!
>
> Will do. Thanks.
>
> After going through the image table dump, it seems not all data is in
> there. For example, page count for Djvu files is missing. Instead of
> metadata in the image table dump, a reference to text table [1] is
> provided:
>
> {"data":[],"blobs":{"data":"tt:609531648","text":"tt:609531649"}}
>
> But that table itself does not seem to be available as a dump? Or am I
> missing something or misunderstanding something?
>
> [1] https://www.mediawiki.org/wiki/Manual:Text_table
>
>
> Mitar
>
> On Fri, Feb 4, 2022 at 6:54 AM Ariel Glenn WMF 
> wrote:
> >
> > This looks great! If you like, you might add the link and a  brief
> description to this page:
> https://meta.wikimedia.org/wiki/Data_dumps/Other_tools  so that more
> people can find and use the library :-)
> >
> > (Anyone else have tools they wrote and use, that aren't on this list?
> Please add them!)
> >
> > Ariel
> >
> > On Fri, Feb 4, 2022 at 2:31 AM Mitar  wrote:
> >>
> >> Hi!
> >>
> >> If it is useful to anyone else, I have added to my library [1] in Go
> >> for processing dumps support for processing SQL dumps directly,
> >> without having to load them into a database. So one can process them
> >> directly to extract data, like dumps in other formats.
> >>
> >> [1] https://gitlab.com/tozd/go/mediawiki
> >>
> >>
> >> Mitar
> >>
> >> On Thu, Feb 3, 2022 at 9:13 AM Mitar  wrote:
> >> >
> >> > Hi!
> >> >
> >> > I see. Thanks.
> >> >
> >> >
> >> > Mitar
> >> >
> >> > On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF 
> wrote:
> >> > >
> >> > > The media/file descriptions contained in the dump are the wikitext
> of the revisions of pages with the File: prefix, plus the metadata about
> those pages and revisions (user that made the edit, timestamp of edit, edit
> comment, and so on).
> >> > >
> >> > > Width and hieght of the image, the media type, the sha1 of the
> image and a few other details can be obtained by looking at the
> image.sql.gz file available for download for the dumps for each wiki. Have
> a look at https://www.mediawiki.org/wiki/Manual:Image_table for more info.
> >> > >
> >> > > Hope that helps!
> >> > >
> >> > > Ariel Glenn
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Feb 2, 2022 at 10:45 PM Mitar  wrote:
> >> > >>
> >> > >> Hi!
> >> > >>
> >> > >> I am trying to find a dump of all imageinfo data [1] for all files
> on
> >> > >> Commons. I thought that "Articles, templates, media/file
> descriptions,
> >> > >> and primary meta-pages" XML dump would contain that, given the
> >> > >> "media/file descriptions" part, but it seems this is not the case.
> Is
> >> > >> there a dump which contains that information? And what is
> "media/file
> >> > >> descriptions" then? Wiki pages of files?
> >> > >>
> >> > >> [1] https://www.mediawiki.org/wiki/API:Imageinfo
> >> > >>
> >> > >>
> >> > >> Mitar
> >> > >>
> >> > >> --
> >> > >> http://mitar.tnode.com/
> >> > >> https://twitter.com/mitar_m
> >> > >> ___
> >> > >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
> >> > >> To unsubscribe send an email to
> xmldatadumps-l-le...@lists.wikimedia.org
> >> >
> >> >
> >> >
> >> > --
> >> > http://mitar.tnode.com/
> >> > https://twitter.com/mitar_m
> >>
> >>
> >>
> >> --
> >> http://mitar.tnode.com/
> >> https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
> ___
> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org
>
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org


[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-05 Thread Ariel Glenn WMF
The text table itself is not dumped, because some entries in it may be
related to hidden revisions or deleted pages, and thus not viewable by
ordinary users.

The text id is given in the content dumps as an xml tag before the wrapped
wikitext content, and you can associate the items that way.

Ariel

On Fri, Feb 4, 2022 at 10:43 PM Mitar  wrote:

> Hi!
>
> Will do. Thanks.
>
> After going through the image table dump, it seems not all data is in
> there. For example, page count for Djvu files is missing. Instead of
> metadata in the image table dump, a reference to text table [1] is
> provided:
>
> {"data":[],"blobs":{"data":"tt:609531648","text":"tt:609531649"}}
>
> But that table itself does not seem to be available as a dump? Or am I
> missing something or misunderstanding something?
>
> [1] https://www.mediawiki.org/wiki/Manual:Text_table
>
>
> Mitar
>
> On Fri, Feb 4, 2022 at 6:54 AM Ariel Glenn WMF 
> wrote:
> >
> > This looks great! If you like, you might add the link and a  brief
> description to this page:
> https://meta.wikimedia.org/wiki/Data_dumps/Other_tools  so that more
> people can find and use the library :-)
> >
> > (Anyone else have tools they wrote and use, that aren't on this list?
> Please add them!)
> >
> > Ariel
> >
> > On Fri, Feb 4, 2022 at 2:31 AM Mitar  wrote:
> >>
> >> Hi!
> >>
> >> If it is useful to anyone else, I have added to my library [1] in Go
> >> for processing dumps support for processing SQL dumps directly,
> >> without having to load them into a database. So one can process them
> >> directly to extract data, like dumps in other formats.
> >>
> >> [1] https://gitlab.com/tozd/go/mediawiki
> >>
> >>
> >> Mitar
> >>
> >> On Thu, Feb 3, 2022 at 9:13 AM Mitar  wrote:
> >> >
> >> > Hi!
> >> >
> >> > I see. Thanks.
> >> >
> >> >
> >> > Mitar
> >> >
> >> > On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF 
> wrote:
> >> > >
> >> > > The media/file descriptions contained in the dump are the wikitext
> of the revisions of pages with the File: prefix, plus the metadata about
> those pages and revisions (user that made the edit, timestamp of edit, edit
> comment, and so on).
> >> > >
> >> > > Width and hieght of the image, the media type, the sha1 of the
> image and a few other details can be obtained by looking at the
> image.sql.gz file available for download for the dumps for each wiki. Have
> a look at https://www.mediawiki.org/wiki/Manual:Image_table for more info.
> >> > >
> >> > > Hope that helps!
> >> > >
> >> > > Ariel Glenn
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Feb 2, 2022 at 10:45 PM Mitar  wrote:
> >> > >>
> >> > >> Hi!
> >> > >>
> >> > >> I am trying to find a dump of all imageinfo data [1] for all files
> on
> >> > >> Commons. I thought that "Articles, templates, media/file
> descriptions,
> >> > >> and primary meta-pages" XML dump would contain that, given the
> >> > >> "media/file descriptions" part, but it seems this is not the case.
> Is
> >> > >> there a dump which contains that information? And what is
> "media/file
> >> > >> descriptions" then? Wiki pages of files?
> >> > >>
> >> > >> [1] https://www.mediawiki.org/wiki/API:Imageinfo
> >> > >>
> >> > >>
> >> > >> Mitar
> >> > >>
> >> > >> --
> >> > >> http://mitar.tnode.com/
> >> > >> https://twitter.com/mitar_m
> >> > >> ___
> >> > >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
> >> > >> To unsubscribe send an email to
> xmldatadumps-l-le...@lists.wikimedia.org
> >> >
> >> >
> >> >
> >> > --
> >> > http://mitar.tnode.com/
> >> > https://twitter.com/mitar_m
> >>
> >>
> >>
> >> --
> >> http://mitar.tnode.com/
> >> https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
>
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org


[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-04 Thread Mitar
Hi!

Will do. Thanks.

After going through the image table dump, it seems not all data is in
there. For example, page count for Djvu files is missing. Instead of
metadata in the image table dump, a reference to text table [1] is
provided:

{"data":[],"blobs":{"data":"tt:609531648","text":"tt:609531649"}}

But that table itself does not seem to be available as a dump? Or am I
missing something or misunderstanding something?

[1] https://www.mediawiki.org/wiki/Manual:Text_table


Mitar

On Fri, Feb 4, 2022 at 6:54 AM Ariel Glenn WMF  wrote:
>
> This looks great! If you like, you might add the link and a  brief 
> description to this page: 
> https://meta.wikimedia.org/wiki/Data_dumps/Other_tools  so that more people 
> can find and use the library :-)
>
> (Anyone else have tools they wrote and use, that aren't on this list? Please 
> add them!)
>
> Ariel
>
> On Fri, Feb 4, 2022 at 2:31 AM Mitar  wrote:
>>
>> Hi!
>>
>> If it is useful to anyone else, I have added to my library [1] in Go
>> for processing dumps support for processing SQL dumps directly,
>> without having to load them into a database. So one can process them
>> directly to extract data, like dumps in other formats.
>>
>> [1] https://gitlab.com/tozd/go/mediawiki
>>
>>
>> Mitar
>>
>> On Thu, Feb 3, 2022 at 9:13 AM Mitar  wrote:
>> >
>> > Hi!
>> >
>> > I see. Thanks.
>> >
>> >
>> > Mitar
>> >
>> > On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF  wrote:
>> > >
>> > > The media/file descriptions contained in the dump are the wikitext of 
>> > > the revisions of pages with the File: prefix, plus the metadata about 
>> > > those pages and revisions (user that made the edit, timestamp of edit, 
>> > > edit comment, and so on).
>> > >
>> > > Width and hieght of the image, the media type, the sha1 of the image and 
>> > > a few other details can be obtained by looking at the image.sql.gz file 
>> > > available for download for the dumps for each wiki. Have a look at 
>> > > https://www.mediawiki.org/wiki/Manual:Image_table for more info.
>> > >
>> > > Hope that helps!
>> > >
>> > > Ariel Glenn
>> > >
>> > >
>> > >
>> > > On Wed, Feb 2, 2022 at 10:45 PM Mitar  wrote:
>> > >>
>> > >> Hi!
>> > >>
>> > >> I am trying to find a dump of all imageinfo data [1] for all files on
>> > >> Commons. I thought that "Articles, templates, media/file descriptions,
>> > >> and primary meta-pages" XML dump would contain that, given the
>> > >> "media/file descriptions" part, but it seems this is not the case. Is
>> > >> there a dump which contains that information? And what is "media/file
>> > >> descriptions" then? Wiki pages of files?
>> > >>
>> > >> [1] https://www.mediawiki.org/wiki/API:Imageinfo
>> > >>
>> > >>
>> > >> Mitar
>> > >>
>> > >> --
>> > >> http://mitar.tnode.com/
>> > >> https://twitter.com/mitar_m
>> > >> ___
>> > >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
>> > >> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org
>> >
>> >
>> >
>> > --
>> > http://mitar.tnode.com/
>> > https://twitter.com/mitar_m
>>
>>
>>
>> --
>> http://mitar.tnode.com/
>> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org


[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-03 Thread Ariel Glenn WMF
This looks great! If you like, you might add the link and a  brief
description to this page:
https://meta.wikimedia.org/wiki/Data_dumps/Other_tools  so that more people
can find and use the library :-)

(Anyone else have tools they wrote and use, that aren't on this list?
Please add them!)

Ariel

On Fri, Feb 4, 2022 at 2:31 AM Mitar  wrote:

> Hi!
>
> If it is useful to anyone else, I have added to my library [1] in Go
> for processing dumps support for processing SQL dumps directly,
> without having to load them into a database. So one can process them
> directly to extract data, like dumps in other formats.
>
> [1] https://gitlab.com/tozd/go/mediawiki
>
>
> Mitar
>
> On Thu, Feb 3, 2022 at 9:13 AM Mitar  wrote:
> >
> > Hi!
> >
> > I see. Thanks.
> >
> >
> > Mitar
> >
> > On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF 
> wrote:
> > >
> > > The media/file descriptions contained in the dump are the wikitext of
> the revisions of pages with the File: prefix, plus the metadata about those
> pages and revisions (user that made the edit, timestamp of edit, edit
> comment, and so on).
> > >
> > > Width and hieght of the image, the media type, the sha1 of the image
> and a few other details can be obtained by looking at the image.sql.gz file
> available for download for the dumps for each wiki. Have a look at
> https://www.mediawiki.org/wiki/Manual:Image_table for more info.
> > >
> > > Hope that helps!
> > >
> > > Ariel Glenn
> > >
> > >
> > >
> > > On Wed, Feb 2, 2022 at 10:45 PM Mitar  wrote:
> > >>
> > >> Hi!
> > >>
> > >> I am trying to find a dump of all imageinfo data [1] for all files on
> > >> Commons. I thought that "Articles, templates, media/file descriptions,
> > >> and primary meta-pages" XML dump would contain that, given the
> > >> "media/file descriptions" part, but it seems this is not the case. Is
> > >> there a dump which contains that information? And what is "media/file
> > >> descriptions" then? Wiki pages of files?
> > >>
> > >> [1] https://www.mediawiki.org/wiki/API:Imageinfo
> > >>
> > >>
> > >> Mitar
> > >>
> > >> --
> > >> http://mitar.tnode.com/
> > >> https://twitter.com/mitar_m
> > >> ___
> > >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
> > >> To unsubscribe send an email to
> xmldatadumps-l-le...@lists.wikimedia.org
> >
> >
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
>
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org


[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-03 Thread Mitar
Hi!

If it is useful to anyone else, I have added to my library [1] in Go
for processing dumps support for processing SQL dumps directly,
without having to load them into a database. So one can process them
directly to extract data, like dumps in other formats.

[1] https://gitlab.com/tozd/go/mediawiki


Mitar

On Thu, Feb 3, 2022 at 9:13 AM Mitar  wrote:
>
> Hi!
>
> I see. Thanks.
>
>
> Mitar
>
> On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF  wrote:
> >
> > The media/file descriptions contained in the dump are the wikitext of the 
> > revisions of pages with the File: prefix, plus the metadata about those 
> > pages and revisions (user that made the edit, timestamp of edit, edit 
> > comment, and so on).
> >
> > Width and hieght of the image, the media type, the sha1 of the image and a 
> > few other details can be obtained by looking at the image.sql.gz file 
> > available for download for the dumps for each wiki. Have a look at 
> > https://www.mediawiki.org/wiki/Manual:Image_table for more info.
> >
> > Hope that helps!
> >
> > Ariel Glenn
> >
> >
> >
> > On Wed, Feb 2, 2022 at 10:45 PM Mitar  wrote:
> >>
> >> Hi!
> >>
> >> I am trying to find a dump of all imageinfo data [1] for all files on
> >> Commons. I thought that "Articles, templates, media/file descriptions,
> >> and primary meta-pages" XML dump would contain that, given the
> >> "media/file descriptions" part, but it seems this is not the case. Is
> >> there a dump which contains that information? And what is "media/file
> >> descriptions" then? Wiki pages of files?
> >>
> >> [1] https://www.mediawiki.org/wiki/API:Imageinfo
> >>
> >>
> >> Mitar
> >>
> >> --
> >> http://mitar.tnode.com/
> >> https://twitter.com/mitar_m
> >> ___
> >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
> >> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org


[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-03 Thread Mitar
Hi!

I see. Thanks.


Mitar

On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF  wrote:
>
> The media/file descriptions contained in the dump are the wikitext of the 
> revisions of pages with the File: prefix, plus the metadata about those pages 
> and revisions (user that made the edit, timestamp of edit, edit comment, and 
> so on).
>
> Width and hieght of the image, the media type, the sha1 of the image and a 
> few other details can be obtained by looking at the image.sql.gz file 
> available for download for the dumps for each wiki. Have a look at 
> https://www.mediawiki.org/wiki/Manual:Image_table for more info.
>
> Hope that helps!
>
> Ariel Glenn
>
>
>
> On Wed, Feb 2, 2022 at 10:45 PM Mitar  wrote:
>>
>> Hi!
>>
>> I am trying to find a dump of all imageinfo data [1] for all files on
>> Commons. I thought that "Articles, templates, media/file descriptions,
>> and primary meta-pages" XML dump would contain that, given the
>> "media/file descriptions" part, but it seems this is not the case. Is
>> there a dump which contains that information? And what is "media/file
>> descriptions" then? Wiki pages of files?
>>
>> [1] https://www.mediawiki.org/wiki/API:Imageinfo
>>
>>
>> Mitar
>>
>> --
>> http://mitar.tnode.com/
>> https://twitter.com/mitar_m
>> ___
>> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
>> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org


[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-02 Thread Ariel Glenn WMF
The media/file descriptions contained in the dump are the wikitext of the
revisions of pages with the File: prefix, plus the metadata about those
pages and revisions (user that made the edit, timestamp of edit, edit
comment, and so on).

Width and hieght of the image, the media type, the sha1 of the image and a
few other details can be obtained by looking at the image.sql.gz file
available for download for the dumps for each wiki. Have a look at
https://www.mediawiki.org/wiki/Manual:Image_table for more info.

Hope that helps!

Ariel Glenn



On Wed, Feb 2, 2022 at 10:45 PM Mitar  wrote:

> Hi!
>
> I am trying to find a dump of all imageinfo data [1] for all files on
> Commons. I thought that "Articles, templates, media/file descriptions,
> and primary meta-pages" XML dump would contain that, given the
> "media/file descriptions" part, but it seems this is not the case. Is
> there a dump which contains that information? And what is "media/file
> descriptions" then? Wiki pages of files?
>
> [1] https://www.mediawiki.org/wiki/API:Imageinfo
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
> ___
> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org
>
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org