[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API

2024-03-15 Thread Physikerwelt
Physikerwelt added a comment.


  In T287164#9194358 , 
@DD063520 wrote:
  
  > Hi, we have an extension for this:
  >
  > https://gitlab.the-qa-company.com/FrozenMink/batchingestionextension
  >
  > it is based on the ideas proposed by AddShore.
  
  Nice.
  
  (FYI) I had a similar import problem in 2020 when I was going to submit a 
paper, but the importer was too slow to finish before the deadline. Which I 
solved by writing my import script
  
  
https://phabricator.wikimedia.org/rEMASe12cd7a9d47a289a189f4283cfac5ff57588044b
  
  **handling of foreign keys**
  
  It has one feature I did not see for other importers, but I find it pretty 
helpful. Often, you have foreign keys in your data model. For example, for the 
StackExchange example from above users and posts. If you start with an empty 
wikibase, both users and posts will be empty. So you don't know the item IDs of 
the referenced wikibase items upfront. To handle that I created a `references` 
field in my datamodel, which will replace external foreign key ids with 
internal wikibase qids.
  
  I recently started to write new custom import PHP scripts, but I think using 
a batch rest API would be much more convenient as presented in the 
batchingestionextension.

TASK DETAIL
  https://phabricator.wikimedia.org/T287164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Physikerwelt
Cc: Physikerwelt, Lhaaitsma, Harej, Alan_Ang-WMDE, So9q, Eli, erik_s_paulson, 
georginaburnett-wmde, WMDE-leszek, DD063520, Afandian, Daniel_Mietchen, Tarrow, 
aidhog, johanricher, Addshore, Masssly, danshick-wmde, Thadguidry, Aklapper, 
RShigapov, Ullasoff, RickiJay-WMDE, Danny_Benjafield_WMDE, roti_WMDE, jdfraine, 
Astuthiodit_1, Eposthumus, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API

2024-03-15 Thread Physikerwelt
Physikerwelt added a project: NFDI.

TASK DETAIL
  https://phabricator.wikimedia.org/T287164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Physikerwelt
Cc: Lhaaitsma, Harej, Alan_Ang-WMDE, So9q, Eli, erik_s_paulson, 
georginaburnett-wmde, WMDE-leszek, DD063520, Afandian, Daniel_Mietchen, Tarrow, 
aidhog, johanricher, Addshore, Masssly, danshick-wmde, Thadguidry, Aklapper, 
RShigapov, Ullasoff, RickiJay-WMDE, Danny_Benjafield_WMDE, roti_WMDE, jdfraine, 
Astuthiodit_1, Eposthumus, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API

2024-01-03 Thread darthmon_wmde
darthmon_wmde added a project: Product-Feature.

TASK DETAIL
  https://phabricator.wikimedia.org/T287164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: darthmon_wmde
Cc: Lhaaitsma, Harej, Alan_Ang-WMDE, So9q, Eli, erik_s_paulson, 
georginaburnett-wmde, WMDE-leszek, DD063520, Afandian, Daniel_Mietchen, Tarrow, 
aidhog, johanricher, Addshore, Masssly, danshick-wmde, Thadguidry, Aklapper, 
RShigapov, RickiJay-WMDE, Danny_Benjafield_WMDE, roti_WMDE, jdfraine, 
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API

2024-01-02 Thread darthmon_wmde
darthmon_wmde added a project: Wikibase Suite Team.

TASK DETAIL
  https://phabricator.wikimedia.org/T287164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: darthmon_wmde
Cc: Lhaaitsma, Harej, Alan_Ang-WMDE, So9q, Eli, erik_s_paulson, 
georginaburnett-wmde, WMDE-leszek, DD063520, Afandian, Daniel_Mietchen, Tarrow, 
aidhog, johanricher, Addshore, Masssly, danshick-wmde, Thadguidry, Aklapper, 
RShigapov, RickiJay-WMDE, Danny_Benjafield_WMDE, roti_WMDE, jdfraine, 
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API

2023-09-25 Thread DD063520
DD063520 added a comment.


  Hi, we have an extension for this:
  
  https://gitlab.the-qa-company.com/FrozenMink/batchingestionextension
  
  it is based on the ideas proposed by AddShore.

TASK DETAIL
  https://phabricator.wikimedia.org/T287164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: DD063520
Cc: Lhaaitsma, Harej, Alan_Ang-WMDE, So9q, Eli, erik_s_paulson, 
georginaburnett-wmde, WMDE-leszek, DD063520, Afandian, Daniel_Mietchen, Tarrow, 
aidhog, johanricher, Addshore, Masssly, danshick-wmde, Thadguidry, Aklapper, 
RShigapov, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, 
maantietaja, ItamarWMDE, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API

2023-09-25 Thread Lhaaitsma
Lhaaitsma added a comment.


  Hi all, I'm also interested in this item's progress. Is anyone still working 
on this?

TASK DETAIL
  https://phabricator.wikimedia.org/T287164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lhaaitsma
Cc: Lhaaitsma, Harej, Alan_Ang-WMDE, So9q, Eli, erik_s_paulson, 
georginaburnett-wmde, WMDE-leszek, DD063520, Afandian, Daniel_Mietchen, Tarrow, 
aidhog, johanricher, Addshore, Masssly, danshick-wmde, Thadguidry, Aklapper, 
RShigapov, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, 
maantietaja, ItamarWMDE, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API

2022-06-22 Thread So9q
So9q added a comment.


  Any news on this? Is something hindering it from being triaged?

TASK DETAIL
  https://phabricator.wikimedia.org/T287164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: So9q
Cc: So9q, Eli, erik_s_paulson, georginaburnett-wmde, WMDE-leszek, DD063520, 
Afandian, Daniel_Mietchen, Tarrow, aidhog, johanricher, Addshore, Masssly, 
danshick-wmde, Thadguidry, Aklapper, RShigapov, Astuthiodit_1, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, darthmon_wmde, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API

2021-11-30 Thread Addshore
Addshore added a comment.


  > Dragan Espenschied: The REST API is probably not going to end up calling 
different PHP for doing the actual API work than the "action API" I assume?
  
  The topic of improve bulk imports and make editing / importaing / data load 
faster are likely 2 separate concerns here.
  The REST API will probably call some different code but the majority of this 
core part of editing would likley stay the same.
  
  > So if an improvement was made on the action API level before the REST API 
was ready, the REST API would end up getting it  too?
  
  Yes

TASK DETAIL
  https://phabricator.wikimedia.org/T287164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: georginaburnett-wmde, WMDE-leszek, DD063520, Afandian, Daniel_Mietchen, 
Tarrow, aidhog, johanricher, Addshore, Masssly, danshick-wmde, Thadguidry, 
Aklapper, RShigapov, Invadibot, maantietaja, Akuckartz, darthmon_wmde, Nandana, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API

2021-11-04 Thread RShigapov
RShigapov added a comment.


  At WikidataCon2021 we had Open meeting of the Wikibase Stakeholder Group and 
interactive roadmapping session . 
During that session we worked interactively on roadmap miro board. I copy and 
paste the discussion about this ticket:
  
Anonymous: How can we push this? Can this really be done without Wikimedia 
Deutschalnd

Adam Shorland: I'd like to think that most of the problems here in some way 
can be worked on without WMDE, but collaboration there would always be needed.

The question around API performance, is are we going for small 
improvements, or waiting for the rewrite to REST

Renat Shigapov: When the REST API will be ready?

Adam Shorland: Though designed and feedback gathered, work has not started 
on implementation yet.

Dragan Espenschied: The REST API is probably not going to end up calling 
different PHP for doing the actual API work than the "action API" I assume? So 
if an improvement was made on the action API level before the REST API was 
ready, the REST API would end up getting it  too?

TASK DETAIL
  https://phabricator.wikimedia.org/T287164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RShigapov
Cc: WMDE-leszek, DD063520, Afandian, Daniel_Mietchen, Tarrow, aidhog, 
johanricher, Addshore, Masssly, danshick-wmde, Thadguidry, Aklapper, RShigapov, 
Invadibot, maantietaja, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API

2021-10-14 Thread RShigapov
RShigapov added a comment.


  Hi folks! Any suggestions how to move this a bit forward? It seems we have 
not even agreed on a dataset for testing, not saying about the rest.

TASK DETAIL
  https://phabricator.wikimedia.org/T287164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RShigapov
Cc: DD063520, Afandian, Daniel_Mietchen, Tarrow, aidhog, johanricher, Addshore, 
Masssly, danshick-wmde, Thadguidry, Aklapper, RShigapov, Invadibot, 
maantietaja, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API

2021-08-10 Thread Thadguidry
Thadguidry added a comment.


  Hi @aidhog Aidan in my opinion I would say "NO, not a good test-case for this 
need".  And the only reason is this... it's ASCII only (chars <128) and doesn't 
let us unsure proper load handling for all data in all languages, multilingual 
data (ASCII > 128) such as UTF-8, etc.
  DBLP.xml is however a great test-case for any SAX parser as I can see in it's 
PDF https://dblp.uni-trier.de/xml/docu/dblpxml.pdf
  
  We ideally need to find a CC-0 public domain data set (or even create or 
generate one) in UTF-8 in both JSON and RDF/XML.  Leaving out CSV for now, 
since pre-processing of CSV files into JSON records or RDF/XML is best in other 
tools that more easily can handle those conversions.
  
  Something like the British National Library's Linked Open Data - Serials LOD 
samples file 
https://www.bl.uk/bibliographic/downloads/BNBLODSerials_sample_rdf.zip (or the 
full file 
https://www.bl.uk/bibliographic/downloads/BNBLODSerials_202106_rdf.zip) 
available here https://www.bl.uk/collection-metadata/downloads#

TASK DETAIL
  https://phabricator.wikimedia.org/T287164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Thadguidry
Cc: Afandian, Daniel_Mietchen, Tarrow, aidhog, johanricher, Addshore, Masssly, 
danshick-wmde, Thadguidry, Aklapper, RShigapov, Invadibot, maantietaja, 
Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API

2021-08-10 Thread aidhog
aidhog added a comment.


  Regarding the dataset, we are working on inserting DBLP into Wikibase. It 
might be a good test-case? The scale is sufficient to be a challenge without 
being overwhelming, and the dumps are available here 
.
  
  A downside is that it's mostly monolingual (mostly English labels, but in the 
case of papers in other languages, to the best of my knowledge, there is no 
translation, and no indication of language).
  
  Another option, of course, would be to use self-contained extracts of 
Wikidata for testing. :)

TASK DETAIL
  https://phabricator.wikimedia.org/T287164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: aidhog
Cc: Afandian, Daniel_Mietchen, Tarrow, aidhog, johanricher, Addshore, Masssly, 
danshick-wmde, Thadguidry, Aklapper, RShigapov, Invadibot, maantietaja, 
Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API

2021-08-04 Thread RShigapov
RShigapov updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T287164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RShigapov
Cc: Afandian, Daniel_Mietchen, Tarrow, aidhog, johanricher, Addshore, Masssly, 
danshick-wmde, Thadguidry, Aklapper, RShigapov, Invadibot, maantietaja, 
Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API

2021-07-28 Thread RShigapov
RShigapov added a comment.


  Regarding the dataset for testing: it would be good to formulate some 
requirements for it. Then we could just create a synthetic dataset. For 
example, in performance analysis 
 with 
RaiseWikibase I used randomly generated strings of fixed length 
 
and different number of claims. I did it for one datatype only (`string`). 
Probably in the public dataset for this ticket more datatypes have to be used.

TASK DETAIL
  https://phabricator.wikimedia.org/T287164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RShigapov
Cc: aidhog, johanricher, Addshore, Masssly, danshick-wmde, Thadguidry, 
Aklapper, RShigapov, Invadibot, maantietaja, Akuckartz, darthmon_wmde, Nandana, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API

2021-07-27 Thread Maintenance_bot
Maintenance_bot added a project: Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T287164

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Maintenance_bot
Cc: aidhog, johanricher, Addshore, Masssly, danshick-wmde, Thadguidry, 
Aklapper, RShigapov, Invadibot, maantietaja, Akuckartz, darthmon_wmde, Nandana, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org