[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-05-09 Thread dr0ptp4kt
dr0ptp4kt added a comment. On the gaming-class 2018 desktop, although the `bufferCapacity` value at 10**0** sped things up as described on this here ticket, application of the CPU governor change did not seem to have any additional bearing (it took 2.47 days as compared to its previous

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-05-09 Thread dr0ptp4kt
dr0ptp4kt added a comment. And for the second run in T362920: Benchmark Blazegraph import with increased buffer capacity (and other factors) we saw that this took about 3089 minutes, or about 2.**15** days, for the scholarly article entity graph

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-05-06 Thread dr0ptp4kt
dr0ptp4kt added a comment. In T362920: Benchmark Blazegraph import with increased buffer capacity (and other factors) we saw that this took about 3702 minutes, or about 2.57 hours, for the scholarly article entity with the CPU governor change

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-04-10 Thread dr0ptp4kt
dr0ptp4kt added a comment. Good news. With the N-triples style scholarly entity graph files, with a buffer capacity of 10**0**, a write retention queue capacity of 4000, and a heap size of 31g, on the gaming-class desktop, it took about 2.40 days. Recall that with buffer capacity of

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-04-08 Thread dr0ptp4kt
dr0ptp4kt added a comment. Update: With the buffer capacity at 10**0**, file number 550 of the scholarly graph was imported as of `Mon Apr 8 03:22:08 PM CDT 2024` . So, under 28 hours so far (buffer capacity at 10 was more than 36 hours). Processing

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-04-07 Thread dr0ptp4kt
dr0ptp4kt added a comment. With bufferCapacity at 10**0**, kicked it off again with the scholarly article entity graph files: ubuntu22:~/rdf/dist/target/service-0.3.138-SNAPSHOT$ date | tee loadData.log; time ./loadData.sh -n wdq -d /mnt/firehose/split_0/nt_wd_schol -s 0 -e 0

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-04-07 Thread dr0ptp4kt
dr0ptp4kt added a comment. Update. On the gaming-class machine it took about 3.25 days to import the scholarly article entity graph, using a buffer capacity of 10 (compare this with 5.875 days on wdqs1024 ). This resulted in

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-04-05 Thread dr0ptp4kt
dr0ptp4kt added a comment. Just updating on how far along this run is, file 550 of the scholarly article entity side of the graph is being processed. There are files 0 through 1023 for this side of the graph. Note that I did think to `tee` output this time around so that generally/hopefully

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-04-04 Thread dr0ptp4kt
dr0ptp4kt added a comment. Following roughly the procedure in P54284 to rename the Spark-produced graph files (and updating `loadData.sh` with `FORMAT=part-%05d-46f26ac6-0b21-4832-be79-d7c8709f33fb-c000.ttl.gz` and still having a `date` call after

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-04-03 Thread dr0ptp4kt
dr0ptp4kt added a comment. This morning of April 3 around 6:25 AM I had SSH'd to check progress, and it was working, but going slowly, similar to the day before. It was on a file number in the 1200s, but I didn't write down the number or copy terminal output; I do remember seeing it was

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-04-02 Thread dr0ptp4kt
dr0ptp4kt added a comment. Now this is interesting: we're now past 4 days (about 4 days and 1 hour) of this running, and with buffer capacity at 10 instead of 10**0** (but this time without any gap between the batches of files), there's still a good way to go yet. Processing

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-04-01 Thread dr0ptp4kt
dr0ptp4kt added a comment. The run with with buffer at 10**0** and heap size at 31g and queue capacity at 4000 on the gaming-class desktop completed. Processing wikidump-01332.ttl.gz http://www.w3.org/TR/html4/loose.dtd;>blazegraph by SYSTAPtotalElapsed=13580ms,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-21 Thread dr0ptp4kt
dr0ptp4kt added a comment. **AWS EC2 servers** After exploring a battery of EC2 servers, four instance types were selected and the commands posted were run. The configuration most like our `wdqs1021-1023` servers (third generation Intel Xeon) is listed first. The fastest option

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-21 Thread dr0ptp4kt
dr0ptp4kt added a comment. By the way, I'm attempting a run for the first 1332 munged files (one shy of the 1333 where terminated last time around) with buffer at 10**0** and heap size at 31g and queue capacity at 4000 on the gaming-class desktop to see whether this imports smoothly and

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-20 Thread dr0ptp4kt
dr0ptp4kt added a comment. The run to check with heap size of 31g, queue capacity of 8000, and buffer at 10**0** stalled at file 107. TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To:

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-20 Thread dr0ptp4kt
dr0ptp4kt added a comment. Attempting a run with a **queue capacity of 8000** and buffer of 10**0** and heap size of 16g on the gaming-class desktop to mimic the MacBook Pro, things were slower than a queue capacity of 4000 and buffer of 100 and heap size of 31g on the gaming-class

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-19 Thread dr0ptp4kt
dr0ptp4kt added a comment. **About Amazon Neptune** Amazon Neptune was set to import using the simpler N-Triples file format with its serverless configuration at 128 NCUs (about 256 GB of RAM with some attendant CPU). We don't use N-Triples files in our existing import process, but it

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-19 Thread dr0ptp4kt
dr0ptp4kt added a comment. **Going for the full import** Further import commenced from there with a `bufferCapacity` of 10**0**: ubuntu22:~/rdf/dist/target/service-0.3.138-SNAPSHOT$ date Mon Mar 4 06:31:06 PM CST 2024

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-19 Thread dr0ptp4kt
dr0ptp4kt added a comment. **More about bufferCapacity** Similarly, with 150 munged files, was attempted with the buffer in RWStore.properties increased from 10 to 10**0** with the target as the NVMe. com.bigdata.rdf.sail.bufferCapacity=100

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-19 Thread dr0ptp4kt
dr0ptp4kt added a comment. **More about NVMe versus SSD** Runs were also done to see the effects on 150 munged files (out of a set of 2202 files) from the full Wikidata import, which allows for exercising more disk related pieces. This was tried with both types of target disk - SATA SSD

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-19 Thread bking
bking closed subtask T358727: Reclaim recently-decommed CP host for WDQS (see T352253) as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt, bking Cc: ssingh, bking, dr0ptp4kt,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-08 Thread bking
bking added a comment. @ssingh @dr0ptp4kt hold up on the testing for on your hosts for now...we might be able to get an NVMe into this year's budget, will let you know. @dr0ptp4kt If you want to run i/o tests on the existing hosts, I recommend the approach detailed in this wikitech page

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-08 Thread dr0ptp4kt
dr0ptp4kt added a subscriber: ssingh. dr0ptp4kt added a comment. @ssingh would you mind if the following command is run on one of the newer cp hosts with a new higher write throughput NVMe? If so, got a recommended node? I don't have access, but I think @bking may. `sudo sync; sudo

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-08 Thread dr0ptp4kt
dr0ptp4kt added a comment. Thanks @bking ! It looks like the NVMe in this one is not a higher speed one for writes, and I'm also wondering if perhaps its write performance has degraded with age. I'll paste in the results here, but this was slower than the other servers, ironically (although

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-07 Thread Maintenance_bot
Maintenance_bot removed a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt, Maintenance_bot Cc: bking, dr0ptp4kt, Aklapper, Danny_Benjafield_WMDE,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-07 Thread bking
bking added a comment. @dr0ptp4kt `wdqs1025` should be ready for your I/O tests. Let us know how it goes! TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt, bking Cc: bking,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-07 Thread gerritbot
gerritbot added a comment. Change 1009574 **merged** by Bking: [operations/puppet@production] wdqs: move monitoring logic into role declaration https://gerrit.wikimedia.org/r/1009574 TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-07 Thread gerritbot
gerritbot added a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt, gerritbot Cc: dr0ptp4kt, Aklapper, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-07 Thread gerritbot
gerritbot added a comment. Change 1009574 had a related patch set uploaded (by Bking; author: Bking): [operations/puppet@production] wdqs: make "monitoring_tier" var optional https://gerrit.wikimedia.org/r/1009574 TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-07 Thread dr0ptp4kt
dr0ptp4kt updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt Cc: dr0ptp4kt, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-07 Thread dr0ptp4kt
dr0ptp4kt updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt Cc: dr0ptp4kt, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-07 Thread dr0ptp4kt
dr0ptp4kt added a comment. First, adding some commands that were used for Blazegraph imports on Ubuntu 22.04. I had originally tried a good number of EC2 instance types, and then after that went back to focus on just four of them with a sequence of repeatable commands (this wasn't scripted,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-06 Thread dr0ptp4kt
dr0ptp4kt updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt Cc: dr0ptp4kt, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-06 Thread bking
bking reopened subtask T358727: Reclaim recently-decommed CP host for WDQS (see T352253) as Open. TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt, bking Cc: dr0ptp4kt, Aklapper,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-05 Thread Maintenance_bot
Maintenance_bot removed a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt, Maintenance_bot Cc: dr0ptp4kt, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-05 Thread gerritbot
gerritbot added a comment. Change rOPUP100894305575 **merged** by Bking: [operations/puppet@production] partman: configure wdqs1025 partioning https://gerrit.wikimedia.org/r/1008943 TASK DETAIL

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-05 Thread gerritbot
gerritbot added a comment. Change rOPUP100894305575 had a related patch set uploaded (by Bking; author: Bking): [operations/puppet@production] partman: configure wdqs1025 partioning

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-05 Thread gerritbot
gerritbot added a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt, gerritbot Cc: dr0ptp4kt, Aklapper, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-05 Thread bking
bking added a subtask: T358727: Reclaim recently-decommed CP host for WDQS (see T352253). TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt, bking Cc: dr0ptp4kt, Aklapper,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-04 Thread dr0ptp4kt
dr0ptp4kt updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt Cc: dr0ptp4kt, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-04 Thread dr0ptp4kt
dr0ptp4kt moved this task from Incoming to Current work on the Wikidata-Query-Service board. dr0ptp4kt removed a project: Wikidata-Query-Service. TASK DETAIL https://phabricator.wikimedia.org/T359062 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-04 Thread dr0ptp4kt
dr0ptp4kt updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt Cc: dr0ptp4kt, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-04 Thread dr0ptp4kt
dr0ptp4kt updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt Cc: dr0ptp4kt, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-04 Thread Maintenance_bot
Maintenance_bot added a project: Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt, Maintenance_bot Cc: dr0ptp4kt, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen,

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-04 Thread dr0ptp4kt
dr0ptp4kt changed the task status from "Open" to "In Progress". dr0ptp4kt triaged this task as "Medium" priority. dr0ptp4kt claimed this task. dr0ptp4kt added projects: Wikidata-Query-Service, Discovery-Search (Current work). dr0ptp4kt updated the task description. TASK DETAIL