Neunhoef added a subscriber: Neunhoef.
Neunhoef added a comment.

This is a report about an actual experiment.

I downloaded 20150126.json.gz to an AWS r3.2xlarge instance (61GB RAM, 80 GB 
SSD, 8 vCPUs) and then used ArangoDB V 2.4.1 do do the import of the first 
2961954 documents in the file. I created the same indexes as you and I used 
this command:

  gunzip -c 20150126.json.gz | grep -v '^\[' | sed -e 's/},$/}/g' | head -n 
2961954 | time arangoimp --file - --type json --collection wikidata --overwrite 
true

Here is the usage statistics:

- The import took 15 minutes on that machine.
- The database used at most 10.0 GB resident memory during the import.
- After the WAL was flushed after the actual input it used 9.2 GB.
- This comes from:
  - 6.811.237.720 bytes actual data (shaped), this is about 2300 b/document
  - 489.390.288 bytes of shape data, this is about 165 b/document
  - 672.169.048 bytes of data for all four indexes together, this is about 226 
b/document
- The actual data files on disk (data+shapes without indexes): 7.280.736.336 
bytes, which is about 2692 b/document

For comparison: The raw (unzipped) JSON data for these documents were 
7.390.007.296 bytes (as reported by arangoimp).

When I shut down the database server and restart it with loading this 
collection (which rebuilds the 4 indexes in memory), this takes about 263 
minutes, which is well below 10 minutes. Explicit unloading is pretty fast. 
Reloading the collection in the still running server takes about as long.


TASK DETAIL
  https://phabricator.wikimedia.org/T88549

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, Neunhoef
Cc: Neunhoef, Fceller, JanZerebecki, Aklapper, Manybubbles, jkroll, Smalyshev, 
Wikidata-bugs, aude, GWicke, daniel



_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to