Neunhoef added a subscriber: Neunhoef. Neunhoef added a comment. This is a report about an actual experiment.
I downloaded 20150126.json.gz to an AWS r3.2xlarge instance (61GB RAM, 80 GB SSD, 8 vCPUs) and then used ArangoDB V 2.4.1 do do the import of the first 2961954 documents in the file. I created the same indexes as you and I used this command: gunzip -c 20150126.json.gz | grep -v '^\[' | sed -e 's/},$/}/g' | head -n 2961954 | time arangoimp --file - --type json --collection wikidata --overwrite true Here is the usage statistics: - The import took 15 minutes on that machine. - The database used at most 10.0 GB resident memory during the import. - After the WAL was flushed after the actual input it used 9.2 GB. - This comes from: - 6.811.237.720 bytes actual data (shaped), this is about 2300 b/document - 489.390.288 bytes of shape data, this is about 165 b/document - 672.169.048 bytes of data for all four indexes together, this is about 226 b/document - The actual data files on disk (data+shapes without indexes): 7.280.736.336 bytes, which is about 2692 b/document For comparison: The raw (unzipped) JSON data for these documents were 7.390.007.296 bytes (as reported by arangoimp). When I shut down the database server and restart it with loading this collection (which rebuilds the 4 indexes in memory), this takes about 263 minutes, which is well below 10 minutes. Explicit unloading is pretty fast. Reloading the collection in the still running server takes about as long. TASK DETAIL https://phabricator.wikimedia.org/T88549 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, Neunhoef Cc: Neunhoef, Fceller, JanZerebecki, Aklapper, Manybubbles, jkroll, Smalyshev, Wikidata-bugs, aude, GWicke, daniel _______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs