If this comes out garbled again the answer is java 1.6 on 1.8Ghz windows xp
i get about 500,000 rows/second reading single character values via xerial jdbc > From: john.elr...@fenestra.com > Jason Freshwater wrote: > > 2. Read every record into my application and use a hashtable to > > do my own select distinct. This is pretty good, about 0.6s to > > achieve the distinct list (I can count each distinct value > > along the way which is a useful side effect). > > Would you mind my asking what language and what OS you are using? My > best performance tests have given ~10,000 rows per second on Windows > based machines, your results indicate two orders of magnitude better. > > John Sure, its worth just clarifying one thing the sample script I attached generates a ~1M row table, the sample data I refer to timings of is a 384,000 record csv file with 6 short fields e.g. ABCDEF,Z,A,X1,200.0,10.0 per record. So there is maybe a factor of 3 difference in amount of data I'm running java 1.6 on windows xp using the xerial jdbc driver which unpacks the native libraries for sqlite when it runs. 1.8GHz core 2 duo, with 4Gb ram (but none of this is memory bound) if you are interested, output from my test program is as follows I dont think i have made any silly errors of arithmetic - but you never know! I use stat.executeUpdate("pragma synchronous=OFF;"); stat.executeUpdate("pragma count_changes=OFF;"); stat.executeUpdate("pragma journal_mode=OFF;"); stat.executeUpdate("pragma read_uncomitted=OFF;"); stat.executeUpdate("pragma temp_store=MEMORY;"); stat.executeUpdate("pragma cache_size=1280;"); stat.executeUpdate("pragma page_size=4096;"); before doing anything numbers suggest about 500,000 rows/second reading (last result set below) --------- # read csv and manually insert via jdbc insert into sd values (?, ?, ?, ?, ?, ?) 100000 200000 300000 time for 384001 inserts: 6734 ms 57024.2 rows/second # run select distinct in db and read results select distinct row_ from sd ROW_ A B C D E F G H I J K L M N O P query time for 16 rows = 2531ms # this just reads the data from all columns# in the select and sums the length select rowid, row_ from sd order by row_ total data length 2576895 query time for 384000 rows = 3969ms # read the rows without sorting select row_ from sd total data length 384000 query time for 384000 rows = 563ms # distinct with count select row_,count(row_) from sd group by 1 ROW_ count(row_) A 24000 B 24000 C 24000 D 24000 E 24000 F 24000 G 24000 H 24000 I 24000 J 24000 K 24000 L 24000 M 24000 N 24000 O 24000 P 24000 query time for 16 rows = 2578ms # this is the hashmap distinct with the counts printed below D = 24000 O = 24000 K = 24000 H = 24000 C = 24000 P = 24000 B = 24000 L = 24000 G = 24000 I = 24000 M = 24000 A = 24000 F = 24000 J = 24000 N = 24000 E = 24000 query time for hash distinct: select row_ from sd: 384000 rows 593ms total time 14500ms _________________________________________________________________ Use Hotmail to send and receive mail from your different email accounts. http://clk.atdmt.com/UKM/go/167688463/direct/01/ _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users