On 2022-03-07 01:08:40 [+0200], Lasse Collin wrote:
> Hello!
Hi,

> I committed something. The liblzma part shouldn't need any big changes,
> I hope. There are a few FIXMEs but some of them might actually be fine
> as is. The xz side is just an initial commit, there isn't even
> --memlimit-threading option yet (I will add it).
> 
> Testing is welcome. It would be nice if someone who has 12-24 hardware
> threads could test if it scales well. One needs a file with like a
> hundred blocks, so with the default xz -6 that means a 2.5 gigabyte
> uncompressed file, smaller if one uses, for example, --block-size=8MiB
> when compressing.

I made
    Stream    Blocks      CompOffset    UncompOffset        CompSize      
UncompSize  Ratio  Check      Padding
         1       777               0               0   2.386.777.028  
19.540.326.400  0,122  CRC64            0

one block is 25.165.824.

32 cores:

| $ time ./src/xz/xz -tv tars.tar.xz -T0
| tars.tar.xz (1/1)
|   100 %      2.276,2 MiB / 18,2 GiB = 0,122   1,6 GiB/s       0:11            
 
| 
| real    0m11,162s
| user    5m44,108s
| sys     0m1,988s

256 cores:
| $ time ./src/xz/xz -tv tars.tar.xz -T0
| tars.tar.xz (1/1)
|   100 %      2.276,2 MiB / 18,2 GiB = 0,122   3,4 GiB/s       0:05            
 
| 
| real    0m5,403s
| user    4m0,298s
| sys     0m24,315s

it appears to work :) If I see this right, then the file is too small or
xz too fast but it does not appear that xz manages to create more than
100 threads.

and decompression to disk
| $ time ~bigeasy/xz/src/xz/xz -dvk tars.tar.xz -T0
| tars.tar.xz (1/1)
|   100 %      2.276,2 MiB / 18,2 GiB = 0,122   746 MiB/s       0:24            
 
| 
| real    0m25,064s
| user    3m49,175s
| sys     0m29,748s

appears to block at around 10 to 14 threads or so and then it hangs at the end
until disk I/O finishes. Decent.
Assuming disk I/O is slow, say 10MiB/s, and we would 388 CPUs (blocks/2)
then it would decompress the whole file into memory and stuck on disk
I/O?

In terms of scaling, xz -tv of that same file with with -T1…64:

| CPUS: 1
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 85 MiB/s, 3:38
| 
| real  3m38,047s
| user  3m37,404s
| sys   0m0,626s
| CPUS: 2
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 171 MiB/s, 1:49
| 
| real  1m49,296s
| user  3m41,529s
| sys   0m1,433s
| CPUS: 3
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 256 MiB/s, 1:12
| 
| real  1m12,832s
| user  3m40,929s
| sys   0m1,199s
| CPUS: 4
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 341 MiB/s, 0:54
| 
| real  0m54,616s
| user  3m40,596s
| sys   0m1,161s
| CPUS: 5
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 425 MiB/s, 0:43
| 
| real  0m43,900s
| user  3m41,306s
| sys   0m1,038s
| CPUS: 6
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 510 MiB/s, 0:36
| 
| real  0m36,587s
| user  3m41,527s
| sys   0m1,076s
| CPUS: 7
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 591 MiB/s, 0:31
| 
| real  0m31,568s
| user  3m41,559s
| sys   0m1,079s
| CPUS: 8
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 676 MiB/s, 0:27
| 
| real  0m27,579s
| user  3m42,098s
| sys   0m0,966s
| CPUS: 9
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 758 MiB/s, 0:24
| 
| real  0m24,614s
| user  3m42,318s
| sys   0m1,119s
| CPUS: 10
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 844 MiB/s, 0:22
| 
| real  0m22,111s
| user  3m41,353s
| sys   0m1,152s
| CPUS: 11
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 923 MiB/s, 0:20
| 
| real  0m20,219s
| user  3m43,327s
| sys   0m1,311s
| CPUS: 12
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,0 GiB/s, 0:18
| 
| real  0m18,442s
| user  3m41,710s
| sys   0m1,110s
| CPUS: 13
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,1 GiB/s, 0:17
| 
| real  0m17,067s
| user  3m42,102s
| sys   0m1,176s
| CPUS: 14
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,1 GiB/s, 0:15
| 
| real  0m15,861s
| user  3m41,978s
| sys   0m1,171s
| CPUS: 15
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,2 GiB/s, 0:14
| 
| real  0m14,866s
| user  3m42,247s
| sys   0m1,108s
| CPUS: 16
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,3 GiB/s, 0:13
| 
| real  0m13,936s
| user  3m41,086s
| sys   0m1,017s
| CPUS: 17
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,4 GiB/s, 0:13
| 
| real  0m13,200s
| user  3m42,171s
| sys   0m1,137s
| CPUS: 18
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,5 GiB/s, 0:12
| 
| real  0m12,539s
| user  3m43,286s
| sys   0m1,355s
| CPUS: 19
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,5 GiB/s, 0:11
| 
| real  0m11,949s
| user  3m44,354s
| sys   0m1,111s
| CPUS: 20
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,6 GiB/s, 0:11
| 
| real  0m11,216s
| user  3m42,635s
| sys   0m1,202s
| CPUS: 21
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,7 GiB/s, 0:10
| 
| real  0m10,655s
| user  3m41,742s
| sys   0m1,123s
| CPUS: 22
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,8 GiB/s, 0:10
| 
| real  0m10,232s
| user  3m42,328s
| sys   0m1,211s
| CPUS: 23
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,9 GiB/s, 0:09
| 
| real  0m9,812s
| user  3m42,091s
| sys   0m0,935s
| CPUS: 24
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,9 GiB/s, 0:09
| 
| real  0m9,448s
| user  3m42,343s
| sys   0m1,220s
| CPUS: 25
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,0 GiB/s, 0:09
| 
| real  0m9,099s
| user  3m42,985s
| sys   0m1,226s
| CPUS: 26
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,1 GiB/s, 0:08
| 
| real  0m8,750s
| user  3m43,389s
| sys   0m1,401s
| CPUS: 27
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,2 GiB/s, 0:08
| 
| real  0m8,444s
| user  3m43,105s
| sys   0m1,245s
| CPUS: 28
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,3 GiB/s, 0:08
| 
| real  0m8,119s
| user  3m43,075s
| sys   0m1,103s
| CPUS: 29
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,3 GiB/s, 0:07
| 
| real  0m7,850s
| user  3m43,279s
| sys   0m1,202s
| CPUS: 30
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,4 GiB/s, 0:07
| 
| real  0m7,601s
| user  3m43,112s
| sys   0m1,043s
| CPUS: 31
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,5 GiB/s, 0:07
| 
| real  0m7,381s
| user  3m43,070s
| sys   0m1,354s
| CPUS: 32
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,5 GiB/s, 0:07
| 
| real  0m7,241s
| user  3m44,362s
| sys   0m1,247s
| CPUS: 33
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,6 GiB/s, 0:06
| 
| real  0m7,027s
| user  3m44,586s
| sys   0m1,152s
| CPUS: 34
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,7 GiB/s, 0:06
| 
| real  0m6,822s
| user  3m44,385s
| sys   0m1,475s
| CPUS: 35
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,8 GiB/s, 0:06
| 
| real  0m6,637s
| user  3m44,306s
| sys   0m1,263s
| CPUS: 36
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,8 GiB/s, 0:06
| 
| real  0m6,479s
| user  3m45,268s
| sys   0m0,991s
| CPUS: 37
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,9 GiB/s, 0:06
| 
| real  0m6,336s
| user  3m45,405s
| sys   0m1,175s
| CPUS: 38
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,0 GiB/s, 0:06
| 
| real  0m6,183s
| user  3m45,455s
| sys   0m1,153s
| CPUS: 39
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,0 GiB/s, 0:05
| 
| real  0m6,021s
| user  3m45,547s
| sys   0m1,331s
| CPUS: 40
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,1 GiB/s, 0:05
| 
| real  0m5,902s
| user  3m45,937s
| sys   0m1,224s
| CPUS: 41
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,2 GiB/s, 0:05
| 
| real  0m5,772s
| user  3m46,520s
| sys   0m1,261s
| CPUS: 42
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,2 GiB/s, 0:05
| 
| real  0m5,650s
| user  3m46,616s
| sys   0m1,276s
| CPUS: 43
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,3 GiB/s, 0:05
| 
| real  0m5,545s
| user  3m46,671s
| sys   0m1,474s
| CPUS: 44
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,4 GiB/s, 0:05
| 
| real  0m5,429s
| user  3m46,988s
| sys   0m1,264s
| CPUS: 45
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,4 GiB/s, 0:05
| 
| real  0m5,338s
| user  3m46,985s
| sys   0m1,598s
| CPUS: 46
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,5 GiB/s, 0:05
| 
| real  0m5,248s
| user  3m47,202s
| sys   0m1,724s
| CPUS: 47
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,6 GiB/s, 0:05
| 
| real  0m5,138s
| user  3m47,641s
| sys   0m1,339s
| CPUS: 48
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,6 GiB/s, 0:05
| 
| real  0m5,054s
| user  3m48,088s
| sys   0m1,335s
| CPUS: 49
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,7 GiB/s, 0:04
| 
| real  0m4,981s
| user  3m48,815s
| sys   0m1,397s
| CPUS: 50
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,8 GiB/s, 0:04
| 
| real  0m4,890s
| user  3m48,999s
| sys   0m1,601s
| CPUS: 51
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,8 GiB/s, 0:04
| 
| real  0m4,786s
| user  3m48,623s
| sys   0m1,382s
| CPUS: 52
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,9 GiB/s, 0:04
| 
| real  0m4,720s
| user  3m49,048s
| sys   0m1,555s
| CPUS: 53
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,9 GiB/s, 0:04
| 
| real  0m4,658s
| user  3m49,990s
| sys   0m1,712s
| CPUS: 54
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,0 GiB/s, 0:04
| 
| real  0m4,603s
| user  3m52,079s
| sys   0m1,757s
| CPUS: 55
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,1 GiB/s, 0:04
| 
| real  0m4,485s
| user  3m50,508s
| sys   0m1,509s
| CPUS: 56
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,1 GiB/s, 0:04
| 
| real  0m4,444s
| user  3m51,148s
| sys   0m1,764s
| CPUS: 57
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,2 GiB/s, 0:04
| 
| real  0m4,381s
| user  3m51,783s
| sys   0m1,816s
| CPUS: 58
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,3 GiB/s, 0:04
| 
| real  0m4,306s
| user  3m51,901s
| sys   0m1,671s
| CPUS: 59
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,3 GiB/s, 0:04
| 
| real  0m4,250s
| user  3m51,997s
| sys   0m1,809s
| CPUS: 60
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,4 GiB/s, 0:04
| 
| real  0m4,199s
| user  3m52,443s
| sys   0m1,889s
| CPUS: 61
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,4 GiB/s, 0:04
| 
| real  0m4,168s
| user  3m53,326s
| sys   0m1,906s
| CPUS: 62
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,5 GiB/s, 0:04
| 
| real  0m4,114s
| user  3m52,766s
| sys   0m2,308s
| CPUS: 63
| tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,5 GiB/s, 0:04
| 
| real  0m4,074s
| user  3m53,676s
| sys   0m2,001s
| CPUS: 64
| tars.tar.xz: 2.272,9 MiB / 18,2 GiB = 0,122, 4,6 GiB/s, 0:03
| 
| real  0m4,023s
| user  3m53,527s
| sys   0m1,899s

time of 1 CPU / 64 = (3 * 60 + 38) / 64 = 3.40625

Looks okay.

> If the input is broken, it should produce as much output as the
> single-threaded stable version does. That is, if one thread detects an
> error, the data before that point is first flushed out before the error
> is reported. This has pros and cons. It would be easy to add a flag to
> allow switching to fast error reporting for applications that don't
> care about partial output from broken files.

I guess most of them don't care because an error is usually an abort,
the sooner, the better. It is probably the exception that you want
decompress it despite the error and maybe go on with the next block and
see what is left.

Sebastian

Reply via email to