@Araq - fabulous. I think a B-Tree in the stdlib would be great.
@mratsim - I think you meant `collections.intsets` with a 't' which does _not_
yield items() in key order.
The built-in `set[up to int16]` type _does_ (implicitly) order its keys,
though. @cdome also did not mention how large the
@Tiberium, in the `csources/build.sh` context you should only need to remove
the `-w` and change the `-fno-strict-aliasing` in `COMP_FLAGS` since it's just
a shell script with a zillion gcc invocations. Compiling other nim code, I only
had to change my nim.cfg's `gcc.options.always = "-Wall 2>>/
@lightness1024 - good point. Another argument in favor of "KeyOrderedTable"
would be that it would probably show up in searches of the more generic
"OrderedTable". I don't think Nim has a key-ordered collection (EDIT - in the
stdlib) right now, but Araq has said he has a B-Tree waiting in the wi
Patching the stdlib to use concepts where it makes sense instead of `openArray`
is also a fine idea. It's really used a lot -- (`algorithms`, `strutils`,
`sequtils`, `random`, `stats`, etc.). `Iterable` & `Indexable` concepts make
sense to me as a decomposition for all those interface use-cases.
One other wrinkle more in line with this thread topic is making things work for
general object types but specific standard key types. In C one might handle
that by taking an `offsetof` for where they key field is within an object and
`sizeof` for the size of object in the array. In Nim, one coul
@aedt - with that C program I posted and just changing the numbers, I see no
change in behavior at all:
N fib(N) log_2(Fib(N)) WallTimeUsrCPU log_1.618(Tm)
57 365435256832 38.410825 88.338572 88.29 9.31268150798
56 225851408384 37.716583 54.570624 54.
If zeroed memory is required semantically, then it's better to compare against
`calloc` not `malloc` in an apples-to-apples sense.
Also, at least on Linux on you need to be careful in benchmarking this sort of
thing. As you get to larger allocations the OS can actually give the process
many lin
(I should perhaps qualify beyond -d:release using gcc or clang with high
optimization levels on the back end since jil210 revealed in [another
thread](https://forum.nim-lang.org/t/3198) the baseline 5X performance mystery
arose from using tcc as a back end.)
Also, for the curious, the reason C+
Some greps can indeed be quite good.
Really, I think this algorithmic intelligence should be able to be just used
from Nim `strutils`. Nim strings know their length. Many/most of those
read-only procs like `strutils.find` "should" be able to work on types like
`MemSlice` or really some kind of
It really shouldn't _need_ to break anything, though mistakes can always be
made in some initial pass.
I would think you might also like long option key normalization so that one
doesn't have to remember when typing commands `--how-toSpell_OptKeys-exactly`.
Entering commands is often more ad h
Also, in terms of optimization just replacing:
for str in iter(): file.writeLine(str)
with
for str in iter(): file.write(str & "\n")
yielded a 1.3X speed-up (I.e., it took the time down to 0.14 seconds).
In case anyone else is curious about aedt's fibonacci benchmarks, I found an
explanation for the approx 6..8X time difference of C vs Nim (with gcc as a
backend, on Linux). It took digging into assembly to figure it out, though.
With the Nim version, the nested calls/various module boilerplate a
Thanks for the compliments. You should probably learn about backward shift
deletion. It's a little tricky the first time you see it, but ultimately it's
better than tombstones and a bunch of ad hoc garbage collection rules. You can
look at `lib/pure/collections/tables.nim`. There are really only
Nice, boia01! In my same `/usr/share/nim/**.nim` test I get 768 microseconds
for your version and 2080us for just doing the memSlices approach..So, 2.7X
speed up, a bit less than the 4X I saw when I last compared the approach in two
C versions..maybe unrolling. Dunno.
@alfrednewman - if the sta
@Stefan_Salewski - I think this idea of having a family of overloaded procs to
sort on standard types is very good. I have almost raised it myself several
times.
It would be useful for `cmp`-less `sort` procs on such type to be _stable_
sorts (meaning elements with identical keys retain relativ
Looking at the implementation of strutils.unescape, it seems to only interpret
the xHH syntax that escape outputs. Of course, the compiler proper is often
changing those raw/quoted string forms into special characters. So, maybe there
is some other approach/trick that could work..sort of like an
So, building on the 40% or so of the benchmarks def did, I rounded out the
other 60% and gave a PR to his repo. With more like hours than years of my time
per program profiling/tweaking/optimizing, I got the Nim implementations to
mostly parity with the top performers (Nim was in the top 3..6 im
Fair point about the LLVM backends, though I don't know their level of C struct
compatibility or if they can lever internal LLVM apis to get that. Anyway, just
trying to help. Code complexity jello (squeeze one place, watch others expand)
is very surely not an exact science.
You've probably al
Garry: --opt:size (and things like gcc -Os) instructs compilers to choose
trade-offs to make object code smaller rather than faster. E.g., on Oderwat's
code, I get 0.19 seconds with just -d:release (with gcc-6.1 on Linux x86_64,
i7-6700k @4.5GHz). If I additionally specify --opt:size the time ri
You can also put in [any of your
nim.cfgs](https://nim-lang.org/docs/nimc.html#compiler-usage-configuration-files)
hint[Processing]=off
You can also disable other categories of diagnostics besides "Processing"
similarly.
Yeah...I thought about \n higher up but didn't want to change the semantics too
much. With that change my time goes down to 0.10 seconds (almost a full 3X
faster than the original 0.28 --opt:size version on the same machine). About
55..60% of that 0.10 seconds is all just libc fwrite stuff. The
I get the same perfect line with the Nim fib(47..57) as well. { Well, slightly
slower: log_1.618(wall) =~ N - 47.35. So, 1.618**(47.67-47.35) = 1.16 or around
16% slower. }
@lightness1024 - fair criticism of C++ STL.
modulo prime reduction of the hash code to a table address/array index is slow
because division is slow. In my tests using modulo can sometimes **triple** the
entire lookup time for simple integer keys due to the slowness of division! I
recommend you
`collections.heapqueue` does mostly what you want. You would have to remove
duplicates yourself, like below, but probably with an iterator called something
like `unique`.
import collections.heapqueue
var heap = newHeapQueue[int]()
let data = [1, 3, 5, 1, 7, 9, 2, 4, 6,
This may be obvious, but has anyone else tried changing `-fno-strict-aliasing`
to `-fstrict-aliasing -Wstrict-aliasing` in their `nim.cfg` and seeing if any
gcc warnings arise? When I try this on Linux with gcc-6.3 and gcc-7.2 and devel
nim I don't see any warnings at all. You may also need to t
It sounds like you will have many regular files (i.e., random access/seekable
inputs as opposed to things like Unix pipes). On Linux with glibc,
memfiles.open is probably the fastest approach which uses memchr internally to
find line boundaries. E.g. (right from memfiles documentation),
+1 on the feature. I also sometimes re-use buffers that would benefit from this.
I also got `nim c` 's code running twice as fast as `nim cpp` 's code. See some
previous discussion:
[https://forum.nim-lang.org/t/1779](https://forum.nim-lang.org/t/1779) \- there
are cases where Nim's generated C code "inspires" the C compiler to "unroll the
last few recursions" and save 4x o
I am not quite sure what "original" benchmark you mean - perhaps the 4.6 s for
Nim vs 6.0 s for C on the original github page at the top of this thread. That
may be small scale code layout/caching/branch predictor effects as you say.
However, the difference in this thread of 8X is _almost certai
No `gcc` doesn't -- at least not for me with a factor of about 7X, almost
identical to @adrianv 's results. I used both an earlier gcc-7 and a gcc-8.2
just now.
If you don't want to disassemble your binary, that's ok. You can check this
with `gcc -pg` and `gprof` which will count calls for you.
It might be a lot of things, but I'm telling you - there is a `gcc`
optimization, very finicky about context (and probably compiler flags, too),
that reduces function calls by almost exactly his speed-up factor. There's
really no need to guess if I'm right, though. Just add `-pg` to the gcc flag
`for i in 1..12: echo -i and 3 `
Run
+1 on some kind of invisibility/private-ness override feature. Another bonus of
that is to use it as a temporary workaround if some 3rd party package has made
something "overly" private as viewed by an "expert" user. While this has come
up in the context of library writing, sometimes users of li
+1 for labels rather than alternate repos (or more labels such as "random idea"
or "half-baked" or such).
But, also +1 if Nim core devs want to aggressively close issues that are too
incomplete to be a "real" RFC or are considered settled questions.
I think a vote would be fine. Ballot box stuffing is possible, but probably
identifiable via sheer numbers since the Nim community is so small unless the
vote gets HackerNews'd or Slashdotted. I also agree that the people who matter
most would never participate in such a vote because they've alr
Does anyone out there routinely use this feature of diverging from the style of
an `import` or as I mentioned just follow the lib's conventions? Part of
@dom96's survey should perhaps ask if that aspect is just a "theoretical
nicety".
I mean, someone cared enough about project-internal style co
Well said, @arnetheduck.
Honestly, for the true master, you could do some pluggable system that allowed
macro/compile-time proc-driven identifier transformation with the current rules
as a simple activation maybe as a pragma on `import`. Then people could
override that current "compromise" batc
By the way, if you are trying to "right size" a set or table to streamline
resizing costs, you can use rightSize(number of members to support). This is
easier than trying to guess the power of two at which such a population is
supported for both initSet and initTable. rightSize uses the same for
Every language has nested loops. My view is that the original article about
C++20 ranges conceived this test/benchmark to be about the cost, if any, of
abstractions, not exactly the performance of "nested loops however you write
it" as suggested by Timothee's code or "the fastest algorithm for g
Oh, I got your point and tried to emphasize that. I wasn't arguing against you
anywhere that I know of. I totally agree with your a,b,&c. I suspect we don't
disagree on anything real at all, but are just discussing different aspects. I
tried to express praise for your approach (if you happened t
Also, I guess a TL;DR part C) - I was never arguing against the tautology that
a faster algorithm is faster. That is kind of a weird straw man position. No
idea how quoting 600 microseconds for it left that impression, but maybe bears
correcting the record. (I didn't assess correctness, though p
Those are all fine points. Asm can sometimes make a bigger difference than
people conditioned to not question compiler output expect (and there are many
such people). This is especially with vector units. A few years back, I wrote
an AVX2 vectorized "minimum" function that ran 24x (twenty four t
@akavel \- you may (or may not) have noticed the "Group By Type" drop-down menu
on the doc page for any given module (e.g. clicking on `tables` from
`theindex`). That's sort of like what you're asking for.
The big `theindex.html` doesn't have that Group By feature, but perhaps should.
The optio
@akavel \- you're welcome.
@both \- I'd have to agree that in a language like Nim which relies heavily on
type-based overloads that long lists of the same ident in that side-bar/ToC
section is not helpful. Right now each is a hyperlink to the specific one. So,
collapsing would remove that choic
Or `[1,2,3].len` which is curiously not among the various listed choices for
whatever reason, and which is even more visually distinct from array indexing,
IMO.
What mashigan wrote may well be best suited to your exact purpose. Another
possibility at least worth a shout out is `collections/critbits` which
implements an efficient `keysWithPrefix`. For more fuzzy still than prefix
matching, you could also do some kind of efficient edit distance-based thin
If you want to _actually_ calculate Fibonacci numbers not use it as a (poor)
benchmark for funcalls/recursive overheads or for other purposes, then you may
also find this interesting:
[https://sahandsaba.com/five-ways-to-calculate-fibonacci-numbers-with-python-code.html](https://sahandsaba.com/f
You can also use `memfiles`. There writing/reading is the same as accessing
memory. Besides being possibly simpler presenting an "as if you already read
the whole file into a buffer" view, it may also be much more efficient,
especially for byte-at-a-time operation where other APIs might do a lot
A short reply like this may be inadequate to explain virtual memory mechanisms
if you have never heard of them before. That said, if you have heard in the
past and forgotten this may help.
The `newMemMapFileStream` will call `memfiles.open` with default flags. Default
flags typically just looku
If you can't break your habit, you can always add a few lines near the top
(before all the `release`-dependent switching) of your `nim.cfg`:
@if r: #Allow short alias -d:r to activate fast release mode
define:release
@end
Run
Perhaps somewhere you picked u
Either automatic or manual vectorization can also allow twice as many `float32`
numbers to be handled per vector instruction vs `float64` on x86 just like the
ARM or GPU cases. You may need `-march=native` or `-mavx` compiler flags (or
manual intrinsics/assembly) to activate that feature, though
@mratsim is probably intending to refer to the `..` including `50` in Nim while
the C `for` with a `<` excludes `50` doing about 2% less work, but the
terseness and style of of his comment may leave the wrong impression.
for i in 0..50: echo i
Run
indeed compiles dow
With such a brief comment, it's hard to know which is why I said "probably
intending". Only one person knows. ;) Maybe he did think iterators cost more.
You are right I did misread his 1..50 as 0..50 {after looking at the first
version of the ggibson Nim code, not the 2nd where he confusingly sw
As @Stefan mentioned the string stuff can be slow. My `MemSlice` techniques
might be more broadly interesting. The tokenizer below only works for "strict"
one-character delimiting, not, e.g., repeated whitespace. Also, I agree
completely with @arnetheduck that algorithm choices matter more than
Just a quick follow-up, with gcc-8.3 on Linux x86_64 Skylake CPU,
profile-guided optimization did get that 176ms time down to 150 ms (With
-d:useNimHash it was 152 ms), a 1.17x boost to 1.43x faster than the C in
@enthus1ast 's `wp.c`.
Of course, with 291,000 unique keys the average external ch
Oh, and two more stats about my input data important to reason about my timings
- 43 MB and 4.8e6 total words total (coincidentally close to 1e-3 times my
1/4.8GHz clock cycle).
So, average string length around 43e6/4.8e6 =~ 9 bytes (also a web server log),
and about 150e-3/4.8e6 =~ 31 nsec =~
I wrote: "most lookups are failed lookups", but Oops! 291e3 unique/4.8e6 total
= 6% ==> 94% of lookups are successful. So, the `memcmp` that happens only
after hash codes match does happen almost all the time, and so my particular
data set is more like 16B/(32B+9B) = 39% cached, not 50%. This do
Nim defines a
proc allocCStringArray*(a: openArray[string]): cstringArray
Run
in the `system` module (no need to explicitly `import` it). There is also a
corresponding:
proc deallocCStringArray*(a: cstringArray)
Run
You probably want to u
You're welcome. Nim core devs are **_very_** willing to work with any and all
comers on pull requests to improve documentation. Ignorance by otherwise
reasonably patient and resourceful newcomers is **_not_** easy to simulate (in
any project, really). In some ways it's a resource that decays wit
You probably want `proc getFileHandle*(f: File): FileHandle` from the system
module (no need to import). That just calls through to the C fileno().
There is also `proc open*(a1: cstring, a2: cint)` in the `posix` module:
import posix
let fd = open("/dev/mydevice", O_RDONLY)
Run
if you want an unbuffered raw file handle which sounds like it might be the
case (and you don't need portability which is implied by
Well, to make it work without the rest of your type environment, I changed just
the first couple lines of your posted code to:
import posix, selectors
proc readDevice(dev: string) =
let devfd = posix.open(dev, posix.O_RDWR)
Run
Then it compiled fine for me.
0.19.0 is very old in "Nim years" { like "dog years", but even more compressed
;-) }. 0.19.2 works better, but personally I would recommend `git clone
https://github.com/Araq/Nim` and learning how to build that/run right out of
the build. It's not too hard to follow the instructions. I think `sh
You're welcome, and I'm very glad you didn't give up so quickly!
FWIW, I was not sure it was a problem with that version of Nim -- only that
when I changed the type environment enough to compile your code that it worked
for me on a devel version. In the future, providing more whole code context
While it is a little unusual to see it, open addressed tables can support
multiple keys (as a set or associatively as a table). I believe most other
collision resolution schemes can, as well. So, a Nim `Table` is what some
people call a "multi-set". Whether you have unique keys depends upon the
The relevant discussion:
[https://github.com/nim-lang/Nim/pull/2078#issuecomment-73744634](https://github.com/nim-lang/Nim/pull/2078#issuecomment-73744634)
A documentation PR seems reasonable. Go for it. If you need to undo multiple
add calls to `Table` then you should be able loop until you don't find the key
anymore, but yeah, you don't know what order values for the key will arrive in
that loop. There's no secondary order discipline like FIFO or
@Stefan_Salewski, you can also just call the libc memchr (which is what the
current memfiles does to delimit lines aka slices until string conversion). A
good memchr will do the SSE/AVX internally. For example this program:
import memfiles, os, times
var f = memfiles.open(param
@Stefan - this is a classic application of `containsOrImpl` in the set APIs (or
`getOrPut` in the table APIs) which are very close to the core "search or
insert" hash table algorithm.
proc uniqCp[T](s: openArray[T]): seq[T] =
newSeq(result, s.len)
var h = initSet[T](righ
Perhaps you want strutils.escape? I.e.:
import strutils
echo escape("\n\t")
Output is "\x0A\x09"
Its output is designed to be as portable an 'input' to
de-escapers/escape-interpreters as possible. So you will see things like \x0A
and \x09 instead of the \n \t because the
If you use memfiles then you can just map the whole file and let the OS page
things in from disk on demand as necessary. This gives you a "whole file buffer
for almost free" kind of user interface. After the first run or file
creation/etc. there may be zero/little IO and also very little copying
> For my use case it is very important, that I can get all the sizes at compile
> time
You may not be able to make all these things available at Nim-compile-time
because they are not fully determined until the C (or whatever) backend compile
is reached. For example,
type foo obje
While doofenstein makes some good points especially about fragility with regard
to adding new fields and old data files, there is some real efficiency charm to
a "just memory map and go" pure binary native data approach. It's not always
fun to be parsing data files again and again. So, I would a
flyx wrote:
> And yes, something between array and seq would be nice. But currently, Nim's
> type system does not allow a type with a runtime-calculated length parameter.
> Allowing it would be a hack similar to C VLAs.
No hack involved here. The length parameter can be part of the object like
Not sure what will be easiest for you to use/understand..I thought the patty
approach pretty nice. That said, to answer your last question, the macros
module has a few interesting things along the lines of what you were asking
for: `parseExpr`, `parseStmt`, and `emit`. The first two can compile
"order" is just ambiguous - ordered by a key or ordered by an action like
insertion. Some resolve that ambiguity easily in their heads and others require
it to be spelled out. For the same reason, "KeyOrdered" would be clearer than
"sorted". Similarly, "PutOrdered" might be better than the curre
While excess can be bad, and there are inconsistencies that should be repaired,
I think a "batteries included" stdlib is a good thing. E.g., the recent
`enumerate` for loop macro in `tests/macros/tforloop_macro1.nim` should
probably be in `lib/pure/sugar.nim`. I think the duplication (`re` and `
`CountTable` is just a histogram and the sorting is by the bin counts not by
the keys. Keys are visited in "hash order". `hashes` does use an identity hash
for integers which could create some confusion from a simple test program if
the modulo the table size post hash transform doesn't change th
Nim `Table` and `HashSet` should be caching the hash code values in their
tables and also using that as a "comparison prefix" (comparing string values
only if the integer hash codes already match). The string hash of the new
inputs is ineliminable - or rather has to be done at least once anyway.
@timothee - @amalek beat me to the punch, but user-visible compilation time is
very sensitive to backend compiler/gcc options. Consider two invocations
compiling my [cligen](https://github.com/c-blake/cligen) test suite (30
programs with somewhat complex macros running at compile-time):
You're welcome, Garry.
Stefan - a simple check for whether pattern has any regex metacharacters (e.g.
'.', '*', etc.) can decide if a pattern is definitely not a regular expression.
If there are any metacharacters in pattern, well only the user can know and
there would have to be a command-line
Ok. This works (replicates Nim performance) for me (on gcc-7.1.0 x86_64 Linux -
not sure what others):
float fibonacci(int x) {
return x < 2 ? (float)x : fibonacci(x - 1) + fibonacci(x - 2);
}
#include
void NimMainInner() {
printf("%.0f\n", fibonacci(50)
One other gotcha in this "test" to see how close we already are is that it
seems nim catches the `stderr` output of the gcc invocations. In that sense the
nim `gcc` is as if it were `gcc -w` anyway. If you capture the commands from
`--verbosity:2` and re-execute them from a `/bin/sh` script, you
So, there have been various on again/off again conversations on github about
how the stdlib parseopt could/should behave. Those conversations seemed to have
a pretty narrow audience..basically participants on these github issue threads:
[4620](https://github.com/nim-lang/Nim/issues/4620)
[68
@jlp765 - good catch. I thought of that, too (I actually wrote that `memSlices`
stuff), and almost went back and added a note later, but you beat me to it.
I still am unaware about relative timings on platforms other than what I
personally use and would be interested to hear reports, but on Lin
What Stefan means is doing something like this:
pattern = ARGS[1]
let rex = pattern.re
and then using `rex` instead of `pattern.re` in the two spots inside the
`while(fIn.readline(line))` loop.
That one change made this run about 4X faster on my machine. [ With memfiles
@twetzel59 - I use fork (and even vfork) all the time. They work just fine.
@Krux02 - I have never used compiler-as-a-service mode which seems originally
targeted at IDEs, but that feature may apply to testing as well. See
`compiler/service.nim` and `tests/testament/caasdriver.nim`..maybe search
I believe what you are running into is this terminal driver issue:
> [http://unix.stackexchange.com/questions/131105/how-to-read-over-4k-input-without-new-lines-on-a-terminal](http://unix.stackexchange.com/questions/131105/how-to-read-over-4k-input-without-new-lines-on-a-terminal)
Oh, sure. There isn't _usually_ a 1.618**4 kind of work reduction in play,
though. Araq would know, but I doubt the reason NimMainInnner is called
through a volatile function pointer is to trick gcc's optimizer, though that is
a happy side effect.
To me, this was just a performance mystery th
I did a little searching and compiling and I'd have to say Jehan is right here
about "absence of evidence". See, [this stackoverflow
thread](https://stackoverflow.com/questions/21214875/gcc-accuracy-of-strict-aliasing-warnings)
for a very simple example to people that is (still in 2017) too comp
@flyx - c99 (and gcc long before it) have VLAs of the
fixed-after-initialization variety...mostly syntactic sugar for alloca, of
course. All that is surely more commonly used than Ada and required no fancy
containers or any constraints about being a leaf function. Such an array style
may be dif
This recursion unpacking/unrolling trick that gcc does (at call-site if
insulated by call via volatile function ptr, and always inside the recursive
impl) is, in my experience, a rare compiler optimization, but maybe it will
catch on. clang does _neither_. If you `objdump -D` the executable (or
Memory hierarchies since the 1990s have broadened the relevance of 1970s
practical wisdom for disk storage. A solid hash table with locality and solid
B-Tree often win today. A great stdlib should have both.
A more obscure bonus of a B-Tree (when compared to, say, balanced binary trees)
is chea
Yeah..Depending on what he's doing, same-file dynamic estimation might also
work. Good point, @jlp765.
On my system the 5.4 MB of `/usr/share/nim/**.nim` gets counted in about 4
milliseconds - over 1.3 GB/sec, probably faster than all but the most
powerhouse nvme/disk array IO. This is why I su
201 - 294 of 294 matches
Mail list logo