Nice, boia01! In my same `/usr/share/nim/**.nim` test I get 768 microseconds
for your version and 2080us for just doing the memSlices approach..So, 2.7X
speed up, a bit less than the 4X I saw when I last compared the approach in two
C versions..maybe unrolling. Dunno.
@alfrednewman - if the sta
Just for fun, I ported and hacked together a self-contained Nim version of
Daniel Lemire's avxcount:
[https://gist.github.com/aboisvert/3f89bc0ae0a2168fcf35ccca98177f6a](https://gist.github.com/aboisvert/3f89bc0ae0a2168fcf35ccca98177f6a)
(I didn't bother with the loop-unrolled versions)
Hi, thanks for the help of all of you.
Yes, I'm pre-calculating things. In the data orchestration process I'm involved
in, I can usually estimate the time of a rendering based on the number of rows
I'm processing. It is a linear process and the processing time is typically not
much affected as
Yeah..Depending on what he's doing, same-file dynamic estimation might also
work. Good point, @jlp765.
On my system the 5.4 MB of `/usr/share/nim/**.nim` gets counted in about 4
milliseconds - over 1.3 GB/sec, probably faster than all but the most
powerhouse nvme/disk array IO. This is why I su
How giant is a "giant text file"?
On my machine a 75M file takes roughly 0.12 sec to count the lines (it is dummy
data, so not very random).
If GigaBytes in size, then close enough might be good enough
I didn't see @cblake mention it, but you could count the bytes to read 100
lines of a big f
So, one other thing that is _probably_ obvious but bears mentioning just in
case it didn't occur to @alfrednewman - the number of addressable/seekable
bytes in a file is usually maintained by any modern filesystem on any modern OS
as cheaply accessed metadata. So, if what you really need is not
Please also compare this thread:
[https://forum.nim-lang.org/t/1164#18006](https://forum.nim-lang.org/t/1164#18006)
I have not yet used SIMD instructions myself in Nim, but there are some hints
in the Forum already.
For line counting, the different end-of-line marks for Unix/Windows/Mac makes
Guys, thank you for your help.
@Stefan_Salewski, yes speed is an important point for me. I found the link you
provided (about SMID) very interesting ... however, I do not know how to do
this using Nim. Could you please help?
Even to help newbies like me, thought to include the response of this
If speed is really important for you, you may consider SIMD instructions.
D. Lemire gave an example for this in his nice blog:
[https://lemire.me/blog/2017/02/14/how-fast-can-you-count-lines](https://lemire.me/blog/2017/02/14/how-fast-can-you-count-lines)/
@jlp765 - good catch. I thought of that, too (I actually wrote that `memSlices`
stuff), and almost went back and added a note later, but you beat me to it.
I still am unaware about relative timings on platforms other than what I
personally use and would be interested to hear reports, but on Lin
Even faster (avoiding some string allocations)
import memfiles
for line in memSlices(memfiles.open("foo")):
inc(i)
It sounds like you will have many regular files (i.e., random access/seekable
inputs as opposed to things like Unix pipes). On Linux with glibc,
memfiles.open is probably the fastest approach which uses memchr internally to
find line boundaries. E.g. (right from memfiles documentation),
Hello,
Before processing a giant txt file, I need to know in advance how many lines
that file has. Since I will have to process multiple files it would be
important to perform this line counting operation as quickly as possible.
What is the fastest way to know how many lines a txt file has?
I
13 matches
Mail list logo