This might still be optimizable in pure Racket; otherwise, mixing Racket with R might not be a bad idea for this and other reasons.

Details...

I played with this briefly late last night after emails with Ryan, without finding a substantially faster way that still looked elegant as Racket code. It did appear that the hit was *not* from GC (not even when a huge list was involved, which can be bad for some GCs), but either the number parsing or the basic file port I/O. (BTW, the "regexp-match*" approach was more expensive than I would've guessed.)

If speeding this up were important for a consulting client, I would next do something that didn't look elegant as Racket code, and put it in a reusable module with an elegant interface. Offhand, I would probably next try one of the following two approaches, and if neither of those worked, make C extension that was called once per file: (1) byte-by-byte read from buffered I/O with a handwritten Racket DFA, probably doing the conversion to a floating-point number as we go; or (2) unbuffered block reads to byte strings, sized for optimal file block I/O, and parse numbers out of those byte strings quickly. (Just doing a quick brain dump here, since clients need me to do different things now.)

Tools like Mathematica and R presumably have had their read-lots-of-numbers-from-a-file made pretty fast. It's OK to call R or Mathematica judiciously from Racket for big data and other number-crunching purposes. (I have a consulting client who mixes these two tools well, and currently calls out to an isolated R process on other cores from Racket, through a stdio interface. Originally, they did this with in-process C extensions, but separate processes is much better for a few reasons.) Although I understand that Dr. Neil T. is well on the way of putting more R functionality into pure Racket, so I assume more and more people over time will be doing their numeric work in pure Racket.

Neil V.

____________________
 Racket Users list:
 http://lists.racket-lang.org/users

Reply via email to