Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-06-08 Thread Const V
I did implementations and profiling of following: *BigGrepStrFnd* - Boyer-Moore (grep) *BigGrepBytes* - Rabin-Karp *BigGrepStr - *Rabin-Karp *BigGrepScan *- search with sliding window Additionally implemented them using concurrency. Tested on 100 files containing one 100MB line. Searching for

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-08 Thread Amnon
So what happens when you run your program > /dev/null ? For testing I would write a function that reads from an io.Reader and writes to an io.Reader. Write a unit test which uses a bytes.Buffer to catch the output. On Monday, 9 May 2022 at 04:59:37 UTC+1 Const V wrote: > I'm using OSX. > > The

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-08 Thread Amnon BC
Why don't you try redirecting stdout to /dev/null and see how your program behaves. Also, which OS are you using? On Sun, May 8, 2022 at 11:36 PM Const V wrote: > reading 1 line '\n' delimited 100MB file > r1 := bufio.NewReader(file) > s := ReadWithReadLine(r1) > InputProcessing(strings.NewRead

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-08 Thread Bakul Shah
On May 7, 2022, at 1:24 PM, Constantine Vassilev wrote: > > I need to write a program that reads STDIN and should output every line that > contains a search word "test" to STDOUT. > > How I can test that considering the problem is a line can be 100s of MB long > (\n is line end) and tens of M

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-08 Thread Amnon BC
On Sun, May 8, 2022 at 10:41 PM Const V wrote: > write to stdout is not working for MB long strings > >> >> That is very surprising indeed. How do you reach the conclusion? How can we replicate that failure? -- You received this message because you are subscribed to the Google Groups "golang-n

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-08 Thread Robert Engels
Way over complicating this. Use a buffered reader. Keep track of the position the last newline was seen. It is a trivial state machine the find ‘test’ continue to next newline. Seek to stored last newline position and buffered read and write to stdout until next newline. > On May 8, 2022, at

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-08 Thread Const V
pre-allocating a buffer is not an option, it should be dynamic On Sunday, May 8, 2022 at 1:24:40 PM UTC-7 Barnim Dzwillo wrote: > I had a similar use case in the past and got the best performance when > using ReadSlice() instead of scanner.Scan(). > See sample code here: https://go.dev/play/p/Ef

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-08 Thread Const V
reallocating a buffer is not an option, it should be dynamic On Sunday, May 8, 2022 at 1:26:34 PM UTC-7 Const V wrote: > Using r.ReadLine() I can successfully read 100 MB line in a string, using > the following conditional statement which is > increasing the buffer until '\n' is encountered. >

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-08 Thread Const V
Using r.ReadLine() I can successfully read 100 MB line in a string, using the following conditional statement which is increasing the buffer until '\n' is encountered. for isPrefix && err == nil { line, isPrefix, err = r.ReadLine() ln = append(ln, line...) } Now the last problem is how to search

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-08 Thread Const V
Using r.ReadLine() I can r=successfully read 100 MB line in a string, using the following conditional statement which is increasing the buffer until '\n' is encountered. for isPrefix && err == nil { line, isPrefix, err = r.ReadLine() ln = append(ln, line...) } Now the last problem is how to sear

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-08 Thread 'Barnim Dzwillo' via golang-nuts
I had a similar use case in the past and got the best performance when using ReadSlice() instead of scanner.Scan(). See sample code here: https://go.dev/play/p/EfvadCURcXt On Sunday, May 8, 2022 at 7:25:29 AM UTC+2 Amnon wrote: > So you raise a couple of questions: > > 1) How about handling rune

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-07 Thread Amnon
So you raise a couple of questions: 1) How about handling runes? The nice thing about utf8 is you don't have to care. If you are searching for the word ascii byte 'test', you can simply compare byte by byte - the letter t is represented by 0x74, and this byte in the search buffer can only repr

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-07 Thread 'Dan Kortschak' via golang-nuts
On Sat, 2022-05-07 at 16:16 -0700, Const V wrote: > The question is will scanner.Scan handle a line of 100s MB? No, at least not by default (https://pkg.go.dev/bufio#Scanner.Buffer). But that that point you want to start questioning why you're doing what you're doing. Your invocation of grep can

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-07 Thread Const V
The question is will scanner.Scan handle a line of 100s MB? On Saturday, May 7, 2022 at 2:49:08 PM UTC-7 Amnon wrote: > How about something like > > func grep(pat []byte, r io.Reader, w io.Writer) error { > scanner := bufio.NewScanner(r) > for scanner.Scan() { > if (bytes.Contain

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-07 Thread Const V
Here is what came up withL func TestGrep1(t *testing.T) { cmd := exec.Command("./read.bash") fmt.Printf("%v\n", cmd) stdout, err := cmd.StdoutPipe() if err != nil { log.Fatal(err) } if err := cmd.Start(); err != nil { log.Fatal(err) } fmt.Printf("%v\n", stdout) find := []byte{'b', 'u', 'f', 'i', '

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-07 Thread Const V
Now the next question is if I have to handle runes. On Saturday, May 7, 2022 at 3:31:31 PM UTC-7 kortschak wrote: > On Sat, 2022-05-07 at 15:18 -0700, Amnon wrote: > > The other interesting question is what algorithm we use to find the > > pattern in each line. > > Generally bytes.Contains uses R

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-07 Thread 'Dan Kortschak' via golang-nuts
On Sat, 2022-05-07 at 15:18 -0700, Amnon wrote: > The other interesting question is what algorithm we use to find the > pattern in each line. > Generally bytes.Contains uses Rabin-Karp. But as the pattern is the > word "test" which is only 4 bytes long, > a brute force search is used, using SSE typ

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-07 Thread Amnon
The other interesting question is what algorithm we use to find the pattern in each line. Generally bytes.Contains uses Rabin-Karp. But as the pattern is the word "test" which is only 4 bytes long, a brute force search is used, using SSE type instructions where available. So the naive Go approac

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-07 Thread Amnon
p.s. If you changed the above code to use strings rather than []byte it would run many times slower due to the cost of allocation. On Saturday, 7 May 2022 at 22:49:08 UTC+1 Amnon wrote: > How about something like > > func grep(pat []byte, r io.Reader, w io.Writer) error { > scanner := bufio

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-07 Thread Amnon
How about something like func grep(pat []byte, r io.Reader, w io.Writer) error { scanner := bufio.NewScanner(r) for scanner.Scan() { if (bytes.Contains(scanner.Bytes(), pat)) { w.Write(scanner.Bytes()) } } return scanner.Err() } and for extra speed, j

Re: [go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-07 Thread Jan Mercl
On Sat, May 7, 2022 at 10:24 PM Constantine Vassilev wrote: > I need to write a program that reads STDIN and should output every line that > contains a search word "test" to STDOUT. Piping the data through grep(1) would be my first option. -- You received this message because you are subscrib

[go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it?

2022-05-07 Thread Const V
I need to write a program that reads STDIN and should output every line that contains a search word "test" to STDOUT. How I can test that considering the problem is a line can be 100s of MB long (\n is line end) and tens of MB info is passed to it. -- You received this message because you a

[go-nuts] Which is the most efficient way to read STDIN lines 100s of MB long and tens of MB info is passed to it

2022-05-07 Thread Constantine Vassilev
I need to write a program that reads STDIN and should output every line that contains a search word "test" to STDOUT. How I can test that considering the problem is a line can be 100s of MB long (\n is line end) and tens of MB info is passed to it. -- You received this message because you