I did implementations and profiling of following:
*BigGrepStrFnd* - Boyer-Moore (grep)
*BigGrepBytes* - Rabin-Karp
*BigGrepStr - *Rabin-Karp
*BigGrepScan *- search with sliding window
Additionally implemented them using concurrency.
Tested on 100 files containing one 100MB line. Searching for
So what happens when you run your program > /dev/null
?
For testing I would write a function that reads from an io.Reader and
writes to an io.Reader.
Write a unit test which uses a bytes.Buffer to catch the output.
On Monday, 9 May 2022 at 04:59:37 UTC+1 Const V wrote:
> I'm using OSX.
>
> The
Why don't you try redirecting stdout to /dev/null and see how your program
behaves.
Also, which OS are you using?
On Sun, May 8, 2022 at 11:36 PM Const V wrote:
> reading 1 line '\n' delimited 100MB file
> r1 := bufio.NewReader(file)
> s := ReadWithReadLine(r1)
> InputProcessing(strings.NewRead
On May 7, 2022, at 1:24 PM, Constantine Vassilev wrote:
>
> I need to write a program that reads STDIN and should output every line that
> contains a search word "test" to STDOUT.
>
> How I can test that considering the problem is a line can be 100s of MB long
> (\n is line end) and tens of M
On Sun, May 8, 2022 at 10:41 PM Const V wrote:
> write to stdout is not working for MB long strings
>
>>
>>
That is very surprising indeed.
How do you reach the conclusion?
How can we replicate that failure?
--
You received this message because you are subscribed to the Google Groups
"golang-n
Way over complicating this. Use a buffered reader. Keep track of the position
the last newline was seen. It is a trivial state machine the find ‘test’
continue to next newline. Seek to stored last newline position and buffered
read and write to stdout until next newline.
> On May 8, 2022, at
pre-allocating a buffer is not an option, it should be dynamic
On Sunday, May 8, 2022 at 1:24:40 PM UTC-7 Barnim Dzwillo wrote:
> I had a similar use case in the past and got the best performance when
> using ReadSlice() instead of scanner.Scan().
> See sample code here: https://go.dev/play/p/Ef
reallocating a buffer is not an option, it should be dynamic
On Sunday, May 8, 2022 at 1:26:34 PM UTC-7 Const V wrote:
> Using r.ReadLine() I can successfully read 100 MB line in a string, using
> the following conditional statement which is
> increasing the buffer until '\n' is encountered.
>
Using r.ReadLine() I can successfully read 100 MB line in a string, using
the following conditional statement which is
increasing the buffer until '\n' is encountered.
for isPrefix && err == nil {
line, isPrefix, err = r.ReadLine()
ln = append(ln, line...)
}
Now the last problem is how to search
Using r.ReadLine() I can r=successfully read 100 MB line in a string, using
the following conditional statement which is
increasing the buffer until '\n' is encountered.
for isPrefix && err == nil {
line, isPrefix, err = r.ReadLine()
ln = append(ln, line...)
}
Now the last problem is how to sear
I had a similar use case in the past and got the best performance when
using ReadSlice() instead of scanner.Scan().
See sample code here: https://go.dev/play/p/EfvadCURcXt
On Sunday, May 8, 2022 at 7:25:29 AM UTC+2 Amnon wrote:
> So you raise a couple of questions:
>
> 1) How about handling rune
So you raise a couple of questions:
1) How about handling runes?
The nice thing about utf8 is you don't have to care. If you are searching
for the word ascii byte 'test', you can
simply compare byte by byte - the letter t is represented by 0x74, and this
byte in the search buffer can
only repr
On Sat, 2022-05-07 at 16:16 -0700, Const V wrote:
> The question is will scanner.Scan handle a line of 100s MB?
No, at least not by default (https://pkg.go.dev/bufio#Scanner.Buffer).
But that that point you want to start questioning why you're doing what
you're doing.
Your invocation of grep can
The question is will scanner.Scan handle a line of 100s MB?
On Saturday, May 7, 2022 at 2:49:08 PM UTC-7 Amnon wrote:
> How about something like
>
> func grep(pat []byte, r io.Reader, w io.Writer) error {
> scanner := bufio.NewScanner(r)
> for scanner.Scan() {
> if (bytes.Contain
Here is what came up withL
func TestGrep1(t *testing.T) {
cmd := exec.Command("./read.bash")
fmt.Printf("%v\n", cmd)
stdout, err := cmd.StdoutPipe()
if err != nil {
log.Fatal(err)
}
if err := cmd.Start(); err != nil {
log.Fatal(err)
}
fmt.Printf("%v\n", stdout)
find := []byte{'b', 'u', 'f', 'i', '
Now the next question is if I have to handle runes.
On Saturday, May 7, 2022 at 3:31:31 PM UTC-7 kortschak wrote:
> On Sat, 2022-05-07 at 15:18 -0700, Amnon wrote:
> > The other interesting question is what algorithm we use to find the
> > pattern in each line.
> > Generally bytes.Contains uses R
On Sat, 2022-05-07 at 15:18 -0700, Amnon wrote:
> The other interesting question is what algorithm we use to find the
> pattern in each line.
> Generally bytes.Contains uses Rabin-Karp. But as the pattern is the
> word "test" which is only 4 bytes long,
> a brute force search is used, using SSE typ
The other interesting question is what algorithm we use to find the pattern
in each line.
Generally bytes.Contains uses Rabin-Karp. But as the pattern is the word
"test" which is only 4 bytes long,
a brute force search is used, using SSE type instructions where available.
So the naive Go approac
p.s. If you changed the above code to use strings rather than []byte
it would run many times slower due to the cost of allocation.
On Saturday, 7 May 2022 at 22:49:08 UTC+1 Amnon wrote:
> How about something like
>
> func grep(pat []byte, r io.Reader, w io.Writer) error {
> scanner := bufio
How about something like
func grep(pat []byte, r io.Reader, w io.Writer) error {
scanner := bufio.NewScanner(r)
for scanner.Scan() {
if (bytes.Contains(scanner.Bytes(), pat)) {
w.Write(scanner.Bytes())
}
}
return scanner.Err()
}
and for extra speed, j
On Sat, May 7, 2022 at 10:24 PM Constantine Vassilev wrote:
> I need to write a program that reads STDIN and should output every line that
> contains a search word "test" to STDOUT.
Piping the data through grep(1) would be my first option.
--
You received this message because you are subscrib
I need to write a program that reads STDIN and should output every line
that contains a search word "test" to STDOUT.
How I can test that considering the problem is a line can be 100s of MB
long (\n is line end) and tens of MB info is passed to it.
--
You received this message because you a
I need to write a program that reads STDIN and should output every line
that contains a search word "test" to STDOUT.
How I can test that considering the problem is a line can be 100s of MB
long (\n is line end) and tens of MB info is passed to it.
--
You received this message because you
23 matches
Mail list logo