> On 01 Mar 2017, at 21:57, Conrad Meyer wrote:
>
> Hi Bruce,
>
> On my laptop (Intel(R) Core(TM) i5-3320M CPU — Ivy Bridge) I still see
> a little worse performance with this patch.
Hi Bruce & Conrad,
I gave both patches a try.
It's a real use case, iSCSI throughput.
Both target and initiator
On Wed, 1 Mar 2017, Conrad Meyer wrote:
On Wed, Mar 1, 2017 at 9:27 PM, Bruce Evans wrote:
On Wed, 1 Mar 2017, Conrad Meyer wrote:
On my laptop (Intel(R) Core(TM) i5-3320M CPU ??? Ivy Bridge) I still see
a little worse performance with this patch. Please excuse the ugly
graphs, I don't have
On Wed, Mar 1, 2017 at 9:27 PM, Bruce Evans wrote:
> On Wed, 1 Mar 2017, Conrad Meyer wrote:
>
>> On my laptop (Intel(R) Core(TM) i5-3320M CPU — Ivy Bridge) I still see
>> a little worse performance with this patch. Please excuse the ugly
>> graphs, I don't have a better graphing tool set up at t
> On 02 Mar 2017, at 06:27, Bruce Evans wrote:
>
> On Wed, 1 Mar 2017, Conrad Meyer wrote:
>
>> On my laptop (Intel(R) Core(TM) i5-3320M CPU — Ivy Bridge) I still see
>> a little worse performance with this patch. Please excuse the ugly
>> graphs, I don't have a better graphing tool set up at t
On Wed, 1 Mar 2017, Conrad Meyer wrote:
On my laptop (Intel(R) Core(TM) i5-3320M CPU ??? Ivy Bridge) I still see
a little worse performance with this patch. Please excuse the ugly
graphs, I don't have a better graphing tool set up at this time:
https://people.freebsd.org/~cem/crc32/sse42_bde.p
Hi Bruce,
On my laptop (Intel(R) Core(TM) i5-3320M CPU — Ivy Bridge) I still see
a little worse performance with this patch. Please excuse the ugly
graphs, I don't have a better graphing tool set up at this time:
https://people.freebsd.org/~cem/crc32/sse42_bde.png
https://people.freebsd.org/~cem
On Mon, 27 Feb 2017, Conrad Meyer wrote:
On Thu, Feb 2, 2017 at 12:29 PM, Bruce Evans wrote:
I've almost finished fixing and optimizing this. I didn't manage to fix
all the compiler pessimizations, but the result is within 5% of optimal
for buffers larger than a few K.
Did you ever get to a
On Thu, Feb 2, 2017 at 12:29 PM, Bruce Evans wrote:
> I've almost finished fixing and optimizing this. I didn't manage to fix
> all the compiler pessimizations, but the result is within 5% of optimal
> for buffers larger than a few K.
Hi Bruce,
Did you ever get to a final patch that you are sat
Hi guys, Conrad, Bruce,
May I ask you some news regarding this please ?
More than 3 weeks now running Conrad commit on 2 CRC32C digest enabled iSCSI
initiators / targets without issue :)
Thank you very much again for this !
Shall we then think about "fixing" the last one or two remaining things
On 01/31/17 16:36, Bruce Evans wrote:
gcc-4.2.1 is an ancient compiler. Good riddance.
I prefer it.
This change also breaks compilation with clang v3.6 (FYI)
--HPS
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo
On Thu, 2 Feb 2017, Konstantin Belousov wrote:
On Tue, Jan 31, 2017 at 03:26:32AM +, Conrad E. Meyer wrote:
+ compile-with"${CC} -c ${CFLAGS:N-nostdinc} ${WERROR} ${PROF} -msse4
${.IMPSRC}" \
BTW, new gcc has -mcrc32 option, but clang 3.9.1 apparently does not.
I've almost fi
On Tue, Jan 31, 2017 at 03:26:32AM +, Conrad E. Meyer wrote:
> + compile-with"${CC} -c ${CFLAGS:N-nostdinc} ${WERROR} ${PROF} -msse4
> ${.IMPSRC}" \
BTW, new gcc has -mcrc32 option, but clang 3.9.1 apparently does not.
___
svn-src-all@freebs
On Tue, 31 Jan 2017, Conrad Meyer wrote:
On Tue, Jan 31, 2017 at 7:16 PM, Bruce Evans wrote:
Another reply to this...
On Tue, 31 Jan 2017, Conrad Meyer wrote:
On Tue, Jan 31, 2017 at 7:36 AM, Bruce Evans wrote:
On Tue, 31 Jan 2017, Bruce Evans wrote:
I
think there should by no alignment
On Tue, Jan 31, 2017 at 7:16 PM, Bruce Evans wrote:
> Another reply to this...
>
> On Tue, 31 Jan 2017, Conrad Meyer wrote:
>
>> On Tue, Jan 31, 2017 at 7:36 AM, Bruce Evans wrote:
>>>
>>> On Tue, 31 Jan 2017, Bruce Evans wrote:
>>> I
>>> think there should by no alignment on entry -- just assume
Another reply to this...
On Tue, 31 Jan 2017, Conrad Meyer wrote:
On Tue, Jan 31, 2017 at 7:36 AM, Bruce Evans wrote:
On Tue, 31 Jan 2017, Bruce Evans wrote:
Unrolling (or not) may be helpful or harmful for entry and exit code.
Helpful, per my earlier benchmarks.
I
think there should by n
On Tue, 31 Jan 2017, Conrad Meyer wrote:
On Tue, Jan 31, 2017 at 7:36 AM, Bruce Evans wrote:
On Tue, 31 Jan 2017, Bruce Evans wrote:
Unrolling (or not) may be helpful or harmful for entry and exit code.
Helpful, per my earlier benchmarks.
I
think there should by no alignment on entry -- ju
On Tue, Jan 31, 2017 at 7:36 AM, Bruce Evans wrote:
> On Tue, 31 Jan 2017, Bruce Evans wrote:
> Unrolling (or not) may be helpful or harmful for entry and exit code.
Helpful, per my earlier benchmarks.
> I
> think there should by no alignment on entry -- just assume the buffer is
> aligned in th
On Tue, 31 Jan 2017, Bruce Evans wrote:
On Mon, 30 Jan 2017, Conrad Meyer wrote:
On Mon, Jan 30, 2017 at 9:26 PM, Bruce Evans wrote:
On Tue, 31 Jan 2017, Conrad E. Meyer wrote:
Log:
calculate_crc32c: Add SSE4.2 implementation on x86
This breaks building with gcc-4.2.1,
gcc-4.2.1 is an
On Mon, 30 Jan 2017, Conrad Meyer wrote:
On Mon, Jan 30, 2017 at 9:26 PM, Bruce Evans wrote:
On Tue, 31 Jan 2017, Conrad E. Meyer wrote:
Log:
calculate_crc32c: Add SSE4.2 implementation on x86
This breaks building with gcc-4.2.1,
gcc-4.2.1 is an ancient compiler. Good riddance.
I pre
Hi Bruce,
On Mon, Jan 30, 2017 at 9:26 PM, Bruce Evans wrote:
> On Tue, 31 Jan 2017, Conrad E. Meyer wrote:
>
>> Log:
>> calculate_crc32c: Add SSE4.2 implementation on x86
>
>
> This breaks building with gcc-4.2.1,
gcc-4.2.1 is an ancient compiler. Good riddance.
>> Added: head/sys/libkern/x8
On Tue, 31 Jan 2017, Conrad E. Meyer wrote:
Log:
calculate_crc32c: Add SSE4.2 implementation on x86
This breaks building with gcc-4.2.1, and depends on using non-kernel clang
headers for clang.
Modified: head/sys/conf/files.amd64
=
On Mon, Jan 30, 2017 at 7:26 PM, Conrad E. Meyer wrote:
> (The CRC instruction takes 1 cycle but has 2-3 cycles of latency.)
My mistake, it's not 2 anywhere. It's just 3 cycles on all
workstation/server CPUs since Nehalem. Different on Atom chips and
AMD.
Best,
Conrad
_
Author: cem
Date: Tue Jan 31 03:26:32 2017
New Revision: 313006
URL: https://svnweb.freebsd.org/changeset/base/313006
Log:
calculate_crc32c: Add SSE4.2 implementation on x86
Derived from an implementation by Mark Adler.
The fast loop performs three simultaneous CRCs over subsets of the
23 matches
Mail list logo