Re: [vox-tech] Memory addressing?

2010-06-24 Thread Brian Lavender
On Wed, Jun 23, 2010 at 06:37:41PM -0700, Bill Broadley wrote:
 On 06/23/2010 10:42 AM, timri...@appahost.com wrote:
  The reason for the discussion was Brian's intrusion detection
  implementation stored the incoming packets in a hash
  table. The key to the hash table was quite
  large -- inbound IP address, outbound IP address, inbound port, and
  outbound port. I thought to my self, on a very large implementation (say
  Google) the table could grow to a billion entries. Could a hash table
  store this amount in memory? Could you allocate an array of half the
  total memory? Could you allocate an array of a billion integers? Brian,
  on his laptop, couldn't allocate a billion integers. But he could
  allocate a billion characters (bytes). Since I thought both bytes and
  integers were words, and since I thought memory stored words
  like registers stored words, we had our discussion.
 
 Sounds like an interesting discussion, sorry I missed it.  Kinda of 
 amusing trying to handle such a hash table on a (older I assume) single 
 32 bit laptop.

So, I am about to put up the code for my mods to nProbe. I believe it
runs on 64 bit. I really haven't done any work on the hashing part,
but I am sure that the work that Luca Deri has done works well.

How do you put a public git repository online?

Somewhat like the following. 
http://www.gtk.org/download.html

I found the following link. Does it look like it has the correct
details?
http://toolmantim.com/thoughts/setting_up_a_new_remote_git_repository

brian
-- 
Brian Lavender
http://www.brie.com/brian/

There are two ways of constructing a software design. One way is to
make it so simple that there are obviously no deficiencies. And the other
way is to make it so complicated that there are no obvious deficiencies.

Professor C. A. R. Hoare
The 1980 Turing award lecture
___
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech


[vox-tech] Memory addressing?

2010-06-23 Thread timriley
  Original Message 
 Subject: Re: [vox-tech] Memory addressing?
 From: Chanoch (Ken) Bloom kbl...@gmail.com
 Date: Tue, June 22, 2010 9:46 am
 To: lugod's technical discussion forum vox-tech@lists.lugod.org
 
 
 On Tue, Jun 22, 2010 at 09:11:44AM -0700, Brian Lavender wrote:
  Can someone confirm what is correct? 
  
  Tim and I were discussing memory addressing at Crepeville last night
  and we had a disagreement about how memory is addressable. I say that
  on today's common intel i386 32 bit architecture (in case you are one of
  those souls who builds your hardware from scratch), that memory is byte
  (octet) addressable. You can load a byte from memory into the lower 8
  bits of a register. Tim says that memory is only addressable on 32 bit
  word boundaries.
  

I stand corrected. Now that I have my hardware textbook open,
Tanenbaum 1990, I see that the smallest addressable unit on most
computers
is the byte -- 8 bits. Bytes are grouped into words. The word-length is
the size of the registers, not registers and memory.

snip

 Consider the following program. The fact that you can get pointers to
 arbitrary characers should be enough proof that the architecture is
 byte addressable.

Tanenbaum calls the smallest addressable unit a cell. He then lists
examples
of cell lengths for 11 computers. Before cells were standardized
to 8 bits, they ranged from 1 bit to 60 bits per cell. Therefore, a 60
bits per cell computer would store a single ASCII character in the
lowest
7 bits and set the upper 53 bits to off (probably). You could then
allocate
a pointer to address that 60 bit cell.

The reason for the discussion was Brian's intrusion detection
implementation stored the incoming packets in a hash
table. The key to the hash table was quite
large -- inbound IP address, outbound IP address, inbound port, and
outbound port. I thought to my self, on a very large implementation (say
Google) the table could grow to a billion entries. Could a hash table
store this amount in memory? Could you allocate an array of half the
total memory? Could you allocate an array of a billion integers? Brian,
on his laptop, couldn't allocate a billion integers. But he could
allocate a billion characters (bytes). Since I thought both bytes and
integers were words, and since I thought memory stored words
like registers stored words, we had our discussion.

The following failed to compile with an array-to-large error:
int main( void )
{
 int table[ 10 ];
 return 0;
}

The following succeeded to compile:
int main( void )
{
 char table[ 10 ];
 return 0;
}

This compiles but core dumps:
#include stdio.h
int main( void )
{
 char table[ 10 ];
 printf( Hello world!\n );
}

snip

___
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech


Re: [vox-tech] Memory addressing?

2010-06-23 Thread Brian Lavender
On Wed, Jun 23, 2010 at 10:42:18AM -0700, timri...@appahost.com wrote:
 
 The reason for the discussion was Brian's intrusion detection
 implementation stored the incoming packets in a hash
 table. The key to the hash table was quite
 large -- inbound IP address, outbound IP address, inbound port, and
 outbound port. I thought to my self, on a very large implementation (say
 Google) the table could grow to a billion entries. Could a hash table
 store this amount in memory? Could you allocate an array of half the
 total memory? Could you allocate an array of a billion integers? Brian,
 on his laptop, couldn't allocate a billion integers. But he could
 allocate a billion characters (bytes). Since I thought both bytes and
 integers were words, and since I thought memory stored words
 like registers stored words, we had our discussion.

What started this is that having a hash provides ideally a O(1).  So,
I say that to achieve this, one would ideally want to have a large array
to store your IP addresses and a hashing function that provides good
distribution. To which, Tim pointed to using a smaller array and using
chaining, because he initially thought that arrays were limited to smaller
sizes than was observed by doing some simple tests. To which I replied,
using a fixed size array and the chaining could cause the hash to decay
into a linked list which has a O(n). And to which Tim has noted using a
large fixed size array may just take up all your architectural limits
of the program. I imagine that having such a large array would push
the lower bound of the stack up in memory, or if allocated at runtime,
would push the heap way down from the top.

In my Genetic Algorithm, I used Gnome's glib to do the hashing for me. I
figured that someone else already looked at this problem. In my Genetic
Algorithm, I am only storing a maximum of 32 bits in the hash key.

In the nProbe code, Luca Deri wrote the hash, using the key values as Tim
described. I have not figured out exactly how Luca does it, but I will
assume that he used a suitable method that works well enough for me. ;-)
If I don't contain the scope on my Master's Project, I won't finish!

Thus, this was the impetus for our discussion. ;-)

brian
-- 
Brian Lavender
http://www.brie.com/brian/

There are two ways of constructing a software design. One way is to
make it so simple that there are obviously no deficiencies. And the other
way is to make it so complicated that there are no obvious deficiencies.

Professor C. A. R. Hoare
The 1980 Turing award lecture
___
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech


Re: [vox-tech] Memory addressing?

2010-06-23 Thread Bill Broadley
On 06/23/2010 10:42 AM, timri...@appahost.com wrote:
 The reason for the discussion was Brian's intrusion detection
 implementation stored the incoming packets in a hash
 table. The key to the hash table was quite
 large -- inbound IP address, outbound IP address, inbound port, and
 outbound port. I thought to my self, on a very large implementation (say
 Google) the table could grow to a billion entries. Could a hash table
 store this amount in memory? Could you allocate an array of half the
 total memory? Could you allocate an array of a billion integers? Brian,
 on his laptop, couldn't allocate a billion integers. But he could
 allocate a billion characters (bytes). Since I thought both bytes and
 integers were words, and since I thought memory stored words
 like registers stored words, we had our discussion.

Sounds like an interesting discussion, sorry I missed it.  Kinda of 
amusing trying to handle such a hash table on a (older I assume) single 
32 bit laptop.

 The following failed to compile with an array-to-large error:
 int main( void )
 {
   int table[ 10 ];
   return 0;
 }

Keep in mind, that's allocating on the stack, you could easily run out 
even on a 64 bit machine.  Moving the int table up before the int main 
would allocate it directly from memory.

Also I believe that requires contiguous memory, which may well be 
significantly lower than the free memory.

 The following succeeded to compile:
 int main( void )
 {
   char table[ 10 ];
   return 0;
 }

 This compiles but core dumps:
 #includestdio.h
 int main( void )
 {
   char table[ 10 ];
   printf( Hello world!\n );
 }

Keep in mind that compilers are getting smarter and I've seen problems 
with blind arrays.  Also depending on the OS they may or may not 
allocate memory that you ask for but don't initialized.  I'm not sure 
offhand if C initializes statically allocated memory like that.  So if 
you want to insure that a memory allocation (generically is working) I'd 
use the memory.  The code I used was:
#include stdio.h
#include stdlib.h

#define big 10
int table[ big ];

int main( void )
{
   int i;
   for (i=0;ibig;i++)
   {
table[i]=i/2048;
   }
   printf (%d\n,table[123]);
   return 0;
}

___
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech


Re: [vox-tech] Memory addressing?

2010-06-23 Thread Bill Broadley
On 06/23/2010 11:36 AM, Ken Bloom wrote:
 On a 32-bit machine, this will eat up most of the computer's address
 space, including *all* application space, and some kernel space (so you
 can expect things to segfault).

Assuming no PAE.  With PAE often the kernel and other user processes 
would be in different memory ranges.  So the kernel and other apps could 
be perfectly happy while you fill your particular segment.


 On a 64-bit machine, it should work
 though.

Nope, at least with the normal defaults stack segments tend to be less 
than 4GB.

On a 32GB ram 64-bit machine:
$ ./a.out
Segmentation fault

 Stuff gets pushed on the stack after the end of the array, approximately
 1GB into memory. This means you have a stack that's 1GB long when you
 get inside printf, which is seriously much more than the OS is prepared
 to let you have for your stack. (This will overflow the stack whether
 your machine is 32-bits or 64-bits.) Try mallocing your array instead.

Ya, definitely should work since that won't be on the stack.

#include stdio.h
#include stdlib.h

int *table;

#define big 10

int main( void )
{
   int i;
   table = (int *) malloc(sizeof(int)*big);
   for (i=0;ibig;i++)
   {
table[i]=i/2048;
   }
   printf (%d\n,table[4096]);
   free(table);
   return 0;
}

$ ./a.out
2

 (Oh and change that printf to be printf(%px Hello world\n,table); so
 you can see whether the allocation is working.)

 If you're looking at an IDS that big, you're going to need to find an
 appropriate caching data structure that can write the infrequently used
 parts out to disk. Or you're going to have to find some other way of
 minimizing the number of packets you're keeping track of at a time.

Exactly.

I'd suggest:
1) Don't require contiguous memory (you get much more that way)
2) Plan on parallel access (not a single thread/core)
3) Don't ignore pages, cache lines, L1/L2/L3 cache, etc.

So ignoring the cost of the hash calculation a memory lookup takes 
approximately 100ns.  If you do 8 or 16 in parallel you could get an 
effective latency (or throughput if you prefer) of 10ns.  Of course if 
you can get into L3 cache the latency and bandwidth will go up an order 
of magnitude.  Of course getting into L1/L2 will get you another magnitude.

So basically for optimal performance you'll end up with a multi-level 
datastructure that allocates chunks of memory much less than 4GB.  After 
all for any IDS the number of active sessions in any short time is going 
to be MUCH MUCH MUCH less than 2^96.  To keep that in perspective if 
your IDS handled a billion machines sending 1000 packets a second for 2 
billion years it still would be less than 2^96.

So, sure a hash of the 2^96 space is easy to write and plenty for 
smaller installations as you scale you are going to have to get 
significantly more efficient.
___
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech


Re: [vox-tech] Memory addressing?

2010-06-23 Thread Bill Broadley
On 06/23/2010 12:39 PM, Brian Lavender wrote:
 What started this is that having a hash provides ideally a O(1).  So,
 I say that to achieve this, one would ideally want to have a large array
 to store your IP addresses and a hashing function that provides good
 distribution. To which, Tim pointed to using a smaller array and using
 chaining, because he initially thought that arrays were limited to smaller
 sizes than was observed by doing some simple tests. To which I replied,
 using a fixed size array and the chaining could cause the hash to decay
 into a linked list which has a O(n).

Indeed, the classic memory vs access time for hashes.

 And to which Tim has noted using a
 large fixed size array may just take up all your architectural limits
 of the program. I imagine that having such a large array would push
 the lower bound of the stack up in memory, or if allocated at runtime,
 would push the heap way down from the top.

I had what sounds like a similar problem for web logging.  I wanted to 
take a URL which contained agent, IP, URL, referer.  I posted the source 
on lugod quite awhile ago.

The pseudo code was:
for hits:
 agent_id=getid(agent,agent string);
 ip_id=getid(ip,ip string);
 url_id=getid(url,url string);
 refer_id=getid(refer,refer string);
 append_log(agent_id,ip_id,url_id,refer_id,datastamp,returncode)

Of course getid would lookup the value and if it didn't exist it would 
add that string to the table.

This resulted in several advantages:
* Made it much easier to extract info from the logs, things like how
   many hits did a subset of the website have last year vs this year
   so we can evaluate advertising effectiveness.
* Logs were denser (less disk)
* Running reports like top 10 for things like hits, bandwidth, IPs,
   agents, and 404s was easy.

While load testing on a rather old machine (PII-400 I believe) I could 
record 1000 hits/sec even when using mysql as the backend.  With a 
memory resident database or something lighter like sqllite I'd expect 
much better.  Of course hardware has come a long way since the PII-400 
as well.

Such a setup might work well with for an IDS as well.  With some care it 
should be relatively straight forward to make it parallel friendly.

With all that said, I used the big hash method in a simple little tool I 
wrote to instantly tell you where your bandwidth is going:
   http://broadley.org/bill/tcpdump.pl

The output:
tcpdump -n -c 1000 | ./tcpdump.pl | head -5
1000 packets captured
1000 packets received by filter
0 packets dropped by kernel
lines=1000 lengths=760
 613884 128.120.46.32.8649 - 128.120.246.1.48876
 153463 128.120.46.33.8649 - 128.120.246.1.48609
 153462 128.120.46.33.8649 - 128.120.246.52.54912
   2684 131.215.74.22.61733 - 128.120.246.102.9996


___
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech


[vox-tech] Memory addressing?

2010-06-22 Thread Brian Lavender
Can someone confirm what is correct? 

Tim and I were discussing memory addressing at Crepeville last night
and we had a disagreement about how memory is addressable. I say that
on today's common intel i386 32 bit architecture (in case you are one of
those souls who builds your hardware from scratch), that memory is byte
(octet) addressable. You can load a byte from memory into the lower 8
bits of a register. Tim says that memory is only addressable on 32 bit
word boundaries.

Say you look at memory in bits and then on the left is the memory
address. 

I say that memory is normally byte addressable and the addressing
corresponds to byte (octet) boundaries.

Address  bits
00 7  15 23 31
30 7  15 23 31
70 7  15 23 31
11   0 7  15 23 31
15   0 7  15 23 31

Tim says that memory is only 32 bit word addressable 

Address  bits
00 7  15 23 31
10 7  15 23 31
20 7  15 23 31
30 7  15 23 31
40 7  15 23 31


-- 
Brian Lavender
http://www.brie.com/brian/

There are two ways of constructing a software design. One way is to
make it so simple that there are obviously no deficiencies. And the other
way is to make it so complicated that there are no obvious deficiencies.

Professor C. A. R. Hoare
The 1980 Turing award lecture
___
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech


Re: [vox-tech] Memory addressing?

2010-06-22 Thread Bill Broadley
On 06/22/2010 09:46 AM, Chanoch (Ken) Bloom wrote:
 On Tue, Jun 22, 2010 at 09:11:44AM -0700, Brian Lavender wrote:
 Can someone confirm what is correct? 

 Tim and I were discussing memory addressing at Crepeville last night
 and we had a disagreement about how memory is addressable. I say that
 on today's common intel i386 32 bit architecture (in case you are one of
 those souls who builds your hardware from scratch), that memory is byte
 (octet) addressable. You can load a byte from memory into the lower 8
 bits of a register. Tim says that memory is only addressable on 32 bit
 word boundaries.

Heh, there are many tiny details that depending on your perspective could
justify either answer.

Ken is of course correct that a pointer can address a byte.  There are
performance penalties for unaligned byte accesses, but they work.

On the other hand from the offchip view you can't load less than a cache line
from main memory and that's usually 64-128 bytes.  Additionally those requests
have to be aligned on cache line boundaries.

Kinda related to the question is an i386 a 32 bit architecture?  Well it can
address 40 bits (with PAE), read/write 64 bit values from main memory
(doubles), and allow for machines with over 4GB ram.  Granted PAE is gross,
the segment register bites again.

 (I don't know of any modern architectures that aren't
 byte addressable, though MIPS takes some shortcuts in its various jump
 instructions because the instructions have to be word-aligned.)

Well, depending on your definition of modern ;-).  The alpha architecture is
IMO rather modern... even if dead.  It had 64 bits from the start, vector/SIMD
operations, symmetric multithreading (before intel), out of order operation,
multiple functional units, on chip memory controllers, multiple memory
channels, etc.  It didn't however have x86 compatibility, that and marketing
killed it.  Amusingly after many $B and many years of process improvements the
Itanium never really bested the alpha even with a much smaller process, much
better memory bandwidth, and embarrassing more transistors.  Sad, seemed like
the linux boom was a bit late for the alpha.  So Dec died, sold alpha to
compaq which died and sold alpha to HP which killed of pa-risc, alpha, and
for the most part itanium.  Amusingly arm is now looking at the same space
(servers and hpc).

Amusingly intel is trying to get some folks to migrate their very expensive
mainframe based services to itanium, but it seems like the market is mostly
interested in either staying where they are, of if they do bother to make the
huge outlay to port their software once they want to use a standard commodity
platform instead of a specialty CPU solution looking for a problem.

The alpha originally designed to deliver a factor of 1000 in performance over
it's lifetime (10 in clock, 10 in ipc, and 10 in parallel cores) with all of
the latest greatest architectural features.  They basically completely
designed a CPU from scratch (something that hasn't been done in awhile)
designed for general purpose use.

To allow for increased CPU performance they simplified the CPU so they could
not restrict the clock rate and spend the transistors where they would help
performance the most.  The original didn't have byte addressing.  To write a
byte on the original alpha you had to read 64 bits, modify the byte you wanted
and write 64 bits.

BTW, I believe this wasn't that a pointer addresses 2^64 64 bit quantities,
but that the bottom 6 bits of a pointer were mandated by the architecture to
be zero.

[ Snipped Ken's proof of i386 byte addressability ]

___
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech