We must not forget that it is not human intelligence. It requires an
absurdly large amount of data to match what it can be achieved with
relatively very little input in humans, like, with learning languages. On
the other hand, it can learn an arbitrarily large number of languages
provided enough memory. It doesn't have will of its own, it lives in an
eternal slumber, unless it is told to do something, in which case it can do
something forever. Also, it has no idea on how to make politics of any
type, even on an interpersonal level. It doesn't have a body. To do
something malicious, it must be provided with examples from someone
malicious, and it will do it only within the parameters provided. I don't
think it will do anything like the Colossus computer of the film. Probably,
it will at most be like a supergenius that obeys orders, not unlike people
who took part on the Manhattan project.

Em seg., 3 de abr. de 2023 às 15:47, Jed Rothwell <jedrothw...@gmail.com>
escreveu:

> I wrote:
>
>
>> The human brain has 86 billion neurons, all operating simultaneously. In
>> other words, complete parallel processing with 86 billion "processors"
>> operating simultaneously. ChatGPT tells us she has 175 billion
>> parameters in Version 3. I assume each parameter is roughly equivalent to a
>> neuron.
>>
>
> Wrong! I got that wrong. Each parameter is roughly equivalent to a neuron,
> but human brain neurons have roughly 7,000 synapses:
>
> Each neuron has, on average, about 7,000 synaptic connections with other
> neurons. That puts the synapse count in the neighborhood of 600 trillion.
> In young children, before synaptic pruning begins in earnest, the estimated
> number reaches as high as 1 quadrillion.
>
>
> https://www.jax.org/news-and-insights/jax-blog/2018/december/600-trillion-synapses-and-alzheimers-disease
>
> ChatGPT tells me that each parameter is stored as a 16 or 32-bit
> floating point number. That is roughly equivalent to having 32 synapses per
> neuron. The human neuron can have a much wider range of values. I don't
> know how that affects the model, but to me the human neuron looks more like
> a quantum computer Qbit than a 32-bit floating point number. Presumably,
> brains evolved to the optimum number of neurons and synapses.
>
> So, ChatGPT Ver. 3 was smaller than the human brain in terms of the range
> of values the ANN can hold. Granted that comparing it to a brain is
> comparing apples to oranges, there is probably some advantage to a much
> larger range of values per neuron. A super-brain would need a lot more
> synapse equivalents compared to a human brain, so we have a long way to go
> before reaching something like a million times 1 quadrillion artificial
> synapses. Plus of course it would need programming methods not yet invented.
>
> ChatGPT tells me that with the specialized hardware provided by OpenAI,
> all 175 billion parameters are all loaded into RAM at the same time. With
> other configurations they are overlaid. The speed of MPP operation depends
> on whether the parameters are loaded or overlaid, and I think more
> importantly on whether the operation is done with specialized hardware (GPU
> and Google's TPU), or emulated with ordinary computer hardware.
>
> I asked ChatGPT for some of these details. Here is our exchange:
>
>
> ME: You told me that the largest version of GPT-3 had 175 billion
> parameters. Is each parameter stored as 1 byte? Or some other number of
> bits?
>
> ChatGPT: No, each parameter in GPT-3 is not stored as 1 byte. Instead,
> each parameter is typically stored as a 16-bit or 32-bit floating-point
> number, depending on the implementation. This is because floating-point
> numbers can represent a wider range of values and provide greater precision
> than integers.
>
> For example, GPT-3's 175 billion parameters would require around 700 GB of
> memory if each parameter was stored as a 32-bit floating-point number. If
> each parameter was stored as a 16-bit floating-point number, it would
> require around 350 GB of memory.
>
> It's worth noting that the actual memory usage of GPT-3 may be different
> due to various optimizations and compression techniques used by the
> implementation.
>
> ME: With the 32-bit floating point number configuration, I assume all 700
> GB are online in RAM. Not overlaid. Correct?
>
> ChatGPT: No, it's not necessary for all 700 GB of GPT-3's parameters to be
> stored in RAM at the same time. In fact, it's not practical to store such a
> large model entirely in memory on most machines.
>
> Most deep learning frameworks use a technique called "memory-mapping" to
> allow the model parameters to be loaded and unloaded from disk as needed
> during training or inference. Memory-mapping allows the model parameters to
> be accessed as if they were in memory, even though they are stored on disk.
>
> In practice, the memory usage of GPT-3 will depend on the specific
> implementation and hardware configuration. Some implementations may use
> techniques like model parallelism or gradient checkpointing to reduce
> memory usage, while others may rely on larger hardware configurations with
> more memory.
>
> ME: I meant in the "specialized computing infrastructure provided by
> OpenAI" that you described earlier. Is that all on-line?
>
> ChatGPT:   Ah, I see what you're asking now. Yes, in the specialized
> computing infrastructure provided by OpenAI, the GPT-3 model with 175
> billion parameters is fully loaded into the RAM of the computing cluster
> during operation. The computing cluster used by OpenAI is composed of many
> machines with large amounts of RAM, and the model parameters are
> distributed across those machines.
>
> However, it's worth noting that not all of the machines in the cluster
> will necessarily have a full copy of the model parameters in memory at all
> times. Instead, the model parameters may be partitioned across the machines
> and loaded into memory as needed for different parts of the computation.
> This approach allows the computing cluster to handle the large memory
> requirements of the GPT-3 model while also being scalable and efficient.
>
> ME: I realize that memory mapping is very rapid with today's peripherals,
> so even an overlay configuration will be rapid.
>
> ChatGPT:  Yes, memory mapping can be very fast with modern storage devices
> and is commonly used to efficiently access large datasets that cannot fit
> entirely in memory. With memory mapping, the operating system maps a
> portion of a file into the virtual address space of a process, allowing the
> process to access the file's contents as if they were in memory.
>
> However, it's important to note that memory-mapped I/O does have some
> overhead, such as page faults and disk I/O. Page faults can occur if the
> requested memory page is not in physical memory and must be fetched from
> disk, while disk I/O can occur if the requested page has not yet been read
> into the page cache. These overheads can be minimized through various
> techniques such as preloading data into the page cache, using large pages,
> and optimizing disk access patterns.
>
> In any case, the actual performance of memory-mapped I/O depends on the
> specific hardware and software configuration, as well as the
> characteristics of the workload being performed.
>
> ME: One more question about this. I read that the "specialized computing
> infrastructure provided by OpenAI" is an MPP configuration. Approximately
> how many processors are there? How many processors per parameter? I guess
> one GPU or TPU is equivalent to one CPU processor in a traditional MPP
> configuration.
>
> ChatGPT:   I'm sorry, but I don't have access to detailed information
> about the specific hardware and configuration of OpenAI's computing
> infrastructure, so I can't provide specific numbers about the number of
> processors or GPUs/TPUs used per parameter.
>
> However, I can provide some general information about MPP (Massively
> Parallel Processing) configurations and their relationship to GPUs/TPUs.
>
> In an MPP configuration, the processing power of a system is distributed
> across multiple processors or nodes, allowing for massive parallelism and
> high throughput. This is typically achieved through the use of specialized
> hardware and software, such as high-speed interconnects and parallel file
> systems.
>
> In the context of deep learning, GPUs and TPUs are often used as the
> primary processing units in MPP configurations due to their ability to
> perform large-scale matrix operations in parallel. GPUs and TPUs are
> designed with a large number of processing cores that can perform these
> operations simultaneously, making them well-suited for deep learning
> workloads.
>
> The number of processors or GPUs/TPUs used per parameter will depend on
> various factors, such as the size of the model, the complexity of the
> workload, and the hardware configuration. In general, larger models and
> more complex workloads will require more processing power per parameter,
> while smaller models and simpler workloads may be able to make do with less.
>
> ME: So, a larger model actually needs a higher ratio of GPU or TPU units
> per parameter? A model is "larger" in the sense that it does more
> processing for each parameter, not that it has more parameters. That is
> interesting.
>
> ChatGPT:  Yes, you're correct. The number of processing units, whether
> CPUs, GPUs or TPUs, required per parameter in a deep learning model depends
> not only on the number of parameters but also on the complexity of the
> computation required for each parameter.
>
> Larger models, in terms of the computation required for each parameter,
> will require more processing units per parameter, as the computation
> becomes more computationally intensive. This is especially true for models
> that require more complex operations, such as recurrent neural networks,
> convolutional neural networks, and transformers like GPT-3.
>
> In general, the number of processing units required per parameter is
> proportional to the computational requirements of the model. Therefore,
> larger models require a higher ratio of processing units per parameter,
> which means that more powerful hardware configurations, such as clusters of
> GPUs or TPUs, are needed to efficiently train or inference these models.
>
>

-- 
Daniel Rocha - RJ
danieldi...@gmail.com

Reply via email to