Perplexity summary of neural network notes

The development of neural networks over the last 30 years has been a journey of continuous learning and improvement. Here’s a brief summary of the key stages:

1986: Supervised Learning and State Units: The concept of supervised learning was introduced, where algorithms learned from sequential patterns. Memory neurons, also known as ‘state units’, were developed, representing a state of mind. These systems could generate data, where the output becomes the input, and learn to follow a trajectory through phase space1.
1990: Finding Structure in Time: Jeffrey Elman used neural networks to learn language instead of sequential patterns. The network learned word boundaries on its own and clustered words based on meaning. This demonstrated that networks could learn hierarchical interpretations1.
2011: Character Level Prediction: A team aimed to improve the compression of text files by predicting the next character. They found that learning was happening, but the network was hitting its capacity when trying to maintain coherent context over long sequences1.
2017: OpenAI and Sentiment Neuron: OpenAI trained a neural network on 22 million Amazon product reviews and found a ‘sentiment neuron’. This neuron could generate text reviews and force the sentiment to be positive or negative1.
2017: Attention Mechanism: A paper offered a solution to memory constraint via Attention. This added a dynamic layer that adapts some/all connection weights based on the context of the input, leading to the development of transformers1.
2018: GPT: OpenAI used the attention approach on the ‘next word’ prediction problem, training the model on 7000 books. This showed the potential of language models to generalize from their training data and apply it to arbitrary tasks1.
2019: GPT-2: OpenAI released GPT-2, which was trained on a dataset scraped from the web. It showed capability in answering general questions, comprehension, summarization, translation, and question answering1.
2020: GPT-3: OpenAI made the network 100x bigger with GPT-3, which had 175 billion connections and 96 layers. It introduced the concept of in-context learning, where the network could learn new things without changing the network weights1.
2023: ChatGPT: InstructGPT became the consumer-facing product ChatGPT, used by 100 million people. It could talk to itself and think out loud, following a chain of reasoning as long as needed1.

This journey shows the evolution of neural networks from simple pattern recognition to complex language understanding and generation. The development of these models has led to a new computing paradigm where the computer operates at the level of thoughts, and the prompt is the program1.

Citations:

1 https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/6597253/9b051b72-17f0-425e-bc76-496af3657547/Perplexitysummmarisatin.pdf

2 https://towardsdatascience.com/a-concise-history-of-neural-networks-2070655d3fec

3 https://news.mit.edu/2023/large-language-models-in-context-learning-0207

4 https://stackoverflow.com/questions/62228981/what-is-freezing-unfreezing-a-layer-in-neural-networks

5 https://www.geeksforgeeks.org/gpt-4-vs-gpt-3/

6 https://web.stanford.edu/class/ee373b/30years.pdf

7 https://news.ycombinator.com/item?id=35516167

8 https://www.ecva.net/papers/eccv2020/papersECCV/papers/123720188.pdf

9 https://www.searchenginejournal.com/gpt-4-vs-gpt-3-5/482463/

10 https://www.skynettoday.com/overviews/neural-net-history

11 http://ai.stanford.edu/blog/understanding-incontext/

12 https://aitechtrend.com/mastering-layer-freezing-a-key-to-faster-neural-network-training/

13 https://fireflies.ai/blog/gpt3-vs-4

14 https://news.mit.edu/2017/explained-neural-networks-deep-learning-0414

15 https://arxiv.org/abs/2311.08360

16 https://arxiv.org/pdf/2006.06888.pdf

17 https://simplified.com/blog/ai-writing/chatgpt-vs-gpt-3/

18 https://en.wikipedia.org/wiki/Timelineofmachinelearning

19 https://www.pnas.org/doi/10.1073/pnas.2310142120?doi=10.1073%2Fpnas.2310142120

20 https://forums.fast.ai/t/can-anyone-explain-me-what-does-freeze-and-unfreeze-do/42025

21 https://www.almabetter.com/bytes/articles/chat-gpt-4-vs-gpt-3-5

22 https://www.wired.com/story/ai-pioneer-explains-evolution-neural-networks/

23 https://arxiv.org/abs/2305.12600

24 https://www.semanticscholar.org/paper/LayerOut%3A-Freezing-Layers-in-Deep-Neural-Networks-Goutam-Balasubramanian/9a75abc5a02d2489f5e6685c948781ff96641be5

25 https://towardsdatascience.com/all-you-need-to-know-about-in-context-learning-55bde1180610

Claude Version:

Here is a brief summary of the key developments in neural networks over the last 30 years, based on the information from the search results:

1986

Introduction of backpropagation algorithm for training multilayer perceptrons (MLPs) by Rumelhart, Hinton and Williams. This led to renewed interest in neural networks.

Early 1990s

Development of recurrent neural networks (RNNs) to model sequential data. However, training RNNs was difficult due to vanishing gradient problem.

1991

Sepp Hochreiter’s identified the vanishing gradient problem in RNNs in his thesis. He proposed long short-term memory (LSTM) networks to address this.

2012

AlexNet won the ImageNet competition using convolutional neural networks (CNNs), proving their effectiveness for computer vision tasks. This led to widespread adoption of deep learning.

2014

Ian Goodfellow introduced generative adversarial networks (GANs). GANs enabled generating new data points, like images.

2015

Attention mechanisms introduced to allow modeling of long-range dependencies in sequences without running into vanishing gradients like with RNNs.

2017

Transformer networks introduced, which are based entirely on attention mechanisms without recurrence. Transformers have become very popular in natural language processing.

2020

Development of huge transformer models like GPT-3 with over 175 billion parameters, showing ability for few-shot and zero-shot learning.

So in summary, key innovations like backpropagation, CNNs, RNNs/LSTMs, GANs, attention and transformers have driven neural network capabilities over the last 30 years. Scale has also played an important role recently.

Citations:
[1] https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/6597253/9b051b72-17f0-425e-bc76-496af3657547/Perplexitysummmarisatin.pdf
[2] https://towardsdatascience.com/a-concise-history-of-neural-networks-2070655d3fec
[3] https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history1.html
[4] https://www.codesofinterest.com/2017/07/milestones-of-deep-learning.html?m=1
[5] https://web.stanford.edu/class/ee373b/30years.pdf
[6] https://en.wikipedia.org/wiki/History_of_artificial_neural_networks
[7] https://www.techtarget.com/whatis/A-Timeline-of-Machine-Learning-History
[8] https://www.skynettoday.com/overviews/neural-net-history
[9] https://libguides.aurora.edu/ChatGPT/History-of-AI-and-Neural-Networks
[10] https://www.kdnuggets.com/a-brief-history-of-the-neural-networks
[11] https://news.mit.edu/2017/explained-neural-networks-deep-learning-0414
[12] https://pub.towardsai.net/a-brief-history-of-neural-nets-472107bc2c9c
[13] https://soundand.design/10-historical-milestones-in-the-development-of-ai-systems-b99f21a606a9
[14] https://en.wikipedia.org/wiki/Timeline_of_machine_learning
[15] https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history2.html
[16] https://www.wired.com/story/ai-pioneer-explains-evolution-neural-networks/

Unless We Forget