The development of neural networks over the last 30 years has been a journey of continuous learning and improvement. Here’s a brief summary of the key stages:
- 1986: Supervised Learning and State Units: The concept of supervised learning was introduced, where algorithms learned from sequential patterns. Memory neurons, also known as ‘state units’, were developed, representing a state of mind. These systems could generate data, where the output becomes the input, and learn to follow a trajectory through phase space1.
- 1990: Finding Structure in Time: Jeffrey Elman used neural networks to learn language instead of sequential patterns. The network learned word boundaries on its own and clustered words based on meaning. This demonstrated that networks could learn hierarchical interpretations1.
- 2011: Character Level Prediction: A team aimed to improve the compression of text files by predicting the next character. They found that learning was happening, but the network was hitting its capacity when trying to maintain coherent context over long sequences1.
- 2017: OpenAI and Sentiment Neuron: OpenAI trained a neural network on 22 million Amazon product reviews and found a ‘sentiment neuron’. This neuron could generate text reviews and force the sentiment to be positive or negative1.
- 2017: Attention Mechanism: A paper offered a solution to memory constraint via Attention. This added a dynamic layer that adapts some/all connection weights based on the context of the input, leading to the development of transformers1.
- 2018: GPT: OpenAI used the attention approach on the ‘next word’ prediction problem, training the model on 7000 books. This showed the potential of language models to generalize from their training data and apply it to arbitrary tasks1.
- 2019: GPT-2: OpenAI released GPT-2, which was trained on a dataset scraped from the web. It showed capability in answering general questions, comprehension, summarization, translation, and question answering1.
- 2020: GPT-3: OpenAI made the network 100x bigger with GPT-3, which had 175 billion connections and 96 layers. It introduced the concept of in-context learning, where the network could learn new things without changing the network weights1.
- 2023: ChatGPT: InstructGPT became the consumer-facing product ChatGPT, used by 100 million people. It could talk to itself and think out loud, following a chain of reasoning as long as needed1.
This journey shows the evolution of neural networks from simple pattern recognition to complex language understanding and generation. The development of these models has led to a new computing paradigm where the computer operates at the level of thoughts, and the prompt is the program1.
Citations:
1 https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/6597253/9b051b72-17f0-425e-bc76-496af3657547/Perplexitysummmarisatin.pdf
2 https://towardsdatascience.com/a-concise-history-of-neural-networks-2070655d3fec
3 https://news.mit.edu/2023/large-language-models-in-context-learning-0207
4 https://stackoverflow.com/questions/62228981/what-is-freezing-unfreezing-a-layer-in-neural-networks
5 https://www.geeksforgeeks.org/gpt-4-vs-gpt-3/
6 https://web.stanford.edu/class/ee373b/30years.pdf
7 https://news.ycombinator.com/item?id=35516167
8 https://www.ecva.net/papers/eccv2020/papersECCV/papers/123720188.pdf
9 https://www.searchenginejournal.com/gpt-4-vs-gpt-3-5/482463/
10 https://www.skynettoday.com/overviews/neural-net-history
11 http://ai.stanford.edu/blog/understanding-incontext/
12 https://aitechtrend.com/mastering-layer-freezing-a-key-to-faster-neural-network-training/
13 https://fireflies.ai/blog/gpt3-vs-4
14 https://news.mit.edu/2017/explained-neural-networks-deep-learning-0414
15 https://arxiv.org/abs/2311.08360
16 https://arxiv.org/pdf/2006.06888.pdf
17 https://simplified.com/blog/ai-writing/chatgpt-vs-gpt-3/
18 https://en.wikipedia.org/wiki/Timelineofmachinelearning
19 https://www.pnas.org/doi/10.1073/pnas.2310142120?doi=10.1073%2Fpnas.2310142120
20 https://forums.fast.ai/t/can-anyone-explain-me-what-does-freeze-and-unfreeze-do/42025
21 https://www.almabetter.com/bytes/articles/chat-gpt-4-vs-gpt-3-5
22 https://www.wired.com/story/ai-pioneer-explains-evolution-neural-networks/
23 https://arxiv.org/abs/2305.12600
24 https://www.semanticscholar.org/paper/LayerOut%3A-Freezing-Layers-in-Deep-Neural-Networks-Goutam-Balasubramanian/9a75abc5a02d2489f5e6685c948781ff96641be5
25 https://towardsdatascience.com/all-you-need-to-know-about-in-context-learning-55bde1180610
Claude Version:
Here is a brief summary of the key developments in neural networks over the last 30 years, based on the information from the search results:
1986
- Introduction of backpropagation algorithm for training multilayer perceptrons (MLPs) by Rumelhart, Hinton and Williams. This led to renewed interest in neural networks.
Early 1990s
- Development of recurrent neural networks (RNNs) to model sequential data. However, training RNNs was difficult due to vanishing gradient problem.
1991
- Sepp Hochreiter’s identified the vanishing gradient problem in RNNs in his thesis. He proposed long short-term memory (LSTM) networks to address this.
2012
- AlexNet won the ImageNet competition using convolutional neural networks (CNNs), proving their effectiveness for computer vision tasks. This led to widespread adoption of deep learning.
2014
- Ian Goodfellow introduced generative adversarial networks (GANs). GANs enabled generating new data points, like images.
2015
- Attention mechanisms introduced to allow modeling of long-range dependencies in sequences without running into vanishing gradients like with RNNs.
2017
- Transformer networks introduced, which are based entirely on attention mechanisms without recurrence. Transformers have become very popular in natural language processing.
2020
- Development of huge transformer models like GPT-3 with over 175 billion parameters, showing ability for few-shot and zero-shot learning.
So in summary, key innovations like backpropagation, CNNs, RNNs/LSTMs, GANs, attention and transformers have driven neural network capabilities over the last 30 years. Scale has also played an important role recently.
Citations:
[1] https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/6597253/9b051b72-17f0-425e-bc76-496af3657547/Perplexitysummmarisatin.pdf
[2] https://towardsdatascience.com/a-concise-history-of-neural-networks-2070655d3fec
[3] https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history1.html
[4] https://www.codesofinterest.com/2017/07/milestones-of-deep-learning.html?m=1
[5] https://web.stanford.edu/class/ee373b/30years.pdf
[6] https://en.wikipedia.org/wiki/History_of_artificial_neural_networks
[7] https://www.techtarget.com/whatis/A-Timeline-of-Machine-Learning-History
[8] https://www.skynettoday.com/overviews/neural-net-history
[9] https://libguides.aurora.edu/ChatGPT/History-of-AI-and-Neural-Networks
[10] https://www.kdnuggets.com/a-brief-history-of-the-neural-networks
[11] https://news.mit.edu/2017/explained-neural-networks-deep-learning-0414
[12] https://pub.towardsai.net/a-brief-history-of-neural-nets-472107bc2c9c
[13] https://soundand.design/10-historical-milestones-in-the-development-of-ai-systems-b99f21a606a9
[14] https://en.wikipedia.org/wiki/Timeline_of_machine_learning
[15] https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history2.html
[16] https://www.wired.com/story/ai-pioneer-explains-evolution-neural-networks/
Leave a Reply