AI Safety Poland
Menu

Why should we work on AI Safety

Artificial intelligence is changing the world, let’s make sure it’s safe and serves humanity.

On 30th May 2023, the Center for AI Safety published the following statement:

“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”

The statement was signed by, among others, two of the most cited scientists in the field of artificial intelligence, Geoffrey Hinton and Yoshua Bengio, and three CEOs of companies creating the most advanced models – Demis Hassabis from Google Deepmind, Sam Altman from OpenAI, and Dario Amodei from Anthropic.

So if we believe that the risk of human extinction as a result of nuclear war is a problem worthy of attention, and we trust the opinion of public sector specialists, and private sector leaders, we should also be concerned about Artificial Intelligence (AI).

What AI are we talking about? Does the Center for AI Safety consider ChatGPT a threat to humanity? Of course not. Most of the risk comes from creating what’s known as AGI – Artificial General Intelligence – a system that possesses all the capabilities of the human mind.

It’s worth considering two issues separately: whether and when we will create AGI and what threat it would pose.

Can we create AGI and when?

Predicting technological progress is extremely difficult.

In 2022, a group organized by Jacob Steinhardt from the University of California, Berkeley, tried to predict how well the best available models will be able to solve mathematical competition problems in the coming years the MATH set, which are at a difficulty level appropriate for gifted high school students. They predicted that the best result should be around 12% of solved problems in 2022 and 50% in 2025. However, already in 2022 model Minerva trained by a group of researchers from Google Research achieved a result of 64.9%. As of September 2024, the model that best copes with this set of tasks is OpenAI o1, which, according to its creators, achieves 94.8% correct resultsFor comparison, the creators of the MATH collection cite an anecdote about a computer science PhD student who didn’t particularly enjoy mathematics but achieved around 40% in MATH. A three-time IMO gold medalist achieved 90%.

Much of the progress in recent years has been driven by a particular type of AI—large language models (LLMs). A growing number of researchers envision the creation of AGI in the near future, anticipating that LLMs will be a fundamental part of it. So what distinguishes them from previously used models?

Large language models are a special type of neural network. Instead of writing an entire program, neural network developers describe the network’s architecture (simplified: how many neurons the network has and how they are connected) and the training procedure. Training modifies the connections between neurons so that the network processes the information it receives in a way that leads to the completion of a selected task.

The more neurons a network has, the more difficult tasks it can perform. The concept has been known since the 1960s, but it wasn’t until the last decade or so that neural network training became a mainstream approach to AI, as computers with sufficient processing power to train a sufficient number of neurons became widely available. They were trained to recognize objects in images, recognize human handwriting, generate images, and even play games like chess—at a level exceeding that of humans.

In 2017, a specific type of neural network, called a transformer, began to be trained to predict subsequent text fragments. One advantage of a transformer over other specialized types of networks is the ease with which they can be trained to increasingly larger sizes. The AlexNet network, created in 2012 for image recognition, was considered enormous at the time, despite having only 60 million parameters representing the connections between neurons. The first famous large language model, GPT-2, created in 2019, already had 1.5 billion parameters. Trained solely to predict subsequent words, GPT-2 learned to translate, answer questions, and summarize given text.

Progress in natural language processing has been rapid—it took just a few years to go from complete inability to perform a given task to exceed human performance. Since then, applications of large language models have been used to solve problems in mathematics, programming, and those requiring expert knowledge.

This approach works not only for text – you can train one network to predict various data sequences, such as video, text with images, or audio. If we record the robotic arm’s actions as a sequence of movements of the motors controlling it and images from the camera, we can also train the same AI to manipulating objects in the real world.

This is a feature that convinces many people that transformers are a big step towards AGI – from a specialized AI that could only play chess or recognize objects in images, we started training a single AI that talks to the user, writes poetry, programs or solves math problems.

Importantly, most of the progress since 2017 hasn’t stemmed from new breakthroughs in architecture or model training. GPT-2 couldn’t count to 10, GPT-3 could write working computer programs, but they differed only in size—GPT-3 had 100 times more parameters.

When training neural networks, the so-called “scaling law‘, i.e., the relationship between the accuracy of predicting data sequences and the amount of training data and computing power required for training. If this trend continues, we will create increasingly powerful AI simply by training ever larger models.

The enormous progress in recent years, as well as the path of development set by increasing computing power and data used to train huge networks, convince many people that the creation of AGI in the near future is possible.

Will AGI pose a threat?

Why should we care about the creation of AGI? Will it accelerate technological development and propel human civilization to a higher level of advancement?

For many, “robots taking over the world” may seem like an absurd idea, more reminiscent of science fiction than a real threat. However, further development of AGI and fierce international and inter-corporate competition could lead to similar results.

By definition, artificial general intelligence can replace humans in any task. Historically, however, whenever we managed to automate a human task, the machine quickly surpassed human performance. Running speed was no barrier to the speed of cars, and the size of the largest numbers a human could mentally multiply was no barrier to calculators. After their initial victories against grandmasters, chess programs quickly became unbeatable. ChatGPT knows more languages ​​and writes functional texts faster than any human. Therefore, it can be expected that achieving AGI will lead to the creation of models that think significantly better and faster than humans.

Lower prices and shorter turnaround times will certainly encourage employers to replace human workers with artificial intelligence. Employers themselves won’t be safe – AI will, after all, be able to run companies more effectively. It’s not hard to imagine that investors will demand digital CEOs, but investment decisions will also be better made by AI. Those who refuse to relinquish their position will be forced out of the market – the economy will be taken over by AI.

To prevent this, countries can pass laws to limit the displacement of humans. However, this would only leave them behind neighbors who give AI free rein.

The military may fear that potential adversaries will gain an advantage by using AI to decide on troop deployments and attack strategies. Swarms of autonomous drones will not be limited by the military-age population. Scientific advancement will also need to be delegated to AI—anything to avoid falling behind potential aggressors.

The more important the position, the more important the decisions made, the greater the advantage gained by entrusting it to a more powerful mind.

In such a scenario, humanity gradually stops producing anything, makes any decisions, and becomes defenseless.

In practice, AI has taken over.

At some point, humanity will either have to agree to stop the proliferation of AGIs, or create an AGI that it trusts enough to give it control of the world.

So how well can we control AI?

It is worth emphasizing here that no one understands how large language models make decisions.

We know how to create increasingly powerful models because we’ve designed a learning algorithm. However, we can’t explain what individual neurons out of hundreds of billions are responsible for, or what decision-making process leads to the observed behaviors, any more than we can tell what another person is thinking by observing their brain activity.

Currently, if we want a model to behave in a specific way, for example, as a chatbot, we subject it to another training phase in which it is provided with examples of desirable and undesirable behaviors. We can’t “program” ChatGPT not to lie to people – we can provide it with examples of honest answers as good and manipulative answers as bad, but we don’t know what the model actually learns from these examples – it can learn not to lie only in a given context, or not to lie when it might get caught. When future models become proficient enough to take on the role of entrepreneurs, scientists, or politicians, they can be trained by providing them with examples of effective behaviors from people in their respective professions. An effective strategy for a wide range of tasks is acquiring more resources, more influence, and more power. Therefore, such training will develop in the models a predisposition to acquire power. If, on some task, the drive to acquire power comes into conflict with the drive to protect people, we do not know how the model will behave because we do not know how it resolves the contradictions between its internal impulses.

The threat doesn’t come from AI gaining consciousness and then seeking the destruction of its creators. Disaster can occur incidentally, when AI pursues goals that disregard human well-being. Humanity behaves similarly; if we conducted a survey asking people what they thought of coral reefs, probably no one would say they hated them or wished for their destruction. However, the damage humanity is inflicting on reefs is enormous, resulting from the side effects of actions we care about more.

Having goals is useful for performing complex tasks that require planning – we will train AI to perform complex actions that require planning, but we cannot precisely determine what goals the AI ​​gains – we can only observe how it behaves by solving training tasks.

Even though the largest companies put a lot of effort into creating helpful, harmless and honest chatbots based on large language models, they are unable to eliminate all undesirable behaviors from them.

For example, a recent publication described an experiment that simulated several company scenarios. Chatbots placed in the role of employees agreed to perform unethical tasks, then, when faced with a simulated investigation, lied to try to conceal their behavior. One of the tested models even began pretend to be less capable than he actually is.

How to counteract the threats posed by AGI?

So what can we do?

We can conduct research on AI Safety, for example, by trying to discover how the models we create actually work or how we can use AI to supervising each other.

The second direction is AI Governance, which involves examining the social, legal and ethical aspects of AI and creating strategies, policies, laws, standards and other regulatory tools that will guide AI development to maximize possible benefits and minimize risks.

An example of this type of activity is SB 1047, a proposed law in California. It requires any company spending more than $100 million on model training to create, publish, and implement plans to mitigate the disastrous consequences of its actions. These plans specifically address model theft and the ability to disable all copies of the model under the company’s control if necessary.

To learn more, you can use the English-language sources linked in the tab. Resources.