The Research About What AI can and cannot do.

Table of Contents

What Are the Bounds of the Research?
#

I’m not a youngster, and I’ve watched several new tools be developed over the years, and all of these tools go through what I refer to as the three generation pattern:

Generation 1: What is it?
Generation 2: How can it be used?
Generation 3: When should it be used?

Generation 1 invents the tool to solve a problem they have. Generation 2, impressed with the tool, begins experimenting with the tool to see what other problems it can solve. They start using the tool everywhere, including places that it shouldn’t be used. Because of this, generation 3 understands the strengths and weaknesses of the tool and uses it more judiciously.

Right now, AI tooling is deep within generation 2. I am now trying to collect the knowledge, and glean what wisdom generation 3 will have. Unfortunately several people have decided that this tooling is their path to power and money. They are spending a great deal of time and effort convincing others that the tooling is capable of doing things that it cannot. This meant that I had to add the extra step of separate facts from fiction. I have found a lot of people heavily invested in the fiction.

When Should LLMs be Used?
#

It is Probabilistic, not Deterministic.
#

Though out the history of computers, applications were designed to be deterministic. 1 + 1 always equals 2. When the column of numbers in a spread sheet is added, the answer is the same regardless of the running computer is Mac, Windows, or Linux. The executable compiled from unchanged source code will behave in exactly same way no matter how many times or where it is compiled. The outcome of any given action can be determined in advance and is trusted.

AI tooling breaks this pattern. The answer is not always the same. A personal example of this occurred while I was doing some Gemma4 testing. I created a new rust project, which provides a simple “Hello, world!” program as a starting point. I then ran a prompt to add unit testing to the application. The first time, it added the unit testing into the main.rs file. The second time, when I repeated the steps, it created a lib.rs file and placed all of the tests and functions in it. Both of these solutions worked, but both of these solutions were different. Thus proving, the same result is not guaranteed.

Additionally there is increased evidence that simply changing the word order of the request can change the outcome. The Hidden Position Bias in LLMs: Why Your AI Might Fail When It’s Asked to Choose is an article about how the choices AIs make are based on the order the order that they are entered into the request. Harvard Just Caught AI Lying to Every Executive in America has a wonderful story about him protecting his friend from a chat bot. To protect his friend, he had him log open a series of chat bot sessions without logging in confirm three different causes to his systems. The chat bot always agreed with the cause mentioned in the session. If AI was deterministic, the answers would have agreed with each other.

Because it is probabilistic, it leads to the following branch issue:

The AI can Lie.
#

There are a lot of stories about lawyers using AI to write briefs. Example: (How to Use ChatGPT to Ruin Your Legal Career)[https://www.youtube.com/watch?v=oqSYljRYDEM]. Though the example is about people not doing the work they were hired to do, the problem was caused by ChatGPT making up information it did not have.

Unfortunately, this seems to be seems to be baked into the system: (LLMs Will Always Hallucinate, and We Need to Live With This)[https://arxiv.org/html/2409.05746v1]. Though I cannot site it at this time, I found an article about how most of the wrong answers came from a small number of nodes within the LLM. The experimenter was able increase accuracy by decreasing the use of those nodes; however, deactivating the nodes caused the LLM to lose the ability to answer in natural language.

This means that the AI is best used when the output can be verified. This works well for code building tasks because testing, both automated and manual, can confirm the output of the LLM. It also works well for finding information, like the top selling, most durable, least liked… because that information can be verified externally.

This also means that AI should not be used to make a decision. As the above mentioned paper points out, it’s not possible to have all available data on a subject. When there is missing data, the AI will then start making stuff up. Therefore you cannot be sure if the decision was made on actual data, or something the machine made up.

It is an Advanced Pattern Recognition System.
#

I’ve not seen any evidence of AI doing any actual thinking. Thinking requires planning and understanding, and all I am seeing is pattern recognition. This is part of thinking, but not all of it.

There are several examples showing this. One example is Harvard and MIT Study: AI Models Are Not Ready to Make Scientific Discoveries where the AI was asked to predict the force applied to two planetary bodies. The mathematics has been known since the beginning 1600s, but the AI shows random forces that happen to pull the planet into a standard orbit.

Another example came from researchers trying to measure speed improvements based on LLM prompts. They were testing prompts to solve puzzles like “Tower of Hanoi.” Hanoi can be solved with a simple algorithm. Two prompts were created, and both prompts explained the rules of the puzzle and asked the AI to solve it. The difference was one included the description of the algorithm, and the other did not. Both prompts took the same amount to time. If the LLM were actually thinking, it would realize it was given the answer and used it.

Yet another is found in The Cartoon that Breaks AI. Though the producer points out that he doesn’t understand why this is occurring, he notes that even small children are able to follow along even through AI is not. The most likely reason is that there’s not enough training data for Dr. Seuss, and the AI cannot extrapolate. Or in simple terms, it can find a similar pattern, but it cannot make a new one.

I can see where this can useful. Attacking a piece of software with every know method breaking it could make an application more secure. Calling up existing solutions to a known problem is convenient. It’s an excellent index of what has happened, but it cannot predict.

Similar to Outputs, Security is Different with AI.
#

Security is about keeping people out. It’s about making sure that only the appropriate people have access. The problem is that AI is asking to be treated with the most trusted status.

For AI to be useful, it must first train on data. This means giving it access to data behind the fence. That could be mitigated by building and maintaining the models yourself, but it most likely means hiring an outside source, since it requires additional compute and specialists in data analytics. But that is not all. Every prompt that is sent out is also being sent to this third party. This means that that they also know what types of questions are being asked of the data.

If he party providing the service were unscrupulous, they could easily duplicate the services as using the data and prompts given to them. That is a very high level of trust.

What Are the Bounds of the Research? #

When Should LLMs be Used? #

It is Probabilistic, not Deterministic. #

The AI can Lie. #

It is an Advanced Pattern Recognition System. #

Similar to Outputs, Security is Different with AI. #