Script: Seriously, what the Hell is AI?

As featured in this ipXchange thought piece. Written by Dr. Eamon Standing.

I think it’s fair to say that AI (Artificial Intelligence) is THE hot topic at the moment. The funny thing is, AI isn’t a new concept. Not even nearly. But now everyone is suddenly talking about it, and more importantly, they’re trying to shove it into any product they can to add value to what they’re offering. To quote Guy at CES, “you want AI with that?”

But surely not everyone who offers AI with their product can really have this innovation within their grasp. Not with the amount of development it requires, right? So is there a bunch of phonies out there making big bucks from the AI fad? Or perhaps the real problem is that each person’s definition and understanding of AI is drastically different – I’m sure the cultural definition has changed even within the course of my lifetime. Which begs the question:

Seriously, what the Hell is AI? And maybe more importantly, what is not AI?

Picture this: a man sits at a narrow desk in a small room. There is a door to his right, the source of the knocking that woke him from his nap, but it is locked. He locked it. He likes his privacy, for good reason, but now that he’s awake, he knows what he must do. He watches as a piece of paper is slid under the door, and in black ink, a small string of Chinese characters is written clearly for him to read. He picks up the note, but he cannot read it. This man does not understand Chinese.

He turns to the desk and boots up the bulky 1980s computer, which is built to run a single program for a single purpose that he’s been assured will make him a very rich man. He uses a basic graphic user interface to replicate what is shown on the piece of paper, being careful to ensure that every aspect of each intricate character is correctly replicated on the screen. He hits return, and after a few seconds, a different set of characters appears in response. He jots them down on the same piece of paper and promptly slides his ‘answer’ under the door. Then he waits, and before long, another string of characters appears at his feet.

The conversation continues for around 20 minutes. By now, he’s getting the hang of the computer program again, and rewriting the characters faster than he’s ever done before. Then as suddenly as it started, the messages stop.

“OK, then,” he mutters to himself before switching off the machine and taking a nap. He’s earnt it.

On the other side of the door, another man – a Chinese man – shows the messages to his superiors. They all look impressed. “Yeah, his handwriting sucks and he’s a bit of a recluse, but boy is he a great advisor or what!” And for another ten years, no member of the organisation realises that the well-paid man in the small room knows and understands nothing about what this company does.

Hello, ipXchange community! It’s your usual host Dr. Eamon Standing coming at you with another thought piece to get your mind thinking, prefaced as last time with a little story – because I love writing stories.

What you just heard was a slightly embellished retelling of ‘The Chinese Room Argument’, which is a philosophical thought piece that was first published in 1980 by John Searle as an argument against Alan Turing’s “Turing Test” being an adequate way of evaluating whether a machine possesses human-level intelligence. Artificial intelligence.

The Turing test proposes that in a text-only chat environment, if a human evaluator cannot tell the difference between their conversation with a machine or with another human being, conducted as separate exchanges, then the machine must therefore be exhibiting intelligent behaviour at the same level as a human.

The Turing test is one of the core experiments in the field of AI, and given that this was first conceived in 1950, we are going back a LONG way. Of course, Searle’s Chinese Room Argument shows that just because an intelligent conversation can be had, it does not mean that a computer – or its operator – has any understanding of what is being asked of it. Not at a “theory of mind” level, which I think is a great introduction to asking the question of just what the Hell is AI?!

I’d like to preface what follows by stating that any conclusions drawn during this thought piece are merely my opinions, but they are based on independent research and my observations of how many people have used the phrase AI throughout my lifetime. And I firmly believe that there is a lot of confusion as to what AI is simply because the generally understood definition of AI has changed in recent years. In some ways, it is arguable that we have simply narrowed our definition of AI to fit what is easy to market as being impressive so that we can shut down some of the competition. Bravo, big tech. Bravo.

I’ll start by winding back the clock to a 2016 article by academic AI researcher Arend Hintze – sorry if I’m butchering his name or any others that I have or will mention. I think this is far enough back in time that it cannot be clouded by thoughts of the recent AI trendsetter ChatGPT, so I’ll trust it as a starting point for the modern definition of AI, and it’s a good read; I’ll link it and several other sources of inspiration for this thought piece in the description.

Hintze’s article outlines the four types of AI as they were understood at the time. I think it is safe to say that these four definitions cover what I would argue constitutes AI quite clearly, so I’ll out go over them now.

Type I AI: Reactive machines

This is the most basic type of AI. There is no memory of past actions, and it responds purely to a given input and takes action. IBM’s Deep Blue chess-playing supercomputer is cited as the perfect example of this type of AI, and that was able to beat chess grandmasters, so it must be pretty smart, right?

Deep Blue, like other reactive machines, knows the current situation and what actions can be taken from this starting point. In the case of chess, this is the position of the pieces on the board and where they can be moved during the current turn, while ignoring where they have been previously. It can choose its next move based on what is available and predict the opponent’s counter to each of these moves.

But let us be clear: Deep Blue does not know every single possible move that can ever exist. Its neural network has been trained through action to determine the most effective way to win based on the next moves possible. It is NOT simply sorting through the list of all possible states of the game and choosing the one that comes next. It likely isn’t even capable of storing them as they are so numerous. And for those wondering what a neural network actually is, all will be revealed once we’ve decided what is and isn’t AI, but I’ll tell you now, it’s pretty important.

Type II AI: Limited memory

These devices expand on the type-I reactive machines to include some sense of time through past memories. Hintze uses autonomous vehicles as an example of this: while they can react to what’s going on immediately, limited memory AI machines can also perceive a rapidly changing information landscape and the patterns that these changes imply, such as approaching a traffic light.

A poorly trained reactive machine might see a traffic light is red and stop, failing to recognise that the car needs to take into account the movement of other vehicles or pedestrians before stopping, otherwise there might be a crash. Most visual sensors can’t determine the velocity of other vehicles in an instantaneous moment, so limited memory AI machines are the next step above reactive machines for interpreting the world around them.

That said, these machines do not learn on the go, at least not within this basic definition. They are still based on neural networks trained for specific actions. The memories of limited memory machines do not translate to experience that can be drawn from, like how to avoid a crash that has happened in the past involving that particular machine.

Type III AI: Theory of mind

This is the tricky one, because it makes Hintze’s article feel somewhat outdated given recent developments in AI, and here is where I feel that the simplest reactive machines have almost been dismissed as such a trivial form of AI that they may not be considered AI at all. I don’t believe this, but I’m sure some people do.

According to Hintze, Type III AI understands how creatures and objects behave in the world and how they respond and react to the things around them. It also highlights how a machine will adjust its behaviour to better suit the emotional needs of a human because it understands that emotions affect behaviour.

At the time of Hintze’s article – and I’ll remind you it was published in 2016 – he states that the boundary between limited memory machines and ‘theory of mind’ is the point where ‘current’ AI technology ends. And this is where I’m unsure of where we’re at. I’ll come back to that momentarily.

Finally, we get to Type IV AI: Self-awareness

This is the type of AI that not only understands humans and the world around it, it understands and forms ideas and representations about itself – it has a consciousness. I’m not going to dwell on this definition because I don’t think we can argue that we’re there yet, and surely no-one is going to argue that this is not AI.

Well…unless you’re like me, and you think that at the point where a machine has a consciousness, and therefore rights as a sentient being, it should be classified as an “inorganic actual intelligence”. See the link in the description for the scene in 1995’s sci-fi anime epic ‘Ghost In The Shell’ that first prompted me to think this way back in my teen years. It truly makes one question the definitions of AI, life, and what it means to be human. Damn, I love Ghost In The Shell…

OK! That’s four definitions of AI: reactive machines, limited memory, theory of mind, and self-awareness. So where does the Turing test fit into all this? Well, it’s not clear that the AI conversationist needs to be self-aware, but a grasp of theory of mind is probably a requirement for having a conversation that makes you not sound like a stereotypical robot, so perhaps this qualifies as a type-III artificial intelligence.

From some documented interactions with chatbots like ChatGPT, I think we’re at the cusp of type-III AI. I’m sure those following this technology have heard that it will lie about citations when writing papers and is able to ‘apologise’ and write a better paper to correct its mistakes, as if admitting to the lie. With AI dating chatbots too, you would think there must be some level of theory of mind for people to gain intimate feelings for an avatar on their phone…

But we’ve already established from Searle’s Chinese Room Argument that passing the Turing test can be spoofed by a good program that has absolutely no understanding of the conversation, so is AI present in our man’s computer, or the combination of him and the computer?

Arguably, yes. The computer in The Chinese Argument appears to be able to navigate natural language processing, albeit written, and this would fall under even type-I AI with a well-trained neural network – it’s doing far more than wake-word detection, and we understand that to be AI, right? But is there something inherently different about understanding a written sentence that can easily be digitised or sensor data from a microphone and converting that to a digitally represented word?

Before I get onto neural networks, let’s just determine whether every chatbot or wake word detection system requires AI. And the short answer is a very quick ‘no’. Not even nearly.

You’ve probably had the frustrating experience of an online chatbot that is built using a decision tree. This simply uses a bunch of selectable answers and provides responses and options based on the previous answer. It’s called a decision tree because if you look at the conversation architecture, it looks like a tree. You initiate the conversation, and each optional answer branches to the next question and its own set of answers, and so on and so on until you’ve got a tonne of branches leading to some conversation end points. I find it usually directs you to a dedicated webpage, whether that’s a how-to guide to solve a problem or a specific product, or it tells you to phone a human-operated helpline – this is usually what I wanted the whole time.

You may be thinking, “of course that’s not AI; it’s multiple choice,” but even chatbots where you type in an answer can operate in this way. So long as your spelling is not appalling, the program can look for keywords in what you’ve typed and still direct you to points on the decision tree. The funny thing about a decision tree is that it does have some resemblance to a neural network, but if every possible question and answer was programmed into it, ultimately ending with a link or phone number, that’s very different to how a neural network works.

AI – in some cases – debunked, because it’s not to say that an AI could not be combined with a decision tree for a better chatbot. In fact, decision tree algorithms are a common way of developing neural networks, and I’ll admit that when I searched decision tree after writing a first draft of this script and saw that AI was all that was coming up, I panicked a little that I had been guilty of dismissing the most basic AI! But decision trees are a concept that has existed long before their use in AI – I think – so – feel free to correct me in the comments – I think, in many cases, this is not AI, at least not for the chatbot that I just described.

So what about wake-word detection? Well, in my third year at university, I had a similar project where we were given a recording of a set of church bells ringing, as well as a recording of each bell being rung on its own, and the task was to build a system that can show when each individual bell was rang within the recording of all of them playing together – I’ve linked to a paper from 2009 which likely inspired this project.

Without getting into the details, the solution that I used – prompted by my supervisors – was a mathematical technique called non-negative matrix factorisation, and this can also be used for facial recognition. Essentially, the first thing to do is to convert the recording of all the bells ringing together into a spectrogram by performing a Fourier transform.

An audio spectrogram is a 3D representation of sound in the time, amplitude, and frequency dimensions. Other than a pure sine wave – the simplest oscillation at a single frequency – all audio recordings are a combination of many sine wave frequencies stacked on top of each other. These frequencies give a musical instrument, for example, its distinctive sound, as well as how the amount of each of these frequencies varies over the course of a note. A Fourier transform is just a mathematical operation that separates these frequencies to produce a spectrogram where you can see the change in amplitude – the loudness – of every frequency with time.

Usually, a spectrogram is shown as a 2D image, with time on the horizontal axis, frequency on the vertical axis, and the colour of each pixel indicating the amplitude of each frequency at a given time. But if you imagine that colour is just a number, a spectrogram is a matrix, a grid of numbers, so it’s just a piece of maths that more maths can be applied to.

If you then perform a Fourier transform on a recording of a single bell, you can then get another spectrogram, find when the bell is initially hit, and create a slice of the recording at a single point in time. This gives you a spectral profile – a frequency signature – for that bell and that bell alone; bells have a very complex spectral profile which makes them great for this experiment. Spectral profiles can also be represented as a matrix with a time dimension of one – they are effectively just in the frequency domain.

So if you repeat the process for all the bells, you’ll have a set of frequency-domain spectral profile matrices and the big spectrogram of the group recording in the frequency and time domains. You can then use non-negative matrix factorisation to get the time-domain matrix which shows when each of these spectral profiles occurs during the recording. Factorising numbers gives you what to multiply to get what you want, and is the main recording not simply a combination of all the bells being rung at the correct times?

The result is that by using rather complex mathematics – but mathematics nonetheless – you can very accurately tell when each bell is rung during the group recording, with near-perfect isolation between the different bells. If you then set a threshold on each time-domain occurrence matrix for positive detection of each bell, you have yourself a purely mathematical, non-AI bell ring detector. And you can even add more spectral profiles to detect other instruments. I know this because I’ve done it, and once you get the hang of it, it’s not difficult.

While a wake word is a bit more complex to detect than the presence of a bell ringing with a defined spectral profile, I’m sure that a similar technique to the one outlined above could be used to determine whether a wake word is said by adding another time dimension to the analysis, still without the use of AI. Let me be clear: this was just maths.

But I’m also very sure that AI could be useful in improving this technique because the tone and intonation will change every time you say ‘OK, Google’, and you can’t ship a mass product that works for one person saying something the same way every time. Hey, there’s got to be a reason I keep seeing matrix computing on so many AI chips.

So where do neural networks fit into all this? This key difference between an AI system running a neural network and the situations I’ve just described is that these decision trees and non-negative matrix factorisation rely on the fact that all the information is present within the code. It can’t operate outside of this because it does not recognise it.

In other words, it’s not ‘artificially intelligent’ in the same way as the type-I-to-IV AIs outlined by Hintze, despite being a pretty clever machine system. But as with Deep Blue, artificial intelligence is not simply having a list of all possible answers and quickly looking up what’s suitable given the question. This also shows that you can rule out speed as a defining component of AI, so don’t be fooled by people simply doing fast look-up operations or quick maths. Neither of these feature neural networks, machine learning, or deep learning or natural language processing, another two subsets of AI that are based on how the human brain operates. They don’t get any cleverer because they just do what they were explicitly programmed to do. Or it’s just maths.

Neural networks, a, if not THE, key component of many AI systems, operate in much the same way as the way that neurons in brains work, hence the name. Given enough training, which links the neurons into networks that represent an understanding of a concept, a neural network can learn to recognise patterns, and when presented with a challenge or a situation that is based on new data, it can make a reasonable guess of the best answer or course of action to take. This applies from type-I AI upwards.

That’s a bit abstract, so here’s an example. You train an AI to recognise cats. You feed lots of pictures of cats to the AI so that it builds a neural network centred on what makes a cat a cat – what are the commonalities between every cat it has ever ‘experienced’? – and, if trained correctly, when you show it a new cat, it will tell you that it’s a cat, based on the overlap with its current experience and understanding of what a cat is. You will have done this in your own brain during the course of your childhood, assuming that you knew what a cat was by the time ‘C was for Cat’; it would be pointless to learn the alphabet with things that you had no concept of.

The differentiator here is that the neural network does not contain every possible cat in existence – how could it? Instead, it ‘understands’ what a cat is, non-deterministically. But a neural network that is trained on cats might see a picture of a Vietnamese hmong puppy and identify it as a cat. It’d be wrong, but it’d be damn close.

I think sensor data training and generative AI are the two types of AI that people think about most these days when the topic of conversation comes up. Really, generative AI is taking lots of data, understanding the patterns within the data, and outputting something that resembles what it was trained on. The output is original, even though it might be a fever dream of many different sources combined, but in some ways, it is the mirror image of the sensor data training AI.

And anyway: can you honestly say that any original creative work is completely uninspired by patterns that are well received by an audience? With the exception of revolutionary works chocked up to acts of fate or anarchy against the norm. Even J.S. Bach’s choral harmonies are now taught as formulaic – perhaps algorithmic – given the amount that he wrote, and that have been digested and analysed by musicologists, so that’s just neural network training at the academic level.

But you’re probably getting a little ticked off that I haven’t actually shown what a neural network looks like yet. Well, it’s come to that point in the thought piece, with one of my favourite demonstrations of AI. And no, it’s not new at all.

YouTuber SethBling’s MarI/O genetic algorithm is an example of a type-I-or-II AI with a twist – I’m not 100% sure whether limited memory is used. It is based on a previous work, but he explains it well and illustrates the development of a neural network in real time, so the full video is really worth a watch – link in the description.

MarI/O takes the first level of Super Mario World and puts an AI at the controls, without any prompts to action. All that the AI knows is the present frame of the game – unless it can detect the movement of enemies in time, though these move predictably – and the range of control options available. But it does not know what the controls do, and it doesn’t know what collisions with certain objects will do to it.

The reason why this is called genetic AI is because the algorithm is trained through random mutations – i.e. application of the controls – and this results in the program getting better and better at playing this level of Super Mario World through selective breeding of the neural networks based on the maximum fitness of each generation, i.e. how far it goes to the right of the screen as this is where the level will eventually end. It is quite literally AI survival of the fittest, and the neural network is shown at the top right of the screen in SethBling’s video.

For the first generation, i.e. the first time the program is run, nothing happens. When remaining inactive, the generation times out, and the next generation begins. Eventually, random mutations – or changes – in the program cause the AI to push buttons randomly, and in some cases, this increases the fitness of the program – the movement to the right. These actions, implemented within the neural network, are then used as the starting point for the next generation.

More neural connections between the controls and what MarI/O sees end up forming as later generations get further in the level and the AI understands how to operate, driven by the maximum fitness levels and evolution through more random mutations. And learning that colliding with certain objects will kill Mario. The funny thing is that since the level can be played in multiple ways, different species of MarI/O form as competing neural networks are developed to beat the level.

These neural networks are simply a way of mapping the learnt behaviour of the AI, and as you can see from MarI/O, this looks quite deterministic – i.e. if this, then that – but it really is a remarkable implementation of artificial intelligence. It may not be self-aware, it’s not going to pass the Turing test, but it’s definitely doing something that can’t be done solely with maths or fast look-up operations of predetermined outcomes. Trained neural networks really are the differentiating factor that makes for a consistent definition of AI that I think everyone can agree on, because machine neural networks are sort of like abstract thought; they are based on something that is not known by a system to be correct by its own programming, but it can use its training to determine whether something falls in line with what it understands, and isn’t that what intelligence is, artificial or otherwise?

In this thought piece, I started with the Chinese Room Problem, showed how this can be used to scrutinise the classic Turing test for human-level artificial intelligence, and then expanded the definition to include the full range of AI types from reactive machines to self-awareness.

As I said at the beginning, I think there will be many people out there who will be quick to dismiss something as simple as fall detection as not being ‘real’ AI, when in fact, depending on how it is implemented, there might be a small neural network trained to recognise that action. Or alternatively, as shown by what decision trees and mathematical operations like non-negative matrix factorisation can accomplish, it could be all done with fast calculations and be a complete con.

The thing is: AI is not about what’s being done, it’s about how it’s being done. At the end of the day, an image generated by an AI and a great graphical artist will be hard to tell apart, but the fact that the AI generated people’s hands might be nightmare fuel because the human artist is much more likely to understand the context of hands is the dead giveaway in today’s AI landscape.

AI is not a new concept, but the word gets thrown around a lot, and has been for years to describe many forms of intelligent machines, from the ways that enemies move and react in a video game, to deep-fake technology that threatens to destroy our democracy. But now that we’re all getting wise to it, the next time you encounter the phrase ‘artificial intelligence’, have a bit of a think about how what you’re seeing or hearing is being achieved. You might be surprised that there’s a neural network running on something, and in that case, you’ve got yourself some AI.

Thank you for sticking around for this second thought piece on ipXchange. It was so much fun to write and research, and I didn’t even cover a tonne of things that could have been mentioned. Like how robotics is considered by some to be a key cornerstone of AI. That seems a little odd to me because I consider robotics to be an electro-mechanical field rather than something that requires inbuilt intelligence, but I’m sure there is some definition of robotics that makes that statement make a lot more sense. I expect that I’ve either confused a lot of people further, or hopefully clarified some things that you may have been wondering about within this whole AI fad, or innovation, depending on who’s developing it.

As always, a huge thank you to Jake and Harry for making these videos so pretty. I couldn’t do this without you guys, and I’m so looking forward to the next one of these thought pieces. You’re editing and graphics are on fire!

Anyway! Subscribe for more disruptive content, and as always, there’s a tonne of technology that you can put to the test on ipXchange.tech; I’ll leave a link to the AI components section in the description. Let us know in the comments if you’ve got a topic you want me to explore for next time, but until then…

Keep designing!