CEO Blog Series: Chapter 1 – The Basics

AI must be so complicated I’ll never understand it. Even though the world is changing because of it, I’ll just give up understanding it.”

Or not.

(In the chapters that follow, pay attention to the words in quotes: they are key concepts in Deep Learning)

Before we get into this, two clarifications: at this point there is not much “intelligence” in AI, it is mostly about teaching a machine to recognize patterns, and the systems do no “reasoning “. That said, you can go a very long way with pattern matching, and it can sure look like intelligence!

Secondly, we are only going to be talking about a part of AI called “Deep Learning”. Because “AI” is shorter, we will use those terms interchangeably.

A lot of AI has a front end and a back end. The front end is an application-specific engine: you feed it something (like a headshot, a landscape scene, or a link to breakfast cereal) and it spits out a set of numbers.

Let’s look at what happens on the back end: it gets lots and lots of the numbers the front end spits out as you feed it more objects. To keep this simple, let’s pretend for each object the front end spits out 3 numbers. Now let’s imagine a 3-dimensional box, right there in the room with you. You take the first object’s 3 numbers, and use the first to locate it on the horizontal X-axis, the second number locates it on the vertical Y-axis, and the third number locates it on the depth or Z-axis. So for each object the front end processes and spits out, you can find a single point in your cube of space which those three numbers point you to.

Let’s pretend the first object we give to the front-end engine is a photo of Muhammed Ali’s face. We get our three numbers, we find the point in our cube of space they point us to (let’s pretend that is in the back top left corner), and we put a dot there, and we know that is Muhammed Ali.  Perhaps the second set of numbers is for President John F. Kennedy, and his three numbers put him in the box at the near lower right corner. And we load in a few thousand other photos of famous people, some repeats of Muhammed Ali and JFK. Pretty soon we notice there is a “cluster” of dots at the back top left corner, right next to our “known” photo of Muhammed Ali, and we look at each of them. Turns out, the cluster of dots out there really close to our known photo of Muhammed Ali are ALSO of Muhammed. Next, we look at any dot REALLY near to the one we know to be JFK, and sure enough, they are also pictures of him, just taken at different angles or different lighting or maybe when he was a little younger.

What did we just learn? The back end of most AI/Deep Learning systems is a chunk of space with a lot of dots in it, one dot for each object we care about. And the “nearest neighbors” of a known dot are very likely to be the same object (person, scene, breakfast cereal) as the one we know about. Pretty neat!

In our example, it lets us recognize faces in photos. It could just as easily tell us that someone who buys Cheerios also is likely to want to buy milk, and a dot representing milk might be the nearest neighbor. What the dots represent and what they can do for us means the front-end engine has to vary from application to application.

This is, of course, an oversimplification. In reality useful Deep Learning front ends kick out anywhere from 512 to 2048 numbers for EACH object, not just three, and so the math requires a space with 512 or more dimensions, which is REALLY hard to visualize. But the idea of each object ending up as a single dot, and the importance of nearest neighbors both hold true.

Of course, this only works if the front-end AI engine can decipher the differences between two subjects. Or maybe you are more interested in what Sally will buy if she just bought Cheerios? Let’s look at the front end next!