What is AI? - Defining AI

Goals:

define AI in multiple ways and contextualize the term usage
describe how modern AI is built and how that differs from older algorithms

AI is computer science¶

Building artificial intelligence has been the goal of computing research (computing is a broader feild including both computer science or computer engineering) the whole time. Computing is not a very old discipline, but can trace its roots to math and electrical engineering (and through that, physics). At some institutions, computer science and computer engineering are in the same department and at others they may be separate. Computer Science has especially been focused on developing AI as a goal.

Figure 1:A venn diagram showing CS as a broad feild with AI as a sub feild

AI is a field of study, which means there is a community of people egaged in this work.

AI can be done in many ways and has been over time, but the current dominant paradigm is machine learning.
The formal definitions of AI are broad and in that way, almost everything in a computer can be considered to “be AI” but, in this moment, AI typically refers to a few things:

Large Language Models (and multimodal mdoels with the same basic goal) that produce text and accomplish high level goals based on english
predictive models that predict specific things or automatically label things and are the result of machine learning
complex systems that combine the two things above with additional logic

Figure 2:A venn diagram showing CS as a broad feild with many subfeilds

What is common across all of these, and all things in computer science, is algorithms. Algorithms have gotten a lot more attention recently, but they are not fundamentally new, mathematicians have developed and studied them formally for centuries and informally, people have developed them all over.

Figure 3:Algorithms are at the center and what is common in all of CS.

How Algorithms are Made¶

A familiar way to think about what an algorithm is as a recipe. A recipe consists of a set of ingredients (inputs) and a set of instructions to follow (procedures) to produce a specific dish (output). Computer algorithms describe the procedure to produce an output given an input in order to solve a problem.

Mathematicians have developed algorithms for centuries by:

Selecting a problem to solve in the world
Representing for the relevant part of the world mathematically
Working to solve the problem in the mathematical space
Documenting their process so that it is repeatable

Begin

Select

Decisions

Approximate

Represent

Solve

Distill

Algorithms are developed by people, not naturally occuring things that we observe or discover — Figure 4:Algorithms are *developed* by people, not naturally occuring things that we observe or *discover*

Computer Scientists do the same, with the main change being that the solution is expressed in a programming language instead of in a spoken language, for example Python instead of English.

Simple changes

Implementation

Figure 11:Computer Scientists might use different tools to develop algorithms or terms to document things, but the process is mostly the same

The challenge with this is as we approach more complex problems as our goal to try to delegate to a computer, the approximaitons we make get in the way more. Writing an algorithm to add numbers together or find an exact match for an item from a list is straightforward, writing an algorithm to detect if a set of pixels represents a person or not (e.g. if there is a person in front of a self driving car) is much more complex.

The traditional way of developing algorithms works well for problems where we have a good mathematical repesentation of the part of the world we need to compute and people can describe the steps that need to occur in terms of calculations a computer can carry out.

This is where machine learning comes in.

What is ML?¶

In machine learning, we change the process a little.

Instead of solving the problem and figuring out precise steps to carry out, the people define a generic strategy, collect a lot of examples, and write a learning algorithm to fill in the details of the generic strategy from the examples. Learning algorithms are developed the way we have always developed algorithms, but then these algorithms essentially write a prediction or inference algorithm that is what gets sent to the world.

All ML consists of two parts: learning and prediction

these can also go by other names.

Learning may be called:

fitting
optimization
training

Prediction may be called:

inference
testing

A common assumption¶

All ML has some sort of underlying assumptions, almost all ML relies on two key assuptions, that can be written in many ways:

Plain English

Math

Code (Python)

Diagram

A relationship exists to that we can determine, or predict outcome or target from input.
Given enough examples a computer can find that relationship

where:

outcome or target is the goal of the task
input is the information to be used to predict that target

Given a row index matrix training_in and a vector training_out we can write:

data = [(in_i,out_i) for in_i, out_i in zip(training_in, training_out)]
parameters = learning_algo (data)
predictor = lambda input_example: pred_algo(input_example, parameters)
pred_output = predictor(test_input)

where^[1]:

pred_algo is a function template that uses in parameters to customize the calculation
parameters can change how pred_algo works to adapt it to different contexts, the lambda keyword makes a new function that takes in only the input
predictor take on input sample and computes the predicted output

and for any valid test_input we will get a valid pred_output

This, alone, is not that different than the traditional way of developing algorithm, we have to assume a way to get from some input to the desired output exists for that to happen. However, in machine learning this is a bit more specific, we assume that there is a specific $x$ and $y$ that are available to us^[2] and that from the $x$ , we can compute a value for $y$ .

To make this concrete, this could be as simple as a linear regression

Plain English

Math

Code (Python)

We can predict the tip for a restaurant bill based on the total bill, by multiplying by some percentage and adding a flat amount. We can determine the percentage and the amount to add from previous bills.

Given a row index matrix training_in and a vector training_out we can write:

def learning_algo(data):
    theta0 = initialize_theta()
    abs_pred_error_i = lambda t,x,y: abs(y - pred_algo(x,t))
    total_pred_error = lambda th: sum([pred_error_i(th,x,y)] for x,y in data)
    # optmize so that the error is minimized
    theta = minimize(total_pred_error,theta0)
    return theta


def pred_algo(x,theta):
    m,b = theta
    return m*x + b

data = [(in_i,out_i) for in_i, out_i in zip(training_in, training_out)]
parameters = learning_algo (data)
predictor = lambda input_example: pred_algo(input_example, parameters)
pred_output = predictor(test_input)

where^[1]:

parameters can change how pred_algo works to adapt it to different contexts, the lambda keyword makes a new function that takes in only the input
predictor take on input sample and computes the predicted output
minimize takes a function and parameters for it and finds values of the parameters to the function that get the smallest possible value.

and for any valid test_input we will get a valid pred_output

This generally has to be written mathematically to be solved, then the implementation is then translated into a programming language for a computer to execute.

A common problem to solve¶

Then the goal in creating the learning algorithm is to find the right details, if we take the mathematical representation above, we need to find the right $\theta$ .

Learning algorithms output that and then allow us to have a complete prediction algorithm.

A learning algorithm and prediction algorithm are linked by a shared model. The prediction algorithm is basically the model treated as a template so that once the parameters are set it becomes a simple input output function. The learning algorithm is what people work on how to write how to find the right parameters to make predictions in a specific domain.

ML is classified in many ways¶

AI can be classified by how it is developed:

traditional methods (rule based systems,etc)
Machine learning
hybrid systems that combine multiple types (agentic AI)

Most current things are ML, and the underlying assumptions come in different forms.

ML can be classified in many different ways too:

when we focus on the learning problem, we classify into supervised and unsupervised learning based on availability of the target variable and the type of prediction we want to make discrete (classification or clustering) or continuous (regression)
if we focus on what is learned to make decisions we can classify into discriminative or generative.
if we focus on the specific assumptions, we can classify by the model class

We can describe a model with each of these descriptors for example:

ChatGPT, Gemini, and Claude are examples of large language models and specifically GPTs which are a type of generative model implemented with deep learning they are trained with unsupervised initially followed by supervised <supervised learning>
the original HALO player ranking algorithm was also a generative model but it was primarily used to make predictions of what would be a good matchup, rather than generating new sequences of win/loss for opponent pairs. it was trained with a supervised <supervised learning> approach of past player matchups.

What is an LLM?¶

While AI has been a research area in computing since the beginning of computing, AI came into most common use when ChatGPT was released. ChatGPT is chatbot interfact to the GPT family of LLMs. This and large scale models of vision for image generation or audio for audio production, etc all work on the same basic idea. For LLMs, specifically this is:

model is a simplification of some part of the world
language is a tool for communicating consisting of words and rules about how to combine them
large refers to the number of parameters is big

Specifically, they model language by using a lot of examples and a statistical model. In math:

P(w_j| w_{j-1}, w_{j-1},\ldots, w_{j-c})

(3)

where $c$ is callend the context window.

In English, this says that the model represents a proabability distribution of possible next words( $w_j$ ) given a past sequence of $c$ words.

(3) is implemented in a computer using neural network. A Neural networks is computationl model for approximating a function defined by a number of artificial neurons. Neural networks approximate complex functions by combining a lot of simple functions together.

Footnotes¶

Python is a programming language specifically designed for readability.
↩↩
there is also unupservised or semi-supervised where the $y$ is either unknown or only available for some samples, but they still assume that it exists and the $x$ can be used to compute it.
↩
quantum computers, which are not yet available for consumer use or even broad research use, represent data with probabilistic qubits instead of traditional binary
↩

Defining AI

What is AI?

What is AI not?