Goals:
define AI in multiple ways and contextualize the term usage
describe how modern AI is built and how that differs from older algorithms
AI is computer science¶
Building artificial intelligence has been the goal of computing research (computing is a broader feild including both computer science or computer engineering) the whole time. Computing is not a very old discipline, but can trace its roots to math and electrical engineering (and through that, physics). At some institutions, computer science and computer engineering are in the same department and at others they may be separate. Computer Science has especially been focused on developing AI as a goal.
Figure 1:A venn diagram showing CS as a broad feild with AI as a sub feild
AI is a field of study, which means there is a community of people egaged in this work.
AI can be done in many ways and has been over time, but the current dominant paradigm is machine learning.
The formal definitions of AI are broad and in that way, almost everything in a computer can be considered to “be AI” but, in this moment, AI typically refers to a few things:
Large Language Models (and multimodal mdoels with the same basic goal) that produce text and accomplish high level goals based on english
predictive models that predict specific things or automatically label things and are the result of machine learning
complex systems that combine the two things above with additional logic
Figure 2:A venn diagram showing CS as a broad feild with many subfeilds
What is common across all of these, and all things in computer science, is algorithms. Algorithms have gotten a lot more attention recently, but they are not fundamentally new, mathematicians have developed and studied them formally for centuries and informally, people have developed them all over.
Figure 3:Algorithms are at the center and what is common in all of CS.
How Algorithms are Made¶
A familiar way to think about what an algorithm is as a recipe. A recipe consists of a set of ingredients (inputs) and a set of instructions to follow (procedures) to produce a specific dish (output). Computer algorithms describe the procedure to produce an output given an input in order to solve a problem.
Mathematicians have developed algorithms for centuries by:
Selecting a problem to solve in the world
Representing for the relevant part of the world mathematically
Working to solve the problem in the mathematical space
Documenting their process so that it is repeatable
Figure 4:Algorithms are developed by people, not naturally occuring things that we observe or discover
Figure 5:The person picks a problem to solve and some part of the world as a context for that solution-whether they attend to the details of this choice carefully or not
Figure 6:A person thinks about the problem and makes choices about each step
Figure 7:In general, we cannot write math that exactly describes the world so we pick some way to simplify the part of the world that we wish to study, that we think is relevant.
Figure 8:As they make simplifications, they write down a mathematical representation of the simplification
Figure 9:Once it is represented, they can solve it, applying mathematical techniques
Figure 10:Finally, the steps to re-create the solution for a similar problem is written down so that other people can follow the steps, or apply the algorithm
Computer Scientists do the same, with the main change being that the solution is expressed in a programming language instead of in a spoken language, for example Python instead of English.
Figure 11:Computer Scientists might use different tools to develop algorithms or terms to document things, but the process is mostly the same
Figure 12:The main difference is the final algorithm is written in a programming language for a computer to execute instead of a person following the steps.
The challenge with this is as we approach more complex problems as our goal to try to delegate to a computer, the approximaitons we make get in the way more. Writing an algorithm to add numbers together or find an exact match for an item from a list is straightforward, writing an algorithm to detect if a set of pixels represents a person or not (e.g. if there is a person in front of a self driving car) is much more complex.
The traditional way of developing algorithms works well for problems where we have a good mathematical repesentation of the part of the world we need to compute and people can describe the steps that need to occur in terms of calculations a computer can carry out.
This is where machine learning comes in.
What is ML?¶
In machine learning, we change the process a little.
Instead of solving the problem and figuring out precise steps to carry out, the people define a generic strategy, collect a lot of examples, and write a learning algorithm to fill in the details of the generic strategy from the examples. Learning algorithms are developed the way we have always developed algorithms, but then these algorithms essentially write a prediction or inference algorithm that is what gets sent to the world.
All ML consists of two parts: learning and prediction
these can also go by other names.
Learning may be called:
fitting
optimization
training
Prediction may be called:
inference
testing
A common assumption¶
All ML has some sort of underlying assumptions, almost all ML relies on two key assuptions, that can be written in many ways:
A relationship exists to that we can determine, or predict
outcome or targetfrominput.Given enough examples a computer can find that relationship
where:
outcome or targetis the goal of the taskinputis the information to be used to predict that target
such that
such that
where:
Given a row index matrix training_in and a vector training_out we can write:
data = [(in_i,out_i) for in_i, out_i in zip(training_in, training_out)]
parameters = learning_algo (data)
predictor = lambda input_example: pred_algo(input_example, parameters)
pred_output = predictor(test_input)where[1]:
pred_algois a function template that uses inparametersto customize the calculationparameterscan change howpred_algoworks to adapt it to different contexts, the lambda keyword makes a new function that takes in only the inputpredictortake on input sample and computes the predicted output
and for any valid test_input we will get a valid pred_output
This, alone, is not that different than the traditional way of developing algorithm, we have to assume a way to get from some input to the desired output exists for that to happen. However, in machine learning this is a bit more specific, we assume that there is a specific and that are available to us[2] and that from the , we can compute a value for .
To make this concrete, this could be as simple as a linear regression
We can predict the tip for a restaurant bill based on the total bill, by multiplying by some percentage and adding a flat amount. We can determine the percentage and the amount to add from previous bills.
Assume have vectors and
Ordinary least squares can sovle this or any minimization algorithm can solve:
or, equivalently, element-wise
where:
Given a row index matrix training_in and a vector training_out we can write:
def learning_algo(data):
theta0 = initialize_theta()
abs_pred_error_i = lambda t,x,y: abs(y - pred_algo(x,t))
total_pred_error = lambda th: sum([pred_error_i(th,x,y)] for x,y in data)
# optmize so that the error is minimized
theta = minimize(total_pred_error,theta0)
return theta
def pred_algo(x,theta):
m,b = theta
return m*x + b
data = [(in_i,out_i) for in_i, out_i in zip(training_in, training_out)]
parameters = learning_algo (data)
predictor = lambda input_example: pred_algo(input_example, parameters)
pred_output = predictor(test_input)where[1]:
parameterscan change howpred_algoworks to adapt it to different contexts, the lambda keyword makes a new function that takes in only the inputpredictortake on input sample and computes the predicted outputminimizetakes a function and parameters for it and finds values of the parameters to the function that get the smallest possible value.
and for any valid test_input we will get a valid pred_output
This generally has to be written mathematically to be solved, then the implementation is then translated into a programming language for a computer to execute.
A common problem to solve¶
Then the goal in creating the learning algorithm is to find the right details, if we take the mathematical representation above, we need to find the right .
Learning algorithms output that and then allow us to have a complete prediction algorithm.
A learning algorithm and prediction algorithm are linked by a shared model. The prediction algorithm is basically the model treated as a template so that once the parameters are set it becomes a simple input output function. The learning algorithm is what people work on how to write how to find the right parameters to make predictions in a specific domain.
ML is classified in many ways¶
AI can be classified by how it is developed:
traditional methods (rule based systems,etc)
Machine learning
hybrid systems that combine multiple types (agentic AI)
Most current things are ML, and the underlying assumptions come in different forms.
ML can be classified in many different ways too:
when we focus on the learning problem, we classify into supervised and unsupervised learning based on availability of the target variable and the type of prediction we want to make discrete (classification or clustering) or continuous (regression)
if we focus on what is learned to make decisions we can classify into discriminative or generative.
if we focus on the specific assumptions, we can classify by the model class
We can describe a model with each of these descriptors for example:
ChatGPT, Gemini, and Claude are examples of large language models and specifically GPTs which are a type of generative model implemented with deep learning they are trained with unsupervised initially followed by
supervised <supervised learning>the original HALO player ranking algorithm was also a generative model but it was primarily used to make predictions of what would be a good matchup, rather than generating new sequences of win/loss for opponent pairs. it was trained with a
supervised <supervised learning>approach of past player matchups.
What is an LLM?¶
While AI has been a research area in computing since the beginning of computing, AI came into most common use when ChatGPT was released. ChatGPT is chatbot interfact to the GPT family of LLMs. This and large scale models of vision for image generation or audio for audio production, etc all work on the same basic idea. For LLMs, specifically this is:
model is a simplification of some part of the world
language is a tool for communicating consisting of words and rules about how to combine them
large refers to the number of parameters is big
Specifically, they model language by using a lot of examples and a statistical model. In math:
where is callend the context window.
In English, this says that the model represents a proabability distribution of possible next words() given a past sequence of words.
(3) is implemented in a computer using neural network. A Neural networks is computationl model for approximating a function defined by a number of artificial neurons. Neural networks approximate complex functions by combining a lot of simple functions together.
Python is a programming language specifically designed for readability.
there is also unupservised or semi-supervised where the is either unknown or only available for some samples, but they still assume that it exists and the can be used to compute it.
quantum computers, which are not yet available for consumer use or even broad research use, represent data with probabilistic qubits instead of traditional binary