Chapter 1: What Is Computation?

Every time you send a text message, stream a song, or check tomorrow's weather, you are relying on computation. But what is computation, exactly? Strip away the glowing screens and sleek hardware, and you find a surprisingly simple idea: computation is the systematic transformation of information according to well-defined rules. In this opening chapter we will build that idea from the ground up, starting with the smallest possible piece of information and ending with a question that will motivate the rest of this textbook - are there problems so hard that no classical computer can solve them efficiently?

No prior knowledge is assumed. If you know what a number is, you have everything you need to begin.

1.1 Bits: The Atoms of Information

Imagine you are in a dark room with a single light switch. The switch can be in one of two positions: off or on. That single choice - off or on, no or yes, false or true, 0 or 1 - is the fundamental unit of information. We call it a bit, short for binary digit, a term coined by the mathematician John Tukey in 1947.

Key Concept.

A bit is the smallest possible piece of information. It represents a single choice between two alternatives, which we conventionally label 0 and 1.

One bit is not very expressive on its own. With a single bit you can say "yes" or "no," but not much else. The power of bits comes from combining them. With two bits you can represent four distinct values: 00, 01, 10, and 11. With three bits you get eight values: 000, 001, 010, 011, 100, 101, 110, 111. The pattern is precise:

$$n \text{ bits can represent } 2^n \text{ different values.}$$

Eight bits make a byte, giving $2^8 = 256$ possible values - enough to encode every letter in the English alphabet (uppercase and lowercase), the digits 0 through 9, punctuation marks, and still have room to spare.

Binary and Decimal

In everyday life we use the decimal (base-10) number system. Each digit position represents a power of 10. The number 42 means $4 \times 10^1 + 2 \times 10^0$. The binary (base-2) system works exactly the same way, but each position represents a power of 2 instead of a power of 10. The binary number 101010 means:

$$1 \times 2^5 + 0 \times 2^4 + 1 \times 2^3 + 0 \times 2^2 + 1 \times 2^1 + 0 \times 2^0 = 32 + 8 + 2 = 42$$

So the decimal number 42 and the binary number 101010 represent the same quantity - they are just written in different bases. Every non-negative integer has a binary equivalent, and converting between the two is purely mechanical.

To convert from decimal to binary, you can repeatedly divide by 2 and collect the remainders. For example, to convert 13:

$13 \div 2 = 6$ remainder 1
$6 \div 2 = 3$ remainder 0
$3 \div 2 = 1$ remainder 1
$1 \div 2 = 0$ remainder 1

Reading the remainders from bottom to top gives $1101_2 = 13_{10}$.

Encoding Everything as Bits

Numbers are an obvious fit for binary, but bits can encode anything once we agree on a scheme:

Text. Assign each character a number. In the ASCII standard, the letter "A" is 65 (which is $1000001_2$), "B" is 66, and so on. The modern Unicode standard extends this idea to cover over 150,000 characters from virtually every writing system on Earth.
Images. Divide the image into tiny dots called pixels. Store each pixel as three numbers representing how much red, green, and blue light to mix - each between 0 and 255 (one byte per color channel). A 12-megapixel photo thus requires about 36 million bytes (36 MB) before compression.
Sound. Sample the air pressure thousands of times per second (CD audio uses 44,100 samples per second) and record each sample as a number. The rapid sequence of numbers faithfully reconstructs the sound wave when played back.

Images, video, your location on a map, the state of a chess game - all of it reduces to long sequences of zeros and ones.

Note.

The idea that any information can be represented as bits is not just an engineering convenience - it is a deep insight that traces back to Claude Shannon's 1948 paper A Mathematical Theory of Communication, which founded the field of information theory. We will return to Shannon's ideas in Chapter 2.

Try the converter below to build intuition for how decimal and binary relate to each other. Type a decimal number and see its binary form, or type a binary string and see the decimal value. The colored boxes show each bit's place value.

Common Misconception.

People sometimes think that because computers use binary internally, binary is somehow more "fundamental" than decimal. It is not - both are notational choices. Binary is convenient for hardware (it maps naturally to on/off switches), but the underlying mathematics does not depend on the base. What is fundamental is the concept of a bit as a unit of information.

1.2 Logic Gates and Boolean Algebra

Storing bits is useful, but computation requires transforming them. The simplest transformations are performed by logic gates - tiny circuits that take one or two input bits and produce an output bit according to a fixed rule. There are only a handful of fundamental gates, and from these few building blocks every computation that has ever run on any computer is ultimately constructed.

The NOT Gate

The simplest gate is NOT. It takes a single bit and flips it: 0 becomes 1, and 1 becomes 0. In Boolean algebra we write $\text{NOT}(A)$ or $\overline{A}$ or $\lnot A$. Think of it as flipping a light switch.

$A$	$\text{NOT}(A)$
0	1
1	0

The AND Gate

The AND gate takes two input bits and outputs 1 only when both inputs are 1. If either input is 0, the output is 0. In everyday language, "I will go to the park if it is sunny AND I have free time" - both conditions must hold. We write $A \land B$.

$A$	$B$	$A \text{ AND } B$
0	0	0
0	1	0
1	0	0
1	1	1

The OR Gate

The OR gate outputs 1 if at least one of its inputs is 1. "I will be happy if I get a raise OR win the lottery" - either one suffices (and both at once is fine too). We write $A \lor B$.

$A$	$B$	$A \text{ OR } B$
0	0	0
0	1	1
1	0	1
1	1	1

The XOR Gate

XOR (exclusive or) outputs 1 when the inputs are different. If both inputs are the same (both 0 or both 1), the output is 0. A handy mnemonic: "one or the other, but not both." We write $A \oplus B$.

$A$	$B$	$A \text{ XOR } B$
0	0	0
0	1	1
1	0	1
1	1	0

Key Concept.

AND, OR, NOT, and XOR are the fundamental logic gates. Any computation - from adding two numbers to rendering a video game - can be expressed as a combination of these gates.

Boolean Algebra

Just as ordinary algebra lets you manipulate equations with $+$ and $\times$, Boolean algebra lets you manipulate logical expressions with AND, OR, and NOT. The field was invented by George Boole in his 1854 work The Laws of Thought, nearly a century before electronic computers existed. Here are a few of the most useful identities:

Identity: $A \land 1 = A$ and $A \lor 0 = A$
Null: $A \land 0 = 0$ and $A \lor 1 = 1$
Complement: $A \land \overline{A} = 0$ and $A \lor \overline{A} = 1$
Idempotence: $A \land A = A$ and $A \lor A = A$
Commutativity: $A \land B = B \land A$ and $A \lor B = B \lor A$
Associativity: $(A \land B) \land C = A \land (B \land C)$
Distributivity: $A \land (B \lor C) = (A \land B) \lor (A \land C)$
De Morgan's laws: $\overline{A \land B} = \overline{A} \lor \overline{B}$ and $\overline{A \lor B} = \overline{A} \land \overline{B}$

De Morgan's laws deserve special attention. They tell us we can always swap ANDs for ORs (and vice versa) as long as we also negate the inputs and the output. This flexibility has a remarkable consequence: we can build any logical function from just NAND gates (NOT-AND) or just NOR gates (NOT-OR). A NAND gate outputs 0 only when both inputs are 1; its truth table is the exact complement of AND. The fact that NAND alone is functionally complete - capable of expressing any Boolean function - greatly simplifies chip manufacturing, since factories only need to mass-produce one type of gate.

Use the interactive truth table below to explore how different gate types behave. Select a gate from the dropdown and watch how the output column changes for every combination of inputs.

1.3 Building Circuits from Gates

A single logic gate is like a single word - useful, but limited. The real expressive power comes from combining gates into circuits, just as words combine into sentences and paragraphs. In this section we will build up from the simplest useful circuit to something that can genuinely add numbers, illustrating the powerful principle of composition.

The Half Adder

Suppose we want to add two single-bit numbers, $A$ and $B$. Each is either 0 or 1, so the possible sums are:

$A$	$B$	Sum (decimal)	Sum bit ($S$)	Carry bit ($C$)
0	0	0	0	0
0	1	1	1	0
1	0	1	1	0
1	1	2	0	1

Look at the pattern carefully. The sum bit $S$ is 1 precisely when the inputs differ - that is the XOR pattern. The carry bit $C$ is 1 only when both inputs are 1 - that is the AND pattern. So:

$$S = A \oplus B \qquad\qquad C = A \land B$$

This tiny two-gate circuit is called a half adder. With nothing more than an XOR gate and an AND gate, we have a device that adds two bits and tells us the sum and carry. It is called "half" because it handles only two inputs; it cannot accommodate a carry coming in from a previous column.

The Full Adder

When adding multi-digit binary numbers column by column (just as you do in long addition), each column after the first might also receive a carry from the previous column. We need a circuit that adds three bits: $A$, $B$, and a carry-in $C_{in}$.

A full adder accomplishes this using two half adders and an OR gate:

First half adder: compute $A \oplus B$ (partial sum) and $A \land B$ (first carry candidate).
Second half adder: add $C_{in}$ to the partial sum, producing the final $S$ and a second carry candidate.
OR gate: the carry-out $C_{out}$ is 1 if either carry candidate is 1.

$$S = A \oplus B \oplus C_{in}$$ $$C_{out} = (A \land B) \lor \bigl((A \oplus B) \land C_{in}\bigr)$$

Scaling Up: The Ripple-Carry Adder

Chain several full adders together, with each carry-out feeding into the next column's carry-in, and you have a ripple-carry adder - a circuit that adds multi-bit binary numbers. An 8-bit adder uses 8 full adders in a row. A 64-bit adder (the kind inside your laptop's processor) uses 64. The carry signal "ripples" through the chain from the least significant bit to the most significant, exactly like carrying digits in pencil-and-paper addition.

Key Concept.

Complex operations are built by connecting simple logic gates in well-defined patterns. This principle of composition - building complex behavior from simple, well-understood parts - is one of the most powerful ideas in all of computing.

Beyond Adders

The same compositional approach applies to every arithmetic and logical operation a computer needs. Subtraction uses adders with a trick called two's complement. Multiplication is implemented through combinations of shifts and additions. Multiplexers select between different data paths, acting like railroad switches for information. Decoders take a binary address and activate exactly one of many output lines - perfect for selecting a memory location.

Every one of these circuits is, at bottom, just a carefully arranged collection of AND, OR, NOT, and XOR gates. There is nothing magical happening inside a computer chip - only billions of these simple gates working in concert, flipping bits according to their fixed rules.

The Fan-Out Principle

One property of classical bits that is easy to take for granted: they can be copied freely. If you need the value of bit $A$ in two different parts of a circuit, you simply split the wire - a technique called fan-out. This seems so obvious that it barely deserves mention.

Note.

We mention fan-out now because in the quantum world, this innocent ability disappears entirely. The no-cloning theorem of quantum mechanics says that an unknown quantum state cannot be copied. This fundamental constraint will shape everything about quantum circuit design when we reach Part II of this textbook.

1.4 From Circuits to Computers

We have seen that gates combine into circuits, and circuits can add, subtract, compare, and route information. But a calculator that only adds is not a computer. What makes a computer fundamentally different is programmability - the ability to follow different sequences of instructions without being physically rewired.

Layers of Abstraction

Between the raw logic gates and the apps on your phone lies a tower of abstraction layers, each hiding the complexity of the layers beneath:

Transistors. Microscopic electronic switches that physically implement our abstract 0s and 1s. A transistor is either conducting (on, representing 1) or not (off, representing 0). The Intel 4004, the first commercial microprocessor released in 1971, contained about 2,300 transistors. Modern processors contain tens of billions - Apple's M2 chip, for instance, has roughly 20 billion transistors, each just a few nanometers across.
Logic gates. Transistors wired together to perform AND, OR, NOT, and other operations, as we discussed in the previous section.
Functional units. Gates assembled into adders, multipliers, comparators, and memory cells - the building blocks of a processor.
Processor (CPU). Functional units organized around a central control loop that executes instructions one after another (or, in modern designs, many at once through pipelining and parallelism).
Operating system. Software that manages hardware resources and provides a stable platform for applications.
Applications. The programs you actually interact with - web browsers, games, messaging apps, scientific simulations.

Each layer provides a clean interface to the layer above while hiding enormous internal complexity. When you write a text message, you do not think about transistors. When an operating system schedules tasks, it does not think about individual logic gates. This principle of abstraction is what makes it possible for humans to build systems of staggering complexity without drowning in details.

The Stored-Program Concept

The great intellectual leap that created modern computing was the realization that instructions and data are both just sequences of bits and can be stored in the same memory. This is the stored-program concept, articulated by John von Neumann in his 1945 draft report on the EDVAC computer (though the idea had many contributors, including Alan Turing, J. Presper Eckert, and John Mauchly).

Before this insight, "programming" a machine like the ENIAC meant physically rearranging cables and switches - a process that could take days. With the stored-program concept, changing what the computer does is as simple as loading different bits into memory.

Key Concept.

In a stored-program computer, both the instructions (the program) and the data live in the same memory. This means a computer can load new programs without rewiring, modify its own program at runtime, and even treat programs as data to be analyzed and transformed.

The Fetch-Decode-Execute Cycle

At its heart, a processor runs a deceptively simple loop, over and over, billions of times per second:

Fetch: read the next instruction from memory. A special register called the program counter tracks which instruction comes next.
Decode: interpret the instruction to determine what operation to perform and which data to use.
Execute: carry out the operation - perhaps adding two numbers, comparing values, or jumping to a different instruction address.
Store: write the result back to a register or memory, and advance the program counter.

That is it. Every computation your computer has ever performed - every web page rendered, every email sent, every video decoded - is the result of this cycle running at extraordinary speed. A modern processor can complete billions of these cycles per second, with a clock speed typically measured in gigahertz (GHz), where 1 GHz means one billion cycles per second.

Moore's Law and Physical Limits

In 1965, Gordon Moore observed that the number of transistors on a chip roughly doubled every two years - a trend that became known as Moore's Law. For decades, this exponential growth delivered ever-faster, ever-cheaper computers. From the Intel 4004's 2,300 transistors in 1971 to modern chips with tens of billions, the trajectory was breathtaking.

But Moore's Law describes an economic and engineering trend, not a law of physics, and it is running into fundamental physical barriers. As transistors shrink to just a few nanometers - approaching the scale of individual atoms - several problems arise:

Heat. Every time a gate switches, it dissipates a small amount of energy. Billions of switches per second on a tiny chip generate significant heat. Landauer's principle, from the thermodynamics of computation, establishes a theoretical minimum energy of $k_B T \ln 2$ (about $2.85 \times 10^{-21}$ joules at room temperature) for each irreversible bit operation - and real gates dissipate far more than this minimum.
Quantum effects. At the nanometer scale, electrons can tunnel through barriers that should be impenetrable according to classical physics, causing bits to flip unpredictably.
Manufacturing precision. Etching features just a few atoms wide requires extraordinary precision and increasingly expensive fabrication facilities.

Note.

The heat problem and Landauer's principle will be explored thoroughly in Chapter 3, on reversible computing. There we will see that the energy cost of computation is intimately connected to whether information is destroyed during the process - a connection that leads directly to the quantum computing paradigm.

These physical limits suggest that simply making classical computers faster will eventually reach a ceiling. If we want to keep solving harder problems, we may need a fundamentally different approach to computation itself. That is precisely the motivation for quantum computing.

1.5 What Makes a Problem "Hard"?

We now know that a computer is, at its core, a machine that flips bits according to rules at tremendous speed. Modern processors perform billions of operations per second. With that kind of raw power, is there any problem a computer cannot handle? The perhaps surprising answer is: yes, many. And understanding why is the thread that will eventually lead us to quantum computing.

Algorithms and Running Time

An algorithm is a precise, step-by-step procedure for solving a problem. A recipe is an algorithm for cooking; long division is an algorithm for dividing numbers. Computer scientists care not just about whether an algorithm works, but about how long it takes as the input grows larger. We express this growth rate using big-O notation.

Suppose you have a list of $n$ names and need to find a particular one:

Unsorted list: You might have to check every name one by one. In the worst case, the target is at the very end (or not there at all), requiring $n$ comparisons. We say this takes $O(n)$ time - "linear time." If you double the list, the work roughly doubles.
Sorted list: You can use binary search: open to the middle, check whether your target comes before or after, and eliminate half the remaining names. Repeat, and you find the answer in about $\log_2 n$ steps. That is $O(\log n)$ time - extraordinarily efficient. A sorted phone book with a million entries requires only about 20 comparisons.

The gap between $O(n)$ and $O(\log n)$ is significant, but both are considered efficient. The real trouble begins when growth rates become much steeper.

Easy Problems: The Class P

Many important problems can be solved in polynomial time - meaning the number of steps grows as some fixed power of the input size, such as $O(n)$, $O(n^2)$, or $O(n^3)$. These problems form the complexity class called P (for "polynomial time"). Examples include:

Sorting a list of numbers ($O(n \log n)$ with efficient algorithms like mergesort)
Finding the shortest path in a network ($O(n^2)$ with Dijkstra's algorithm)
Multiplying two large numbers
Determining whether a number is prime (proven to be in P in 2002 by Agrawal, Kayal, and Saxena)

For problems in P, bigger inputs mean more work, but the growth is tame enough that computers can handle impressively large instances within practical time limits.

Key Concept.

The class P contains problems solvable in polynomial time - problems where the computational work grows manageably as the input gets bigger. These are the problems we consider "efficiently solvable" by classical computers.

Hard to Solve, Easy to Verify: The Class NP

Now consider a different kind of problem. Imagine someone hands you a jigsaw puzzle with 10,000 pieces. Assembling it from scratch might take days. But if someone hands you a completed puzzle, you can verify it is correct almost instantly - just check that all pieces fit and the picture is complete. Many important problems share this asymmetry: hard to solve, but easy to check.

The class NP (nondeterministic polynomial time) contains all problems where a proposed solution can be verified in polynomial time, even if finding that solution might take much longer. Every problem in P is automatically in NP (if you can solve it quickly, you can certainly verify quickly), but NP also contains problems for which no fast solving algorithm is known.

Some famous NP problems:

The traveling salesperson problem: Given a list of cities and the distances between them, find the shortest route that visits every city exactly once and returns to the starting point. Verifying a proposed route is easy (just add up the distances), but finding the optimal route appears to require checking a vast number of possibilities.
Boolean satisfiability (SAT): Given a Boolean formula, is there an assignment of true/false values to its variables that makes the whole formula true? Checking a proposed assignment is straightforward, but finding one can be enormously difficult.
Graph coloring: Can you color the nodes of a network with $k$ colors so that no two connected nodes share the same color?

The P vs NP Question

Here we arrive at the most important open question in computer science, and one of the seven Millennium Prize Problems identified by the Clay Mathematics Institute, each carrying a one-million-dollar prize:

Does P equal NP?

In plain language: if a solution to a problem can be quickly verified, does that guarantee it can also be quickly found? Is the ability to recognize a correct answer just as powerful as the ability to discover one?

Almost all computer scientists believe the answer is no - P does not equal NP - but despite decades of effort, no one has been able to prove it. The question was formally posed by Stephen Cook in 1971, with Leonid Levin arriving independently at the same result in 1973. It remains open more than fifty years later.

Common Misconception.

NP does not stand for "non-polynomial." It stands for "nondeterministic polynomial time," referring to a theoretical model of computation (the nondeterministic Turing machine) that can verify solutions in polynomial time. Many problems in NP - including all problems in P - can be solved in polynomial time.

Exponential Walls

Why does computational complexity matter? Because the difference between polynomial and exponential growth is not a matter of degree - it is a difference in kind. Consider this comparison:

Growth Rate	Name	$n = 10$	$n = 50$	$n = 100$
$O(\log n)$	Logarithmic	~3	~6	~7
$O(n)$	Linear	10	50	100
$O(n^2)$	Quadratic	100	2,500	10,000
$O(n^3)$	Cubic	1,000	125,000	1,000,000
$O(2^n)$	Exponential	1,024	$\approx 10^{15}$	$\approx 10^{30}$

For $n = 100$, a quadratic algorithm does 10,000 operations - trivial for a modern computer. An exponential algorithm requires approximately $10^{30}$ operations. Even a computer performing a trillion ($10^{12}$) operations per second would need roughly $10^{18}$ seconds - more than 30 billion years, over twice the current age of the universe. No amount of engineering can overcome exponential growth through brute force.

The Factoring Problem

One problem of particular importance sits in an intriguing position: integer factorization. Given the number 15, you can quickly see that $15 = 3 \times 5$. But given a number with hundreds of digits - the product of two large primes - the best known classical algorithms take time that grows sub-exponentially but still super-polynomially, roughly $O\!\left(e^{n^{1/3}}\right)$ for an $n$-digit number using the general number field sieve. For a 600-digit number, this could take longer than the age of the universe on today's fastest supercomputers.

Interestingly, factoring is not known to be NP-complete (the hardest problems in NP). It sits in a gray zone - clearly in NP (given proposed factors, you can verify by multiplying), but possibly not as hard as the NP-complete problems. This subtle position is part of what makes factoring so fascinating from both a theoretical and practical standpoint.

The difficulty of factoring is not just academic. The RSA cryptosystem, which secures much of the internet's communication - your bank transactions, encrypted emails, and secure web connections - relies on the assumption that factoring large numbers is computationally infeasible.

Note.

Here is where quantum computing enters the story. In 1994, mathematician Peter Shor discovered a quantum algorithm that can factor large integers in polynomial time - an exponential speedup over the best known classical method. This single result demonstrated that quantum computers are not merely a curiosity but a potential revolution in what is computationally possible. We will build up to understanding Shor's algorithm in Chapter 11. First, we need to understand what makes quantum computation different from everything we have discussed so far.

Looking Ahead

As we will discover in the coming chapters, quantum computing challenges several of the classical assumptions we have built up here:

Classical bits are always 0 or 1. Quantum bits (qubits) can exist in superpositions of both states simultaneously.
Classical bits can be freely copied (fan-out). Qubits cannot, due to the no-cloning theorem.
Classical gates can be irreversible (AND destroys information about its inputs). Quantum gates must be reversible.
Classical computation is deterministic. Quantum computation is fundamentally probabilistic, governed by the mathematics of complex amplitudes.

Each of these differences will turn out to be not a limitation but a source of computational power that has no classical analog.

You now know that all information reduces to bits; that bits are transformed by logic gates; that gates compose into circuits, which compose into full computers; and that even these extraordinary machines have fundamental limitations when facing exponentially hard problems. With this foundation firmly in place, we are ready to explore the mathematics of information in Chapter 2, and then the world of reversible computing in Chapter 3 - which will be our bridge into the quantum realm.