Vectors

Log in to access the full course.

What a vector is

A vector is an ordered list of numbers. In machine learning, vectors are the fundamental way to represent data points, features, weights, and predictions.

x=(x1x2xn)\mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}

A vector in Rn\mathbb{R}^n has nn components and lives in nn-dimensional space. A training example with 5 features is a vector in R5\mathbb{R}^5.

Two interpretations

As a point: a location in nn-dimensional space. The vector (3,2)(3, 2) is the point 3 units right and 2 units up from the origin.

As a direction and magnitude: an arrow from the origin to that point, with a direction and a length. This interpretation is more useful for operations like addition and scaling.

Both interpretations coexist and are useful in different contexts.

Vector operations

Addition: add component-wise. Adding two vectors combines their displacements — geometrically, you follow one arrow then the other.

u+v=(u1+v1u2+v2)\mathbf{u} + \mathbf{v} = \begin{pmatrix} u_1 + v_1 \\ u_2 + v_2 \end{pmatrix}

Scalar multiplication: multiply every component by a number cc. Scales the length of the vector without changing its direction (unless c<0c < 0, which flips direction).

cv=(cv1cv2)c\mathbf{v} = \begin{pmatrix} cv_1 \\ cv_2 \end{pmatrix}

Magnitude (norm)

The magnitude or L2 norm of a vector is its length:

v=v12+v22++vn2\|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}

This is the Euclidean distance from the origin to the point v\mathbf{v}.

A unit vector has magnitude 1. Dividing any non-zero vector by its magnitude normalizes it: v^=vv\hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|}.

Why vectors matter in ML

In ML, everything is a vector:

  • A training example with nn features is a vector xRn\mathbf{x} \in \mathbb{R}^n.
  • The weights of a linear model are a vector wRn\mathbf{w} \in \mathbb{R}^n.
  • A prediction is a vector (or scalar — a 1D vector).
  • A word embedding maps a word to a vector in, say, R300\mathbb{R}^{300}.

The entire computation of a linear model — predicting y^=wTx+b\hat{y} = \mathbf{w}^T \mathbf{x} + b — is a vector operation. Training updates the weight vector. Regularization penalizes its magnitude.

Linear combinations and span

A linear combination of vectors v1,,vk\mathbf{v}_1, \ldots, \mathbf{v}_k is any sum c1v1++ckvkc_1\mathbf{v}_1 + \cdots + c_k\mathbf{v}_k for scalars cic_i. The set of all possible linear combinations is the span of those vectors.

If you can reach any point in Rn\mathbb{R}^n by linear combinations of a set of nn vectors, those vectors span the full space. If they are redundant (one is a multiple of another), they only span a lower-dimensional subspace. This concept underlies matrix rank and linear dependence, covered in the rank lesson.