Friday, September 10, 2021

trace(AB) = trace(BA)

 Last time I showed how we can think of matrix multiplication as (1) a matrix of inner products, or (2) a sum of outer products, and used the latter to unpack the "positive definite" part of the product \(X^tX\). At the end I mentioned that the outer-product version of matrix multiplication makes it easy to show that  trace \(AB\) = trace \( BA \). 

In the late 1980s I took a final exam that asked for the proof, and totally blanked. I mucked around with the sums for a while, and then time ran out. A little embarrassing, but also an indication that I hadn't really grasped the concepts. One can get quite a ways in math by just pushing symbols around, without developing intuition. 

Some Sums

The matrix multiplication of \(AB = C\) is usually defined as an array of inner products, so the \(i\)th row and \(j\)th column of the product is 
$$ c_{ij} = \sum_{k=1}^q a_{ik} b_{kj} $$
where \(q\) is the number of columns of \(A\), which must be the same as the number of rows of \(B\) or else the multiplication isn't defined. 

The trace function just adds up the diagonal of a matrix (usually assumed to be square, although that isn't strictly necessary). So in the product \(AB = C\), we only care about the diagonal as far as the trace is concerned. That subset of elements of \(AB = C\) is 
$$ c_{ii} = \sum_{k=1}^q a_{ik} b_{ki}. $$
If the matrices swap positions \(BA = D\), then the diagonal stripe elements are
$$ d_{ii} = \sum_{k=1}^p b_{ik} a_{ki}, $$
where  \(p\) is the number of columns of \(B\), which is the same as the number of rows of \(A\). 

The trace is the sum of these diagonal elements, so if trace \(AB\) = trace \( BA \), then we must have
$$\sum_{i=1}^q \sum_{k=1}^p a_{ik} b_{ki} = \sum_{i=1}^p \sum_{k=1}^q b_{ik} a_{ki},$$
assuming the result is a square matrix \(p \times p \). To see that this is true, we just need to swap the names of the counting indices \(i\) and \(k\) on one side of the equation. These names are arbitrary, and afterwards the expressions are identical. 

This, however, is not very enlightening to me. The problem is that working with inner products forces us to get into the weeds, mucking around with sums. I don't want to do that.

Outer Products

Let's abstract the matrices to think of them as organizations of vectors (one-dimensional matrices). By convention, these are column matrices, so the outer product form is
$$ AB = \begin{bmatrix} a_1 & \dots & a_q \end{bmatrix} \begin{bmatrix} b_1^t \\ \vdots \\ b_q^t \end{bmatrix} = a_1 b_1^t  + \dots + a_q b_q^t $$
 
where \(A\) is expressed as columns and \(B\) as rows (hence the transpose on each vector). Each term on the right is a \(p \times p \) matrix resulting from the outer product of a column of \(A\) and a row of \(B\). The trace takes the diagonal elements of these square matrices, and since the sum of the trace is obviously the trace of the sum, we have a sum of terms, with the first one (the first column of \(A\) times the first row of \(B\) ) expressed as
$$\text{tr}  (a_1 b_1^t) =  \text{tr} \begin{bmatrix} a_{11} b_{11} & \dots & a_{11} b_{q1} \\ \vdots \\
a_{1q} b_{11} & \dots & a_{1q} b_{q1} \end{bmatrix} = a_{11} b_{11} + \dots + a_{1q} b_{q1} =
a_1^t b_1.$$
I'm abusing the notation for convenience. The one-subscript variables are column matrices, and the two-subscript versions are individual elements. The equations above illuminate the key property of the trace function: it converts outer products to inner products: \( \text{tr}(a_1 b_1^t  + \dots + a_q b_q^t) = a_1^t b_1  + \dots + a_q^t b_q \). 
 
Commuting the original matrix product gives the inner product form directly.
$$\text{tr} \left( \begin{bmatrix} b_1^t \\ \vdots \\ b_q^t \end{bmatrix} \begin{bmatrix} a_1 & \dots & a_q \end{bmatrix} \right) = b_1^t a_1 + \dots + b_q^t a_q $$
 
Since for real vectors the inner product is symmetric, i.e. \( x^ty = y^tx \), the sums for \( \text{tr} AB \) and \( \text{tr} AB \) are equal.
 

An Application

The trace has interesting properties, such as being the sum of eigenvalues of a square matrix, but its main character is that it converts outer products to inner products. Or the reverse, which can be a useful analytical trick. The proof below comes from page 95 of Linear models in Statistics by Alvin C. Rencher.
 
 
 
A clever trick in this proof is "expanding" a scalar result of an inner product into a whole matrix because the trace is the same for both. Then the properties of the matrix are used to reduce to the given form. It was difficult for me to understand what was going on here before developing the intuition about the inner/outer product conversion of trace.

 
 


 


No comments:

Post a Comment