In these lectures we will explore a modern formulation of several aspects of fundamental physics which features symmetries and geometry. As we will see, making use of such mathematical concepts not only deepens our understanding of many classical topics, but paves the way to many of the great conquests of 20th century physics. Our point of view will mostly be that of action principles and classical field theories (or just the mechanics of point particles sometimes) and we will review those aspects that are needed as we go along. Sometimes, it will prove worthwile to change perspective and consider quantum mechanics instead, but we will limit ourselves to a few crucial topics for this.
There are many more advanced topics that are outside the scope of these lectures, but that will come too close for us to completely ignore. The short detours into this terrain that we will undertake are meant to stimulate your intellectual curiosity, but do not form part of the examinable content. The corresponding sections will be clearly marked with a \(^{\bf \ast}\).
These lectures combine material that naturally belongs together, but can rarely be found all in one source. Furthermore, many standard texts on the subjects covered pitch their material at a somewhat higher level. While I hope you take a look at the references given, don’t get discouraged if what you find seems very difficult. These notes form the core of the material you should use in this course and the exercises given reflect the level of understanding I expect you to achieve. Only what is discussed in the lecture notes (and does not have a \(^{\bf \ast}\)) forms the examinable material.
In (classical) theoretical physics, we typically specify which kind of physical system we are talking about by stating the physical degrees of freedom, e.g. some generalized coordinates \(q_i\) and their derivatives \(\dot{q}_i\) with respect to ‘time’, which uniquely determine the configuration of our system, together with an action \(S[q_i,\dot{q}_i]\) that encodes the dynamics. We will introduce actions formally later, so if you are unfamiliar with actions or feel a little rusty, nevermind, all that matters for this discussion is that the action is a functional of \(q_i(t)\) and \(\dot{q}_i(t)\) which determines the equations of motion by demanding that the equations of motion are extrema of \(S\).
Defining systems using an action \(S\) a sensible1 definition of what is a symmetry might be
Definition 1. Let \(g\) be an invertible map \[g:\hspace{.5cm} \begin{aligned} q_i &\rightarrow g(q_i) \\ \dot{q}_i &\rightarrow g(\dot{q}_i) \end{aligned}\] such that \[S[g(q_i),g(\dot{q}_i)] = S[q_i,\dot{q}_i]\, .\] Then \(g\) is called a symmetry of the action \(S\). Note that this must hold true no matter what \(q_i\) and \(\dot{q}_i\) are.
Given an action we can ask about the set \(G\) of all of its symmetries (and this is something we will frequently do), and our definition above has two immediate consequences for that:
The identity map \(g= \mathds{1}\) is a (rather trivial) symmetry.
For any two symmetries \(g\) and \(g'\) we can form another symmetry by compositon: \[S[g(g'(q_i)),g(g'(\dot{q}_i))] = S[g'(q_i),g'(\dot{q}_i)] = S[q_i,\dot{q}_i]\, .\] Here it is crucial that \(S\) stays invariant no matter what \(q_i\) and \(\dot{q}_i\) are. In particular, it does not matter to apply our definition that we act with \(g\) on \(g'(q_i),g'(\dot{q}_i)\) instead of \(q_i,\dot{q}_i\).
If you consider the above overly formal, we can unpack what it means intuitively: if you can map the system using both \(g\) and \(g'\) without changing \(S\), you might as well first act with \(g'\), after which \(S\) did not change (read the above equation from right to left). Then we subsequently apply \(g\), and again the action does not change. We hence acted with both maps without changing the action.
The properties of the set of symmetries we have just uncovered are exactly what is called a group (or rather, this is why groups are defined the way they are). Recall the following from Linear Algebra I:
Definition 2. A group is a set \(G\) equipped with an operation \(\circ: G \times G \rightarrow G\) such that
there is an identity element \(e\) in \(G\) such that \(x\circ e = e \circ x = x\) for all \(x \in G\);
if \(x,y \in G\) then \(x\circ y \in G\);
\((x\circ y)\circ z = x \circ (y\circ z)\) for all \(x,y,z \in G\).;
for each \(x \in G\), there exists an inverse \(x^{-1}\) in \(G\) with \(x^{-1} \circ x = e\);2
Proposition 1. The symmetries of a classical action form a group.
The composition \(\circ\) is just the composition of maps here and the identity \(e\) is the identity map \(g = \mathds{1}\). We just seen the identity is a symmetry and that the composition of symmetries is a symmetry as well, taking care of i) and ii). Composition of maps satisfies associativity (in each case we simply apply three maps after each other), so we also have iii). Finally, we have assumed that symmetries are invertible maps, so we also have iv). \(\square\)
1. Let \(\mathbb{C}\) be the complex numbers and \(\mathbb{C}^* = \mathbb{C}\setminus \{0\}\). Which of these is a group under addition? Which of these is a group under multiplication?
Note that the way we introduced groups did not depend on groups acting in a physics context. More generally, groups naturally appear as symmetries of whatever object you are interested in. Take a moment to think about how you would argue that the defining properties of a group are present for any symmetries that you can think of.
REMARK: In the last years it has become widely appreciated that one needs to rethink the notion of symmetry somewhat when discussing quantum systems with extended objects, and things become more complicated than what we discuss here. Here, a suitable notion of symmetry is different to ‘it is a transformations that leaves the action invariant’, which in turn leads to the possibility that symmetries are no longer groups, which is exactly what happens. This is an active field of research at present, here is a popular science article about the topic .
Symmetries have remarkable consequences on the physics in that they imply conservation laws via what is called ’Noether’s theorem’. Understanding which symmetries an action has hence immediately gives us conserved quantities which we can use in turn to constrain the dynamics. Here is a famous example without which we would not exist: the ‘Standard Model of particle physics’ classically has a symmetry which implies ‘baryon number conservation’. This in turn implies that the proton cannot decay into any of the particles lighter than itself, as this would violate the conservation law. Without this restriction, protons could decay into positrons (the electron’s anti particle) and the world as we know it would end in a flash. So we can explain that the proton is stable by a symmetry! If you try to come up with an extension of the Standard Model you better be careful not to violate this symmetry3.
Instead of investigating if a given action has some group of symmetries, we can hence turn things around and try to construct actions symmetric under a given group \(G\) if the consequences of the symmetry match experiments. This point of view is precisely what people mean when they say that ‘we understand’ fundamental physics using symmetries. Imagine we have good reasons to write down an action \[S[q] = S_1[q] + \alpha S_2[q]\] with a parameter \(\alpha\), and measurements tell us that \(\alpha = 0\). You will immediately write a publication and receive much fame if you can find a symmetry \(G\) which leaves \(S_1\) invariant, but not \(S_2\). In this case we simply add ’invariance under \(G\)’ as a fundamental requirement which forces \(\alpha=0\).
Having established the relevance of groups we of course want to study them in more detail. Two questions that immediately come to mind (and which correspond to topics 1 and 2) are ‘what are they like?’ and ‘how can they act?’. To give you a feeling about the first question, consider the
Example 1. The group \(U(1)\) is the group of complex numbers of unit modulus under multiplication. For \(\boldsymbol{\varphi} = 0.. 2 \pi\) we can write any group element as \[g = e^{i \boldsymbol{\varphi}} \, .\] However, it is not true that this group is isomorphic to the interval \([0..2\pi]\) as \(\boldsymbol{\varphi}=0\) and \(\boldsymbol{\varphi} = 2\pi\) are one and the same group element. It is isomorphic to a circle \(S^1\), see figure 1. The fact that this group is topologically non-trivial will turn out to be the reason we can have magnetic monopoles in the second half of this course! In topic 1 we will examine several classes of groups that are also non-trivial spaces (‘manifolds’) in their own right, these are called ‘Lie groups’.
To appreciate the second question, consider a group that has elements which are \(n \times n\) matrices (which ones is not really important here). If this group is supposed to act on the generalized coordinates in our theory, you might be tempted to say that they might be a vector with \(n\) components and the group acts as \[\boldsymbol{q} \rightarrow M \boldsymbol{q} \, .\] However, this is not the only possibility. We might say that the generalzied coordinates in our theory are \(n \times n\) matrices \(\boldsymbol{Q}\) themselves and the group acts as \[\boldsymbol{Q} \rightarrow M^{-1} \boldsymbol{Q} M \, .\] Note that set of \(n \times n\) matrices is a vector space as well, but its dimension is \(n^2\), which is different from \(n\). The general study of how groups can linearly act on vector spaces of various dimensions (and which dimensions can occur) is called ‘representation theory’ and will discuss this in topic 2. As linear maps on vector spaces can always be written in terms of matrices, we can also say that representation theory is the question of how abstract relations such as \[g g' = g''\] can be concretely realized using matrices. In figure 2 you can see a glimpse of the beautiful structures that emerge when asking (and answering) such questions.
2. Consider the set \(S\) of real \(n \times n\) matrices.
Show that \(S\) this is a (real) vector space \(V\) of dimension \(n^2\).
Let \(U \subset S\) be the set of matrices with determinant \(1\). Is \(U\) a vector space as well?
For any matrix \(Q\) in \(V\) define a map \[g_M: Q \rightarrow M^{-1} Q M\] where \(M\) is a fixed invertible matrix. Show that \(g_M\) is a linear map on \(V\).
In Epiphany term, we will start investigating what happens when we consider group actions that vary across space-time, i.e. we let \[g = g(t,\vec{x}) \, .\] Such things are called gauge symmetries and form the underpinning for the interactions of the Standard Model of particle physics. Interestingly, demanding such symmetries forces us to have forces which are transmitted by ‘gauge bosons’ such as photons or gluons.
After spending some time formulating gauge theories as classical field theories, we will investigate the impact of the ‘gauge group’ \(G\) on the physics and will discover that the topology of \(G\) plays a central role.
Before giving formal definitions of Lie groups, Lie Algebras, let us look at some motivating examples and explore their properties a bit. These examples will serve as templates for all there is to come, so make sure you understand them well4.
1) The group \(U(1)\) derives its name from unitary complex \(1 \times 1\) matrices. Acting with \(g\) on a complex number \(c\) \[\begin{equation} \label{eq:u1_on_c} z \rightarrow g z, \,\,\, g \in U(1), \,\,\, z \in \mathbb{C} \end{equation}\] we require that the inner form \(|c|^2\) remains unchanged. As \[|z|^2 = \bar{z} z \rightarrow \bar{z} \bar{g} g z = |g|^2 |z|^2\] this implies that \(g\) is a complex number of modulus one, \(|g|^2=1\), so that we can write \(g = e^{i \phi}\) (as before). Hence
Definition 1.1. \(U(1)\) is are the complex numbers of unit modulus.
Proposition 1.1. Using multiplication as the group operation, \(U(1)\) is a group.
: The group operation can be written as \[e^{i \phi } e^{i \phi'} = e^{i(\phi+\phi')}\,.\] Let’s check that this is a group indeed.
there is an identity element \(e\) in \(G\) such that \(x\circ e = e \circ x = x\) for all \(x \in G\);
\(\leftrightarrow\)
we simply use \(1\)
if \(x,y \in G\) then \(x\circ y \in G\);
\(\leftrightarrow\)
the product of any two complex numbers of unit modulus is again of unit modulus
\((x\circ y)\circ z = x \circ (y\circ z)\) for all \(x,y,z \in G\);
\(\leftrightarrow\)
multiplication of complex numbers is associative
for each \(x \in G\), there exists an inverse \(x^{-1}\) in \(G\) such that \(x^{-1} \circ x = e\);
\(\leftrightarrow\)
for \(g = e^{i \phi}\), \(g^{-1}= e^{-i \phi}\).
Note that \(gg' = g'g\) for all \(g,g' \in U(1)\). This is a special property that has a name: ‘abelian’. \(U(1)\) is an example of an abelian group.
Definition 1.2. A group \(G\) is called abelian if \(xy = yx\) for all elements \(x\) and \(y\) of \(G\).
2) Already in the introduction we realized that the map from \(\mathbb{R}\) to \(U(1)\) given by writing \(g = e^{i \phi}\) is not a one-to-one map. If we are to do calculus on \(U(1)\), we cannot simply do calculus on \(\mathbb{R}\) and map that to \(U(1)\) which is a circle. This is the easiest non-trivial example of what we call a manifold.
3) We already introduced \(U(1)\) as acting on complex numbers as in \(\eqref{eq:u1_on_c}\). We might be interested in asking what happens when we perform an infinitesimal transformation, i.e. when \(\phi\) is very close to \(0\). In this case we can approximate the exponential to linear order and get \[z \rightarrow (1+i \phi)z\, .\] The approximation \(e^{i\phi} = (1+i \phi)\) is tangent to the group \(U(1)\) at \(g=1\), see figure 1.2.
We can try to reconstruct finite elements of \(U(1)\) by succesive infinitesimal transformations. Let us hence look at \[(1+i \phi)^2 = 1 + 2 i \phi - \phi^2 \, .\] This fails to reproduce the expansion of the exponential \[e^{i\phi} = 1 + i\phi + \frac{(i\phi)^2}{2} + \frac{(i\phi)^3}{3!} + \cdots\] but we can easily fix this by considering instead \[(1+i \phi/2)^2 = 1 + i \phi - \frac{ \phi^2}{4} \, .\] Now we at least have the linear term right, but the quadratic term is still way off. Now consider \[(1+i \phi/3)^3 = 1 + i \phi + \frac{ (i\phi)^2}{3} + \frac{(i\phi)^3}{27} \, .\] Again, we are forced to have the \(1/3\) to get the linear term right and the quadratic term came closer to \(1/2\). This continues as we consider \((1+ i \phi/n)^n\) for higher values of \(n\), we always get the linear term right and the subsequent terms come closer to the expansion of the complex exponential. One can also understand the need for \(1/n\) as follows: we are trying to reproduce a finite group element with phase \(\phi\) by taking \(n\) consecutive infinitesimal group elements. For these to match up, we need the ‘phase’ of the infinitesimal group elements to be \(\phi/n\).
We might hence guess that we can recover a finite map from an infinitesimal one by looking at \((1+ i \phi/n)^n\) and letting \(n\) go to infinity, which turns out to be correct.
Proposition 1.2. \(\lim_{n \rightarrow \infty }(1+ i \phi/n)^n = e^{i\phi}\)
We can expand the powers to find \[\begin{aligned} \lim_{n \rightarrow \infty }(1+ i \phi/n)^n &= \lim_{n \rightarrow \infty } \sum_{k=0}^n \frac{(i\phi)^k}{n^k} \binom{n}{k}\\ & = \lim_{n \rightarrow \infty } \sum_{k=0}^n \frac{(i\phi)^k}{k!} \frac{n!}{(n-k)!n^k} \end{aligned}\] This already looks like the series of the exponential except for the factor \[\frac{n!}{(n-k)!n^k} = \frac{n(n-1)(n-2)\cdots(n-k+1)}{n^k}\, .\] There are exactly \(k\) factors in the numerator of this fraction which all approach \(n\) when \(n\rightarrow \infty\). Hence this factor converges to \(1\) for any fixed \(k\). In the sum, there are terms for which \(k\) approaches \(n\) so we cannot make the above approximation, but these terms multiply \((i \phi)^n/n!\) which is subleading to the rest of the expression when \(n\rightarrow \infty\). In other words, for every \(n_0\) we can choose a large enough \(n\) such that the first \(n_0\) terms in the exponential series are approximated to any precision by the first \(n_0\) terms in \((1+ i \phi/n)^n\). What this means is that \[\begin{aligned} \lim_{n \rightarrow \infty }(1+ i \phi/n)^n &= \sum_{k=0}^\infty \frac{(i\phi)^k}{k!} = e^{i\phi} \end{aligned}\] \(\square\)
1.1. .
By working out the derivative of \[\lim_{n \rightarrow \infty }(1+ i \phi/n)^n\] with respect to \(\phi\), show that this expression satisfies the same differential equation as \(e^{i\phi}\). You may assume that you can swap the order of the limit and taking the derivative.
As both functions have the same value at \(\phi=0\) this implies that they are equal by the uniqueness of solutions of ordinary differential equations.
Consider a square matrix \(A\) and let \(g = e^{iA}\), which is defined via the Taylor series of the exponential. Show that \[\nonumber \lim_{n \rightarrow \infty }(\mathds{1}+ i A /n)^n = e^{i A}\, .\]
Let’s back up a bit and see what this example has taught us:
We defined a continuous group by demanding that its action on complex numbers leaves the inner form invariant.
This group does not have a one-to-one map to \(\mathbb{R}\), it has a ‘non-trivial topology’ and is isomorphic to \(S^1\). Due to it being a group, it has something \(S^1\) does not have: there is a special point, the identity element \(1\).
We found that infinitesimal transformations are tangent to the group at the identity element. We can recover group elements (and in fact the whole group) by iterating infinitesimal transformations and taking a limit. This is the same as exponentiating the infinitesimal element.
For many of the things that follow, we’ll need to be confident in manipulating expressions involving matrices by using index notation, so let’s review this briefly. For a vector \(\boldsymbol{v}\) in \(\mathbb{R}^n\) we’ll denote the \(n\) components (in the standard basis) by \(v_i\) with \(i\) running from \(1\) to \(n\). Likewise, we denote the components of a square \(n \times n\) matrix \(A\) by \(A_{ij}\) where \(i\) and \(j\) run from \(1\) to \(n\). Here, the first index corresponds to the row and the second to the column of \(A\).
Using this notation, we can write various familiar objects as follows
scalar product: \(\boldsymbol{v} \cdot \boldsymbol{w} = \sum_i v_i w_i\)
multiplication between a matrix and a vector: \(\boldsymbol{w} = A \boldsymbol{v} \Leftrightarrow w_i = \left(A \boldsymbol{v}\right)_i = \sum_j A_{ij} v_j\)
a matrix sandwiched between two vectors: \(\boldsymbol{w}^T A \boldsymbol{v} = \sum_{ij} w_i A_{ij} v_j\)
matrix multiplication \(C = AB \Leftrightarrow C_{ij} = \left(AB\right)_{ij} = \sum_k A_{ik} B_{kj}\)
the vector cross product in \(\mathbb{R}^3\): \(\left(\boldsymbol{w} \cross \boldsymbol{v}\right)_i = \sum_{jk}\epsilon_{ijk} w_j v_k\) (recall \(\epsilon_{ijk} = \pm 1\) if \(i,j,k\) are a cyclical/anti-cyclical permutation of 1,2,3 and zero else)
What makes such expressions easy to manipulate is that components of vectors, matrices, etc are just numbers, so we can freely rearrange them. Here are two examples: \[\boldsymbol{v}\cdot \boldsymbol{w} = \sum_i v_i w_i = \sum_i w_i v_i = \boldsymbol{w}\cdot \boldsymbol{v}\] or \[\boldsymbol{v} \cdot \left( \boldsymbol{w} \times \boldsymbol{u} \right) = \sum_{ijk} \epsilon_{ijk} v_i w_j u_k = \sum_{ijk} \epsilon_{kij} u_k v_i w_j = \boldsymbol{u} \cdot \left( \boldsymbol{v} \times \boldsymbol{w} \right)\]
As you realize, whenever there is a sum in the above expressions, the sum runs over an index which appears twice. Of course one can consider objects which are not of this type, but this is the typical state of affairs for many ‘natural’ kinds of products5 To safe time, it is customary to use ‘summation convention’, i.e. to use the convention to sum whenever an index appears twice and drop the summation sign. If one wants to write an index twice that is not summed over, one then needs to specifically say what is going on.
Here are a few more things we’ll need:
Definition 1.3. The transpose of a matrix \(A\) is simply \(A\) with rows and columns swapped, i.e. we can write \[\left( A^T\right)_{ij} := A_{ji} \, .\]
Definition 1.4. The Hermitian conjugate \(A^\dagger\) (‘A dagger’) of a matrix \(A\) is transpose \(^T\) and complex conjugation applied at the same time: \[A^\dagger := \bar{A}^T \Leftrightarrow \left(A^\dagger\right)_{ij} = \bar{A}_{ji}\]
Definition 1.5. The trace of a square matrix \(A\) is \(trA := A_{jj}\), the sum of its diagonal elements.
Definition 1.6. The determinant of a square matrix \(A\) is \[\det A := \epsilon_{i_1,i_2,\cdots i_n} A_{1 i_1}A_{2 i_2} \cdots A_{n i_n} \, .\] Here \(\epsilon_{i_1,i_2,\cdots i_n} = 1\) for an even permutation of \(\{1,2,\cdots, n\}\), \(\epsilon_{i_1,i_2,\cdots i_n} = -1\) for an odd permutation of \(1,2,\cdots, n\), and \(\epsilon_{i_1,i_2,\cdots i_n} = 0\) else. Recall that an even/odd permutation of \(\{1,2,\cdots, n\}\) is one that can be decomposed into an even/odd number of operations which swap two numbers only.
Note the use of summation convention in the last two definitions!
Proposition 1.3. The determinant has the useful properties:
\(\det AB = \det A \det B\)
\(\det A^T = \det A\)
see MATH1071: Linear Algebra I
1.2. Show using index notation that
\((A+B)^T = A^T + B^T\)
\((AB)^T = B^T A^T\)
\(tr(cA) = c\, tr(A)\)
\(tr(AB) = tr(BA)\)
\(trA^T = trA\)
\(tr(A+B) = trA + trB\)
\((A \boldsymbol{v}) \cdot (B \boldsymbol{w}) = \boldsymbol{v}\, (A^T B)\, \boldsymbol{w}\)
\(\det A^\dagger = \overline{\det A}\)
\(\det cA = c^n \det A\)
where \(A\) and \(B\) are complex \(n \times n\) matrices, \(\boldsymbol{v}\) and \(\boldsymbol{w}\) are vectors with \(n\) components, and \(c\) is a number.
1) The group \(SU(2)\) is the group of special unitary \(2 \times 2\) matrices. Special refers to \(\det g = 1\) and unitary means they keep the inner product (or canonical ‘length’) in \(\mathbb{C}^2\) invariant when multiplying a vector \(\boldsymbol{z} \in \mathbb{C}^2\) by \(g\): \[\boldsymbol{z} = \begin{pmatrix} z_1 \\ z_2 \end{pmatrix} \rightarrow g \boldsymbol{z} = \begin{pmatrix} a & b \\ c & d \end{pmatrix}\begin{pmatrix} z_1 \\ z_2 \end{pmatrix} \, .\] The inner product \[|\boldsymbol{z}|^2 = \bar{\boldsymbol{z}} \cdot \boldsymbol{z} = \bar{z}_1 z_1 + \bar{z}_2 z_2\] transforms as \[|\boldsymbol{z}|^2 \rightarrow \bar{g} \begin{pmatrix} \bar{z}_1 \\ \bar{z}_2 \end{pmatrix} \cdot g \begin{pmatrix} z_1 \\ z_2 \end{pmatrix}\]
Let us write this out in components \[\bar{g} \begin{pmatrix} \bar{z}_1 \\ \bar{z}_2 \end{pmatrix} \cdot g \begin{pmatrix} z_1 \\ z_2 \end{pmatrix} = \bar{g}_{ij} \bar{z}_j \,\, g_{ik} z_k = \bar{g}_{ij} g_{ik} \bar{z}_j z_k = \bar{z}_j g^\dagger_{ji} g_{ik} z_k = \bar{\boldsymbol{z}}^T g^\dagger g \boldsymbol{z} = \boldsymbol{z}^\dagger g^\dagger g \boldsymbol{z}\, .\] The above implies that we need \(g^\dagger = g^{-1}\). We hence make the
Definition 1.7. \(SU(2)\) is the group of complex \(2 \times 2\) matrices \(g\) with \(\det g = 1\) and \(g^\dagger g = \mathds{1}\).
Let us check that all of this makes sense, i.e. that this is a group indeed.
there is an identity element \(e\) in \(G\) such that \(x\circ e = e \circ x = x\) for all \(x \in G\);
\(\leftrightarrow\)
we simply use \(\mathds{1}\) which satisfies \(\mathds{1}^\dagger \mathds{1}= \mathds{1}\) and \(\det \mathds{1}=1\), so is part of \(SU(2)\).
if \(x,y \in G\) then \(x\circ y \in G\);
\(\leftrightarrow\)
for two elements \(g,g'\) in \(SU(2)\) we have \(\det (g g') = \det g \det g' = 1\) and \((gg')^\dagger gg' = (g')^\dagger g^\dagger g g' = \mathds{1}\).
\((x\circ y)\circ z = x \circ (y\circ z)\) for all \(x,y,z \in G\);
\(\leftrightarrow\)
matrix multiplication is associative
for each \(x \in G\), there exists an inverse \(x^{-1}\) in \(G\) such that \(x^{-1} \circ x = e\);
\(\leftrightarrow\)
First note that as \(g^\dagger g =\mathds{1}\) \(g^\dagger\) plays the role of the inverse. The question hence is if \(g^\dagger\) is in \(SU(2)\) whenever \(g\) is in \(SU(2)\). We work out \[\det g^\dagger = \det \bar{g}^T = \overline{\det g^T} = \overline{\det g} = 1 \, .\] and \[(g^\dagger)^\dagger g^\dagger = g g^\dagger = \mathds{1}\, .\]
With the operation of matrix multiplication we hence have defined a group. Note that even though we needed to work through some equations using properties of the \(^\dagger\), we could have anticipated this by observing that we defined this group as a set of linear maps (to get associativity) that are a symmetry of the inner form on \(\mathbb{C}^2\).
2) Next, let us describe what \(SU(2)\) looks like and how we can parametrize it. For any invertible complex \(2 \times 2\) matrix \[g = \begin{pmatrix} a& b\\ c & d \end{pmatrix}\] we can write \[g^{-1} = \frac{1}{\det g} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix}\] As \(\det g = 1\), \(g^{-1} = g^\dagger\) hence implies \[\begin{pmatrix} d & -b \\ -c & a \end{pmatrix} = \begin{pmatrix} \bar{a} & \bar{c} \\ \bar{b} & \bar{d} \end{pmatrix}\] i.e. the most general matrix in \(SU(2)\) can be written as \[\begin{equation} \label{eq:gen_su2_matrix} g = \begin{pmatrix} a & b \\ -\bar{b} & \bar{a} \end{pmatrix} \end{equation}\] and \(\det g = 1\) implies \(|a|^2 + |b|^2 = 1\). As \(a\) and \(b\) are complex numbers, we can write \(a=x_1 + i x_2\) as well as \(b = x_3 + ix_4\) and find \[\begin{equation} \label{eq:su2inr4} SU(2): \{\boldsymbol{x} = (x_1,x_2,x_3,x_4) \in \mathbb{R}^4 | x_1^2 + x_2^2 + x_3^2 + x_4^2 = 1\} \, . \end{equation}\] This is the defining equation of a three-sphere6 , i.e. \(SU(2)\) is a space which looks like the three-sphere \(S^3\) and \(a=x_1 + i x_2\), \(b = x_3 + ix_4\) define an embedding of \(S^3\) into \(\mathbb{R}^4\).
3) In the \(U(1)\) example we managed to write the whole group as the complex exponential of something simple, a real number, which generated the infinitesimal group elements. Let’s try something similar here and write \[g = e^{iA}\] for a matrix \(A\) and a group element \(g\) of \(SU(2)\). The exponential of a matrix is defined via the series \[e^{iA} = \sum_{k=0}^{\infty} \frac{(iA)^k}{k!} \, .\] Let us first impose unitarity \(g^\dagger = g^{-1}\) and see what this implies for \(A\). As \((A^n)^\dagger = (A^\dagger)^n\) we have \[g^\dagger = e^{-i A^\dagger}\, ,\] and as furthermore \[g^{-1} = e^{-iA}\] we find \[A^\dagger = A \, .\] Such matrices are called ‘Hermitian’ and they play an important role in quantum mechanics. You may be familiar with exponentials of Hermitian matrices giving unitary ones from there.
Next we investigate \[\begin{aligned} \det g &= \det e^{iA} \end{aligned}\]
Proposition 1.4. For a \(2 \times 2\) matrix \(A\), we have \(\det e^{iA} = e^{i trA}\).
: Let us first write \[\begin{aligned} \det g &= \det e^{iA} = \det \lim_{n \rightarrow \infty} (\mathds{1}+iA/n)^n = \lim_{n \rightarrow \infty} \det (\mathds{1}+iA/n)^n \\ &= \lim_{n \rightarrow \infty} \left[\det (\mathds{1}+iA/n)\right]^n \end{aligned}\] We can write the determinant explicitely as \[\det (\mathds{1}+ \frac{iA}{n}) = \left(1 + \frac{i A_{11}}{n}\right)\left(1+ \frac{i A_{22}}{n}\right) - \frac{i^2 A_{21}A_{12}}{n^2} = 1 + \frac{i trA}{n} + ... \,,\] where the dots stand for terms of order \(n^{-2}\). In the limit \(n \rightarrow \infty\) these terms are subleading and we hence have \[\det g = \det e^{iA} = \lim_{n \rightarrow \infty} \left(1 + i\, trA/n\right)^n = e^{i trA}\, .\]
1.3. For a general \(k \times k\) matrix \(M\) show that
\(\det e^{M}= e^{trM}\) .
Use this to conclude that for \(g=e^{M}\) we have \(\log \det g = tr\log g\) . Here the \(\log\) of a matrix is defined as the inverse function of the exponential.
The requirement \(\det g = 1\) now implies \(e^{i trA} = 1\) which gives \(trA = 0\) 7. If we can write \(g \in SU(2)\) as a complex exponential, we hence have to use traceless hermitian \(2 \times 2\) matrices in the exponent. For \(g \in SU(2)\) writing \(g = e^{iA}\) implies that \(A \in T\) where \[T = \left\{ A | A^\dagger = A, trA = 0 \right\}\, .\] Whenever \(A,B\) are in \(T\), then also \(aA+bB\) for \(a,b, \in \mathbb{R}\) are in \(T\): we have \[\begin{aligned} tr( aA+bB ) & = a \, trA + b\, trB = 0 \\ (aA+bB)^\dagger & = \bar{a} A^\dagger + \bar{b}B^\dagger = aA +bB \, . \end{aligned}\] This means that \(T\) is a vector space over the real numbers. It is not too hard to convince yourself8 that \(T\) has dimension \(3\). This is a real vector space which contains (in general) complex matrices, so it really pays off to be able to think of vector spaces abstractly! We can make the following choice of basis vectors for \(T\) \[\begin{equation} \label{eq:pauli_matrices} \sigma_1 = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \,\,, \hspace{.3cm} \sigma_2 = \begin{pmatrix} 0 & -i \\ i & 0 \end{pmatrix} \,\,, \hspace{.3cm} \sigma_3 = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}\, . \end{equation}\]
These three matricess are know as the ‘Pauli matrices’.9. By a direct computation one can work out that
Proposition 1.5. The Pauli matrices satisfy \[= 2 i \epsilon_{ijk} \sigma_k\] where \([a,b] \equiv ab-ba\) is the commutator. :
1.4. Show that \[= 2 i \epsilon_{ijk} \sigma_k\] where \(\sigma_i\) are the Pauli matrices.
Note in particular that different \(g \in SU(2)\) in general do not commute, i.e. in general \(g_1 g_2 \neq g_2 g_1\). Similarly, the Pauli matrices do not commute with each other, so that in general \[\begin{aligned}
e^{i \alpha_1 \sigma_1} e^{i \alpha_2 \sigma_2} \neq e^{i \alpha_2 \sigma_2} e^{i \alpha_1 \sigma_1} \\
e^{i \alpha_1 \sigma_1} e^{i \alpha_2 \sigma_2} \neq e^{i \alpha_1 \sigma_1 + i \alpha_2 \sigma_2}
\end{aligned}\] \(SU(2)\) is an example of a non-abelian Lie group.
We can form group elements of \(SU(2)\) by exponentiating arbitrary real linear combinations of the Pauli matrices. Let \(\boldsymbol{\alpha} = (\alpha_1,\alpha_2,\alpha_3)\) and write \[g(\boldsymbol{\alpha}) = e^{i \alpha_j \sigma_j} = e^{i \boldsymbol{\alpha} \boldsymbol{\sigma} }\,\,\,\,\, \alpha_k \in \mathbb{R}.\] As \(\alpha_k \sigma_k\) is traceless and Hermitian for any real \(\boldsymbol{\alpha}\), it follows that \(g \in SU(2)\).
We hence get a group element for every vector \(\boldsymbol{\alpha} \in \mathbb{R}^3\). This map cannot possibly be injective as \(\mathbb{R}^3\) is not the same as \(S^3\). To make contact with the earlier characterization \(SU(2) \simeq S^3\) let us try to work out what kind of coordinates the \(\alpha_k\) give us. As a warmup, let us first consider \[g((\alpha_1,0,0)) = e^{i \alpha_1 \sigma_1 } = \sum_{k=0}^\infty \frac{(i \alpha_1 \sigma_1)^k}{k!} = \sum_{k =\mbox{\scriptsize even}} \frac{(i \alpha_1 \sigma_1)^k}{k!} + \sum_{k =\mbox{\scriptsize odd}} \frac{(i \alpha_1 \sigma_1)^k}{k!}\] As \(\sigma_i^2 = \mathds{1}\), all of the matrix powers in the sum over even \(k\) are equal to \(\mathds{1}\), and all of the matrix powers in the sum over odd \(k\) are equal to \(\sigma_1\). Hence \[e^{i \alpha \sigma_1 } = \sum_{k =\mbox{\scriptsize even}} \frac{(i \alpha_1)^k}{k!} \mathds{1}+ \sum_{k =\mbox{\scriptsize odd}} \frac{(i \alpha_1 )^k}{k!} \sigma_1 = \cos(\alpha_1) \mathds{1}+ i \sin(\alpha_1) \sigma_1 \, .\] You can think of this as a generalization of Euler’s formula \(e^{i\phi} = \cos \phi + i \sin \phi\). Note in particular that \(\alpha_1 + 2 \pi\) maps to the same group element as \(\alpha_1\), so we see the non-injectivity of the exponential map explicitely.
As \(\sigma_1\) commutes with itself, we can write \[g((\alpha_1,0,0)) g((\alpha_1',0,0)) = e^{i \alpha_1 \sigma_1 } e^{i \alpha_1' \sigma_1 } = e^{i (\alpha_1 + \alpha_1') \sigma_1 } = g((\alpha_1+\alpha_1',0,0))\] so that matrices of this form are a subgroup of \(SU(2)\):
Definition 1.8. For a group \(G\), \(H \subset G\) is called a subgroup if the elements of \(H\) form a group with the group composition of \(G\).
Let us quickly check that elements of the form \(g((\alpha_1,0,0))\) form a group in their own right (we already know all of them are in \(SU(2)\)). I copied the definition for you so we can check
there is an identity element \(e\) in \(G\) such that \(x\circ e = e \circ x = x\) for all \(x \in G\);
\(\leftrightarrow\)
we simply use \(\alpha = 0\) which gives \(g((0,0,0)) = \mathds{1}\)
if \(x,y \in G\) then \(x\circ y \in G\);
\(\leftrightarrow\)
this works as \(g((\alpha_1,0,0)) g((\alpha_1',0,0)) = g((\alpha_1+\alpha_1',0,0))\)
\((x\circ y)\circ z = x \circ (y\circ z)\) for all \(x,y,z \in G\);
\(\leftrightarrow\)
matrix multiplication is associative
for each \(x \in G\), there exists an inverse \(x^{-1}\) in \(G\) such that \(x^{-1} \circ x = e\);
\(\leftrightarrow\)
we have \(g((\alpha_1,0,0)) g((-\alpha_1,0,0)) = \mathds{1}\)
This subgroup ‘looks like’ or ‘works in the same way’ as \(U(1)\) parametrized as \(e^{i\phi }\) if we set \(\alpha_1 = \phi\). A more precise way to say this is to use the word group isomorphism:
Definition 1.9. For two groups \(G\) and \(H\), a group homomorphism is a map \(f: G \rightarrow H\) such that \[f(g_1 \circ_G g_2) = f(g_1) \circ_H f(g_2)\]
Note that in this definition we are using the group composition in \(G\) on the left side and the group composition in \(H\) on the right side. The map \(f\) is hence compatible with the group structures of \(G\) and \(H\). Note further that this definition does not assume that \(f\) is injective or surjective.
Example 1.1. The map \[g: x \rightarrow g(x)= e^x\] is a group homomorphism from \(\mathbb{C}\) (with composition \(\circ_+\) being addition) to \(\mathbb{C}^*\) (with composition \(\circ_\ast\) being multiplication). We can check that for every \(x,y \in \mathbb{C}\) we find \[g(x \circ_+ y) = g(x+y) = e^{x+y} = e^x e^y = g(x) g(y) = g(x) \circ_\ast g(y) \, .\] Saying that this is a homomorphism is just another way to express the ‘nice’ property of the exponential that \(e^{x+y} = e^x e^y\).
1.5. Let \(f\) be a homomorphism between two groups \(G\) and \(H\). Show that
\(f(e_G) = e_H\) where \(e_G\) and \(e_H\) are the unit elements of \(G\) and \(H\), respectively.
\(f(g^{-1}) = f(g)^{-1}\) for any \(g \in G\).
1.6. \(U(2)\) is the group of complex \(2 \times 2\) matrices \(g\) such that \(g^\dagger = g^{-1}\), with the group composition being matrix multiplication. Let \(F\) be the map which sends \[g \mapsto \det g \, .\] Show that \(F\) is a group homomorphism from \(U(2)\) to \(U(1)\).
Definition 1.10. For two groups \(G\) and \(H\), a group isomorphism is a map \(f: G \rightarrow H\) which is one-to-one and a group homomorphism.
If two groups have a group isomorphisms, they are the same: the have the same elements that have the same group composition rule, i.e. in the present case we can identify \[e^{i \phi} \leftrightarrow g((\phi,0,0))\, ,\] using the earlier presentation of \(U(1)\). As the composition rule of the elements \(g((\alpha_1,0,0))\) is the same as that of elements of \(U(1)\) this is a group isomorphism. It is hence fair to say that there is a \(U(1)\) sitting inside of \(SU(2)\).
We can summarize the above observation as
Proposition 1.6. There is an injective group homomorphism \(U(1) \rightarrow SU(2)\), the image of which is a \(U(1)\) subgroup of \(SU(2)\).
A similar computation shows that the same works not just for \(\boldsymbol{\alpha}\) of the form \(\boldsymbol{\alpha} = (\alpha_1,0,0)\) but for any subset of \(\boldsymbol{\alpha}\)s that can be written as \(\boldsymbol{\alpha}= t \boldsymbol{\alpha}_0\), so that there are in fact infinitely many \(U(1)\) subgroups of \(SU(2)\). These deserve a special name:
Definition 1.11. A subgroup \(G_\alpha\) of \(SU(2)\) whose elements are of the form \[G_\alpha = \{ e^{i t \boldsymbol{\alpha} \boldsymbol{\sigma} } |t \in \mathbb{R}\}\] for some fixed \(\alpha\) is called the one-parameter subgroup generated by \(\alpha\).
What is nice about the parametrization in terms of exponentials of matrices is that we can easily work out all infinitesimal elements. When \(\boldsymbol{\alpha}\) is very small, we can approximate by \[g(\boldsymbol{\alpha}) \simeq \mathds{1}+ i \alpha_j \sigma_j = \mathds{1}+ i\, \boldsymbol{\alpha} \cdot \boldsymbol{\sigma} \,.\] Equivalently, the space of tangent vectors at \(g = \mathds{1}\) is spanned by the \(\sigma_j\) (times \(i\)), see figure 1.3 and a general vector is written as \(i\, \boldsymbol{\alpha} \cdot \boldsymbol{\sigma}\). Convince yourself that this is indeed a vector space. As before, we can get back elements of \(SU(2)\) by an infinite iteration of infinitesimal elements: \[g(\boldsymbol{\alpha}) = \lim_{n \rightarrow \infty} (\mathds{1}+ i\, \boldsymbol{\alpha} \cdot \boldsymbol{\sigma}/n)^n = e^{i \boldsymbol{\alpha} \boldsymbol{\sigma} }\, .\] What is more, we can think of recovering any infinitesimal generator \(i\boldsymbol{\alpha} \boldsymbol{\sigma}\) by considering a path \[s(t) = e^{i t \boldsymbol{\alpha} \boldsymbol{\sigma} }\] in \(SU(2)\) and taking a derivative w.r.t \(t\) evaluated at \(t=0\) (which corresponds to \(\mathds{1}\in SU(2)\): \[\left. \frac{\partial s(t)}{\partial t}\right|_{t=0} = \left.\frac{\partial}{\partial t}e^{i t \boldsymbol{\alpha} \boldsymbol{\sigma} }\right|_{t=0} = i \boldsymbol{\alpha} \boldsymbol{\sigma} \, .\]
A natural question that is hopefully on your mind is the surjectivity of the map from \(i \boldsymbol{\alpha} \boldsymbol{\sigma}\) to elements of \(SU(2)\). Can we reach any element via the exponential? We can approach this by brute force and start working out
1.7.
Show that \[\begin{equation} \label{eq:cliffordpauli} \sigma_i \sigma_j + \sigma_j \sigma_i = 2 \delta_{ij} \mathds{1} \end{equation}\] where \(\sigma_i\) are the Pauli matrices.
Show that \[g(\boldsymbol{\alpha}) = e^{i \boldsymbol{\alpha} \boldsymbol{\sigma}} = \begin{pmatrix} \cos(a) + i \sin(a) a_3/a & \sin(a) a_2/a + i \sin(a) a_1/a \\ -\sin(a) a_2/a + i \sin(a) a_1/a &\cos(a) - i \sin(a) a_3/a \end{pmatrix}\] where \(a = \sqrt{\alpha_1^2 + \alpha_2^2+\alpha_3^2}\). [hint: write \(\boldsymbol{\alpha} = a \boldsymbol{n}\) with \(|\boldsymbol{n}|^2=1\), i.e. \(n_j = \alpha_j/a\)]
As expressions of the type \(AB+BA\) appear frequently they are given a special name:
Definition 1.12. For two matrices \(A,B\), the anti-commutator \(\{.,.\}\) is \[\{A,B\} = AB+BA\, .\]
The remaining question is if we can choose an \(\alpha\) that maps to any element of \(SU(2)\). We will examine if every point on the \(S^3\) that is \(SU(2)\) is in the image of this map. We write a general group element as \[g = \begin{pmatrix} x_1 + i x_2 & x_3 + i x_4 \\ -x_3 + i x_4 & x_1 - i x_2 \end{pmatrix}\] where \(x_i \in \mathbb{R}\) subject to \(x_1^2 + x_2^2 + x_3^2 + x_4^2 = 1\). First note that \(x_1 \in [-1,1]\). Fixing \(x_1\) gives a slice of \(S^3\) that is a two-sphere of radius \(\sqrt{1 - x_1^2}\). Comparing to the form of \(g(\boldsymbol{\alpha})\) we can always choose \(a\) such that \(\cos(a) = x_1\). Note that this is not unique, but there always exists and \(a\) such that this is satisfied for any \(x_1\). The other variables are mapped as \[\begin{equation} \label{eq:alphavsx} \begin{pmatrix} x_2 \\ x_3 \\ x_4 \end{pmatrix} = \frac{\sin(a)}{a} \begin{pmatrix} \alpha_3 \\ \alpha_2 \\ \alpha_1 \end{pmatrix}\, . \end{equation}\] The question is now if we can always find \(\alpha\) such that this equation is satisfied for every \(x_1,x_2,x_3,x_4\) subject to \(x_1^2 + x_2^2 + x_3^2 + x_4^2 = 1\). As we have already set \(\cos(a) = x_1\), this fixes the length of \(\boldsymbol{\alpha}\) to be \[\boldsymbol{\alpha}^2 = \alpha_1^2 + \alpha_2^2 + \alpha_3^2 = a^2\] which is a two-sphere as well. The points on this two-sphere are mapped to the two-sphere \[x_2^2 + x_3^2 + x_4^2 = 1 - x_1^2\] by \(\eqref{eq:alphavsx}\), which is a one-to-one map. This is consistent as \[x_2^2 + x_3^2 + x_4^2 = 1-x_1^2 = 1- \cos(a)^2 = \sin(a)^2 = \frac{\sin(a)^2}{a^2} (\alpha_1^2 + \alpha_2^2 +\alpha_3^2)\, .\] Hence we have shown that
Theorem 1.1. Every element of the group \(SU(2)\) can be written as \[g(\boldsymbol{\alpha}) = e^{i \boldsymbol{\alpha}\cdot \boldsymbol{\sigma} }\] where \(\sigma_j\) are a basis for infinitesimal transformations (which can be chosen as the matrices \(\eqref{eq:pauli_matrices}\)) and \(\alpha_j \in \mathbb{R}\) .
1.8. Let \(G\) be the set of complex \(2 \times 2\) matrices of the form \[g = \begin{pmatrix} \alpha & \beta \\ - \bar{\beta} & \bar{\alpha} \end{pmatrix}\] for \(\alpha, \beta \in \mathbb{C}\) and \(|\alpha|^2 + |\beta|^2 \neq 0\).
Show that \(G\) is a group using matrix multiplication as the group operation.
Show that \(SU(2)\) is a subgroup of \(G\).
Show that \(V:= \left\{\gamma | g = e^{i \gamma} \in G\right\}\) is a vector space and find a basis for \(V\).
Let’s back up again and see what this example has taught us:
We defined a continuous group by demanding that its action on vectors in \(\mathbb{C}^2\) leaves the inner form invariant.
This group does not have a one-to-one map to \(\mathbb{R}^3\), it has a ‘non-trivial topology’ and can be identified with \(S^3\). Again there is the special point \(\mathds{1}\) lying on this sphere.
We found that infinitesimal transformations are tangent to the group at the identity element. We can recover group elements by iterating infinitesimal transformations and taking a limit. This is the same as exponentiating the infinitesimal element and gives us a surjective map to \(SU(2)\). This is quite nice, as it means we can do all of our computations using the algebra of Pauli matrices \(\sigma_i\) instead of the group \(SU(2)\).
Definition 1.13. The group \(SO(3)\) is the group of real \(3 \times 3\) matrices \(S\) such that \(S^T = S^{-1}\) and \(\det S=1\).
In this example, we will explore the global structure of \(SO(3)\) by examining a clever map from \(SU(2)\) to \(SO(3)\). Let’s take the components of \(\boldsymbol{v}\) and rearrange them in a \(2 \times 2\) matrix \(M_{\boldsymbol{v}}\): \[M_{\boldsymbol{v}} = \begin{pmatrix}
v_3 & v_1 - i v_2 \\
v_1 + i v_2 & -v_3
\end{pmatrix}\, .\] This is just a funny way of writing \(\mathbb{R}^3\). Note that addition of vectors in \(\mathbb{R}^3\) becomes addition of matrices \(M_v\), and multiplication of vectors in \(\mathbb{R}^3\) by a real number \(c\) becomes multiplication of \(M_v\) by \(c\).
Now consider \[F(g): M_{\boldsymbol{v}} \rightarrow g M_{\boldsymbol{v}} g^\dagger = F(g)[M_{\boldsymbol{v}}]\] for \(g \in SU(2)\). What this means is that for every \(g\) in \(SU(2)\), we get a map \(F(g)\) acting on \(\mathbb{R}^3\).
Proposition 1.7. \(F\) is a group homomorphism from \(SU(2)\) to \(SO(3)\).
: problem class 1
REMARK: This homomorphism is not injective (it is not a group isomorphism), we have \[F(g) = F(-g) \, .\] as in both cases we act in the same way on \(\mathbb{R}^3\): \[\begin{aligned}
F(g): &M_{\boldsymbol{v}} \rightarrow g M_{\boldsymbol{v}} g^\dagger \, \\
F(-g): &M_{\boldsymbol{v}} \rightarrow (-g) M_{\boldsymbol{v}} (-g)^\dagger = g M_{\boldsymbol{v}} g^\dagger
\end{aligned}\] The element \(g = - \mathds{1}\) is in its kernel10, \[F(-\mathds{1})\left[M_{\boldsymbol{v}}\right] = -\mathds{1}M_{\boldsymbol{v}} (-\mathds{1})^\dagger = M_{\boldsymbol{v}}\] which is \(\mathds{1}\in SO(3)\).
You might wonder if the group homomorphism from \(SU(2)\) to \(SO(3)\) is surjective. Consider the following simple example of how a one-parameter subgroup of \(SU(2)\) is mapped: \[g(0,0,\theta) = \begin{pmatrix} e^{i\theta} & 0 \\ 0 & e^{-i\theta} \end{pmatrix} = e^{i \theta \sigma_3}\, .\] We have \[\begin{aligned} g(0,0,\theta) M_{\boldsymbol{v}} g(0,0,\theta)^\dagger =& \begin{pmatrix} e^{i\theta} & 0 \\ 0 & e^{-i\theta} \end{pmatrix} \begin{pmatrix} v_3 & v_1 - i v_2 \\ v_1 + i v_2 & -v_3 \end{pmatrix} \begin{pmatrix} e^{-i\theta} & 0 \\ 0 & e^{i\theta} \end{pmatrix} \\ =& \begin{pmatrix} v_3 & e^{2i \theta} (v_1 - i v_2) \\ e^{-2i \theta} (v_1 + i v_2) & -v_3 \end{pmatrix} \end{aligned} \,.\] As \[\begin{aligned} e^{2i \theta} (v_1 - i v_2) &= (\cos(2\theta)+i \sin(2\theta) ) (v_1 - i v_2) \\ &= v_1 \cos(2\theta) + v_2 \sin(2\theta) -i (v_2 \cos(2\theta)- v_1 \sin(2\theta) ) \end{aligned}\] this map sends \[\begin{pmatrix} v_1 \\ v_2 \\ v_3 \end{pmatrix} \mapsto \begin{pmatrix} v_1 \cos (2 \theta) + z_2 \sin(2 \theta) \\ v_2 \cos(2 \theta)-z_1 \sin (2 \theta) \\ v_3 \end{pmatrix} = \begin{pmatrix} \cos (2 \theta) & \sin(2 \theta) & 0 \\ - \sin (2 \theta) & \cos(2 \theta) & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} v_1\\ v_2\\ v_3 \end{pmatrix} \, .\] In other words \[\Phi(g(0,0,\theta)) = \begin{pmatrix} \cos (2 \theta) & \sin(2 \theta) & 0 \\ - \sin (2 \theta) & \cos(2 \theta) & 0 \\ 0 & 0 & 1 \end{pmatrix} \, .\] This \(g\) hence maps to a rotation around the \(v_3\) axis by \(2 \theta\).11 Similarly, one can show that rotations around other axes are generated by using other group elements of \(SU(2)\); e.g. for a rotation around the axis \(\boldsymbol{v}_0\) simply use \(e^{i \theta \boldsymbol{v_0} \cdot \boldsymbol{\sigma}}\).
If we can write any element of \(SO(3)\) as a composition of rotations around fixed axis, such as \(\boldsymbol{v}_1\), \(\boldsymbol{v}_2\) and \(\boldsymbol{v}_3\), we have hence proven that the homomorphism from \(SU(2)\) to \(SO(3)\) is surjective. This is indeed true as the following proposition shows:
Proposition 1.8. Every element \(R\) of \(SO(3)\) can be written as a product of three rotations \(R_{\boldsymbol{v}_i}(\phi_i)\) around fixed axis \(\boldsymbol{v}_i\) by angles \(\phi_i\): \(R = R_{\boldsymbol{v}_1}(\phi_1) R_{\boldsymbol{v}_2}(\phi_2) R_{\boldsymbol{v}_3}(\phi_3)\).
: see . You can also find a proof of this in many texts on mechanics.
Theorem 1.2. There is surjective group homomorphism from \(SU(2)\) to \(SO(3)\). It is not injective, and its kernel is the cylic group with two elements: \(\mathbb{Z}_2 = \{\mathds{1},-\mathds{1}\} \in SU(2)\). As there are exactly two points in \(SU(2)\) that are mapped to each point in \(SO(3)\), this map is called a (double) covering map. We can write \[SO(3) \simeq SU(2)/\mathbb{Z}_2 = S^3/\mathbb{Z}_2\] where the \(\mathbb{Z}_2\) acts by identifying antipodal points on the three-sphere.
: We have shown above that it is a non-injective homomorphism with kernel \(\{\mathds{1},\mathds{1}\}\) and that any rotation around a fixed axis is in the image. As any element in \(SO(3)\) can be written as a product of three rotations, any element of \(SO(3)\) is in the image of this homomorphism, it is surjective. We have also seen that \(SU(2)\) is isomorphic to \(S^3\). Antipodal points on \(S^3\) correspond to sending \(x_i \rightarrow -x_i\) for any solution to \[x_1^2 + x_2^2 + x_3^2 + x_4^2 =1.\] As \[g = \begin{pmatrix}
x_1 + i x_2 & x_3 + i x_4 \\
-x_3 + i x_4 & x_1 - i x_2
\end{pmatrix}\] \(g\) and \(-g\) are hence antipodal points in \(SU(2)\). We have seen that \(g\) and \(-g\) are mapped to the same element of \(SO(3)\). If we hence identify \(g\) and \(-g\) in \(SU(2)\), this group homomorphism becomes a group ismorphism. As the map from \(SU(2)\) to \(SO(3)\) is surjective, we conclude that \(SO(3) \simeq SU(2)/\mathbb{Z}_2 = S^3/\mathbb{Z}_2\). \(\square\)
REMARK: \(^\ast\) One may be forgiven to think that the group \(SO(3)\) of rotations in \(\mathbb{R}^3\) is a two-sphere by imagining all the positions that a vector \(\boldsymbol{v}\) in \(\mathbb{R}^3\) can be rotated to. This is NOT true, however, as there are non-trivial rotations that leave the chosen vector \(\boldsymbol{v}\) invariant. The subgroup of \(SO(3)\) that leaves any \(\boldsymbol{v}\neq 0\) invariant is a \(U(1) = S^1\) which miraculously combines with \(S^2\) to form \(S^3/\mathbb{Z}_2\). Studying the same for the double cover \(SU(2)\) reveals that \(S^3\) is in fact a fibration of \(S^1\) over \(S^2\). This is called the Hopf fibration and is very pretty.
As a final observation, let us examine the topology of those two groups by considering closed loops, i.e. continuous maps \(\phi:[0,1] \rightarrow G\) such that \(\phi(0) = \phi(1)\). For \(SU(2)\) there are no non-trivial such maps: any closed loop in \(S^3\) can be shrunk to a point. Such spaces are called simply connected.
Now consider a path going from \(\mathds{1}\) to \(-\mathds{1}\) in \(SU(2)\). Under \(F\), this maps to a closed path in \(SO(3)\) that starts and ends at \(\mathds{1}\). Let us see if we can shrink this curve in \(SO(3)\). If we continously deform this curve, it will still lift to an open curve in \(SU(2)\), although now it may go from \(g\) to \(-g\) in \(SU(2)\). But this means there is no way of shrinking it! Hence \(SO(3)\) is not simply connected. If we consider looping twice around any loop in \(SO(3)\), we can lift to a closed curve in \(SU(2)\), which we already know can be collapsed. We have hence shown that the fundamental group of \(SO(3)\) contains a \(\mathbb{Z}_2\) element (in fact, this is the whole fundamental group). For a given manifold with non-trivial fundamental group, there is a unique way to find a covering space (called universal cover) that is simply connected: \(SU(2)\) is the universal cover of \(SO(3)\).
In this example we have seen:
Another continuous group defined by demanding that its action on vectors in \(\mathbb{R}^3\) leaves the inner form invariant.
This group has an even more interesting topology; it is isomorphic to \(S^3/\mathbb{Z}_2\).
Curiously, we found this via a surjective homomorphism from \(SU(2)\) to \(SO(3)\). This in particular gave us an action of \(SU(2)\) on vectors in \(\mathbb{R}^3\) instead of the usual action on \(\mathbb{C}^2\). This came at the price that \(g\) and \(-g\) act in the same way though.
In the motivating examples, we have mostly used a fairly pedestrian approach, but have already discovered many interesting things. In the following, these will be appropriately formalized and generalized.
The groups we have investigated in our motivating examples were fundamentally different from vectors spaces. Whereas we could cover parts using (subsets) of \(\mathbb{R}^n\), we could not find one-to-one maps to these as a whole. As you might anticipate, such a behaviour is not exclusive to continuous groups, but gives rise to the more general notion of a differentiable manifold. In this section, we will introduce these objects and give some more elementary examples. The basic idea is that a differentiable manifolds \(X\) can be covered by open sets for each of which we can find a continuous one-to-one map to an open set sitting in a vector space, i.e. \(X\) is sewn together from things that we know how to handle.
For ease of exposition and in order not to venture too far into the realms of topology12 we will restrict ourselves to manifolds that are given to us as subsets of \(\mathbb{R}^m\), i.e. submanifolds of \(\mathbb{R}^m\). If you like you can try to think about how what we are saying here might be generalized, or take a look at e.g. . Considering subsets of \(\mathbb{R}^m\) is exactly the setup we need to study when talking about continuous groups of matrices. The set of all \(k \times k\) matrices is isomorphic to \(\mathbb{R}^m\) with \(m=k^2\), so that those \(k \times k\) matrices forming some given group \(G\) will naturally sit inside \(\mathbb{R}^m\).
Let us start by reviewing the notions of open and closed sets in \(\mathbb{R}^m\). An open ball in \(\mathbb{R}^m\) centred at \(\boldsymbol{p}\) is the set \[B_r (\boldsymbol{p}) = \{ \boldsymbol{x} \in \mathbb{R}^n | \parallel\boldsymbol{x}-\boldsymbol{p}\parallel ^2 < r^2 \} \, .\] Using this, we can define open and closed sets as
Definition 1.14. A subset \(U\) of \(\mathbb{R}^n\) is open if for every point \(p\) in \(U\) there is an \(r\) such that \(B_r (\boldsymbol{p})\) is fully contained in \(U\).
Definition 1.15. A subset \(U\) of \(\mathbb{R}^n\) is closed if its complement \(\mathbb{R}^n \setminus U\) is open.
Note that not every subset of \(\mathbb{R}^m\) has to be either closed or open. Furthermore, these properties are not mutually exclusive. Take some time to think of some examples that are open but not closed, closed but not open, not open and not closed, or both open and closed. Defining what we mean by open and closed subsets of a given set is called defining a topology, and the above is called the standard topology of \(\mathbb{R}^m\).
1.9. Which of the following sets are closed in the standard topology of \(\mathbb{R}^m\)? Which are open?
\(\{ 0 < x < \pi \} \subset \mathbb{R}\) with coordinate \(x\)
\(\{x_1 < -2 \} \subset \mathbb{R}^2\) with coordinates \((x_1,x_2)\)
\(\{ 0 < x \leq \pi \} \subset \mathbb{R}\)
\(\{ 0 < x_1 < 1 \} \subset \mathbb{R}^2\) with coordinates \((x_1,x_2)\)
\(\mathbb{R}^n \subseteq \mathbb{R}^n\)
\(\{(x_1,x_2)\subset \mathbb{R}^2 \, |\, x_1^2 \leq 42 - x_2^2\} \subset \mathbb{R}^2\) with coordinates \((x_1,x_2)\)
\(\{(x_1,x_2) |x_1^2 + x_2^2 = 1\} \subset \mathbb{R}^3\)
An important property of the notion of openess is that
Proposition 1.9. Arbitrary unions and finite intersections of open sets in \(\mathbb{R}^m\) are open again. For closed sets of \(\mathbb{R}^m\) it consequently works the opposite way: arbitrary intersections and finite unions are closed again.
:
1.10. Prove that arbitrary unions and finite intersections of open sets in \(\mathbb{R}^m\) are again open. Why is the intersection of an infinite number of open sets not open in general ?
We now want to be able to talk about spaces \(X\) sitting inside \(\mathbb{R}^m\). First we need to know what are open sets of \(X\), i.e. introduce a topology. This is easy: we can simply inherit the notions of open and closed from \(\mathbb{R}^m\):
Definition 1.16. For a subset \(X\) of \(\mathbb{R}^m\) we define the induced topology by declaring that \(V \subset X\) is open if \(V = U \cap X\) with \(U\) open in \(\mathbb{R}^m\). Closed sets of \(X\) are then defined as complements (in \(X\)) of open sets of \(X\). This turns \(X\) into a topological space.
Example 1.2. Let us consider \(S^1 \subset \mathbb{R}^2\) defined by \(x_1^2 + x_2^2 = 1\). By intersecting \(S^1\) with open balls we obtain open segments on \(S^1\). Writing \(x_1 = \cos \phi\), \(x_2 = \sin \phi\), these are all of the form \(\phi_1 < \phi < \phi_2\) for some \(\phi_1, \phi_2\). Of course this is what one would have naively considered open sets anyway. Arbitrary unions and finite intersections of these are then again open, as is \(\emptyset\) and all of \(S^1\).
Given the notion of a topological space allows us to define what we mean by continuous:
Definition 1.17. A map \(f:X\rightarrow Y\) between topological spaces \(X\) and \(Y\) is called continuous if the set \(f^{-1}(U)\) is open in \(X\) whenever \(U\) is open in \(Y\).
Similarly, a map \(f:U_X\rightarrow U_Y\) between open sets \(U_X \subset X\) and \(U_Y \subset Y\) is called continuous if for all \(V \subseteq U\) the set \(f^{-1}(V)\) is open in \(X\) whenever \(V\) is open in \(Y\).
For maps \(f:\mathbb{R}^n \rightarrow \mathbb{R}^m\) and using the standard topology, this agrees with the usual \(\epsilon \delta\) definition from analysis.
Definition 1.18. A one-to-one map \(f:X\rightarrow Y\) between topological spaces \(X\) and \(Y\) is called a homeomorphism if both \(f\) and \(f^{-1}\) are continuous.
These are the maps that preserve the structure of topological spaces.
We are now ready to define differentiable manifolds sitting inside \(\mathbb{R}^m\).
Definition 1.19. A subset \(X\) of \(\mathbb{R}^m\) with the induced topology is an \(n\)-dimensional differentiable manifold if the following conditions are met
\(X\) is covered by open sets \(U_i \subset X\) and homeomorphisms \(\phi_i\) that map \(U_i\) to an open subset \(\phi_i(U_i)\) of \(\mathbb{R}^n\). These are called coordinate charts or patches. The collection of patches \((U_i,\phi_i)\) is called an atlas.
We only need countably many \(U_i\) to cover all of \(X\).
The coordinate changes \(\phi_i\circ \phi_j^{-1}\) and their inverses \(\phi_j\circ \phi_i^{-1}\) are \(C^\infty\) (‘smooth’) in their domains, i.e. they are continuous one-to-one maps that have infinitely many continuous derivatives.
The property that we can cover \(X\) by open sets, each of which ’looks like’ an open set in \(\mathbb{R}^n\) poses certain restrictings on what \(X\) can look like. For example, \(X =\{ xy=0 \} \subset \mathbb{R}^2\), the union of two lines \(x=0\) and \(y=0\) meeting at the origin, is not a manifold.
Using the topology induced from \(\mathbb{R}^2\), there is no issue to define coordinates away from the point \((y,x)= (0,0)\), we just cut out a little branch and map it to an open set in \(\mathbb{R}\). However any open set \(U\) containing the point \((y,x)= (0,0)\) also contains (a small piece at least from) both branches. Hence these open sets look like a cross, which is radically different from any open subset of \(\mathbb{R}\). There cannot be any homeomorphism to an open subset of \(\mathbb{R}\) for such a \(U\).
We can make a slightly more detailed argument about why that is as follows: choose a point \(p_a\) on the line \(x=0\), and an open interval on \(xy=0\) which connects it to \((0,0)\), and then to a second point \(p_b\) on the line \(x=0\) beyond \((0,0)\). Using that we want a continuous map to \(\mathbb{R}\), this interval must be mapped to an open interval in \(\mathbb{R}\) and \((0,0)\) goes to \(0 \in \mathbb{R}\) (say). The image of the interval on one branch gives us an open interval in \(\mathbb{R}\). Its inverse image must be an open set as well, as we need our coordinate map to be a homeomorphism. The open sets containing \((0,0)\) all contain points on the other branch as well, so it needs to be mapped to our interval \(\subset \mathbb{R}\) as well. But this cannot be as we need a 1-1 map. Note that this problem disappears as soon as you either drop that our map and its inverse are continuous, or that it is 1-1.
1.11. Consider the sets of points in \(\mathbb{R}^2\) with coordinates \((x,y)\) defined implicitely by the following relations
\(y = x^3\)
\(xy = c\)
\(x^2+y^4 = 0\)
\(x \geq y\)
\(y^2 + x^3 - 3x - c = 0\)
Using the induced topology from \(\mathbb{R}^2\), decide in each case if it is a differentiable manifold. For relations containing a constant \(c\), decide for which value of \(c\) it is a differentiable manifold.
Example 1.3. Let us try to see how we can make a circle into a differentiable manifold. The first thing to notice is that we need to choose a topology \(\mathcal{O}\) on the set \(\{e^{i\psi}| \psi \in \mathbb{R}\}\). As \(U(1)\) naturally sits inside \(\mathbb{C}\simeq\mathbb{R}^2\), we can use this to define a topology. Using the induced topology, i.e. we declare that \[U \in U(1) \,\,\ \mbox{is open if and only if}\,\, U = U(1) \cap V\] for \(V\) open in \(\mathbb{R}^2\), turns \(U(1)\) into a topological space which is Hausdorff. The properties we need to check are inherited from \(\mathbb{R}^2\) being a Hausdorff topological manifold.
We already saw that we can write \[g = e^{i \psi}\] which however does not give us good coordinates: \(g(0)=g(2\pi) = 1\) for example. Note that it would be a bad idea to let \(\psi \in (-\pi .. \pi]\). This would not be an open set, so we cannot use this as a coordinate patch. Here is why coordinate patches are defined that way: even though this would give a one-to-one map to \(U(1)\), we still cannot do calculus on \(U(1)\) by doing calculus on \((-\pi .. \pi]\). A smooth function on \((-\pi .. \pi]\) would not even give us a continuous function on \(U(1)\) unless the limit of its value when going \(-\pi\) equals its value at \(\pi\).
Let us hence get rid of the multivaluedness of the coordinate \(\psi\) by restriciting its range such that \(\psi \in (-\pi .. \pi)\). Now we have a one-to-one map from the open interval \((-\pi .. \pi)\) to all of \(U(1)\) except the point \(g = -1\). To make sure we can cover all of \(U(1)\) by coordinates, we hence need a second patch. Let us set \(g = e^{i\pi + i \theta}\) and again let \(\theta \in (-\pi .. \pi)\). Now we can cover all of \(U(1)\) except \(g=1\). In summary, we have \[\begin{aligned} g& = e^{i\psi} \, , \,\,\, \phi \in (-\pi .. \pi) \\ g& = e^{i\pi + i \theta} \, , \,\,\, \theta \in (-\pi .. \pi) \end{aligned}\] However, we can now describe all points except \(1,-1\) in two ways using either \(\psi\) or \(\theta\). The coordinate changes are \[\begin{aligned} \psi &= \pi + \theta \, ,\,\, \theta < 0 \\ \psi &= -\pi + \theta \, ,\,\, \theta > 0 \end{aligned}\] Now we can use the open intervals described by \(\psi\) and \(\theta\) to construct functions on \(U(1)\), if the functions we consider agree on the overlap region \(g \neq 1,-1\). If it bothers you that the overlap region is composed of two disconnected pieces, it is not difficult to introduce more patches such that the overlap between each pair is either empty or a single interval. Try to write this down clearly for 3 or 4 patches. Note in particular that the choice above in which every patch covers all but a single point is somewhat special.
Example 1.4. \(^\ast\) [ex:implicitfct] Manifolds and the implicit function theorem.
We can describe a subspace \(X\) of \(\mathbb{R}^3\) given by the vanishing of a scalar function \(f(x,y,z)\): \[X: \{(x,y,z) \in \mathbb{R}^n | f(x,y,z) = 0 \} \, .\] as a manifold as follows. By the implicit function theorem (see AMV II) we can find a function \(g(x,y)\) such that \(f(x,y,g(x,y)) = 0\) in a neighboorhood \(V \subset \mathbb{R}^3\) of a point \(x_0,y_0,z_0\) where \(\partial f/\partial z (x_0,y_0,z_0)\neq 0\). Let us call \(\hat{U} = V \cap X\) and use \(\hat{x},\hat{y}\) as coordinates in \(\mathbb{R}^2\). For a point \(p = (x,y,z)\) in \(U\) we set \[\hat{\phi}: (x,y,z) \rightarrow (\hat{x},\hat{y})\] If \(\partial f/\partial z (x_0,y_0,z_0)=0\) but e.g. \(\partial f/\partial x (x_0,y_0,z_0) \neq 0\) we can use the same theorem for \(x = h(y,z)\) in a patch \(\tilde{U}\): \[\tilde{\phi}: (x,y,z) \rightarrow (\tilde{y},\tilde{z})\, .\] Recalling that \(z=g(x,y)\) and \(h= (y,z)\), the coordinate changes are given by \[\hat{\phi}\circ \tilde{\phi}^{-1} (\tilde{y},\tilde{z}) = \hat{\phi} (h(y,z),y,z) = (h(y,z),z)\] and \[\tilde{\phi}\circ \hat{\phi}^{-1} (\hat{x},\hat{y}) = \tilde{\phi} (x,y,g(x,y)) = (y,g(x,y))\, .\] A sketch of this situation is given in figure 1.8. Exercise: using the above strategy, find coordinate patches and coordinate changes on the two-sphere \(S^2\) where \(f = x^2+y^2+z^2\).
The only points \((x_0,y_0.z_0)\) at which this strategy fails is when \[\begin{aligned}
f(x_0,y_0,z_0) & = 0 \\ \partial f/\partial x (x_0,y_0,z_0)=\partial f/\partial y (x_0,y_0,z_0)=\partial f/\partial z (x_0,y_0,z_0) & = 0\, .
\end{aligned}\] Hence \(X\) can not be given the structure of a differentiable manifold at such points. These points are called singularities of the surface \(f(x,y,z)=0\).
Another concept which we encountered in the examples we studied was that of a tangent vector. The way we constructed these was simple and can be immediately generalized. First we need to introduce paths.
Definition 1.20. A path is a continuous map \(S\) from a open interval \((a,b) \subset \mathbb{R}\) to \(X\). Letting \(t = (a,b) \subset \mathbb{R}\) we can write this as \[S: t \mapsto \boldsymbol{q}(t) \in X \, ,\] where \(\boldsymbol{q}(t)\) is a description of our path using the coordinates on \(\mathbb{R}^m \supset X\). We furthermore demand that \(\boldsymbol{q}(t)\) is a differentiable function from \(S\) to \(\mathbb{R}^m\).
Definition 1.21. A tangent vector at \(\boldsymbol{p}\) is the derivative of a path passing through \(\boldsymbol{p}\) with respect to its parameter \(t\), evaluated at \(\boldsymbol{p}\). Assuming that \(t_0\) is such that \(\boldsymbol{q}(t_0) = \boldsymbol{p}\) we can write \[T_p(S) := \left.\frac{\partial\boldsymbol{q}(t)}{\partial t}\right|_{t_0} \, .\]
In the above definitions, we have used that the manifold \(X\) in question is realized as a submanifold of \(\mathbb{R}^m\), which allows us to write tangent vectors conveniently as sitting inside \(\mathbb{R}^m\). Given the definition of a manifold with coordinate charts we could also use the coordinates obtained this way to describe tangent vectors. While this is the superior perspective for more abstract and far-reaching applications to the notion of tangent vectors and, more generally, tensors, we’ll stick to this more pedestrian approach.
Example 1.5. Tangent vectors of \(SU(2)\) at \(p = \mathds{1}\) have the form \(i \sum_j \alpha_j \sigma_j\) with \(\sigma_j\) the Pauli matrices and \(\alpha_j \in \mathbb{R}\). This can be seen by writing down a path \[S: t \rightarrow e^{i t \alpha_j \sigma_j} \, .\] which passes through \(\mathds{1}\) at \(t=0\).
We now have \[T_\mathds{1}(S) = \left. \frac{\partial }{\partial t} e^{i t \alpha_j \sigma_j} \right|_{t = 0} = i \alpha_j \sigma_j\, .\]
Proposition 1.10. The tangent vectors \(T_p(S)\) at a point \(\boldsymbol{p}\) form a real \(n\)-dimensional vector space \(T_pX\), the tangent space at \(p\). : To show this, we need to check that (i) both multiples \(c\,\, T(S)\) for \(v \in \mathbb{R}\) and (ii) sums \(T(S)+T(S')\) also satisfy our definition of what a tangent vector is. Finally, we need to show that (iii) this vector space has a basis of dimension \(n\). We will do this in a local patch where we can choose coordinates and denote the image of \(p\) by \(\boldsymbol{x}_0\).
Given a path \(S: x_i(t)\) that defines a tangent vector \(\left[T_p(S)\right]_i = \partial x_i(t)/\partial t|_{t_0}\) at \(p\), i.e. \(\phi(p) = \boldsymbol{x}_0\) we can consider the path \(S_c\) defined by \(\boldsymbol{x}(c t)\). This path also runs through \(p\) and the components of its tangent vector are \[\left[T_p(S_c)\right]_i = \left.\frac{\partial x_i'(ct)}{\partial t}\right|_{\boldsymbol{x}_0} = c \left.\frac{\partial x_i'(t)}{\partial t}\right|_{\boldsymbol{x}_0} = c \left[T_p(S)\right]_i.\] For any tangent vector, there is hence another one with components that are a rescaled by a real number \(c\).
Given two tangent vectors at \(p\) associated to paths \(S\) (with coords \(\boldsymbol{x}(t)\)) and \(S'\) (with coords \(\boldsymbol{x}'(t)\)), we can form the following path \(S''\) (again in local coords) \[\boldsymbol{x}''(t) = \frac{1}{2}\left( \boldsymbol{x}(2t) + \boldsymbol{x}'(2t) \right) \,\] As \(\boldsymbol{x}(t)\) and \(\boldsymbol{x}'(t)\) both pass through \(x_0\), \(\boldsymbol{x}''(t)\) does so as well. At \(x_0\) we can compute \[\left[T_p(S'')\right]_i = \left. \frac{\partial x_i''(t)}{\partial t}\right|_{x_0} = \frac{1}{2}\left(\left. \frac{\partial x_i(2t)}{\partial t}\right|_{x_0} + \left.\frac{\partial x_i'(2t)}{\partial t}\right|_{x_0}\right) = \left[T_p(S')\right]_i + \left[T_p(S'')\right]_i\]
To see this, note that we can choose paths that have \(x_i(t) = t\) and all other components vanishing. For such paths \[\left. \frac{\partial\boldsymbol{x}(t)}{\partial t}\right|_{x_0} = (0,\cdots, 0,1,0,\cdots,0) \,,\] with an entry only at the \(i\)th component. There are hence \(n\) linearly independent elements.
1.12.
For a path \((x_1,x_2) = (\cos t , \sin t)\) on a circle, find the tangent vector at \(t=t_0\).
Let \(t \in [-1,1]\). Find the tangent vector of \(SO(3)\) associated to the path \[t \mapsto \begin{pmatrix} \cos t & \sin t & 0 \\ -\sin t & \cos t & 0 \\ 0 & 0 & 1 \end{pmatrix}\,\] at \(t=0\).
Let \(P\) be the surface given implicitely by \(x^2+y^2+z =0\) in \(\mathbb{R}^3\). Find the tangent space at the point at \((x,y,z)=(0,0,0)\).
1.13. Consider \(y=f(x)\) sitting inside \(\mathbb{R}^2\) with coordinates \(x,y\) and convice yourself that the notion of tangent introduced here is the same as the usual one.
1.14. \(O(1,1)\) are the real \(2 \times 2\) matrices \(O\) which leave the bilinear form \(x_1^2 - x_2^2\) invariant when acting on \(\boldsymbol{x} = (x_1,x_2)\) as \[\boldsymbol{x} \rightarrow O \boldsymbol{x}\, .\]
Show that \(O(1,1)\) is a group using matrix multiplication.
Find the general form of elements of \(O(1,1)\).
Explain why \(O(1,1)\) is a differentiable manifold and write down coordinate charts.
Find the tangent space of \(O(1,1)\) at the identity element.
We are now ready to formally welcome Lie groups to these lectures. The idea here is simple: Lie groups unite the structures of groups and differentiable manifolds in a compatible way:
Definition 1.22. A Lie group is a group that is also a differentiable manifold such that the group operations \[\begin{aligned} \circ :\,& G \times G \rightarrow G \hspace{1cm} &(x,y) \rightarrow x \circ y \\ ^{-1}:\,& G \rightarrow G \hspace{1cm} &x \rightarrow x^{-1} \end{aligned}\] are differentiable maps.
Example 1.6. The group \(\mathbb{C}^* \equiv C \setminus \{0\}\) is a Lie group under multiplication. The map \[(x,y) \rightarrow xy\] is a differentiable map from \(\mathbb{C}^* \times \mathbb{C}^*\) to \(\mathbb{C}^*\), and \(x \rightarrow 1/x\) is a differentiable map from \(\mathbb{C}^*\) to \(\mathbb{C}^*\). This is an example of an abelian group.
Proposition 1.11. The group \(GL(n,\mathbb{R})\) of real invertible \(n \times n\) matrices is a Lie group under matrix multiplication. Note that these naturally sit inside \(\mathbb{R}^{m}\) with \(m=n^2\).
: Recall the definition (from AMV II): a map is differentiable if it can locally be approximated by a linear map. Let us see if this is true for matrix multiplication. For two matrices \(P,Q \in GL(n,\mathbb{R})\), the group operation is the map \[(P,Q) \rightarrow PQ \, .\] To examine if this can be approximated by a linear map we change \(P\) to \(P+\Delta_P\) and \(Q\) to \(Q+\Delta_Q\): \[\begin{aligned} (P+\epsilon\Delta_P,Q+\epsilon\Delta_Q) \rightarrow (P+\epsilon\Delta_P) (Q + \epsilon\Delta_Q) &= PQ + P \epsilon\Delta_Q + \epsilon\Delta_P Q + \epsilon^2 \Delta_P \delta_Q \\ &\simeq PQ + \epsilon (P \Delta_Q + \Delta_P Q) \end{aligned}\] which is manifestly linear in both \(\Delta_P\) and \(\Delta_Q\). Cramer’s rule for constructing inverse matrices similarly shows that \(P \rightarrow P^{-1}\) is differentiable \(\square\)
Theorem 1.3. A closed subgroup \(H\) of \(GL(n,\mathbb{R})\) is again a Lie group.
REMARK: The word ‘closed’ does not refer to the group operation on elements being closed in \(H\) (this must be true for \(H\) to be a subgroup anyway) but is meant in the topological sense: \(H\) is a closed subset of \(GL(n,\mathbb{R})\). : An elementary proof of this result can be found in .
Definition 1.23. Lie groups that are closed subgroups of \(GL(n,\mathbb{R})\) are called matrix Lie groups.
REMARK: Not all Lie groups are of this type, but in this course we will only study those. For the Lie groups that are most interesting to us, this only excludes a handfull of cases.
We can hence get our hands on a lot of examples by finding closed subgroups of \(GL(n,\mathbb{R})\). We start with
Definition 1.24. The orthogonal group \(\mathbf{O(n)}\) is the group of real \(n\times n\) matrices \(g\) such that \[g^T g = \mathds{1}\, .\] The special orthogonal group \(\mathbf{SO(n)}\) is the subgroup of matrices in \(O(n)\) that have determinant \(\det g = 1\).
REMARK: The group \(O(n)\) consists of those invertible maps acting on a real vector space \(\mathbb{R}^n\) such that the canonical inner form stays invariant: \[\boldsymbol{x} \cdot \boldsymbol{y} \rightarrow \boldsymbol{x}' \cdot \boldsymbol{y}' = \boldsymbol{x}^T \,g^T g\, \boldsymbol{y} = \boldsymbol{x} \cdot \boldsymbol{y} \, .\]
Corollary 1.1. \(O(n)\) and \(SO(n)\) are Lie groups.
: One can quickly check that these are indeed groups, they are obviously subgroups of \(GL(n,\mathbb{R})\). The conditions that \(g\) has to satisfy in order to be in \(O(n)\) only hold true on the closed subset where the defining relation \(g^Tg=\mathds{1}\) holds true. To be more precise, for any matrix that does not satisfy these equations, we can find a little ball \(GL(n,\mathbb{R})\) such that \(g^T g \neq \mathds{1}\) for every member of this ball. The complement of \(O(n)\) in \(GL(n,\mathbb{R})\) is hence open, which means that \(O(n)\) is closed.
For \(SO(n)\) we can repeat a similar argument.
1.15. Make the argument above precise, i.e. show that for every \(g \in GL(n,\mathbb{R}) \setminus O(n)\), i.e. \(g \in GL(n,\mathbb{R})\) such that \(g^T g \neq \mathds{1}\), there is an open set \(U_g\) containing \(g\) such that \(U_g\) is entirely contained in \(GL(n,\mathbb{R}) \setminus O(n)\).
hint: \(GL(n,\mathbb{R})\) inherits its topology from the vector space \(V_{n \times n}\) of real \(n \times n\) matrices, which is isomorphic to \(\mathbb{R}^{n^2}\): the \(n^2\) entries of such a matrix are just the components of a vector in \(\mathbb{R}^{n^2}\) from this perspective. We can hence describe the open ball of radius \(r\) around a matrix \(M\) with components \(M_{ij}\) as \[B_r(M) \left\{ \left. N \in V_{n \times n} \right| \sum_{ij} \left( N_{ij} -M_{ij} \right)^2 < r \right\} \, .\]
REMARK: For \(g \in O(n)\) it follows that \(\det ( g^T g ) = (\det g)^2 = \det \mathds{1}= 1\). As \(g\) is a real matrix, we hence have \(\det g = \pm 1\). The space of such matrices is hence disjoint with two components, the one that contains the identify (which has \(\det \mathds{1}=1\)) is called \(SO(n)\) and is a subgroup. The other component is not a subgroup.
REMARK: Conditions such as \(g^T g = \mathds{1}\) and \(\det g = 1\) are typically called ‘closed conditions’ as the sets they define are closed sets in the vector space of all matrices.
1.16. \(GL(n,\mathbb{C})\) is the group of invertible complex \(n \times n\) matrices. Show that \(GL(n,\mathbb{C})\) is a Lie group.
Definition 1.25. The unitary group \(\mathbf{U(n)}\) is the group of complex \(n\times n\) matrices \(g\) such that \(U^\dagger U = \mathds{1}\). The special unitary group \(\mathbf{SU(n)}\) is the subgroup of matrices in \(U(n)\) that have determinant \(\det g = 1\).
REMARK: The group \(U(n)\) consists of those invertible maps acting on a complex vector space \(\mathbb{C}^n\) such that the canonical inner form stays invariant: \[\bar{\boldsymbol{x}} \cdot \boldsymbol{y} \rightarrow \bar{\boldsymbol{x}}' \cdot \boldsymbol{y}' = \bar{\boldsymbol{x}} \,\bar{g}^T g\, \boldsymbol{y} = \bar{\boldsymbol{x}} g^\dagger g \boldsymbol{y} = \bar{\boldsymbol{x}} \cdot \boldsymbol{y} \, .\]
Corollary 1.2. The unitary and special unitary groups are Lie groups.
: These are both closed subgroups of \(GL(n,\mathbb{R})\) by identifying \(\mathbb{C}\) with \(\mathbb{R}^2\). \(\square\)
The idea of a Lie algebra is to formalize the notion of infinitesimal transformation. We first define Lie algebras abstractly.
Definition 1.26. A Lie algebra \(\mathfrak{g}\) is a vector space together with a bilinear map (‘Lie bracket’) \[: \mathfrak{g} \times \mathfrak{g} \rightarrow \mathfrak{g}\] that is antisymmetric \([x,y] = -[y,x]\), and satisfies the Jacobi identity: \[\begin{equation} \label{eq:Jacobi} [x,[y,z]] + [y,[z,x]] + [z,[x,y]] = 0 \, . \end{equation}\] for all \(x,y,z \in \mathfrak{g}.\)
REMARK: This definition does not say if we should think of \(\mathfrak{g}\) as a real or complex vector space, so one and the same algebra can have different ‘real’ or ‘complex’ forms (or even forms over other fields).
Theorem 1.4. Every Lie group comes equipped with a Lie algebra which is equal to its tangent space at the identity element. \[\mathfrak{g} = T_{\mathds{1}}G \, .\]
: This is already a vector space by construction. For matrix Lie groups, we can simply take the bilinear form \([,]\) to be the commutator, which clearly satisfies the Jacobi identity. We will show in lemma Lemma 1.1 below that the commutator of two Lie algebra elements indeed returns a Lie algebra element. The general case requires some more technology we have not introduced, see for details.
Corollary 1.3. The dimension of the Lie algebra (as a vector space) is equal to the dimension of its Lie group (as a differentiable manifold).
1.17. Find the dimension of the group \(SO(n)\) by finding the dimension of its Lie algebra.
Example 1.7. The Lie algebra \(\mathfrak{u}(1)\) of \(U(1)\) are the purely imaginary numbers and \([\gamma,\gamma']= 0\) for all \(\gamma,\gamma' \in \mathfrak{u}(1)\).
Example 1.8. The Lie algebra of \(\mathbb{C}^*\) are the complex numbers and \([\gamma,\gamma']= 0\) for all \(\gamma,\gamma'\) in the Lie algebra \(\mathbb{c}^*\) of \(\mathbb{C}^*\).
Example 1.9. The Lie algebra \(\mathfrak{su}(n)\) of \(SU(n)\) was found for the case \(n=2\) in Section 1.1.3. It is a real vector space with basis the complex \(n \times n\) matrices \(\gamma\) such that \(\gamma^\dagger = - \gamma\) and \(tr\gamma = 0\). We can use \(i\) times the three Pauli matrices \(\sigma_j\) as basis vectors for \(n=2\). Note that while these have complex entries, this is a real vector space !
Example 1.10. As we have seen before, the group \(SU(2)\) is a double cover of \(SO(3)\). This means that a small neighboorhood of the identity \(\in SU(2)\) is isomorphic to a small neighboorhood of the identity \(\in SO(3)\), so that these two groups have isomorphic Lie algebras. You can also check this explicitely by working out the Lie algebra of \(SO(3)\), see problem classes 2 and 3. We can hence have different groups that have the same Lie algebras.
Definition 1.27. For any Lie algebra, we can choose a basis \(\{t_a\}_{a=1}^{\dim \mathfrak{g}}\) of so called generators \(t_a\). In this basis the Lie bracket reads \[\begin{equation} \label{eq:structure_constants} [t_a, t_b] = f_{ab}{}^c t_c \qquad (a,b,c=1,\dots,\dim\mathfrak{g}) \end{equation}\] where the \(f_{ab}{}^c\) are called structure constants, which express the component of the Lie bracket \([t_a,t_b]\) along the generator \(t_c\). Repeated indices are summed over.
REMARK: While there are reasons for putting one index up in the expression \(f_{ab}{}^c\), you can completely ignore this for now. Just think about \(f_{ab}{}^c\) as producing a number for any \(a,b,c\) and think about the positioning of the indices as a pure convention.
The Jacobi identity,\(\eqref{eq:Jacobi}\), implies that \[\begin{equation} \label{Jacobi2} f_{ab}{}^d f_{dc}{}^e + f_{bc}{}^d f_{da}{}^e + f_{ca}{}^d f_{db}{}^e =0 \end{equation}\] for the structure constants.
Example 1.11. A basis of the Lie algebra \(\mathfrak{su}(2)\) of \(SU(2)\) is given by \(t_a = i \sigma_a\) for \(\sigma_a\) the Pauli matrices. We can work out \[= - [ \sigma_a, \sigma_b] = - 2 \epsilon_{abc} i \sigma_c\] so that we conclude that \(f_{ab}{}^c = - 2\epsilon_{abc}\) for \(\mathfrak{su}(2)\).
Definition 1.28. The exponential map is a map \(\exp: \mathfrak{g} \rightarrow G\) sending \(\gamma \in \mathfrak{g}\) to \[e^\gamma := \sum_k \frac{\gamma^k}{k!}\, \in G \, .\]
REMARK: For every matrix \(\gamma \in \mathfrak{g}\), one can show that the above indeed converges and that it is indeed in \(G\) if \(\gamma\) is in \(\mathfrak{g}\), see or the non-examinable box below for a sketch of a proof.
More on the exponential map\(^\ast\)
Let us explain how the exponential map comes about by taking a slightly more geometric perspective. Elements of the Lie algebra are associated with elements of the tangent space of \(G\) at the identity and we can think of both Lie algebra elements and Lie group elements as matrices, which can be multiplied. For any \(\gamma \in T_\mathds{1}(G)\), it turns out that \[L(\gamma)|_g = g \gamma \, .\] is a tangent vector at \(g\) for any point \(g\in G\). This defines what is called a vector field \(L(\gamma)\), i.e. something that attaches a tangent vector to any point on \(G\). The vector fields we have just defined are called left-invariant vector fields and have the nice property that \[g' L(\gamma)|_g = g'g \gamma = L(\gamma)|_{g'g} \, .\] Now whats important about vector fields is that one can flow along with them. E.g. flowing out from the identity is done by solving the differential equation \[\frac{\partial g(t)}{\partial t} = L(\gamma)|_{g(t)} = g(t) \gamma \, .\] The solution to this flow is a path \[g(t) = e^{t\gamma} \, ,\] and the fact that we constructed this as a flow shows that the exponential actually lands in the group.
You can also understand convergence of the series we used to define the exponential by observing that any power of \(\gamma\) with produce a matrix with entries that are polynomials in the components of \(\gamma\). For \(k \rightarrow \infty\) the \(k!\) then grows faster at some point which makes the sum converge to a finite value.
1.18. Consider the set \(G\) of matrices \[G = \left\{\begin{pmatrix} a & b \\ 0 & c \end{pmatrix}| a,b,c \in \mathbb{R}, ac \neq 0 \right\}\]
Show that \(G\) is a Lie group using matrix multiplication as the group composition.
Find the Lie algebra \(\mathfrak{g}\) of \(G\).
Compute the exponentials of the basis elements of the Lie algebra you have found.
Lemma 1.1. Let \(G\) be a Lie group and \(\mathfrak{g}\) be its Lie algebra. We then have
\(g \gamma g^{-1}\, \in \, \mathfrak{g}\) for all \(\gamma \in \mathfrak{g}\) and \(g \in G\).
\([\gamma, \delta] \in \mathfrak{g}\) for all \(\gamma,\delta \in \mathfrak{g}\)
: To see the first part, let’s try to construct a path that gives us \(g \gamma g^{-1}\) as a tangent vector upon differentiating. We could try \[e^{t g \gamma g^{-1}} = \sum_k \frac{(g t \gamma g^{-1})^k}{k!} = g \left( \sum_k \frac{(t \gamma)^k}{k!} \right) g^{-1} = g e^{t \gamma}g^{-1}\, .\] As all of the factors on the rhs are in \(G\), it follows that \(g e^{t \gamma}g^{-1} \in G\). As \(g e^{t \gamma}g^{-1}\) is a path in \(G\) that passes through \(\mathds{1}\) at \(t=0\) and \[\left.\frac{\partial}{\partial t} g e^{t \gamma}g^{-1}\right|_{t=0} = g \gamma g^{-1}\] it follows that \(g \gamma g^{-1} \in \mathfrak{g}\). Although we are talking about matrix Lie algebras in this course where we can just multiply elements \(g \in G\) with elements \(\gamma \in \mathfrak{g}\), you might feel a little uneasy about just multiplying them. In this case, you can read the above statement as a definition of what \(g \gamma g^{-1}\) is: it is the Lie algebra element you get from the path \(g e^{t \gamma}g^{-1}\).
For the second part, consider \(e^{t\gamma } \delta e^{-t \gamma}\) for \(\delta \in \mathfrak{g}\). It follows from i) that this is in \(\mathfrak{g}\) for all \(t\). As a tangent space, the Lie algebra is in particular an \(n\)-dimensional vector space which sits inside the vector space of \(n \times n\) matrices. As such it is closed under taking limits. Hence \[\lim_{t \rightarrow 0} ( e^{t\gamma } \delta e^{-t \gamma} - \delta) /t = \gamma \delta - \delta \gamma = [\gamma,\delta]\, .\] is in \(\mathfrak{g}\). \(\square\)
A natural question about the exponential map concerns its injectivity and surjectivity. Clearly, it cannot be injective for every group \(G\). We have already seen it is not for \(U(1)\) and \(SU(2)\), which in turn had to be like that, because this is how these groups become topologically non-trivial.
It can also not always be surjective as the following simple counter-example shows.
Example 1.12. Elements of the Lie algebra \(\mathfrak{sl}(2,\mathbb{R})\) of \(SL(2,\mathbb{R})\) must obey \[e^{\gamma} = g\] for \(g \in SL(2,\mathbb{R})\). This implies that \(\gamma\) is real by taking complex conjugation. Furthermore \[\det e^{\gamma} = e^{tr\gamma} = 1\] implies that \(\gamma\) is traceless. Finally, \(e^{\gamma}\) always maps to \(SL(2,\mathbb{R})\) if the above conditions are met: the inverse is simply \(e^{-\gamma}\). The Lie algebra \(\mathfrak{sl}(2,\mathbb{R})\) hence contains traceless real matrices. Now consider the matrix \[g = \begin{pmatrix} -4 & 0 \\ 0 & -1/4 \end{pmatrix} \in SL(2,\mathbb{R})\] We claim there is no element \(\gamma \in \mathfrak{sl}(2,\mathbb{R})\) s.t. \(e^\gamma=g\). : If such an element exists, we can immediately write down a square root of \(g\) as \(\sqrt{g} = e^{\tfrac12\gamma}\). But as we show now, no such square root (in \(SL(2,\mathbb{R})\)) exists. The eigenvalues of \(g\) are \(4\) and \(1/4\), so there is one eigenvalue of \(\sqrt{g}\) that is \(\pm 2i\) and another one that is \(\pm \tfrac12i\). However, for \(\sqrt{g}\) to be in \(SL(2,\mathbb{R})\) it must be a real matrix, so that the eigenvalues are given by an equation of the form \(\lambda^2 + p \lambda + q =0\) with \(p,q\) real. Hence \[\lambda_\pm = - p/2 \pm \sqrt{(p/2)^2 -q}\] so that there are two eigenvalues which are either real or complex conjugates of each other. However \(\pm 2i\) is not real and never the complex conjugate of \(\pm \tfrac12i\). Hence there is no \(\sqrt{g}\) such that \((\sqrt{g})^2 = g\). But this implies that there cannot be a \(\gamma\) with \(g = e^\gamma\) as could write down such a \(\sqrt{g}\) otherwise. \(\square\)
Although it does not hold in general, there are favourable circumstances where the exponential map is surjective. We have already seen this for \(SU(2)\) and \(SO(3)\) already (see also exercises). To spell out the general result clearly, we need one more
Definition 1.29. A subset of \(\mathbb{R}^n\) is called compact if it is closed and bounded, i.e. one can find a ball of finite size that entirely contains it.
Example 1.13. \(U(1)\) and \(SU(2)\) are both compact.
Fact 1.1. The orthogonal and unitary groups \(O(n), SO(n), U(n), SU(n)\) are all compact.
Theorem 1.5. If \(G\) is a connected, compact matrix Lie group, the exponential map for \(G\) is surjective.
: We only give a sketch, details are found in . The main idea is to observe that the exponential map is surjective for \(U(1)\) and then try to replicate this setting. The crucial step is to show for any compact matrix Lie group \(G\) that every element \(g \in G\) lies inside some \(U(1)\) subgroup of \(G\). Once this ‘torus theorem’ is established, we can simply use the generators of the \(U(1)\) to reach \(g\) by the exponential map. \(\square\)
REMARK: It is not true that the exponential map is surjective for compact groups only, e.g. \(\mathbb{C}^*\) is not compact but we can write every element in \(\mathbb{C}^*\) as \(e^{z}\) for some complex number \(z\).
Definition 1.30. Lie algebras of compact Lie groups are called compact Lie algebras.
Definition 1.31. An ideal of a Lie algebra is a subset \(I\subset \mathfrak{g}\) such that \([\iota,x] \subset I\) for all \(\iota \in I\) and all \(x \in \mathfrak{g}\).
Definition 1.32. A simple Lie algebra is a Lie algebra that has no non-trivial ideals.
Theorem 1.6. Any compact Lie algebra can be decomposed into the direct sum of \(u(1)\) Lie algebras and of simple Lie algebras: \[\begin{equation} \label{cpct_semisimple} \mathfrak{g}= u(1) \oplus \dots \oplus u(1) \oplus \mathfrak{g}_1 \oplus \dots \oplus \mathfrak{g}_l~. \end{equation}\] : .
Simple Lie algebras were in turn classified by Killing and Cartan, and this classification was put in its definitive form by Dynkin. The matrix Lie algebras form four infinite series, the so called classical Lie algebras \(A_n=su(n+1)\), \(B_n=so(2n+1)\), \(C_n=usp(2n)\), \(D_n=so(2n)\). But there are a few more exceptional Lie algebras which are not of matrix type: \(E_6\), \(E_7\), \(E_8\), \(F_4\), \(G_2\). See for a down-to-earth introduction to the subject and for a more advanced perspective. The structure of these algebras boils down to the following pictures, called Dynkin diagrams, which determine the structure constants in a suitable basis of \(\mathfrak{g}\).
Although we have introduced groups in a rather concrete form as subspaces of \(GL(n,\mathbb{R})\) or \(GL(n,\mathbb{C})\), we can also drop this association and just keep their abstract structure. Taking this point of view, we will explore how groups can act on vector spaces, and how their structure prefers some vector spaces over others, in this section. Such questions belong to a subject called representation theory, and this is a vast field that we will only scratch the surface of. There is dedicated lecture, MATH4241: Representation Theory IV that gives a more detailed and general account of this subject. Here, we will take a practical approach and mostly only explore those aspects of direct use to us.
Definition 2.1. For a vector space \(V\), we will denote the group of invertible linear maps acting on \(V\) by \(GL(V)\). Hence for \(V = \mathbb{R}^n\), \(GL(V) = GL(n, \mathbb{R})\) and for \(V = \mathbb{C}^n\), \(GL(V) = GL(n, \mathbb{C})\).
Definition 2.2. A representation of a group \(G\) is a group homomorphism \(r: G \rightarrow GL(V)\), where \(V\) is a finite-dimensional (real or complex) vector space.
REMARK: What this means is that we ‘represent’ the group \(G\) by matrices in \(GL(V)\). Given \(r\), we can hence act with the group \(G\) on vectors in \(V\) using linear maps. This is often expressed as
‘\(G\) acts on (elements of) \(V\) in the representation \(r\)’
’elements of \(V\) transform under \(G\) in the representation \(r\)’
‘elements of \(V\) live in the representation \(r\)’
Although a representation is defined as the map \(r\) which takes elements of \(G\) to elements of \(GL(V)\), it is quite common to speak about the elements in \(V\) that the image of \(G\) in \(GL(V)\) acts on as a ‘representation’ in the physics literature.
When we define a representation we have essentially two options, and we will see examples of both:
We explicitely write some matrices \(M(g)\) in \(GL(V)\) for any \(g \in G\).
We describe how \(M(g)\) acts on a vector in \(V\) without explicitely giving \(M(g)\).
REMARK: We only ask a representation to be a homomorphism, i.e. \(r\) need not be injective. This means some aspects of \(G\) can get lost in a representation.
Given that we defined \(SU(n)\) as a group of matrices acting on \(\mathbb{C}^n\), we can just use this as an example of a representation.
Definition 2.3. The defining representation (also called fundamental representation) of \(SU(n)\) is the representation where the matrices defining this groups are allowed to be themselves: \(r(g)=g\). The fundamental representation of \(SU(n)\) is complex \(n\)-dimensional and is denoted by ‘the \(\mathbf{n}\) of \(SU(n)\)’.
REMARK: We may construct another representation of \(SU(n)\) that is also \(n\)-dimensional and is called ‘the \(\mathbf{\bar{n}}\) of \(SU(n)\)’ by acting with \(\bar{g}\) instead of \(g\). This representation is isomorphic to the \(\mathbf{n}\) representation.
Definition 2.4. The defining representation of \(SO(n)\) is the representation where the matrices defining this groups are allowed to be themselves: \(r(g)=g\). The defining representation of \(SO(n)\) is real \(n\)-dimensional and is denoted by ‘the \(\mathbf{n}\) of \(SO(n)\)’.
Even though matrix groups sit inside vector spaces, they themselves are not vector spaces, so a group action on itself is not a representation (in general). Here is an example of a non-trivial representation that exists for all Lie groups. It exploits the fact that Lie groups come equipped with an intrinsic vector space: each one of them has its own Lie algebra, which is a vector space.
Definition 2.5. The adjoint representation is a map \(Ad: G \rightarrow GL(\mathfrak{g})\) which sends a group element \(g\) to the \(GL(\mathfrak{g})\) element \(Ad(g)\), which is defined by its action on \(\mathfrak{g}\): \[Ad(g): \gamma \mapsto g \gamma g^{-1} \, .\]
REMARK: Don’t get confused, there are two maps we need to distinguish. There is the map \(Ad(g)\) sending \(g\) to the \(GL(\mathfrak{g})\) element \(Ad(g)\), but this itself is a map acting on the vector space \(\mathfrak{g}\). The definition above works by telling us how \(Ad(g)\) acts on any \(\gamma \in \mathfrak{g}\) f given \(g\), i.e. \(\gamma \rightarrow g \gamma g^{-1}\) is the linear map in \(GL(\mathfrak{g})\) that is the image of \(g \in G\) under the representation \(Ad\). That this is well-defined follows directly from theorem Lemma 1.1 part i) where we showed that \(g \gamma g^{-1} \in \mathfrak{g}\). As it is also a linear non-degenerate map, it is hence in \(GL(\mathfrak{g})\). Taken as a vector space, the Lie algebra \(\mathfrak{g}\) of a Lie group \(G\) is hence acted on by a representation of \(G\).
Example 2.1. As \(e^{i\phi} i\theta e^{-i\phi} = i \theta\) for all \(g \in U(1)\), the adjoint representation of \(U(1)\) is trivial.
Example 2.2. The adjoint representation of \(SU(2)\) is precisely the map we used to map it to \(SO(3)\). Recall that it acts on its own Lie algebra here, which is a real three-dimensional vector space, so this makes perfect sense.
Example 2.3. The adjoint representation of \(SU(3)\) acts on a real eight-dimensional vector space: the matrices in \(\mathfrak{su}(3)\) are traceless anti-hermitian \(3\times 3\) matrices. Their number of real components is \(2\) from the diagonal plus \(3 \times 2\) from off-diagonal terms. This is the reason there are eight different gluons in strong interactions.
2.1. Writing \(\mathbb{R}^3\) as \[M_{\boldsymbol{v}} = \begin{pmatrix} v_3 & v_1 - i v_2 \\ v_1 + i v_2 & -v_3 \end{pmatrix}\, .\] we considered the action of \(g \in SU(2)\) on \(\mathbb{R}^3\) defined by \[F(g): M_{\boldsymbol{v}} \mapsto g M_{\boldsymbol{v}}g^\dagger\] in the lectures. Show that this is a representation, and that this representation is the adjoint representation of \(SU(2)\).
2.2. Let \({\bf q} \in \mathbb{C}^n\) be acted on in the fundamental representation of \(SU(n)\) and \(\gamma\) in the adjoint representation of \(SU(n)\) (this is often expressed as \({\bf q}\) ‘lives’ in the fundamental and \(\gamma\) ‘lives’ in the adjoint of \(SU(n)\).)
By acting with \(SU(n)\) simultaneously on \(\gamma\) and \(\boldsymbol{q}\), describe the action of \(SU(n)\) on
\({\bf v}= \gamma {\bf q}\)
\(\bar{\bf q}\)
A matrix \(Q\) with components \(Q_{ij} = q_i q_j\)
and decide in each case if this defines a representation.
2.3. Let \(g \in SO(3)\) be given by \[g = \begin{pmatrix} \cos \phi & \sin{\phi}& 0 \\ - \sin(\phi) & \cos(\phi) & 0 \\ 0 & 0 &1 \end{pmatrix}\, .\] Find the action of \(g\) in the adjoint representation and describe it using a basis of the vector space \(\mathfrak{so}(3)\). As \(\mathfrak{so}(3)\) is the same as \(\mathbb{R}^3\), we can describe its elements as column vectors after having chosen a basis. Using the basis you have chosen, write the adjoint action as a \(3 \times 3\) matrix acting on a column vector.
Definition 2.6. A representation is called faithful if \(r\) is injective.
Example 2.4. We have seen examples of \(SU(2)\) acting (faithfully) on \(\mathbb{C}^2\) in Section 1.1.3 and (non-faithfully) on \(\mathbb{R}^3\) in example 1.1.4.
Example 2.5. We can act with \(SU(2)\) (faithfully) on \(\mathbb{C}^{4}\) by using the block-diagonal representation \[r:g \rightarrow \begin{pmatrix} g & 0\\ 0 & g \end{pmatrix}\]
Clearly this seems a bit redundant and we want to distinguish between such cases and those that truly give us something new. One way to phrase this is in terms of invariant subspaces.
Definition 2.7. A subspace \(W \subseteq V\) is called invariant if \(r(g)w \in W\) for all \(g \in G\) and all \(w \in W\).
Example 2.6. Coming back to the example Example 2.5, we can decompose \(V = \mathbb{C}^2 \oplus \mathbb{C}^2 \oplus \mathbb{C}^2 \oplus \cdots\) and every one of those summands is an invariant subspace.
Definition 2.8. A representation \(r:G\rightarrow GL(V)\) is irreducible if the only invariant subspaces are \(V\) and \(\{0\}\). Otherwise it is called reducible.
2.4. Let \(G\) be a Lie group and \(H\) be a subgroup of \(G\) that is also a Lie group.
Explain why any representation \(r(G)\) of \(G\) also gives us a representation \(r(H)\) of \(H\).
Let’s assume \(r(G)\) is irreducible. Can you think of an example where the representation \(r(H)\) is reducible? Can you think of an example where the representation \(r(H)\) is irreducible?
2.5. Let \(P\) be a homogeneous polynomial in two complex variables \(z_1\) and \(z_2\) of degree \(d\), i.e. we can write \[P(\boldsymbol{z}) = \sum_{k=0}^d \alpha_k z_1^k z_2^{d-k}\] for complex numbers \(\alpha_k\).
There is a natural action of \(SU(2)\) on \(\boldsymbol{z}= (z_1,z_2)\), which is just \[\boldsymbol{z} \mapsto g \boldsymbol{z}\, .\]
For a polynomial \(P(\boldsymbol{z})\), we can then define an action by \(SU(2)\) as \[r_d(g): P(\boldsymbol{z}) \mapsto P(g^{-1 }\boldsymbol{z}) \, .\] Show that this defines a representation of \(SU(2)\).
[remark: in the above formula, \(g^{-1 }\) does not act on the argument of \(P\) but on \(\boldsymbol{z}\), i.e. the action on \(P(A\boldsymbol{z})\) for a \(2 \times 2\) matrix \(A\) would be \(r_d(g): P(A\boldsymbol{z}) \mapsto P(Ag^{-1 }\boldsymbol{z})\). ]
Example 2.7. Mapping all \(\mathfrak{g}\in G\) to \(\mathds{1}\in GL(V)\) is a group homomorphism, so this trivial representation always exists. This is as un-faithful as possible and reducible: every subspace of \(V\) is an invariant subspace. Objects transforming in this representation are called scalars or singlets. They are often referred to as ‘living in the \(\mathbf{1}\) of \(G\)’.
Definition 2.9. A representation \(r:G\rightarrow GL(V)\) is unitary if \(V\) has an inner form 13 \(\langle.,.\rangle\) and \(\langle x,y\rangle = \langle r(g)x,r(g)y\rangle\) for all \(g \in G\) and all \(x,y \in V\).
Example 2.8. The fundamental representation of \(SU(n)\) is faithful, irreducible, and unitary.
Given that we have introduced most matrix groups as preserving some inner form, this seems like a natural concept. It’s power lies in the following
Theorem 2.1. Let \(r:G\rightarrow GL(V)\) be a finite-dimensional unitary representation. Then it can be completely decomposed into irreducible representations \(r_i(G)\): \[r(G) = \bigoplus_i r_i(G)\,\,, \hspace{1cm} V = \bigoplus_i V_i \,\,, \hspace{1cm} r_i(G) \in GL(V_i)\, .\]
If you like, you can think of \(r(G)\) as respecting the same block-diagonal form for all \(g \in G\) in an appropriate basis of \(V\). We have already seen a reducible representation that can be decomposed into irreducible ones, see example Example 2.5.
:
Let \(r(G)\) be a reducible representation (otherwise there is nothing to prove) and consider any of its invariant subspaces \(W\). The main step of the proof is to show that the orthogonal complement \[W^\perp := \{ v \in V | \langle v,w \rangle = 0\,\, \forall w \in W \}\] is an invariant subspace as well.
For any \(r(g)\) we can define its dual \(r^*(g)\) by \[\langle v, r(g) u \rangle = \langle r^*(g) v, u \rangle\, .\] for all \(v,u \in V\). It follows that \[\langle v, u \rangle = \langle r(g) v, r(g) u \rangle = \langle r^*(g)r(g) v, u \rangle\] so that \(r^*(g)r(g) =\mathds{1}\).
Now for all \(w \in W\), \(v \in W^\perp\) and all \(g \in G\) we have \[0 = \langle v,w \rangle = \langle v, r(g) w \rangle = \langle r^*(g) v, w \rangle = \langle (r(g))^{-1} v, w \rangle\] where \((r(g))^{-1}\) is the inverse of the matrix \(r(g) \in GL(V)\). As every element in \(G\) has an inverse and \((r(g))^{-1} = r(g^{-1})\) (as shown in problems), we can just write \[0 = \langle r(g) v, w \rangle\] for all \(w \in W\), \(v \in W^\perp\) and all \(g \in G\). This means whatever \(g\) acts on \(v \in W^\perp\), we stay in \(W^\perp\). Hence \(W^\perp\) is an invariant subspace as well, which is what we wanted to show.
Now we can decompose \[r(G) = r_W(G) \oplus r_{W^\perp}(G)\,\,, \hspace{1cm} V = W \oplus W^\perp \,\,.\] as both \(W\) and \(W^\perp\) are invariant subspaces. If both \(r_W(G)\) and \(_{W^\perp}(G)\) are irreducible we are done. Otherwise, we can simply run the same argument again to achieve a finer decomposition. This iteration must terminate as \(V\) is finite dimensional. \(\square\)
For unitary representations all is hence nice and easy. But what can we do when we do not have an inner form that is respected by \(r(g)\) ? Using ‘Weyl’s unitarity trick’ we can just cook one up (if \(G\) is compact)!
Theorem 2.2. Let \(G\) be a compact Lie group and \(r(G)\) a finite-dimensional representation on a vector space with inner form \(\langle .,. \rangle\). Then there exists an inner form that is invariant under \(r(G)\) and hence the same statement as in Theorem Theorem 2.1 holds.
: Let \(\langle . , .\rangle\) be some inner form on \(V\). As \(G\) is a compact group, \(\langle r(g) v, r(g) w \rangle\) is bounded for fixed \(v,w\): this expression cannot diverge for \(g \rightarrow \hat{g}\) anywhere on \(G\) as such a \(\hat{g}\) cannot be in \(G\). But it follows from \(G\) being topologically closed that any sequence of group elements \(g_i \in G\) has a limit that is also in \(G\). Hence there must be a maximal value of \(\langle r(g) v, r(g) w \rangle\) for fixed \(v\) and \(w\) and we can use that as the bound.
Furthermore, \(G\) is some bounded subspace in \(\mathbb{R}^m\) for some \(m\) for the matrix Lie groups we are treating in these lectures, and as such has a finite volume. We can then integrate a bounded function over it and receive a finite answer. In particular, we can use any realization of \(G\) as a subset of \(\mathbb{R}^n\) to define \[\langle v,w \rangle_G := \int_G \langle r(g)v,r(g)w \rangle dV \, .\] What is happening here is that we are averaging over the action of the group on \(\langle v,w \rangle\). Let’s act with a group element \(h\) on \(v\) and \(w\) \[\langle r(h)v,r(h)w \rangle_G = \int_G \langle r(g)r(h)v,r(g)r(h)w \rangle dV = \int_G \langle r(gh)v,r(gh)w \rangle dV \, .\] where we have used that \(r\) is a group homomorphism. Now if \(g\) sweeps out the whole group, so does \(gh\) for any \(h \in G\). In particular, every group element \(g'\) can be uniquely written as \(g' = gh\) for some \(g\), just take \(g = g'h^{-1}\). Hence \[\langle r(h)v,r(h)w \rangle_G = \int_G \langle r(gh)v,r(gh)w \rangle dV = \int_G \langle v, w \rangle dV = \langle v,w \rangle_G \, .\] and we are done. \(\square\)
An important feature of complex irreducible representations is Schur’s lemma:
Theorem 2.3. Let \(r\) be an irreducible representations of \(G\) on a finite-dimensional complex vector space \(V\), and let \(T:V \rightarrow V\) be a linear map such that \[r(g) T = T\,\, r(g)\] for all \(g \in G\). Then
\(T=0\)
or
\(T= c \mathds{1}\) for some complex number \(c\)
: First observe that \(\ker T\) is an invariant subspace: if \(v \in \ker T\) we have \[0 = Tv = r(g)Tv = T r(g) v\] so \(r(g) v \in \ker T\) as well. As we have assumed that \(r\) is irreducible, \(\ker T = V\) or \(\ker T =\{0\}\). If \(\ker T = V\) it follows that \(T=0\), so case a) is realized and we are done.
Let us hence assume from now on that \(\ker T =\{0\}\). As a complex matrix, \(T\) has at least one non-zero eigenvalue, let that eigenvalue be \(c\) and the associated eigenvector be \(v_c\). Now consider the map \(\hat{T} : = T - c \mathds{1}\) for which \(v_c \in \ker \hat{T}\). We have \[r(g) \hat{T} = \hat{T} r(g)\] as the identity commutes with every matrix. Now we can again observe as above that \(\ker \hat{T}\) is an invariant subspace and hence must be \(\{0\}\) or \(V\). We already know that \(\ker \hat{T} \neq 0\), so it must be that \(\ker \hat{T} = V\) which implies \(\hat{T}=0\), i.e. \(T = c \mathds{1}\). \(\square\)
The complex representations of \(U(1)\) can be found by more or less elementary considerations.
Theorem 2.4. Complex irreducible representations of \(\,U(1)\) are all unitary, take \(U(1)\) to \(GL(1,\mathbb{C})\), i.e. act on \(\mathbb{C}\), and only depend on an integer \(n\). For \(g = e^{i \, \phi}\) we can write the homomorphism \(f_n:U(1) \rightarrow GL(1,\mathbb{C})\) as \[\begin{equation} \label{eq:u1repstartingpoint} f_n(g) = g^n = e^{in\, \phi} \, . \end{equation}\]
: We first show that all complex irreducible representations of \(U(1)\) are one-dimensional. Consider such a representation \(f(g)\) of \(U(1)\) for which \(f(g) \in GL(m,\mathbb{C})\) As \(U(1)\) is an abelian group we have \[f(g)f(h) = f(gh) = f(hg) = f(h) f(g)\, .\] for all \(g,h \in U(1)\). Now let us fix \(h\) for the time being. Then we can set \(r(g) = f(g)\) in Schur’s lemma, and use \(T = f(h)\) to conclude that \(f(h)\) must be proportional to the identity map. This is true for all \(h \in U(1)\) (different \(h\) might produce different \(c\) however) which implies that the representation \(f(g)\) is one-dimensional: any subspace of \(\mathbb{C}^m\) is an invariant subspace, and the only subspaces giving irreducible ones are complex one-dimensional ones.
As \(f\) is a homomorphism we need \[f(hg) = f(h) f (g)\, .\] Differentiating both sides w.r.t \(h\) and setting \(h=1\) we find \[g f'(g) = f'(1)f(g)\, .\] Let us write the constant \(f'(1)\) by \(n = f'(1)\). The the unique solution to the above differential equation that satisfies \(f(1) = 1\) (a consequence of \(f\) being a group homomorphism) is \[f(g) = g^n = e^{i\phi n} \, .\] Letting \(\phi = 2\pi\) yields \(g=1\), so that \(f(1) = 1\) additionally requires \(e^{2 \pi i n}=1\), i.e. \(n \in \mathbb{Z}\). For \(f(g)\) we have \(f(g)^\dagger = f(g)^{-1}\), so these are all unitary (using the standard inner form on \(\mathbb{C}\)). \(\square\)
REMARK: The integer \(n\) is often called the ‘charge’ in physics, and you will see later that it deserves this name when we study electromagnetism, but this also shows up in other contexts with a \(U(1)\) action. The realization that \(n\) is an integer has the profound consequence that charges are quantized, i.e. they are multiples of some fundamental charge (corresponding to \(n=1\)). This doesn’t explain why protons and electrons have the same charges, but already implies that their charges must satisfy \[\frac{q_{\mbox{\footnotesize electron}}}{q_{\mbox{\footnotesize proton}}} = \frac{n_{\mbox{\footnotesize electron}}}{n_{\mbox{\footnotesize proton}}}\,\] i.e. the ratio must be a rational number.
2.6. .
Describe a \(U(1)\) subgroup of \(SU(2)\). Is \(U(1) \times U(1)\) a subgroup of \(SU(2)\) as well?
Let \(A\) be an element of the vector space that is acted on by the adjoint representation of \(SU(2)\). For the \(U(1)\) subgroup of \(SU(2)\) you identified above, find the action on \(A\) and use this to decompose the action of \(U(1)\) into irreducible representations.
2.7. Consider the map \(r_\kappa:U(1) \rightarrow GL(3,\mathbb{C})\) defined by \[r_\kappa(e^{i\phi}) = e^{\phi \lambda \kappa}\] where \(\kappa \in \mathbb{C}\) and \[\lambda = \begin{pmatrix} 0 & i & 0 \\ i & 0 & i \\ 0 & i & 0 \end{pmatrix}\]
For which values of \(\kappa\) is \(r_\kappa\) a representation of \(U(1)\)? [hint: think about what happens to eigenvectors of \(\lambda\) and use the classification theorem for complex representations of \(U(1)\).]
If we have found a representation of a Lie group \(G\) on some vector space \(V\), every element \(g \in G\) is assigned an element \(r(g) \in GL(V)\). We can think the same way about representations of Lie algebras. The only difference is that we want to preserve the algebra structure.
Definition 2.10. A Lie algebra homomorphism is a linear map \(f: \mathfrak{g} \rightarrow \mathfrak{h}\) between Lie algebras \(\mathfrak{g}\) and \(\mathfrak{h}\) such that \([f(\gamma),f(\delta)] = f([\gamma,\delta])\).
Definition 2.11. A representation of a Lie algebra \(\mathfrak{g}\) is a Lie algebra homomorphism \(\rho: \mathfrak{g}\rightarrow \mathfrak{gl}(V)\) for a finite-dimensional vector space \(V\).
Definition 2.12. A representation of a Lie algebra \(\mathfrak{g}\) is called reducible if there exists an invariant subspace, i.e. there exists a \(W \subset V\) with \(W \neq \{0\}\) and \(W \neq V\) s.t. \[\rho(\gamma) w \in W\,\,\, \forall w \in W\,\, \forall \gamma \in \mathfrak{g} \, .\]
In the same way that a path \(g(t)\) passing through \(\mathds{1}\) determines an element \(\gamma\) of the Lie algebra, we can use \(r(g(t))\) to determine an associated representation \(\rho(\gamma)\): all we need to do is consider \(r(g(t))\) instead of \(g(t)\) and do the same computation.
Proposition 2.1. Given a finite-dimensional representation \(r\) of a Lie group \(G\), there is a unique associated representation \(\rho\) of its Lie algebra \(\mathfrak{g}\) such that \[\begin{equation} \label{eq:algrepfromgrouprep} r(e^{t\gamma}) = e^{t \rho(\gamma)}\, , \end{equation}\] we can compute this by working out \[\begin{equation} \label{eq:algrepfromgrouprep2} \rho(\gamma) = \left. \frac{\partial}{\partial t} r\left(e^{t \gamma}\right) \right|_{t=0}\, . \end{equation}\]
: First of all \(\eqref{eq:algrepfromgrouprep2}\) shows how to compute the map \(\rho(\gamma)\) from \(r(g)\), i.e. it is uniquely given once \(r(g)\) is fixed.
Nect we check that we satisfy the definition. Consider the path \[g(t) :=e^{t\gamma} e^{t\delta}\, .\] On the one hand we have \[\left. \frac{\partial}{\partial t} r\left(g(t)\right) \right|_{t=0} = \left. \frac{\partial}{\partial t} r\left( e^{t\gamma} e^{t\delta} \right) \right|_{t=0} =\left. \frac{\partial}{\partial t} r\left( e^{t\gamma}\right) r \left( e^{t\delta} \right) \right|_{t=0} = \rho(\gamma) + \rho(\delta)\, .\] Now consider \[\begin{aligned} e^{t\gamma} e^{t\delta} &= \sum_{k} \frac{(t\gamma)^k}{k!}\sum_{l} \frac{(t\gamma)^l}{l!} = (1 + t \gamma + t^2 \gamma^2/2 + \cdots) (1 + t \delta + t^2 \delta^2/2 + \cdots) \\ & = \left(\sum_k \frac{t^k(\gamma+\delta)^k}{k!}\right) + \frac{t^2}{2} (\gamma \delta - \delta \gamma) + t^3 (\cdots) + \cdots \, . \end{aligned}\] Such a relation can be given more concisely as what is called the ‘Baker-Campbell-Hausdorff formula’, see for details.
Hence \[\begin{aligned} \left. \frac{\partial}{\partial t} r\left( e^{t\gamma} e^{t\delta} \right) \right|_{t=0} = \left. \frac{\partial}{\partial t} r\left( e^{t(\gamma +\delta)}+t^2(...) \right) \right|_{t=0} = \left. \frac{\partial}{\partial t} r\left( e^{t(\gamma +\delta)} \right) \right|_{t=0} = \rho(\gamma + \delta)\, . \end{aligned}\] In the first step, we have used the relation established above, the second step is just the chain rule, and the third step uses the definition of a Lie algebra representation.
Hence we have shown that \[\rho(\gamma) + \rho(\delta) = \rho(\gamma + \delta) \, .\] Furthermore using the chain rule we have that \[\begin{aligned} \rho(c \gamma) &= \left. \frac{\partial}{\partial t} r\left( e^{tc\gamma} \right) \right|_{t=0} \\ & = c \left. \frac{\partial}{\partial t} r\left( e^{t\gamma} \right) \right|_{t=0} & = c \rho(\gamma) \end{aligned}\] Hence we have shown that \(\rho\) is a linear map of \(\mathfrak{g}\).
Now we check that it respects the algebra \([\cdot,\cdot]\), i.e. is a Lie algebra homomorphism. Recall that we shown earlier that \[e^{tg\gamma g^{-1}} = ge^{t\gamma}g^{-1}\] so we find \[r\left(e^{tg\gamma g^{-1}} \right)= r\left(ge^{t\gamma}g^{-1}\right) = r\left(g\right) r \left( e^{t\gamma}\right) r(g^{-1}) \, .\] Taking a derivative w.r.t \(t\) on both sides and then setting \(t=0\) we get \[\rho(g \gamma g^{-1}) = r(g) \rho(\gamma) r(g^{-1})\,.\] Now comes the final trick: we set \(g=e^{t \delta}\) in the above equation, take another derivative w.r.t. \(t\) and set \(t=0\) again. The rhs becomes \[\left. \frac{\partial}{\partial t} e^{t \delta} \rho(\gamma) e^{-t \delta} \right|_{t=0} = \rho(\delta)\rho(\gamma) - \rho(\gamma)\rho(\delta) = [\rho(\delta),\rho(\gamma)]\] For the lhs, recall that \(\rho\) is a linear map between vector space, so that14 \[\frac{\partial}{\partial t} \rho\left(\kappa(t)\right) = \rho\left( \frac{\partial}{\partial t} \kappa(t)\right)\, ,\] The lhs hence becomes \[\left. \frac{\partial}{\partial t}\rho(e^{t \delta} \gamma e^{-t \delta}) \right|_{t=0} = \left. \rho\left( \frac{\partial}{\partial t} e^{t \delta} \gamma e^{-t \delta}\right) \right|_{t=0} = \rho([\delta,\gamma])\] Hence we have shown that \[\rho([\delta,\gamma]) = [\rho(\delta),\rho(\gamma)]\, ,\] i.e. we have defined a Lie algebra homomorphism. \(\square\)
REMARK: The converse to the above theorem is not always true, i.e. given a representation \(\rho\) of a Lie algebra, there does not need to be a group representation that relates to it via \(\eqref{eq:algrepfromgrouprep}\). We have actually encoutered this before. Recall that \(SO(3)\) and \(SU(2)\) have isomorphic Lie algebras. Hence I can think of the Lie algebra of \(SU(2)\) as a (Lie algebra) representation of \(\mathfrak{so}(3)\). However, if I exponentiate the Lie algebra \(\mathfrak{su}(2)\), I do not get a representation of \(SO(3)\) but just \(SU(2)\) in the defining representation. We will examine this in a little more detail later.
2.8. Consider the Lie group \(G\) of upper triangular \(2 \times 2\) matrices \[G = \left\{\begin{pmatrix} a & b \\ 0 & c \end{pmatrix}| a,b,c \in \mathbb{R}, ac \neq 0 \right\}\]
Let \(\boldsymbol{v} \in \mathbb{R}^3\), \(\boldsymbol{v} = (v_1,v_2,v_3)\). Define an action of \(G\) on \(\boldsymbol{v}\) by writing \[v_m := \begin{pmatrix} v_1 & v_2 \\ 0 & v_3 \end{pmatrix}\] and letting \(g \in G\) act as \[r(g)v_m := g v_m g^{-1} \, .\] Convince yourself that this is a representation of \(G\). Write the action of \(g\) on \(\boldsymbol{v}\) defined above in terms of a \(3 \times 3\) matrix acting on \(\boldsymbol{v}\): \[r(g) \boldsymbol{v} = M(g) \boldsymbol{v}\] for a \(3 \times 3\) matrix \(M(g)\) acting on the vector \(\boldsymbol{v} \in\mathbb{R}^3\) in the usual way.
Writing elements of the representation \(r(G)\) in terms of the matrices \(M(g)\), work out the associated representation \(\rho\) of the Lie algebra \(\mathfrak{g}\) of \(G\).
Check that they obey the same Lie algebra as the Lie algebra \(\mathfrak{g}\) of \(G\) (see problem 20), i.e. find a bijective Lie algebra homomorphism between the Lie algebra \(\mathfrak{g}\) of \(G\) and the Lie algebra representation \(\rho(\mathfrak{g})\) associated with \(r(G)\).
Example 2.9. The adjoint representation of the Lie algebra is a map \(ad: \mathfrak{g} \rightarrow GL(\mathfrak{g})\) which maps \(\delta \in \mathfrak{g}\) to a linear map acting on \(\mathfrak{g}\) that acts on \(\gamma \in \mathfrak{g}\) as \[\begin{equation} \label{eq:adjoint_action} ad(\delta): \gamma \rightarrow [\delta,\gamma] \, , \end{equation}\] i.e. \(ad(\delta)\) is in \(GL(\mathfrak{g})\). We can work out this representation is associated to the usual adjoint representation of the group, Definition \(\eqref{def:adjoint}\), using \(\eqref{eq:algrepfromgrouprep2}\): \[ad(\delta) (\gamma) = \left. \frac{\partial}{\partial t} Ad(e^{t\gamma}) (\gamma) \right|_{t=0} = \left. \frac{\partial}{\partial t} e^{t \delta} \gamma e^{-t\delta} \right|_{t=0} = [\delta,\gamma]\, .\]
Earlier, we mentioned that we can characterise Lie algebras by their structure constants \(f_{ab}{}^c\) once a basis \(\{t_a\}\) was chosen, \(\eqref{eq:structure_constants}\).
Proposition 2.2. The structure constants define a representation by setting \[\begin{equation} \label{eq:adjoint_structure_c} \left(\rho_{adj}(t_a)\right)^{b}{}_c = f_{ac}{}^b \, . \end{equation}\] This representation is the adjoint representation written in the basis \(\{t_a\}\), i.e. the adjoint action in the basis \(\{t_a\}\) is given by matrices \(\rho_{adj}(t_a)\) with components \(\rho_{adj}(t_a)^b{}_c = f_{ac}{}^b\).
:
2.9. .
Check that \(\eqref{eq:adjoint_structure_c}\) defines a representation of \(\mathfrak{g}\).
Show that the adjoint action in the basis \(\{t_a\}\) is given by the matrices \(\rho_{adj}(t_a)\) with components \(f_{ac}{}^b\) by showing that \[\begin{equation} \label{ad_vs_adj} ad(t_a)(\gamma^b t_b) = \left(\rho_{adj}(t_a)\right)^{b}{}_c \gamma^c t_b~. \end{equation}\]
Theorem 2.5. Let \(r\) be a complex representation of a compact Lie group \(G\) acting on \(V\), and let \(\rho\) be the associated Lie algebra representation. Writing the basis elements of the Lie algebra \(\mathfrak{g}\) of \(G\) as \(\{i t_a \}\), we can choose a basis of \(V\) such that \(\rho(t_a)\) are Hermitian matrices, \(\rho(t_a)^\dagger = \rho(t_a)\).
:
As we have seen when using Weyl’s unitarity trick, we can choose an inner form \(\langle .,.\rangle\) on the complex vector space \(V\) such that \[\langle r(g) v,r(g) w\rangle =\langle v, w\rangle\, .\] As this is an inner form on a complex vector space, we have that for any \(c \in C\) \[\langle v,c w\rangle = c \langle v, w\rangle \hspace{1cm} \langle c v,w\rangle = \bar{c} \langle v, w\rangle \,.\]
With this inner form, we can choose a basis \(e_i\) on \(V\) such that \[\langle e_i , e_j \rangle = \delta_{ij} \, .\] Using this basis, we can write \(r(g)\) as matrices \(r(g)_{ij}\).
For \(v = v_i e_i\) and \(w = w_i e_i\) we now work out \[\begin{aligned} \bar{v}_i w_i = \langle v , w \rangle &= \langle r(g) v_i e_i , r(g) w_j e_j \rangle = \langle r(g)_{ik} v_k e_i , r(g)_{jl} w_l e_j \rangle \\ &= \overline{r(g)_{ik} v_k} r(g)_{jl} w_l \langle e_i , e_j \rangle = \bar{v}_k r^\dagger(g)_{ki} r(g)_{jl} w_l \delta_{ij} = \bar{v}_k r^\dagger(g)_{ki} r(g)_{il} w_l \end{aligned}\] i.e. \(r^\dagger(g)_{ki} r(g)_{jl} = \delta_{kl}\) so that \(r(g)^\dagger = r(g)^{-1}\).
As we have assumed that \(\{i t_a \}\) is a basis of \(\mathfrak{g}\), we can write \[g = e^{i t_a \gamma^a}\] for some real numbers \(\gamma^a\). By defintion this implies that \(\rho(t_a)\) satisfies \[r(g) = e^{i \rho(t_a)}\, .\] Now \(r(g)^\dagger = r(g)^{-1}\) gives us \(\rho(\gamma)^\dagger = \rho(\gamma)\). \(\square\)
Here is a neat way to explicitely construct representations of \(SU(2)\), and as we will see, it gives us all the irreducible ones. \(SU(2)\) naturally acts \(\mathbb{C}^2\) in the the fundamental representation. Given some complex polynomial \(P(\boldsymbol{z})\) in two variables \(\boldsymbol{z}= (z_1,z_2)\), we can then let \(SU(2)\) act on \(P(\boldsymbol{z})\) in this way. This is particularly nice if \(P(\boldsymbol{z})\) is a homogenous polynomial of degree \(d\), i.e. we can write \[P(\boldsymbol{z}) = \sum_{k=0}^d a_k z_1^k z_2^{d-k} \, .\] The space of such polynomials is a vector space \(\Pi_d\) of dimension \(d+1\). You can think of the \(a_k \in \mathbb{C}\) as the components of the vector and the monomials as the basis vectors. Letting \(SU(2)\) act on \(\mathbb{C}^2\), we have a corresponding induced action on the vector space of polynomials.
Proposition 2.3. The map \[r_d(g) P := P(g^{-1}\boldsymbol{z} ) \, .\] where \(g^{-1} \in SU(2)\) acts on \(\boldsymbol{z} = (z_1,z_2)\) as \[\begin{pmatrix} z_1 \\ z_2 \end{pmatrix} \rightarrow g^{-1} \begin{pmatrix} z_1 \\ z_2 \end{pmatrix}\] defines a representation \(r_{d}\) of \(SU(2)\) on the complex vector space \(\Pi_d\) of dimension \(d+1\).
: exercises
We can now figure out the representations \(\rho_d\) of \(su(2)\) that are associated with the \(r_d\) described in proposition Proposition 2.3. Let us choose \(\ell_j \equiv \frac{i}{2}\sigma_j\) as the generators of the Lie algebra \(su(2)\): \[\ell_1 = \tfrac12 \begin{pmatrix}
0 & i \\
i & 0
\end{pmatrix}
\,\,, \hspace{.3cm}
\ell_2 = \tfrac12 \begin{pmatrix}
0 & 1 \\
-1 & 0
\end{pmatrix}
\,\,, \hspace{.3cm}
\ell_3 = \tfrac12 \begin{pmatrix}
i & 0 \\
0 & -i
\end{pmatrix}\, .\] Their action on the monomials \(z_1^l z_2^{d-k}\) is then (see problem class 4): \[\label{eq:ellkactionrd}
\begin{align}
\ell_1 &: z_1^k z_2^{d-k} \rightarrow -\frac{i}{2}\left(k z_1^{k-1}z_2^{d-k+1} +(d-k)z_1^{k+1}z_2^{d-k-1} \right)\\
\ell_2 &: z_1^k z_2^{d-k} \rightarrow \frac{1}{2}\left(-k z_1^{k-1}z_2^{d-k+1} +(d-k)z_1^{k+1}z_2^{d-k-1}\right) \\
\ell_3 &: z_1^k z_2^{d-k} \rightarrow i(d/2-k) z_1^k z_2^{d-k}
\end{align}\]
Theorem 2.6. For every integer \(d \geq 0\), there is a single finite-dimensional irreducible representations \(r_d\) of \(SU(2)\) on a complex vector space \(\Pi_d\) of dimension \(d+1\). These are all of the complex irreducible finite-dimensional representations of \(SU(2)\).
Before taking on the theorem, let me prove a little lemma that will be quite useful:
Lemma 2.1. Let \(r\) be a complex representation of \(SU(2)\) acting on \(V\). Then all eigenvalues of \(\rho(\sigma_i)\), for \(\rho\) the associated representation of \(\mathfrak{su}(2)\), are real.
(of the lemma):
Let us denote \(\exp(i \rho(\sigma_j)) \equiv r_j\) and \(\rho(\sigma_j) \equiv \rho_j\), so \(\exp(i\rho_j) = r_j\). As \(SU(2)\) is compact, we can choose an inner form \(\langle .,.\rangle\) on the complex vector space \(V\) such that \[\langle r_j v,r_j v\rangle =\langle v, v\rangle\, .\] When using Weyl’s unitarity trick in Theorem Theorem 2.5, we further found that we can always choose a basis such that \(\rho_i^\dagger = \rho_i\). Now let \(v\) be an eigenvector of \(\rho_j\) with eigenvalue \(e_v\) and work out \[e_v \langle v, v \rangle = \langle v, \rho_j v \rangle = \langle \rho_j v, v \rangle = \bar{e}_v \langle v, v \rangle\, .\] so that \(e_v\) must be real. \(\square\)
Now we are ready to prove the theorem: :
Let’s start by slightly enlarging the scope and study representation of the Lie algebra \(\mathfrak{sl}(2,\mathbb{C}) = \mathfrak{su}(2) \otimes \mathbb{C}= \mathfrak{su}_\mathbb{C}(2)\). As we have seen in problem class 4, irreducible representations of \(\mathfrak{sl}(2,\mathbb{C})\) are in one-to-one correspondence with irreducible representations of \(\mathfrak{su}(2)\). What this means is that we consider a complex instead of a real vector space over the Pauli matrices as the algebra under consideration. This allows us to define \[H \equiv \frac{1}{2}\rho_d(\sigma_3)\,\, , \hspace{2cm} L_\pm = \frac{1}{2} \left(\rho_d(\sigma_1) \pm i \rho_d(\sigma_2) \right)\, .\] These obey the algebra \[= \pm L_\pm \,\,, \hspace{2cm} [L_+,L_-]=2H\, .\] Let us start by assuming that \(w_n\) is an eigenvector of \(H\) with eigenvalue \(n\), so that \(H w_n = n w_n\). Then \[H L_+ w_n = (L_+ H + [H,L_+] ) w_n = (L_+ H + L_+) w_n = (L_+ n + L_+ ) w_n = (n+1) (L_+ w_n)\, .\] This equation means that \(L_+ w_n\) is another eigenvector of \(H\), but now the eigenvalue is \(n+1\). Hence \(L_+\) is a ‘raising operator’ that increases the eigenvalue \(n\) by one. A similar computation reveals that \(L_- w_n\) has eigenvalue \(n-1\), so \(L_-\) is a ‘lowering operator’.
As we only care about representations of \(SU(2)\) we can use the lemma above and conclude that \(H\) only has real eigenvalues. As we only consider finite-dimensional vector spaces, one of these eigenvalues must be the largest. Let us call this eigenvalue \(m\) and \(w_m\) the associated eigenvector15. Then we must have \[L_+ w_m = 0 \, .\] Otherwise \(L_+ w_m\) would be another eigenvector with eigenvalue \(m+1\), which violates the assumption that we have chosen the largest.
We can then repeatedly act with \(L_-\) to produce more eigenvectors with smaller eigenvalues. As we are looking for finite-dimensional representations, this must terminate at some point, i.e. for some \(d \in \mathbb{Z}\), \((L_-)^{d+1} w_m = 0\). A basis of our representation are hence the vectors \[w_{m-l} \equiv (L_-)^l w_m \,\,\,\, , l = 0 \cdots d\, .\] and its dimension is \(d+1\). To find out which values can appear, we introduce \[\Delta \equiv \frac{1}{4}\left( \rho_d(\sigma_1)^2 + \rho_d(\sigma_2)^2 + \rho_d(\sigma_3)^2\right) = \frac{1}{2}\left(L_+ L_- + L_- L_+\right) + H^2\,\] We already know that \(\sigma_i^2 = \mathds{1}\) in the fundamental representation. That \(\Delta = c \mathds{1}\) here as well follows from Schur’s lemma after observing that \[= [\Delta, L_\pm] = 0 \, .\] This does not imply that \(c=3/4\) however, as we are not in the fundamental representation! To fix \(c\), observe that \[\Delta w_m = \left(\frac{1}{2}(L_+ L_- + L_- L_+) + H^2\right) w_m = \left(L_- L_+ + H(H+\mathds{1}) \right) w_m = m(m+1)w_m\, .\] Hence \(c= m(m+1)\). As \(L_- w_{m-d} = 0\) and furthermore \(L_+L_- = \Delta - H(H-1)\) we have that \[\begin{aligned}
0 &= \left(\Delta - H (H-\mathds{1})\right) w_{m-d} = (m(m+1)-(m-d)(m-d-1)) w_{m-d} \\
&= (1 + d) (2 m - d) w_{m-d}
\end{aligned}\] which implies \(2m-d = 0\). As \(d\) is an integer, this implies that \(m\) takes half-integer values. By construction, these are finite-dimensional irreducible representations of \(\mathfrak{sl}(2,\mathbb{C})\).
We can easily restrict all of the matrices appearing in this representation to anti-hermitian ones to find a representation of \(\mathfrak{su}(2)\). As we have that \(\rho(\sigma_i) = \rho(\sigma_i^\dagger)\), we get a representation of \(\mathfrak{sl}(2,\mathbb{C})\) as \[\sum_j a_j \rho(\sigma_i)\,\,\, \mbox{for}\,\,\, a_j \in \mathbb{C}\] and a representation of \(su(2)\) as \[\sum_j i a_j \rho(\sigma_i)\,\,\, \mbox{for}\,\,\, a_j \in \mathbb{R}\, .\] In problem class 4 we have seen using the above that irreducible representations of \(\mathfrak{sl}(2,\mathbb{C})\) are in one-to-one correspondence with irreducible representations of \(\mathfrak{su}(2)\), so we can think of the representations just found as representations of \(\mathfrak{su}(2)\). 16
In order to compare this with the representations \(\rho_d\) of \(su(2)\) associated to the representations \(r_d\) of \(SU(2)\) that we already know exist, it is convenient to rescale the basis vectors \(w_j\) as follows. We define \(v_m \equiv w_m\) and \[L_- v_k = (m+k) v_{k-1}\] which implies \[\begin{aligned}
L_+ v_k &= \frac{1}{m+k+1} L_+ L_- v_{k+1} = \frac{1}{m+k+1} \left(\Delta - H(H-1) \right) v_{k+1} \\
&= \frac{1}{m+k+1} \left(m(m+1) - k (k+1) \right) = (m-k) v_{k+1} \, .
\end{aligned}\] In this basis, the action of the \(\ell_i\) is \[\begin{aligned}
\ell_1 v_{m-k} &= \frac{i}{2} (L_+ + L_-) v_{m-k} = \frac{i}{2} \left(k v_{m-k+1} + (d-k)v_{m-k-1}\right)\\
\ell_2 v_{m-k} &= \frac{1}{2} (L_+ - L_-) v_{m-k} = \frac{1}{2} \left(k v_{m-k+1} - (d-k)v_{m-k-1}\right) \\
\ell_3 v_{m-k} &= iH v_{m-k} = i (m-k) v_{d/2-k}
\end{aligned}\] where \(m=d/2\) for an integer \(d\). Comparing with \(\eqref{eq:ellkactionrd}\) we see that these representations are identified if we associate \[z_1^k z_2^{d-k} \simeq (-1)^k v_{m-k}\, .\] This means all the representations of \(\mathfrak{su}(2)\) we have found are the associated representations of the group representations \(r_d\) we already know exist.
This representation \(\rho_d\) is irreducible as can be seen as follows. Take any invariant subspace \(V\) of \(\Pi_d\). By assumption the action of the \(\ell_k\) maps any vector of \(V\) to another vector of \(V\). As \(V\) is a complex vector space, complex linear combinations are again in \(V\). This implies that if \(P \in V\), we also have that any linear combination of \[\ell_+^n := (z_2 \frac{\partial}{\partial z_1})^n P \,\, , \hspace{1cm} \ell_-^p := (z_1 \frac{\partial}{\partial z_2})^p P\] is in \(V\) (these are just powers of the polynomial versions of raising and lowering operators). We can hence apply a suitable power of \(\ell_-\) to map \(P\) to the single monomial \(z_1^d\). This monomial is hence in \(V\), which implies that any complex multiple of it is in \(V\) as well. But now we can use \(\ell_+\) to conclude the same for any other monomial. As the monomials are a basis of \(\Pi_d\), it follows that \(V = \Pi_d\). The Lie algebra representations \(\rho_d\) are hence irreducible.
This implies that \(r_d\) is irreducible as well. If \(W \in \Pi_d\) is an invariant subspace of \(r_d\), then it must be invariant under \(e^{t \rho(\gamma)}\) for all \(t\) and \(\gamma \in \mathfrak{su}(2)\), so in particular under \(\partial/\partial t e^{t \rho(\gamma)}\) and hence under \(\rho_d(su(2))\). But the Lie algebra representation \(\rho\) is irreducible as we already know.
So now we know all irreducible representations of \(SU(2)\): if there were others, the associated Lie algebra representation would have had to show up in our analysis. \(\square\)
We are now ready to discuss representations of \(SO(3)\). As the Lie algebra of \(SO(3)\) is the same as the Lie algebra of \(SU(2)\), it has the same irreducible representations. Coming to the groups, recall that there is a \(2\) to \(1\) map from \(SU(2)\) to \(SO(3)\) that we investigated in problem class 1 which mapped both \(\mathds{1}\in SU(2)\) and \(-\mathds{1}\in SU(2)\) to \(\mathds{1}\in SO(3)\). We can hence construct representations of \(SO(3)\) from representations of \(SU(2)\) if \(r(-\mathds{1}) = \mathds{1}\). Let us look at the action of \(r_d(-\mathds{1})\) on a monomial \[r_d(-\mathds{1}): z_1^k z_2^{d-k} \rightarrow (-1)^d z_1^k z_2^{d-k} \, .\] This map is the identity only if \(d\) is an even integer, i.e. \(m=d/2\) is an integer. We have seen that every representation of a Lie group gives us an associated representation of its Lie algebra. The above shows that the converse is not true, the representations of \(so(3)\) where \(m\) is half-integer cannot come from any representations of \(SO(3)\). On the other hand, we can lift any finite-dimensional representation \(R\) of \(SO(3)\) to one of \(SU(2)\):
2.10. Show that any irreducible complex representation of \(SO(3)\) also defines an irreducible complex representation of \(SU(2)\).
Hence we have
Theorem 2.7. The \(r_d\) for \(d = 2m\), \(m \in \mathbb{Z}\) are all of the finite-dimensional complex irreducible representations of \(SO(3)\).
In physics in \(\mathbb{R}^3\), the half-integer \(m\) is called the spin: if there is a physical object that transforms in the representation \(r_d\), we say it has spin \(m=d/2\). This applies both to field theories, where \(SO(3)\) acts on the components of a field, and to quantum mechanics, where \(SO(3)\) acts on states. If \(d=0\) we have a one-dimensional representation, e.g. a scalar field, that does not transform at all, this is the spin \(0\) case. An ordinary vector in \(\mathbb{R}^3\) transforms in the three-dimensional representation \(r_2\) of \(SU(2)\), so you would call a field \(\mathbf{\phi} = (\phi_1,\phi_2,\phi_3)\) transforming like a vector in \(\mathbb{R}^3\) a ‘vector field’ as well. Here \(m=1\), so this is ‘spin 1’.
The representations of \(SO(3)\) show up in most courses on quantum mechanics when treating the hydrogen atom. Using wavefunctions gives us a very concrete version of these representations: the ‘spherical harmonics’.
It is a fact of nature that there are particles of ‘spin 1/2’, e.g. the electron or quarks. You might find this irritating as we might want to classify particles according to how they transform under space-time symmetries, i.e. \(SO(3)\) for rotations, and for \(m=1/2\) we do not get a representation of \(SO(3)\), but only one of its Lie algebra. One way to explain this is that in quantum mechanics, multiplying any state vector by a non-zero complex number does not change the state we are in. Taking this into account means studying projective representations, which for \(SO(n)\) are in one-to-one correspondence with ordinary representations of the associated ‘spin groups’: \(Spin(3) = SU(2)\).
Definition 2.13. The spinor representation is the \(\mathbf{2}\) of \(SU(2)\), and objects transforming in this representation are called spinors (of \(SO(3)\)). The covering group \(SU(2)\) of \(SO(3)\) is likewise called the ‘spin group’ \(Spin(3)\).
REMARK: We saw earlier how to map \(SU(2)\) to \(SO(3)\). For the element of \(SU(2)\) of the form \[g_{SU(2)} = \begin{pmatrix} e^{i \phi/2} & 0 \\ 0 & e^{-i\phi/2} \end{pmatrix}\] the corresponding element in \(SO(3)\) was \[g_{SO(3)} = \begin{pmatrix} \cos ( \phi) & \sin ( \phi) & 0 \\ -\sin ( \phi) & \cos ( \phi) & 0 \\ 0 & 0 & 1 \end{pmatrix}\] Let us assume we are performing a rotation by \(360^\circ\) using the (usual) rotation group \(SO(3)\) in \(\mathbb{R}^3\), i.e. we let \(\phi\) go from \(0\) to \(2 \pi\) in the above matrix \(g_{SO(3)}\). In the corresponding \(SU(2)\) matrix \(g_{SU(2)}\) this takes us from \(\mathds{1}\) to \(-\mathds{1}\), i.e. we do not come back to where we started from, and need to let \(\phi\) go from \(0\) to \(4 \pi\) to return to \(\mathds{1}\). In this sense, a spinor needs to be rotated by \(720^\circ\) for a full rotation!
For more general Lie groups such as \(SU(n)\), you will not be surprised to hear that there is a richer representation theory. We know already two representations: the fundamental and the adjoint. Instead of developing the general theory, we will only try to sketch how one might go about creating such representations. Given a representation of a group \(G\), this implies representations of any subgroup \(H\) by simply restricting the homomorphism \(r:G \rightarrow GL(V)\) to \(H \subset G\). Every group \(SU(n)\) for \(n>2\) contains many copies of \(SU(2)\) as subgroups, and we have seen how we could construct representations of \(SU(2)\) using the operators \(H, L_+,L-\). This motivates to try and lift the method used for \(SU(2)\) to that of \(SU(n)\) by writing the (complexified) Lie algebra of \(\mathfrak{su}(n)\) in terms of number operators \(H_i\), lowering operators and raising operators. This is called a ‘Cartan-Weyl basis’ and leads to what are called ‘root systems’ which can in turn be used to classify certain classes of Lie algebras. Such a root system is shown in figure 2.
Definition 2.14. Given two vector spaces \(V\) and \(W\) we can form their tensor product \(V \otimes W\). Let \(\boldsymbol{e}_i\), \(i=1.. \dim V\), be a basis of \(V\) and \(\boldsymbol{f}_j\), \(i=j.. \dim W\), be a basis of \(W\). Then \(V \otimes W\) is a vector space with basis consisting of tuples \((\boldsymbol{e}_i,\boldsymbol{f}_j)\) (also written as \(\boldsymbol{e}_i \otimes \boldsymbol{f}_j\)).
REMARK: It follows from the definition that \(\dim V \otimes W = \dim V \cdot \dim W\). Computing with tensor products works almost the same as with usual products, we have \[\begin{aligned} \boldsymbol{v} \otimes \boldsymbol{w} + \boldsymbol{v}' \otimes \boldsymbol{w} &= (\boldsymbol{v} + \boldsymbol{v}') \otimes \boldsymbol{w} \\ \boldsymbol{v} \otimes \boldsymbol{w} + \boldsymbol{v} \otimes \boldsymbol{w}' &= \boldsymbol{v} \otimes (\boldsymbol{w}+ \boldsymbol{w}'), \end{aligned}\] and for \(c \in \mathbb{R}\) (or \(\mathbb{C}\)) \[c (\boldsymbol{v} \otimes \boldsymbol{w}) = (c\boldsymbol{v}) \otimes \boldsymbol{w} = \boldsymbol{v} \otimes (c\boldsymbol{w}) \, .\] However \(\boldsymbol{v} \otimes \boldsymbol{w} \neq \boldsymbol{w} \otimes \boldsymbol{v}\): the first slot is reserved for vectors from \(V\) and the second for vectors from \(W\), so writing \(\boldsymbol{w} \otimes \boldsymbol{v}\) does not even make sense if \(\boldsymbol{v} \in V\) and \(\boldsymbol{w} \in W\). Not every vector in \(V \otimes W\) can be written as a product, e.g. \(\boldsymbol{v} \otimes \boldsymbol{w} + \boldsymbol{v}' \otimes \boldsymbol{w}'\) for \(\boldsymbol{v} \neq \boldsymbol{v}'\) and \(\boldsymbol{w} \neq \boldsymbol{w}'\).
Example 2.10. Consider \(\mathbb{R}^3 \otimes \mathbb{R}^3\) and let \(e_1,e_2,e_3\) be a basis of the first \(\mathbb{R}^3\) and \(f_1,f_2,f_3\) of the second. A basis of \(\mathbb{R}^3 \otimes \mathbb{R}^3\) is then \[\begin{aligned} e_1 \otimes f_1, e_1 \otimes f_2, e_1 \otimes f_3, \\ e_2 \otimes f_1, e_2 \otimes f_2, e_2 \otimes f_3, \\ e_3 \otimes f_1, e_3 \otimes f_2, e_3 \otimes f_3, \end{aligned}\] whereas \(\mathbb{R}^3 \oplus \mathbb{R}^3\) has a basis \[e_1,e_2,e_3, f_1,f_2,f_3\, .\] Note that whereas \(\mathbb{R}^3 \oplus \mathbb{R}^3\) is six-dimensional, \(\mathbb{R}^3 \otimes \mathbb{R}^3\) is 9-dimensional. Note that you can naturally think of \(\mathbb{R}^3 \otimes \mathbb{R}^3\) as the vector space of real \(3 \times 3\) matrices: we can write any element of \(\mathbb{R}^3 \otimes \mathbb{R}^3\) as \[\boldsymbol{v} = \sum_{ij} a_{ij} e_i \otimes f_j \, .\]
What makes tensor products interesting in the present context is that we can form new representations out of old ones by tensoring the vector spaces they act on.
Example 2.11. \(\mathbf{2} \otimes \bar{\mathbf{2}}\)[ex:2x2] Let’s say you have a vector space \(\mathbb{C}^2\) that ‘lives’ in the fundamental representation of \(SU(2)\), and one \(\mathbb{C}^2\) that lives in the anti-fundamental and you form their tensor product. The question we are asking is: how does \(SU(2)\) act on the tensor product? For a vector in \(\mathbb{C}^2\) in the fundamental we have \[\boldsymbol{z} \rightarrow g \boldsymbol{z}\] and in the antifundamental \[\boldsymbol{z} \rightarrow \bar{g} \boldsymbol{z} \, .\] This is how we would things down using a chosen fixed basis, \(\boldsymbol{v} = (z_1,z_2)\), so we might also write this more abstractly as (for the first case): \[v = \sum_i z_i e_i \rightarrow g_{ij} z_j e_i\] We can think of this as either acting with \(g\) on \(\boldsymbol{z}\) (this is called the active interpretation) or as acting with \(g^T\) on the tuple of basis vectors (this is called the passive interpretation): \[e_j \rightarrow g_{ij} e_i = g_{ji}^T e_i\] i.e. \[\begin{pmatrix} e_1 \\ e_2 \end{pmatrix}\rightarrow g^T \begin{pmatrix} e_1 \\ e_2 \end{pmatrix} \, .\] We can use either if we like it, and this will help us to figure out how to act on elements of \(\mathbb{C}^2 \otimes \mathbb{C}^2\). As the first copy transforms with \(g\) and the second with \(\bar{g}\) we have \[v = \sum_i a_{ij}\, e_i \otimes f_j \rightarrow \sum_{ijkl} a_{ij}\,\, \left( g_{ki} e_k \right)\, \otimes \, \left( \bar{g}_{lj} f_l \right) = \sum_{ijkl} g_{ki} a_{ij} \bar{g}_{lj} \,\, e_k \otimes f_l\] so in summary the vectors in \(\mathbb{C}^2 \otimes \mathbb{C}^2\) behave as \[a_{ij} \rightarrow \sum_{kl} g_{ik} a_{kl} g^\dagger_{lj}\, ,\] i.e. if we collect the \(a_{ij}\) in a matrix \(A\) we get \[A \rightarrow g A g^\dagger \, .\]
We can repeat the same logic to find out how arbitrary representations acting on vector spaces \(V\) and \(W\) act on \(V \otimes W\):
Definition 2.15. Let \(r_V(G) \in GL(V)\) and \(r_W(G) \in GL(W)\), and let the components of these matrices be \(r_V(G)_{ij}\) and \(r_W(G)_{ab}\). Then the tensor product representation \(r_{V \otimes W}\) acts on a vector \(\boldsymbol{U} \in V \otimes W\) with components \(U_{ia}\) as \[U_{ia}' := r_V(G)_{ij} r_W(G)_{ab} \,\, U_{jb} \, .\]
Example 2.12. \(\mathbf{2} \otimes \bar{\mathbf{2}} = \mathbf{1} \oplus \mathbf{3}\)
Continuing example [ex:2x2] we know that we can decompose the representation acting on \(\mathbb{C}^2 \otimes \mathbb{C}^2\) into irreducible representations. But which ones? As we have seen, \(SU(2)\) acts on \(a_{ij}\) as \[A \rightarrow A'\, \hspace{1cm} a_{ij}' = g_{ik} a_{kl} (g^\dagger)_{lj}\] or in matrix notation \[A \rightarrow A' = g A g^\dagger\, .\]
The trace of \(A\) hence transforms as \[\begin{equation} \label{eq2t2bar_tr} trA \rightarrow trg A g^{-1} = trA \, . \end{equation}\] Now what this implies is that the representation \(\mathbf{2} \otimes \bar{\mathbf{2}}\) is reducible, as we can never map matrices with a vanishing trace to ones with a non-vanishing trace. Let’s try to understand this a bit more clearly. The matrices \(A\) have the form \[A = \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}\] and we think of the four complex components \(a_{ij}\) as components of a vector in a vector space \(V\) isomorphic to \(\mathbb{C}^4\) that we chose to write as a matrix. Within this vector space there is a complex three-dimensional vector subspace \(W\) defined by \(a_{11} + a_{22} = 0\), and as \(\eqref{eq2t2bar_tr}\) shows, the group action on \(V\) maps vectors in \(W\) again to vectors in \(W\), i.e. \(W\) is an invariant subspace. More concretely, \(W\) is the subspace of matrices of the form \[W = \left \{ A \left| A = \begin{pmatrix} z_1 & z_2 \\ z_3 & -z_1 \end{pmatrix}\right. , (z_1,z_2,z_3) \in \mathbb{C}^3\right\}\, .\] You might want to convince yourself that this is indeed a vector subspace. Similarly \(W^\perp\) is the one-dimensional subspace containing matrices of the form \[W ^\perp= \left \{ A \left| A = \begin{pmatrix} z_4 & 0 \\ 0 & z_4 \end{pmatrix}\right. , z_4 \in \mathbb{C}\right\}\, .\] which again form an invariant subspace under the group action \(\eqref{eq2t2bar_tr}\). The inner form under which this is \(^\perp\) is just the standard inner form on \(\mathbb{C}^4\), which we can write as \(\langle A, A' \rangle = \sum_{i,j} \bar{a}_{ij} a_{ij}'\) using two matrices \(A,A'\). Also note that for any \(A\) we can write \[A = \begin{pmatrix} z_1 & z_2 \\ z_3 & -z_1 \end{pmatrix} + \begin{pmatrix} z_4 & 0 \\ 0 & z_4 \end{pmatrix} \, .\] The above shows that the representation \(\mathbf{2} \otimes \bar{\mathbf{2}}\) is not irreducible, but decomposes into a one-dimensional and a three-dimensional complex representation, i.e. we can write \(\mathbf{2} \otimes \bar{\mathbf{2}} = \mathbf{1}^\perp \oplus \mathbf{1}\). The only remaining thing to show is hence that \(\mathbf{1}^\perp\) transforms in the \(\mathbf{3}\) of \(SU(2)\). The action here is the same as the adjoint representation of \(SU(2)\), except that we are acting on a complex vector space of dimension three instead of a real one. The irreducibility of the adjoint representation implies that there is no invariant complex subspace if we act on \(\mathbb{C}^3\) instead of \(\mathbb{R}^3\), so this is the \(\mathbf{3}\) of \(SU(2)\).
2.11. Consider the representation \(\mathbf{n} \otimes \bar{\mathbf{n}}\) of \(SU(n)\). Explain why this is always reducible. Can you identify the irreducible representations and invariant subspaces?
Proposition 2.4. \(\mathbf{2} \otimes \mathbf{2} = \mathbf{1} \oplus \mathbf{3}\) : .
2.12.
Find the transformation of elements of \(\mathbf{2} \otimes \mathbf{2}\).
Show that the representations \(\mathbf{2}\) and \(\bar{\mathbf{2}}\) are isomorphic by showing they are related by a change of basis \[\boldsymbol{z}' = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix}\boldsymbol{z}\] [Note: of course, \(\bar{v}\) transforms also as \(\bar{v} \rightarrow \bar{g} \bar{v}\) if \(v \rightarrow g v\). In a complex vector space, complex conjugation is not a change of basis however!]
Use the above to argue that \(\mathbf{2} \otimes \mathbf{2} = \mathbf{1} \oplus \mathbf{3}\). Can you identify the invariant subspaces?
REMARK: More generally, tensor products can be decomposed into irreducible representations, i.e. we may write \[r_{V \otimes W}(G) = \oplus_i r_{V_i}(G)\] whenever we know any representation of \(G\) can be decomposed into irreducible representations. The change of basis relating the natural basis of the tensor product to a basis showing the decomposition on the right hand side of the above equation is a well-known problem, and the coefficients appearing in the change of basis are called ’Clebsch-Gordan coefficients’. Details on this can be found in most books on quantum mechanics or , , and for the more mathematically minded.
REMARK: \(\ast\) There are a number of examples in physics in which \(\mathbf{2} \otimes \bar{\mathbf{2}} = \mathbf{1} \oplus \mathbf{3}\) and \(\mathbf{2} \otimes \mathbf{2} = \mathbf{1} \oplus \mathbf{3}\) plays an important role in organzing degrees of freedom of a theory. Two important ones are explained below:
spin \(\tfrac12\)
The rotation group in 3D is the group \(SO(3)\). As we have seen, \(SO(3) = SU(2)/ \mathbb{Z}_2\), and it turns out that the relevant group describing the behavior of rotations acting on fermions (such as electrons, protons, etc ..) is in fact is \(SU(2)\) and not \(SO(3)\). The relation between usual rotations and maps in \(SU(2)\) is exactly given by the homomorphism we constructed earlier, more on this topic will be discussed later. This means we can write down the wavefunction of a fermion as \[\psi = \begin{pmatrix}
\psi_+ \\
\psi_-
\end{pmatrix}\] which lives in the \(\mathbf{2}\) representation of \(SU(2)\) under rotations. If you have a system composed of two fermions \(\psi_1\) and \(\psi_2\), the total wavefunction \(\Psi\) is a tensor product of the two wavefunctions17 \[\Psi = \psi_i \otimes \psi_2\] and hence lives in the \(\mathbf{2} \otimes \mathbf{2}\). Decomposing this into irreducible representations, we find a singlet and a triplet of wavefunctions. This is what physicists sometimes call ‘addition of angular momentum’ and it explains why a Helium atom or Positronium have a singlet (‘para-’) or triplet (‘ortho-’) behavior under rotations. Note that the triplet we found behaves just as the adjoint of \(SU(2)\), which corresponds to the usual defining (‘vector’) representation of \(SO(3)\).
quarks
The two lightest quarks are the up and the down quark. The have nearly identical masses, but differ in their electric charges. The can form bound states which are called Baryons, and many different of these were found since the 40s. Until the discovery of quarks, people were bewildered by how many there were and how to organize them into some sort of pattern. The simplest bound states, called mesons, contain only two quarks. The force binding them together is the strong nuclear force which is a lot stronger than electromagnetism, and from the perspective of the strong force, the up and the down quark look identical if we forget the small mass difference. We can combine their wavefunctions into one \[\psi_q = \begin{pmatrix}
\psi_u \\
\psi_d
\end{pmatrix}\] and the statement that they are identical in strong interactions means there is an \(SU(2)\) symmetry acting in the \(\mathbf{2}\) on \(\psi_q\). Because this looks exactly like the \(SU(2)\) action of rotations on fermions, it was called ‘iso-spin’ which is a terrible name there is no other relation to spin than this. But this means that bound states of e.g. a quark and an anti-quark transform in the \(\mathbf{2} \otimes \bar{\mathbf{2}} = \mathbf{1} \oplus \mathbf{3}\) representation. The triplet are called ‘pions’ \((\pi^+, \pi^-, \pi^0)\).
In fact, there is a third quark called the strange quark which also has (almost) the same mass as up and down. This enhances the \(SU(2)\) to an \(SU(3)\) and we should be studying \(\mathbf{3} \oplus \bar{\mathbf{3}} = \mathbf{1} \oplus \mathbf{8}\) for mesons and \[\mathbf{3} \oplus \mathbf{3} \oplus \mathbf{3} = \mathbf{1} \oplus \mathbf{8} \oplus \mathbf{8} \oplus \mathbf{10}\] for Baryons which are made up of three quarks.
Historically, this way of thinking was in fact used to motivate the existence of quarks by Murray Gell-Mann and Yuval Ne’eman in 1961 as they saw that observed particles could be fit into this pattern. They called it the ‘Eightfold way’ in a nod to Buddhism and since the adjoint of \(SU(3)\) is eight-dimensional. However, one particle in the \(\mathbf{10}\) had not been seen in experiments yet, so they predicted it. It is now called the \(\Omega^-\) and was discovered in 1964 which among other things earned Gell-Mann a Nobel price in 1969. If you want to read more about this story, is a good starting point.
The Lorentz group is one of the most important examples of a Lie group appearing in physics. It arises in a very similar way to most of the groups we have discussed so far as a symmetry group that respects some quadratic form, in this case the ‘invariant length’ of special relativity. A detailed account of many elementary aspects of the Lorentz group can be found e.g. in
The fundamental postulate of relativity is that the speed of light is the same in all inertial frames. Let us take two points \(p\) and \(q\) in space-time through which a ray of light passes and assume that they have coordinates \(t_p,\boldsymbol{x}_p\) and \(t_q,\boldsymbol{x}_q\) in one inertial frame, and coordinates \(t_p',\boldsymbol{x}_p'\) and \(t_q',\boldsymbol{x}_q'\) in another. We hence need \[c^2 = (\boldsymbol{x}_p - \boldsymbol{x}_q)^2/(t_p-t_q)^2 = (\boldsymbol{x}_p' - \boldsymbol{x}_q')^2/(t_p'-t_q')^2\] In other words \[- c^2(t_p-t_q)^2 + (\boldsymbol{x}_p - \boldsymbol{x}_q)^2 = 0\] must be invariant under a change of frames. It is not hard to come up with coordinate transformation that satisfy this requirement, e.g a rotation \(\in SO(3)\) acting purely on the coordinates \(\boldsymbol{x}\) works. If time is involved in our coordinate change, we need to take the relative minus sign into account. An example would be acting the matrix \[\begin{equation} \label{eq:simp_boost} \Lambda_{01} = \begin{pmatrix} \cosh (\lambda) & -\sinh (\lambda) & 0 & 0 \\ -\sinh (\lambda) & \cosh (\lambda) & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}\, . \end{equation}\] This keeps \(-(ct)^2 + x_1^2\) invariant as \[\begin{aligned} -(ct)^2 + x_1^2 &\rightarrow -(ct')^2 + (x_1')^2 \\ &=- (\cosh (\lambda) \,\, ct - \sinh(\lambda)\,\, x_1 )^2 + (-\sinh (\lambda)\,\, c t + \cosh(\lambda)\,\, x_1 )^2 \\ & = -(ct)^2 (\cosh^2 (\lambda) - \sinh^2 (\lambda)) + x_1^2 (\cosh^2 (\lambda) - \sinh^2 (\lambda))\\ &= -(ct)^2 + x_1^2 \end{aligned}\] as \(\cosh^2 (\lambda) - \sinh^2 (\lambda)=1\) for any \(\lambda\) (this is the hyperbolic analogue of \(\cos^2\phi + \sin^2\phi =1\)).
Note that the origin of the primed system at \(x_1'=0\) satisfies \[-\sinh (\lambda)\,\, c t + \cosh(\lambda)\,\, x_1 = 0\] so that it moves in the unprimed system with a velocity \[v = x_1/t = c \, \frac{\sinh(\lambda)}{\cosh(\lambda)} = c \tanh(\lambda) = c \,\, \frac{e^\lambda - e^{-\lambda}}{e^\lambda + e^{-\lambda}} < c\, .\] For this reason \(\lambda\) is called rapidity in the literature. Note that for every \(\lambda\), this speed is always less that the speed of light. Instead of using such transformations to figure out time dilation, length contraction, etc ... we are going to examine the structure of
Definition 3.1. The Lorentz group \(L\) is the group of linear maps on \(\mathbb{R}^4\) (with coordinates \((x^0,x^1,x^2,x^3)\)) that preserve the quadratic form \[|x|_M^2 \equiv - (x^0)^2 + (x^1)^2 + (x^2)^2 + (x^3)^2\]
REMARK: \(\mathbb{R}^4\) with this quadratic form is also often called \(\mathbb{R}^{1,3}\) or ‘Minkowski space’. It is then appropriate to call the Lorentz group \(O(1,3)\). We have already learned that the principle of relativity is obeyed by (at least) two types of transformations: rotations in \(\mathbb{R}^3\) which leave time untouched, and boosts such as \(\eqref{eq:simp_boost}\) which mediate between relatively moving systems.
Note that \(|x|_M^2\) is not an inner form as it is not positive definite.
For two coordinate systems with relative velocity \(\boldsymbol{v}\) the coordinate change is
Definition 3.2. A boost associated with two relatively moving inertial frames with relative speed \(\boldsymbol{v}\) is a Lorentz transformation \(B\) with \(B(\boldsymbol{v})^0_{\,\,0} = \cosh{\lambda}\), \(B(\boldsymbol{v})^i_{\,\,0} = B(\boldsymbol{v})^0_{\,\,i} = -v^i/c \,\,\cosh{\lambda}\), and \[\begin{equation} \label{eq:boostij} B(\boldsymbol{v})^i_{\,\,k} = \delta^{i}_{\,\,k} + \frac{(\cosh \lambda)^2}{1+ \cosh \lambda} \frac{v^i v^k}{c^2}\, . \end{equation}\] where \(\tanh \lambda = |\boldsymbol{v}|/c\).
In order to facilitate the book-keeping of the minus sign in this definition18the following notation is in widespread use. Define \((x^0,x^1,x^2,x^3) = (ct,x,y,z)\) as the ‘four-vector’ of coordinates combining spatial coordinates and time. Define \[x_\mu \equiv \eta_{\mu \nu} x^\nu\] where \(\eta_{\mu \nu}\) are the components of the diagonal matrix \[\eta = \begin{pmatrix} -1 & & & \\ & 1 & & \\ & & 1 & \\ & & & 1 \end{pmatrix}\,\] and we are using the summation convention. The inverse of the matrix \(\eta\) is clearly \(\eta\) again, we need to put the indices up as this satisfies \[x^\mu = \eta^{\mu \nu} x_\nu\] where \(\eta^{\mu \nu} = \left(\mbox{diag}(-1,1,1,1)\right)^{\mu \nu}\). Note that \[\eta_{\mu \nu} \eta^{\nu \rho} = \delta_\mu{}^\rho \, ,\] \(\delta_\nu{}^\rho\) is the usual Kronecker delta which is \(1\) if both indices are equal and zero otherwise. We can hence write the length \(|x|_M\) of a vector in Minkowski space as \[|x|_M^2 = x^\mu x^\nu \eta_{\mu \nu} = x_{\mu} x^{\mu} = x_\mu x_\nu \eta^{\mu \nu} \, .\]
Let \(\Lambda\) have components \(\Lambda^\mu_{\,\,\nu}\) and assume \(\Lambda\) linearly maps a 4-vector \(\boldsymbol{x}\) to a 4-vector \(\boldsymbol{x}'\) \[x^\mu{}' = \Lambda^\mu_{\,\,\sigma} x^\sigma \, .\] Now if \(\Lambda\) is in the Lorentz group we need \(|x'|_M^2 = |x|_M^2\), i.e. \[|x'|_M^2 = \Lambda^\mu_{\,\,\sigma} x^\sigma \Lambda^\nu_{\,\,\rho} x^\rho \eta_{\mu \nu} =
x^\sigma x^\rho \Lambda^\mu_{\,\,\sigma} \eta_{\mu \nu} \Lambda^\nu_{\,\,\rho}=
x^\mu x^\nu \eta_{\mu \nu} \, .\] In other words \[\Lambda^\mu_{\,\,\sigma} \Lambda^\nu_{\,\,\rho}\eta_{\mu \nu} = \eta_{\sigma \rho}\] or in matrix notation \[\Lambda^T \eta \Lambda = \eta \, \Rightarrow \eta \,\Lambda^T \eta = \Lambda^{-1}\] Up to the insertion of \(\eta\)s \(\Lambda^T\) is hence the same as \(\Lambda^{-1}\). Note that we have the transformation behaviour \[\begin{aligned}
x^\mu & \rightarrow x'^\mu = \Lambda^\mu_{\,\, \nu} x^\nu \\
x_\mu = \eta_{ \mu \rho }x^\rho & \rightarrow x'_\mu = \eta_{ \mu \rho } x'^\rho = \eta_{ \mu \rho } \Lambda^\rho_{\,\, \nu} x^\nu
= \eta_{ \mu \rho } \Lambda^\rho_{\,\, \nu} \eta^{\nu \sigma} x_\sigma = x_\sigma (\eta \Lambda^T \eta)^\sigma_{\,\,\mu} = x_\sigma (\Lambda^{-1})^\sigma_{\,\,\mu}
\end{aligned}\] Thats how it had to be, as we constructed Lorentz transformations in such a way that \(x_\mu x ^\mu\) is invariant!
Objects \(x^\mu\) transforming as above are called ‘Lorentz vectors’. Objects transforming like \(x_\mu\) are called ‘Lorentz covectors’. We can think of the matrix \(\eta\) as a map which sends every vector to a covector and vice-versa.
Whenever we contract upper and lower indices, we hence get something that is invariant under the Lorentz group. By extension, it is customary to put upper/lower indices on objects that have the same transformation behaviour as \(x^\mu\) and \(x_\mu\). The same rule for constructing invariants then exists there as well. The positioning of indices hence serves as a book keeping device for the transformation behaviour and consequently for the constructing of Lorentz scalars, i.e. invariant quantities.
3.1. Consider a Lorentz vector with components \(x^\mu\), which transforms under Lorentz transformations as \[x^\mu \rightarrow x'^{\mu} = \Lambda^\mu_{\,\,\, \nu} x^\nu \, .\] Note that throughout this problem we are using summation convention.
Let \(f^{\mu \nu} \equiv x^\mu x^\nu\). Find the transformation behavior of \(f^{\mu \nu}\), \(f^{\mu}_{\,\,\,\, \nu}=x^\mu x_\nu\) and \(f_{\mu \nu}= x_\mu x_\nu\) under Lorentz transformations.
For another Lorentz vector \(y^\mu\), find the transformation behavior of \(f^{\mu \nu} y_\mu\) under Lorentz transformations.
Compute \[\sum_\mu \frac{\partial}{\partial x^\mu} x^\mu \, .\]
Work out the transformation behavior of \[\frac{\partial}{\partial x^\mu}\] under Lorentz transformations. Use c) to argue for the same result.
Let us now examine the global structure of the Lorentz group \(L\). Clearly, the determinant of \(\Lambda\) is \(\pm 1\), so that we get two disconnected components \(L_\pm\), just as for \(SO(3)\). The component \(L_+\) that is connected to the identity is called proper Lorentz group. Furthermore the \((0,0)\) component of \(\eta \Lambda^T \eta \Lambda = \mathds{1}\) implies \[\begin{equation} \label{eq:1strowoflambda} 1 = \left(\Lambda^{0}_{\,\, 0 }\right)^2 - \left(\Lambda^{0}_{\,\, 1 }\right)^2 - \left(\Lambda^{0}_{\,\, 2 }\right)^2 - \left(\Lambda^{0}_{\,\, 3 }\right)^2 \, , \end{equation}\] so that \(\left( \Lambda^{0}_{\,\, 0}\right) ^2 \geq 1\) which has again two components:
where \(\Lambda^{0}_{\,\, 0} \geq 1\) are called the orthochronous Lorentz transformations.
where \(\Lambda^{0}_{\,\, 0} \leq -1\) are called the non-orthochronous Lorentz transformations.
The orthochronous transformations keep the arrow of time pointing in the same direction. Altogether we hence have four components. The maps \(\Lambda_T =\mbox{diag}(-1,1,1,1)\) (time reversal) and \(\Lambda_P =\mbox{diag}(1,-1,-1,-1)\) (parity) generate the whole group together with \(L_+^\uparrow\): we can use \(\Lambda_T\), \(\Lambda_P\) and \(\Lambda_T \Lambda_P\) to map any group element to \(L_+^\uparrow\), which implies we can write any group element in \(L\) as a product of \(\Lambda \in L_+^\uparrow\) with \(\Lambda_T^a \Lambda_P^b\) for \(a,b \in (0,1)\).
The component of \(L\) that is continously connected to the identity is the proper orthochronous Lorentz group \(L_+^\uparrow\). \(L_+^\uparrow\) admits the following decomposition
Theorem 3.1. \(^\ast\) Every proper orthochronous Lorentz transformation \(\Lambda \in L_+^\uparrow\) has a unique decomposition as \[\Lambda = B(\boldsymbol{v}) \begin{pmatrix} 1 & \\ & R \end{pmatrix}\] where \(B(\boldsymbol{v})\) is a boost with parameter \[v^i/c = \Lambda^{i}_{\,\, 0} / \Lambda^{0}_{\,\, 0}\] and \(R\) is an element of \(SO(3)\) given by \[R^{ik} = \Lambda^i_{\,\, k} - \frac{1}{1+\Lambda^0_{\,\, 0}} \Lambda^i_{\,\, 0} \Lambda^0_{\,\, k}\, .\]
: First of all, it follows from \(\eqref{eq:1strowoflambda}\) that \(\sum_i (\Lambda^{i}_{\,\, 0} / \Lambda^{0}_{\,\, 0})^2 < 1\) as \[\sum_i (\Lambda^{i}_{\,\, 0} / \Lambda^{0}_{\,\, 0})^2 = \frac{(\Lambda^0_{\,\,0})^2-1}{(\Lambda^0_{\,\,0})^2} < 1 \, .\] A boost associated to the speed \(\boldsymbol{v}/c\) hence makes sense. From definition Definition 3.2 above it follows that \(B^0_{\,\, 0} (\boldsymbol{v}) = \cosh \lambda = \Lambda^0_{\,\, 0}\) and \(B^0_{\,\, i} (\boldsymbol{v}) = -v^i/c \cosh \lambda = \Lambda^0_{\,\, i}\). Hence \[B^i_{\,\, j}(\boldsymbol{v}) = \delta^i_{\,\, j} + \frac{1}{1+\Lambda^0_{\,\,0}} \Lambda^0_{\,\,i}\Lambda^0_{\,\,j}\] using \(\eqref{eq:boostij}\). We now show that \[\mathcal{R} := B(-\boldsymbol{v}) \Lambda = B^{-1}(\boldsymbol{v}) \Lambda\] is indeed a rotation and \(\mathcal{R} = 1 \oplus R\), which finishes the proof. We work out \[\begin{aligned}
\mathcal{R}^{0}_{\,\, 0} &= (\Lambda^0_{\,\,0})^2 - \sum_i (\Lambda^i_{\,\,0})^2 = 1\\
\mathcal{R}^{0}_{\,\, i} &= \Lambda^0_{\,\,0}\Lambda^0_{\,\,i} - \sum_j \Lambda^j_{\,\,0}\Lambda^j_{\,\,i} = 0 \\
\mathcal{R}^{i}_{\,\, k} &= \Lambda^i_{\,\, k} - \frac{1}{1+\Lambda^0_{\,\, 0}} \Lambda^i_{\,\, 0} \Lambda^0_{\,\, k}
\end{aligned}\, .\] Here we used \(\Lambda^T \eta \Lambda = \eta\) repeatedly. This is a rotation with the right block-diagonal structure as claimed. \(\square\)
To understand the global structure of \(L_+^\uparrow = SO(1,3)_+\), we can repeat the trick we used when describing the relationship between \(SO(3)\) and \(SU(2)\). For a 4-vector \((x^0,x^1,x^2,x^3)\) we write it as a matrix \(M_x\) with \(M_x^\dagger = M_x\): \[M_x := \begin{pmatrix} x^0 + x^3 & x^1 - ix^2 \\ x^1 + i x^2 & x^0-x^3 \end{pmatrix}\, .\] We can now formulate a map \(SL(2,\mathbb{C}) \rightarrow L\) by sending \(g \in SL(2,\mathbb{C})\) \[g \rightarrow F(g) \hspace{1cm} F(g) M_x := g M_x g^\dagger \, .\]
Proposition 3.1. \(F(g)\) is a surjective group homomorphism from \(SL(2,\mathbb{C})\) to \(L_+^\uparrow\).
:
3.2. .
Show that \(F\) is a surjective homomorphism from \(SL(2,\mathbb{C})\) to \(L_+^\uparrow\).
hint: Try to follow a similar logic as for the homomorphism from \(SU(2)\) to \(SO(3)\) studied before. You can take for granted that \(SL(2,\mathbb{C})\) is connected.
For a rotation in the \(x^1,x^2\)-plane, find the element \(g \in SL(2,\mathbb{C})\) that is mapped to it by \(F\). Repeat the same for a boost along the \(x^1\) direction.
Finally, we can work out the Lie algebra of the Lorentz group. As we have seen, a general Lorentz transformation is uniquely given in terms of an element of \(SO(3)\) (which is real three-dimensional) and a boost (which is parametrized by a real three-dimensional vector \(\boldsymbol{v}\)). We hence conclude that the Lorentz group is a real six-dimensional manifold. This fits with the fact that a real \(4 \times 4\) matrix has \(16\) components and \(\Lambda^T \eta \Lambda = \eta\) imposes \(10\) independent constraints. Using rotation and boost matrices like \(\eqref{eq:simp_boost}\) with parameters gives us paths in the group, and we find that the Lie algebra is generated by the six matrices \[\begin{aligned} l^{01} = \begin{pmatrix} 0 & -1 & 0 & 0 \\ -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \end{pmatrix}\,\,\, l^{02} = \begin{pmatrix} 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & 0 \\ -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \end{pmatrix} \,\,\, l^{03} = \begin{pmatrix} 0 & 0 & 0 & -1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ -1 & 0 & 0 & 0 \\ \end{pmatrix} \\ l^{12} = \begin{pmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \end{pmatrix}\,\,\, l^{13} = \begin{pmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ \end{pmatrix} \,\,\, l^{23} = \begin{pmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & -1 & 0 \\ \end{pmatrix} \end{aligned}\] These can be summarized by \[\begin{equation} \label{eq:ellcomponents} (l^{\mu\nu})^\alpha_{\,\, \beta} = \eta^{\mu \alpha} \delta^\nu_{\,\, \beta} - \eta^{\nu \alpha} \delta^\mu_{\,\, \beta} \, . \end{equation}\] Note that \(\mu\) and \(\nu\) in the equation above label different elements of the Lie algebra, and \(\alpha,\beta\) are the components of the corresponding matrix.
3.3. Verify that the matrices above are elements in the Lie algebra of the Lorentz group.
After a slightly tedious computation one finds that they obey the Lie algebra \[= -\eta^{\mu \rho} l^{\nu \sigma} -\eta^{\nu \sigma} l^{\mu \rho} + \eta^{\mu \sigma} l^{\nu \rho} + \eta^{\nu \rho} l^{\mu \sigma}\]
Let us now investigate representations of the Lorentz group. We have already seen the defingin representation: \[x^\mu \rightarrow \Lambda^\mu_\nu x^\nu\] with \[\Lambda^T \eta \Lambda = \eta\] so that \[x^\mu x_\mu = x^\mu \eta_{\mu \nu} x^\nu = - (x^0)^2 + (x^1)^2 + (x^2)^2 + (x^3)^2\] stays invariant. Now we will ask about other representations of this group. Note that \(SO(3)\) is a subgroup of \(L_+^\uparrow\), and that the fundamental representation of its spin group, \(SU(2)\), had physical significance as a spinor.
As \(SO_+(1,3)= L_+^\uparrow\) has \(SL(2,\mathbb{C})\) as a double covering group (Proposition Proposition 3.1), so it will not be suprising if we make the
Definition 3.3. The group \(Spin(1,3)\) is equal to the group \(SL(2,\mathbb{C})\).
And it is again a fact of life that what matters to describing relativistic processes in the real world, are representations of \(SL(2,\mathbb{C}) = Spin(1,3)\) instead of representations of \(L\).
For \(SO(3)\) we found irreducible representations by using Lie algebra of \(SO(3)\), which is the same as the Lie algebra of \(SU(2)\). Not all representations of this algebra descended to representations of \(SO(3)\), but the extra representations we found were exactly the ‘spin 1/2’ spinorial representations of \(SU(2)\) of physical significance. We can use a similar strategy here, which leads us to what are called spinors of the Lorentz group. Our presentation of spinors mostly follows , see also . Note that these books use somewhat different convention however.
Recall the Lorentz algebra \[\begin{equation} \label{eq:Lalgebra}
[l^{\mu\nu},l^{\rho\sigma}] = -\eta^{\mu \rho} l^{\nu \sigma} -\eta^{\nu \sigma} l^{\mu \rho}
+ \eta^{\mu \sigma} l^{\nu \rho} + \eta^{\nu \rho} l^{\mu \sigma}\, . \end{equation}\]
Proposition 3.2. Let \(\gamma^\mu\), \(\mu=0,1,2,3\) be matrices that obey the algebra \[\{\gamma^\mu,\gamma^\nu\} := \gamma^\mu\gamma^\nu + \gamma^\nu\gamma^\mu = 2 \eta^{\mu\nu} \mathds{1}\, .\] Then we can construct a representation of the Lorentz algebra, \(\eqref{eq:Lalgebra}\), using the matrices \[S^{\mu \nu} := \tfrac14 [\gamma^\mu,\gamma^\nu]\, .\]
: we need to check that the \(S^{\mu \nu}\) satisfy the Lorentz algebra. First note that the relation \(\{\gamma^\mu,\gamma^\nu\} = 2\eta^{\mu \nu}\) implies that \[\gamma^\mu \gamma^\nu = -\gamma^\nu \gamma^\mu \hspace{1cm} \mbox{for} \,\,\ \mu \neq \nu\] and \[(\gamma^\mu)^2 = \eta^{\mu \mu} \mathds{1}\hspace{1cm} \mbox{(no summation)}\] We can now work out the commutator of \([S^{\mu \nu},S^{\rho \sigma}]\). First note that \(\mu \neq \nu\) and \(\rho \neq \sigma\) as the \(S\) otherwise vanish (as do the corresponding \(\ell\). Hence \(S^{\mu \nu} = \tfrac12 \gamma^\mu \gamma^\nu\) and \(S^{\rho \sigma} = \tfrac12 \gamma^\rho \gamma^\sigma\). Let us first assume that \(\mu,\nu,\rho,\sigma\) are all different. We get \[\begin{aligned}[] (\mu,\nu,\rho,\sigma \hspace{.2cm} \mbox{all different}):\\ [S^{\mu \nu},S^{\rho \sigma}] &= \frac{1}{4} \left( \gamma^\mu \gamma^\nu \gamma^\rho \gamma^\sigma - \gamma^\rho \gamma^\sigma {\color{red}\gamma^\mu} \gamma^\nu \right) \\ & \hspace{3.7cm} {\color{red}\swarrow} \\ &= \frac{1}{4} \left( \gamma^\mu \gamma^\nu \gamma^\rho \gamma^\sigma - {\color{red}\gamma^\mu}\gamma^\rho \gamma^\sigma {\color{blue}\gamma^\nu} \right) \\ & \hspace{4.1cm} {\color{blue}\swarrow} \\ &= \frac{1}{4} \left( \gamma^\mu \gamma^\nu \gamma^\rho \gamma^\sigma - \gamma^\mu {\color{blue}\gamma^\nu} \gamma^\rho \gamma^\sigma \right) = 0 \end{aligned}\] As the colours and arrows are supposed to show you, this looks more complicated than it is. All we have done in the second equality is swapped the \(\gamma^\mu\) with \(\gamma^\rho\) and \(\gamma^\sigma\), which produced two minus sign, hence no sign at all. In the third equality we did the same with \(\gamma^\nu\). This is the same as what \(\eqref{eq:Lalgebra}\) tells us.
Now we assume that \(\mu = \rho\) (note that there is no summation over \(\mu\) in the below expressions): \[\begin{aligned}
(\mu = \rho): & \\
& [S^{\mu \nu},S^{\rho \sigma}] &=& \frac{1}{16} \left[[\gamma^\mu,\gamma^\nu],[\gamma^\rho,\gamma^\sigma] \right]
= \frac{1}{16} \left[[\gamma^\mu,\gamma^\nu],[\gamma^\mu,\gamma^\sigma] \right] \\
&& =& \frac{1}{16}\left[2 \gamma^\mu \gamma^\nu,2\gamma^\mu \gamma^\sigma \right] =
\frac{1}{4} \left(\gamma^\mu \gamma^\nu \gamma^\mu \gamma^\sigma - \gamma^\mu \gamma^\sigma \gamma^\mu \gamma^\nu \right)\\
&& =& \frac{1}{4} \left(-(\gamma^\mu)^2\gamma^\nu \gamma^\sigma + (\gamma^\mu)^2\gamma^\sigma\gamma^\nu \right) =
-\eta^{\mu \mu} S^{\nu \sigma} \, .
\end{aligned}\] Here we only had to swap \(\gamma^\mu\) with \(\gamma^\nu\) in the first term and with \(\gamma^\sigma\) in the second term, each giving a minus sign. The final result is exactly what we find from \(\eqref{eq:Lalgebra}\) when \(\mu = \rho\). The remaining cases can be worked out analogously. \(\square\).
REMARK: Algebras of the type \(\{\gamma^a,\gamma^b \}= 2 \eta^{ab}\) where \(\eta^{ab}\) is a symmetric diagonal matrix with entries \(\pm 1\) are called ‘Clifford algebras’. We have already seen an example when discussing the Pauli matrices: the Pauli matrices obey a Clifford algebra generated by three elements with \(\eta^{ab}= \mbox{diag}(1,1,1)\).
When trying to find explicit examples of the four \(\gamma^\mu\) for \(\mu=0,1,2,3\) the above remark is useful hint. It turns out we need at least \(4 \times 4\) matrices, and one possible choice is
Definition 3.4. The Dirac matrices are \[\gamma^0 = \begin{pmatrix} 0 & \mathds{1}_{2 \times 2} \\ -\mathds{1}_{2 \times 2} & 0 \end{pmatrix}\, , \hspace{1cm} \gamma^i = \begin{pmatrix} 0 & \sigma_i \\ \sigma_i & 0 \end{pmatrix}\,\,\, i = 1,2,3\] where \(\mathds{1}\) is the \(2\times 2\) identity matrix and \(\sigma_i\) are the Pauli matrices \[\sigma_1 = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \,\,, \hspace{.3cm} \sigma_2 = \begin{pmatrix} 0 & -i \\ i & 0 \end{pmatrix} \,\,, \hspace{.3cm} \sigma_3 = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}\, .\] Note that the \(\gamma^\mu\) are \(4 \times 4\) matrices which we have written in a \(2 \times 2\) block structure using the \(2 \times 2\) Pauli matrices.
Proposition 3.3. The Dirac matrices obey \(\{\gamma^\mu,\gamma^\nu\} = 2 \eta^{\mu \nu} \mathds{1}_{4 \times 4}\)
:
3.4. .
Show that the Dirac matrices obey \(\{\gamma^\mu,\gamma^\nu\} = 2 \eta^{\mu \nu} \mathds{1}_{4 \times 4}\).
Show the ‘freshers dream’: \[\left(a_\mu \gamma^\mu\right)^2 = a_\mu a^\mu \mathds{1}_{4 \times 4}\]
REMARK: This is not the only realization one can write down (and not Dirac’s original matrices). The above version is often called the ‘Weyl’ or ‘chiral’ representation.
Proposition 3.4. Using the Dirac matrices, the algebra generators \(S^{\mu \nu}\) are \[S^{0i} = \tfrac12 \begin{pmatrix} \sigma_i & 0 \\ 0 & -\sigma_i \end{pmatrix}\,\,, \hspace{1cm} S^{jk} = \frac{i}{2} \epsilon_{jkl} \begin{pmatrix} \sigma_l & 0 \\ 0 & \sigma_l \end{pmatrix}\]
:
3.5. Using the Dirac matrices, check that the algebra generators \(S^{\mu \nu} = \tfrac14 [\gamma^\mu,\gamma^\nu]\) can be written as \[S^{0i} = \frac{1}{2} \begin{pmatrix} \sigma_i & 0 \\ 0 & -\sigma_i \end{pmatrix}\,\,, \hspace{1cm} S^{jk} = \frac{i}{2} \epsilon_{jkl} \begin{pmatrix} \sigma_l & 0 \\ 0 & \sigma_l \end{pmatrix}\, .\]
Definition 3.5. A vector \(\Psi \in \mathbb{C}^4\) transforming under \(Spin(1,3)\) as \[\Psi \rightarrow \Psi' = e^{S^{\mu\nu} \theta_{\mu \nu}} \Psi \equiv \Lambda_{\frac{1}{2}}\Psi\,\,, \hspace{.5cm} \theta_{\mu\nu} \in \mathbb{R}\] is called a Dirac spinor.
REMARK: Note that a Dirac spinor transforms in a reducible representation, as the matrices \(S^{\mu\nu}\) are block-diagonal. The irreducible representations we find by restricting to the blocks are called
Definition 3.6. Decomposing \(\Psi = (\psi_L,\psi_R)\), the objects \(\psi_L\) and \(\psi_R\) are called left-handed, and right-handed Weyl spinors, respectively.
3.6. .
For an element \(\Lambda(\theta) = e^{l^{12}\theta}\) of the Lorentz group (\(l^{12}\) is one of the generators of the Lorentz algebra introduced in the lectures) show that \(\Lambda(0) =\Lambda(2 \pi) = \mathds{1}\). Now compare this behavior to the corresponding element of the representation acting on a Dirac spinor: \(\Lambda_{1/2}(\theta) = e^{S^{12}\theta}\).
Let \(\gamma^5:= i \gamma^0\gamma^1\gamma^2\gamma^3\). What is \(\tfrac12 \left(\gamma^5 \pm \mathds{1}\right)\Psi\) for \(\Psi\) a Dirac spinor written in terms of Weyl spinors?
Having defined the ‘Dirac spinor’ representation of the (spin group of the) Lorentz group, we may ask how we can construct Lorentz scalars out of it. Let us denote the complex conjugate of \(\Psi\) by \(\Psi^*\), an obvious guess might then be \[\Psi^* \cdot \Psi = \Psi^*_I \Psi_I\] where \(\Psi_I\) are the components of \(\Psi\). It turns out it is not quite (but almost) this easy. The problem here is that \[\Lambda_{1/2}^\dagger \neq \Lambda_{1/2}^{-1}\]
Definition 3.7. For a Dirac spinor \(\Psi\) with components \(\Psi_I\) and \(\Psi^*\) its complex conjugate, we let \[\bar{\Psi} \equiv \Psi^* \gamma^0 \hspace{1cm} \mbox{i.e.} \hspace{1cm} \bar{\Psi}_I \equiv \Psi^*_I \gamma^0_{IJ}\]
Note the slight break with the general convention that a bar signifies complex conjugation, but the above notation is almost universally used, so I will follow this as well.
Proposition 3.5. For a Dirac spinor \(\Psi\) with components \(\Psi_I\) \[\bar{\Psi} \Psi = \Psi^*_I \gamma^0_{IJ} \Psi_J\] is a Lorentz scalar.
: A direct computation (see problems class) shows that \[\Lambda_{1/2}^\dagger \gamma^0 = \gamma^0 \Lambda_{1/2}^{-1} \, .\] Now we can work out \[\begin{aligned} \bar{\Psi} \Psi& = \Psi^*\gamma^0 \Psi \\ &\rightarrow \Psi^* \Lambda_{1/2}^\dagger \gamma^0 \Lambda_{1/2} \Psi = \Psi^* \gamma^0 \Lambda_{1/2}^{-1} \Lambda_{1/2} \Psi = \bar{\Psi} \Psi \, . \end{aligned}\] \(\square\)
Theorem 3.2. For a Dirac spinor \(\Psi\) with components \(\Psi_I\) the expression \[\bar{\Psi} \gamma^\mu \Psi = \Psi^*_I \gamma^0_{IJ} \gamma^\mu_{JK} \Psi_K\] transforms as a Lorentz vector.
Note that this means we can effectively take the \(^\mu\) index we gave the Dirac matrices seriously, which is the reason for this notation. Before showing this, we need an important lemma:
Lemma 3.1. The matrices \(\Lambda_{\frac{1}{2}} = e^{S^{\mu\nu} \theta_{\mu \nu}}\) satisfy \[\Lambda_{\frac{1}{2}}^{-1} \gamma^\mu \Lambda_{\frac{1}{2}} = \Lambda^\mu_{\,\, \nu}\gamma^\nu = \left(e^{\,l^{\,\rho \sigma} \theta_{\rho \sigma}} \right)^\mu_{\,\,\,\, \nu}\gamma^\nu\, .\]
: First we show that \[= (l^{\rho \sigma})^{\mu}_{\,\,\,\nu} \gamma^\nu\, .\] Don’t get confused by the rhs of this equation: \(\rho\) and \(\sigma\) label the matrices \(l\), and we are talking about the \(\mu\) and \(\nu\) components of that matrix. As observed earlier in the lectures, these can be written as \[(l^{\rho \sigma})^\mu_{\,\, \nu} = \eta^{\rho \mu} \delta^\sigma_{\,\, \nu} - \eta^{\sigma \mu} \delta^\rho_{\,\, \nu} \, .\] Let’s first take \(\mu \neq \rho\) and \(\mu \neq \sigma\). The rhs then vanishes and we can the work out the lhs as \[2[\gamma^\mu,\gamma^\rho \gamma^\sigma] = 2 (\gamma^\mu \gamma^\rho \gamma^\sigma - \gamma^\rho \gamma^\sigma \gamma^\mu) = 0\, .\] Now we take \(\mu =\rho\neq \sigma\) and compute \[(\mu =\rho): \hspace{.5cm}[\gamma^\mu,S^{\rho\sigma}] = 2[\gamma^\mu,\gamma^\mu \gamma^\sigma] = \eta^{\mu \mu}\gamma^\sigma \,\,\, (\mbox{no summation})\] which equals the rhs of what we want to show for \(\mu =\rho\neq \sigma\). Finally, we take \(\mu =\sigma\neq \rho\) and find \[(\mu =\sigma): \hspace{.5cm}[\gamma^\mu,S^{\rho\sigma}] = 2[\gamma^\mu,\gamma^\rho \gamma^\mu] = -\eta^{\mu \mu}\gamma^\rho \,\,\, (\mbox{no summation})\] which equals the rhs of what we want to show for \(\mu =\sigma\neq \rho\).
The above is equivalent to the statement that, for very small \(\theta_{\mu\nu}\) \[\begin{equation} \label{eq:proof_wonderful_eq_Lambda}
(\mathds{1}- S^{\rho \sigma} \theta_{\rho \sigma}) \gamma^\mu (\mathds{1}+ S^{\rho \sigma} \theta_{\rho \sigma}) =
\left(\delta^\mu_{\,\, \nu} + \left(\ell^{\rho \sigma} \theta_{\rho \sigma}\right)^\mu_{\,\, \nu}\right) \gamma^\nu \end{equation}\] Let’s look at this equation from the following perspective: consider the vector space of matrices spanned by the \(\gamma^\mu\). We can write any element of such a vector space as \(A := a_\mu \gamma^\mu\). The right hand side can be understood as a linear map acting on \(A\) mapping it to \[A' = a_\mu \left(\delta^\mu_{\,\, \nu} + \left( \ell^{\rho \sigma} \theta_{\rho \sigma}\right)^\mu_{\,\, \nu}\right) \gamma^\nu\] and \(\eqref{eq:proof_wonderful_eq_Lambda}\) says that (for \(\theta_{\rho \sigma}\) very small) we can also write this map as \[A' = (\mathds{1}- S^{\rho \sigma} \theta_{\rho \sigma}) A (\mathds{1}+ S^{\rho \sigma} \theta_{\rho \sigma})\] We can apply the same map \(n\) times to find \[(\mathds{1}- S^{\rho \sigma} \theta_{\rho \sigma})^n \gamma_\mu (\mathds{1}+ S^{\rho \sigma} \theta_{\rho \sigma})^n =
\left(\left(\mathds{1}+ \ell^{\rho \sigma} \theta_{\rho \sigma}\right)^n\right)^\mu_{\,\, \nu} \gamma^\nu\] so also \[\lim_{n\rightarrow \infty} (\mathds{1}- S^{\rho \sigma} \theta_{\rho \sigma}/n)^n \gamma_\mu (\mathds{1}+ S^{\rho \sigma} \theta_{\rho \sigma}/n)^n =
\lim_{n\rightarrow \infty} \left(\left(\mathds{1}+ \ell^{\rho \sigma} \theta_{\rho \sigma}/n\right)^n\right)^\mu_{\,\, \nu} \gamma^\nu\] which shows what we wanted to show using the description of the matrix exponential established before. \(\square\)
(of the theorem): We can now work out \[\bar{\Psi} \gamma^\mu \Psi \rightarrow \Psi^\ast \gamma^0 \Lambda_{1/2}^{-1} \gamma^\mu \Lambda_{1/2} \Psi = \Psi^\ast \gamma^0 \Lambda^{\mu}_{\,\,\,\nu} \gamma^\nu \Psi = \Lambda^{\mu}_{\,\,\, \nu} \bar{\Psi} \gamma^\nu \Psi\] where we have used the identity \(\Lambda_{1/2}^{-1} \gamma^\mu \Lambda_{1/2} = \Lambda^{\mu}_{\,\,\,\nu} \gamma^\nu\) shown in the lemma above. \(\square\)
Corollary 3.1. For a Lorentz vector \(a^\mu\), \(a_\mu \bar{\Psi} \gamma^\mu \Psi \equiv \bar{\Psi} \slashed{a} \Psi\) transforms as a Lorentz scalar.
: We have already seen that \(a_\mu b^\mu\) for \(a^\mu\) and \(b^\mu\) any Lorentz vectors gives us a scalar. In the theorem above we saw that \(b^\mu = \bar{\Psi} \gamma^\mu \Psi\) is a Lorentz vector, so the statement follows.
3.7. How does \[B^{\mu \nu} \equiv \bar{\Psi} \gamma^\mu \gamma^\nu \Psi\] transform under Lorentz transformations for \(\Psi\) a Dirac spinor?
3.8. For a Dirac spinor \(\Psi\) write \[\bar{\Psi} \gamma^\mu \Psi\] in terms of Weyl spinors.
Working with the Lie algebra \(\mathfrak{so}(1,3)\) of \(L_+^\uparrow\) reveals the following. Taking this as a Lie algebra over \(\mathbb{C}\) instead of \(\mathbb{R}\) we can define \[\begin{aligned} A_1 = \tfrac{1}{2}(-\ell^{23} + i\ell^{01}) \hspace{.5cm} A_2 = \tfrac{1}{2}(\ell^{13} + i\ell^{02}) \hspace{.5cm} A_3 = \tfrac{1}{2}(-\ell^{12} + i\ell^{03})\\ B_1 = \tfrac{1}{2}(-\ell^{23} - i\ell^{01}) \hspace{.5cm} B_2 = \tfrac{1}{2}(\ell^{13} - i\ell^{02}) \hspace{.5cm} B_3 = \tfrac{1}{2}(-\ell^{12} - i\ell^{03}) \end{aligned}\] these satisfy the algebra \[\label{eq:sl2csl2c} \begin{align}[] &[A_i,B_j] = 0\,\, \forall i, j &\\ [A_i,A_j] =\epsilon_{ijk} A_k &\hspace{1cm}& [B_i,B_j] =\epsilon_{ijk} B_k \end{align}\] which is two copies of the Lie algebra \(\mathfrak{sl}(2,\mathbb{C})\). Hence
Proposition 3.6. The complexification of \(\mathfrak{so}(1,3)\) is equal to \(\mathfrak{sl}(2,\mathbb{C}) \oplus \mathfrak{sl}(2,\mathbb{C})\): \(\mathfrak{so}(1,3) \otimes \mathbb{C}= \mathfrak{sl}(2,\mathbb{C}) \oplus \mathfrak{sl}(2,\mathbb{C})\).
: We can write \(\mathfrak{so}(1,3) \otimes \mathbb{C}\) as \(\eqref{eq:sl2csl2c}\). \(\square\)
We have studied representations of \(SL(2,\mathbb{C})\) in Michaelmas term, and found them to be complex \(d+1\) dimensional and labelled by an integer \(d\). Furthermore, we have seen that e.g. the complex conjugate representation \(\bar{\mathbf{2}}\) becomes the same as \(\mathbf{2}\) after a change of basis in exercise 15. This is not true for \(SL(2,\mathbb{C})\): conjugation does not change the eigenvalues of a matrix and \(g\) and \(\bar{g}\) have different eigenvalues for \(g \in SL(2,\mathbb{C})\). 19 We hence get different representations after taking complex conjugation. At the level of the algebras we can repeat the classification of irreucible representations of \(\mathfrak{so}(1,3)\) by taking a detour via \(\mathfrak{so}(1,3) \otimes \mathbb{C}\) (just as we did for \(\mathfrak{su}(2)\otimes \mathbb{C}= \mathfrak{sl}(2,\mathbb{C})\)), and it turns out that (we will not prove this here)
Theorem 3.3. The complex irreducible representations of \(SL(2,\mathbb{C})\) are the tensor products \(r_{s_1} \otimes \bar{r}_{s_2}\) labelled by pairs \((s_1,s_2)\) where \(s_i\) take half-integer values. They act on a complex vector space of dimension \((2s_1+1)(2s_2+1)\).
For the first values of \((s_1,s_2)\) these representations have the following names
\((0,0)\) This does not transform at all, so this is a scalar.
\((\tfrac12,0)\) This is a Weyl spinor. For the same reasons we discussed representations of \(SU(2)\) vs. \(SO(3)\), this is only a representation of \(Spin(1,3)=SL(2,C)\) but not \(SO(1,3)_+\).
\((0,\tfrac12)\) This is another Weyl spinor.
\((\tfrac12,\tfrac12)\) This has dimension four and is a vector. It is the representation we have used to define the Lorentz group. Its action is exactly the one written down in proposition Proposition 3.1 when we studied the map from \(SL(2,\mathbb{C})\) to \(L_+^\uparrow\).
\((\tfrac12,0) \oplus (0,\tfrac12)\) This reducible representation is a Dirac spinor.
We will be more precise with these things later.↩︎
This implies that \(x \circ x^{-1}=e\) as well. Let \(y = x \circ x^{-1}\). Then \[y = y^{-1} \circ y \circ y = y^{-1} \circ x \circ x^{-1} \circ x \circ x^{-1} = y^{-1}\circ y = e\,.\] Note how we made use of associativity here.↩︎
It turns out a \(^\ast\)tiny\(^\ast\) little bit is okay, and that there are quantum effects that in fact do this. This is good news, as this is what is needed for baryogenesis in the early universe, i.e. is needed to explain why there is matter but no antimatter in the universe.↩︎
The theoretical physicists Richard Feynman was famous for his approach of ‘example based research’: find an good example for what you want to study and understand it really well. Then develop the general theory such that the main features that ‘make it work’ are kept. Although it is not often presented like that, a lot of mathematics came about in this way.↩︎
The deeper reason for this is that all of these can be defined without making reference to a basis.↩︎
A sphere of dimension \(n\) is the set of points in \(\mathbb{R}^{n+1}\) for which \(x_1^2 + x_2^2 + ... + x_{n+1}^2=1\).↩︎
We could also allow matrices with trace a multiple of \(2 \pi\) at this point. These can be ignored for the following reason: modulo the (traceless) Pauli matrices, any such matrix is of the form \(n \pi \mathds{1}\) for \(n \in \mathbb{Z}\). This commutes with any other matrix and exponentiates to \(\pm \mathds{1}\). We hence get nothing new this way.↩︎
Try to write down the most general complex \(2 \times 2\) matrix which obeys \(A^\dagger = A\) and \(trA = 0\) in terms of real numbers.↩︎
They are named after Wolfgang Pauli who introduced them in the 1920s to describe the spin of electrons. Why and how that works will be explained later.↩︎
For a group homomorphism, the kernel are those elements send to the identity element.↩︎
Recall that \(\theta\) going from \(0\) to \(2 \pi\) was a full \(U(1)\) inside \(SU(2)\). Under this map this is mapped to a rotation that goes with double speed from \(0\) to \(4 \pi\). This is how it had to be such that \(-\mathds{1}\) in \(SU(2)\), which is \(\theta=\pi\), is mapped to \(\mathds{1}\) in \(SO(3)\) and is a consequence of the map \(F\) being two-to-one.↩︎
For a lively introduction to elementary topology I can recommend the book .↩︎
A inner form is a symmetric bilinear map \(\langle \cdot, \cdot \rangle: V \times V \rightarrow \mathbb{R}\) s.t. \(\langle v, v \rangle \geq 0\) for all \(v \in V\) and \(\langle v, v \rangle = 0\) if and only if \(v=0\).↩︎
Think of how you would differentiate \(A \boldsymbol{v}(t)\) for a constant matrix \(A\) and \(t\) dependent vector \(\boldsymbol{v}(t)\).↩︎
Here, I have made the tacit assumption that \(w_m\) is unique, i.e. there is only a single eigenvector with the maximal eigenvalue \(m\). You can try to see what will happen if you repeat the following argument without this assumption, and you will find that this results in a reducible representation.↩︎
Recall that the punchline here was that these representations and their irreducibility is both determined solely by representing the Pauli matrices, i.e. fixing \(\rho(\sigma_j)\). In the case of \(\mathfrak{sl}(2,\mathbb{C})\) its representation is a complex vector space with basis \(\{\rho(\sigma_j)\}\), in the case of \(\mathfrak{su}(2)\) you get a real vector space with basis \(\{i\rho(\sigma_j)\}\).↩︎
In fact, it is an antisymmetric version (under exchange) if you have twice the same particle. I will ignore this in the present discussion as it does not really alter the conclusions.↩︎
There is a deeper meaning which is that these are in the tangent (\(x^\mu\)) and cotangent spaces (\(x_\mu\)) of space-time.↩︎
For \(r_d(g)\), \(g \in SU(2)\), the eigenvalues are real or come in pairs of complex conjugates, so this does not happen.↩︎