I think it’s quite often that, when you are learning mathematics beyond linear algebra, you are stuck at some linear algebra problems, but you haven’t learnt that systematically before although you wish you had. In this blog post we will go through some content that is not universally taught but quite often used in further mathematics. But this blog post does not serve as a piece of textbook. If you find some interesting topics, you know what document you should read later, and study it later.

This post is still on progress, neither is it finished nor polished properly. For the coming days there will be new contents, untill this line is deleted. What I’m planning to add at this moment:

  • Transpose is not just about changing indices of its components.
  • Norm and topology in vector spaces
  • Representing groups using matrices

If you care more about algebra

Since the background of the reader varies a lot, I will try to organise contents depending on topic and required background. For the following section, you are assumed to be familiar with basic abstract algebra terminologies, for example, group, ring, fields.

Linear algebra is not just about real and complex numbers

When learning linear algebra, we were always thinking about real or complex vectors, matrices. This makes sense because $\mathbb{R}$ and $\mathbb{C}$ are the closest number fields to our real life. But we should not have the stereotype that linear algebra is all about real and complex spaces, or properties of $\mathbb{R}^n$ and $\mathbb{C}^n$. Never has there been such an restriction. In fact, $\mathbb{R}$ and $\mathbb{C}$ can be replaced with any field $\mathbb{F}$, and there are vast differences depending on the properties of $\mathbb{F}$.

There are already some differences about linear algebra over $\mathbb{R}$ and $\mathbb{C}$. Since $\mathbb{C}$ is algebraically closed, that is, all polynomials of order $n \geq 1$ have $n$ roots, dealing with eigen functions has been much ‘safer’. Besides, for example, we can diagnoalise the matrix

in $\mathbb{C}$ but not in $\mathbb{R}$.

When $\mathbb{F}$ above is finite, there are a lot more interesting things. It’s not just saying, $\mathbb{F}$ is a field, and is finite. For example, if $\mathbb{F}=\mathbb{R}$, we have

There shouldn’t be any problem. However, on the other hand, if $\mathbb{F}=\mathbb{Z}_5$, we have

In application, when working on applied algebra, it’s quite often to meet finite fields. What if we want to solve linear equation over a finite field? That’s when linear algebra over finite fields comes in. Realise this before it’s late! By the way, we are working on rings in lieu of fields, we find ourselves in module theory.

The general linear group

The set of all invertible $n \times n$ matrices forms a multiplicative group (and you should have no problem verifying this). The notation won’t go further than $GL(n)$, $GL(n,\mathbb{F})$, $GL_n(\mathbb{F})$ or simply $GL_n$. The set of all orthomormal matrices, which is also a multiplicative group and written as $O(n)$, is obviously subgroup of $GL(n)$ since for all $A \in O(n)$, we have $\det{A} = \pm 1 \neq 0$ all the time. $O(n)$ contains $SO(n)$ as a subgroup, whose elements have determinant $1$. One should not mess up with $SO(n)$ and $SL(n)$ which is the group of all matrices of determinant $1$. In fact $SO(n)$ is a proper subset of $SL(n)$ and $SL(n) \cap O(n) = SO(n)$. In general we have

Now we consider a more detailed group structure between $GL(n)$ and $O(n)$. I met the following problem on a differential topology book and was about fibre and structure group. But for now it’s simply a linear algebra problem. The crux is finding the ‘square root’ of a positive defined matrix.

There is a direct product decomposition

This decomposition is pretty intuitive. For example if a matrix $A \subset GL(n,\mathbb{R})$ has determinant $a$, we may be looking for a positive definite matrix of determinant $|a|$, and another matrix of determinant $\frac{a}{|a|}$, which is expected to be orthonormal as well. We can consider $O(n)$ as a rotation of basis (change the direction), and the positive definite symmetric matrix as scaling (change the size). Similar result hold if we change the order of multipication. It worth mentioning that by direct product we mean it’s up to the order of eigenvalues.

Proof. For any invertible matrix $A$, we see $AA^T$ is positive definite and symmetric. Therefore there exists some $P \in O(n)$ such that

We assume that $\lambda_1\leq \lambda_2 \leq \cdots \leq \lambda_n$ to preserve uniqueness. Note $\lambda_k>0$ for all $1 \leq k \leq n$ since $AA^T$ is positive definite. We write $\Lambda=\operatorname{diag}(\sqrt\lambda_1,\sqrt\lambda_2,\cdots,\sqrt\lambda_n)$ which gives

Define the square root $B=\sqrt{AA^T}=\sqrt{A^TA}$ by

Then $B^2=P\Lambda P^T P \Lambda P^T = AA^T$. Note $B$ is also a positive definite symmetric matrix and is unique for given $A$. Let $v_1,v_2,\cdots,v_n$ be the orthonormal and linear independent eigenvectors of $B$ with respect to $\sqrt\lambda_1, \sqrt\lambda_2, \cdots, \sqrt\lambda_n$. We first take a look at the following basis:

Note

So if the value above is $1$ if $i = j$ and $0$ if $i \neq j$. ${e_1,e_2,\cdots,e_n}$ is a basis since $A$ is invertible, and later we know it is orthonormal.

Then we take

We see

since both ${e_1,e_2,\cdots,e_n}$ and ${v_1,v_2,\cdots,v_n}$ are orthonormal. On the other hand, we need to prove that $A=UB$. First of all,

(Note we used the fact that ${v_k}$ are orthonormal.) This yields

Therefore $A=UB$ holds on a set of basis, therefore holds on $\mathbb{R}^n$. This gives the desired conclusion. For any invertible $n \times n$ matrix $A$ we have a unique decomposition

where $U \in O(n)$ and $B$ is a positive definitive symmetric matrix. $\square$

Basis of vector spaces

Basis of a vector space is not coming from nowhere. The statement that all vector spaces have a basis is derived from axiom of choice and the fact that all non-zero elements in a field is invertible. I have written an article proving this already, see here (this is relatively advanced). On the other hand, since elements of a ring are not necessarily invertible, modules over a ring are not equipped with basis in general.

It is also worth mentioning that, a vector space of finite dimension is not necessarily of finite dimension. Infinite dimensional vector space is not some fancy thing. It’s quite simple: the set of basis is not finite. It can be countable or uncountable. And there is a pretty straightforward example: the set of all continuous functions $f:\mathbb{R} \to \mathbb{R}$.

Tensor product

One of the most important concepts developed in 20th century is, when studying a set, one can study functions defined on it. For example, let’s consider $[0,1]$ and $(0,1)$. If we consider the set of all continuous functions on $[0,1]$, which is written as $C([0,1])$, we see everything is fine. It’s fine to define norm on it, to define distance on it, and the norm and distance are complete. However, things are messy on $C((0,1))$. Defining a norm on it results in abnormal behaviour. If you are interested you can check here.

Now let’s consider the unit circle $S^1$ on the plane. The real continuous functions defined on $S^1$ can be considered as periodic functions defined on $\mathbb{R}$. So we may have a lot to do with it. If we are interested in the torus (the picture below is from wikipedia),

Source via wiki

which is homeomorphic to $S^1 \times S^1$, how can we study the functions on it? We may consider $C(S^1) \times C(S^1)$, but as we will show later, there are some problems about that. Anyways, it makes sense to define ‘product’ from two vector spaces, which can ‘expand’ it.

Direct sum and direct product

Let’s review direct sum and direct product first. For the direct product of $A$ and $B$, we ask for a algebraic structure on the Cartesian product $A \times B$. For example, $(a,b)+(a’,b’)=(a+a’,b+b’)$. That is, the operation is defined componentwise. This works fine for groups since for each group there is only one binary operation. But at this point we don’t care about scalar multiplication.

There are two types of direct sum, inner and outer. For a vector space $V$ over a field $\mathbb{F}$, we consider two (or even more) subspaces $W$ and $W’$. We have a ‘bigger’ subspace generated by adding $W$ and $W’$ together, namely $W+W’$, which contains all elements of the form $w+w’$ where $w \in W$ and $w’ \in W’$. The representation is not guaranteed to be unique. That is, for $z=w+w’$, we may have $w_1 \in W$ and $w_1’ \in W’$ such that $z=w_1+w_1’$ but $w \neq w_1’$. This would be weird. Fortunately, the representation is unique if and only if $W \cap W’$ is trivial. In this case we say the sum of $W$ and $W’$ is direct, and write $W \bigoplus W’$. This is inner direct sum.

Can we represent the direct sum using an ordered pair? Of course we can. Elements in $W \bigoplus W’$ can be written in the form $(w,w’) \in W \times W’$, and the addition is defined componentwise. That is, $(w,w’)+(w_1,w_1’)=(w+w_1,w’+w_1’)$ (which is in fact $(w+w’)+(w_1+w_1’)=(w+w_1)+(w’+w_1’)$). It seems that we don’t go further than direct product. However we need to consider the scalar product. For $\alpha \in \mathbb{F}$, we have $\alpha(w,w’) = (\alpha{w},\alpha{w’})$ this is because $\alpha(w+w’)=\alpha{w}+\alpha{w’}$. We call this inner direct sum because $W$ and $W’$ are inside $V$. One may ask, since $w+w’=w’+w$, why the pair is ordered? For $w+w’$ we have the first one to be an element of $W$ and the second one to be $W’$ but for $w’+w$ we can’t.

Outer direct sum is different. To define this one considers two arbitrary vector spaces $W$ and $V$ over $\mathbb{F}$. It is not guaranteed that $W$ and $V$ are both subspaces of a bigger vector space. For example it’s legit to take $W$ to be $\mathbb{R}$ over itself and $V$ to be all real functions. $W \bigoplus V$ is defined to be the set of all ordered pairs $(w,v)$ with $w \in W$ and $v \in V$. The addition is defined componentwise, and scalar multiplication is defined to be $\alpha(w,v)=(\alpha{w},\alpha{v})$. One may also write $w+v$ if context is clear.

When the number of vector spaces is finite, we don’t distinguish between direct product and direct sum. When the index is infinite, for example when we consider $\prod_{i=1}^{\infty}X_i$ and $\bigoplus_{i=1}^{\infty}X_i$, things are different. To be precise, in the language of category theory, direct product is the product, and direct sum is the coproduct.

Tensor product as the essential multiplication

We are not touching the definition but first of all let’s imagine what we have for multiplication. Let $W$ and $V$ be two vector spaces over $\mathbb{F}$ and we use $\cdot$ to be the multiplication for the time being. Law of distribution should hold, that is, we have $w \cdot v + w’ \cdot v = (w+w’) \cdot v$ and $w \cdot v + w \cdot v’ = w \cdot (v+v’)$. On the other hand, scalar multiplication should be operated on a single component, that is, $\alpha(w \cdot v)=(\alpha w) \cdot v = w \cdot (\alpha v)$.

It seems illegal to use $\cdot$ so let’s use ordered pair. Under these laws, we have

It makes sense to call it ‘bilinear’. Fixing one component, we have a linear transform. However, direct product and direct product do not work here at all. If it would work, we have $(w,v)+(w’,v)=(w+w’,v+v)$. This gives rise to the tensor product: we need a legit multiplication works on vector and vector.

The absolute definition of tensor product

We have got the spirit of tensor product. A direct product is not OK. There has to be bilinear operation on itself no matter what. For two vector spaces $V$ and $W$, we write the tensor product by $V \bigotimes W$, for $v \in V$ and $w \in W$, we denote its tensor product by $v \otimes w$, which can be considered as a image or value of a bilinear function $\varphi(\cdot,\cdot):V \times W \to V \bigotimes W$. There are many bilinear map with domain $V \times W$. We ask the tensor product to be the essential one.

The tensor product $V \bigotimes W$ of $V$ and $W$, is the vector space having the following properties.

  1. There exists the canonical bilinear map $\varphi(\cdot,\cdot):V \times W \to V \otimes W$, and we write $\varphi(v,w) = v \otimes w \in V \bigotimes W$.

  2. For any bilinear map $h(\cdot,\cdot):V \times W \to U$, there exists a unique linear map

    such that $\lambda(\varphi(v,w)) = h(v,w)$ for all $(v,w) \in V \times W$. This is called the universal property of $V \bigotimes W$.

It can be easily verified that, if $V$ and $W$ have two tensor products, then they are isomorphic (hint: use the universal property). So all tensor products of $V$ and $W$ are isomorphic, we only need to pick the obvious one (as long as it exists). But we don’t have too much space for it. For further study I recommend the following documents:

  1. Definition and properties of tensor products. This one involves a considerable amount of explicit calculation and is of elementary approach.
  2. Tensor products and bases. This one proves the existence in an abstract way.
  3. Tensor Product as a Universal Object (Category Theory & Module Theory). One of my recent blog posts. The topics here are relatively advanced, and I don’t think it’s a good idea to use the language of category theory at this early point.

Duality and transpose

Let $\mathbb{F}$ be any field (it can be replaced with a commutative ring if you want to), and $E,F$ be two modules over $\mathbb{F}$. We will have a glance at the definition of dual space and more importantly, we see what is a transpose. In general we study the bilinear form

Sometimes for simplicity we also write $f(x,y)=\langle x,y \rangle$. The set of all bilinear forms of $E \times F$ into $\mathbb{F}$ will be denoted by $L^2(E,F;\mathbb{F})$ and you may have seen it earlier.

We define the kernel of $f$ on the left to be $F^\perp$ and on the right to be $E^\perp$. Recall that for $S \subset E$, $S^\perp$ consists all $y$ such that $f(x,y)=0$ whenever $x \in S$; similarly, for $T \subset F$, $T^\perp$ consists all $x$ such that $f(x,y)=0$ whenever $y \in T$. Respectively, we say $f$ is non-degenerate on the left/right if the kernel on the left/right is trivial.

One of the simplest example is the case when $E=\mathbb{F}^m$ and $F=\mathbb{F}^n$. We take a $m \times n$ matrix $A$ over $\mathbb{F}$. Define $f(x,y) = x^T A y$. This is a classic bilinear form. Whether it is non-degenerate on the left or on the right depends on the linear independency of row vectors and column vectors. $\def\opn{\operatorname}$

The bilinear form $f$ gives rise to a homomorphism of $E$ to a ‘space of essential arrows’:

given by

$\opn{Hom}_\mathbb{F}(F,\mathbb{F})$ contains all linear maps of $F$ into $\mathbb{F}$. One can imagine $\opn{Hom}_\mathbb{F}(F,\mathbb{F})$ to be a set of ‘arrows’ from $F$ to $\mathbb{F}$.

If you care more about analysis and topology

Now let’s see what we can do in analysis and topology.

Differentiation

Let’s consider all complex polynomials of order $\leq 5$. This is a complex vector space and is in fact isomorphic to $\mathbb{C}^6$ since we have a bijection mapping $a_0+a_1z+a_2z^2+a_3z^3+a_4z^4+a_5z^5$ to $(a_0,a_1,a_2,a_3,a_4,a_5)^T$. Therefore we can simply use matrix and vectors. We represent differentiation via matrices. This is a straightforward work. We pick the natural basis $\{1,z,z^2,z^3,z^4,z^5\}$ to begin with and write the differentiation as $\mathscr{D}$. Since $\def\ms{\mathscr}$

We get a matrix corresponding to $\ms{D}$ by

Next we try to obtain the Jordan normal form of $D$. Since the minimal polynomial of $D$ is merely $m(\lambda)=\lambda^6$, we cannot diagonalise it. After some computation we get

where the matrix $J$ in the square bracket is our Jordan normal form. This makes sense since if we consider the basis $\{1,z,\frac{1}{2}z^2,\frac{1}{6}z^3,\frac{1}{24}z^4,\frac{1}{120}z^5\}$, we see under this basis,

which coincides with $J$.

We already know $\ms{D}^6=0$ but we can also get this by considering $D^6=SJ^6S^{-1}=0$ since $J^6=0$. Further, the format of $S$ should have you realise that we have a hidden $e$, that is

and the basis is in fact first $6$ terms of the expansion of $\exp{z}$.

If this cannot fansinate you I don’t know what can!


Next we consider an example on infinite dimensional vector spaces. Consider $E=C_c^\infty(\mathbb{R})$, the infinite dimensional vectror space of $C^\infty$ functions on $\mathbb{R}$ with compact support, namely, for $f \in C_c^\infty(\mathbb{R})$, we have $f \in C^\infty$ and there exists some $0<K<\infty$ such that $f(x)=0$ outside $[-K,K]$. Next consider the bilinear form $E \times E \to \mathbb{R}$ defined by the following inner product:

Note the differential operator $\ms{D}:E \to E$ is a linear map of $E$ into $E$, so let’s find its transpose $\ms{D}^T$. That is, we need to find the unique linear map $\ms{D}^T:E \to E$ such that

This is a simple application of integration by parts:

Hence the transpose of differentiation $\ms{D}$ is $-\ms{D}$. So we can say it’s skew-symmetric for some obvious reason. But the matrix of $\ms{D}$ in $n$-polynomial space is not.

An eigenvalue problem proved with fixed point theorem

(Perron’s theorem) Let $A$ be a $n \times n$ matrix having all components $a_{ij}>0$, then it must have a positive eigenvalue $\lambda_0$, and a unique corresponding positive eigenvector, i.e., $x=(x_1,x_2,\cdots,x_n)^T$ such that $x_i>0$ for all $i = 1,2,\cdots,n$.

In fact, the positive eigenvalue is the spectral radius of $A$, which is often written as $\rho(A)$. I recommend reading the following documents:

But here we are using Brouwer’s fixed point theorem (you may find an elementary proof on project Euclid). In the following proof, we write $D_n$ to denote $n$-disk and $\Delta^n$ to denote $n$-simplex. That is,

Note $D_n$ is homeomorphic to $\Delta^n$. Further we have a lemma:

(Lemma) If $f:X \to X$ is a continuous function and $X$ is homeomorphic to $D_n$, then $f$ has a fixed point as well.

Proof of the lemma. Let $\varphi$ be the homeomorphism from $X$ to $D_n$. Then $\varphi \circ f \circ \varphi^{-1}:D_n \to D_n$ has a fixed point, according to Brouwer’s fixed point theorem, suppose we have

Then

and hence $\varphi^{-1}(y) \in X$ is our fixed point. $\square$


Now we are ready to prove Perron’s theorem using Brouwer’s fixed point theorem.

Proof of Perron’s theorem. Define $\sigma(x)=\sum_{i=1}^{n}x_i$ where $x = (x_1,x_2,\cdots,x_n)^T$, we see since it’s linear, it’s continuous (it’s not generally true for infinite dimensional spaces, but it’s safe now, and you can see this question on mathstackexchange for a proof). Similarly $A$ is continuous as well. Also, by definition, $x \in \Delta^{n-1}$ if and only if $\sigma(x)=1$. We see Define a function $g:\Delta^{n-1} \to \Delta^{n-1}$ by

We will show that this function is well-defined. Since $x \in \Delta^{n-1}$, not all components of $x$ are equal to $0$ since if so, we get $x_1+x_2+\cdots+x=0$, contradicting the assumption that $x \in \Delta^{n-1}$. Note we can write down $Ax$ explicitly (this is an elementary linear algebra thing):

Since $A$ has all components greater than $0$, we see all components of $Ax$ are greater than $0$ as well. Hence $\sigma(Ax)>0$. On the other hand, $g(x) \in \Delta^{n-1}$ since $\sigma(g(x))=\frac{\sigma(Ax)}{\sigma(Ax)}=1$. Since $A$, $\sigma$, $y=\frac{1}{x}$ are continuous, being a composition of continuous functions, $g$ is continuous.

However, since $\Delta^{n-1}$ is homeomorphic to $D_{n-1}$, $g$ has a fixed point according to the lemma. Hence there exists some $y \in \Delta^{n-1}$ such that

But as we have already proved, $\lambda_0=\sigma(Ay)$ is continuous. On the other hand, all components of $y$ are positive since all components of $Ay$ are positive. The proof is completed. $\square$

Exterior differentiation and Jacobian determinant

You are assumed to be familiar with multivariable calculus when reading this subsection since we are discussing it right now. But in general this section is much beyond elementary linear algebra. First of all we are presenting the ultimate abstract extension of the usual gradient, curl, and divergence. We simply consider the $C^\infty$ functions $\mathbb{R}^3 \to \mathbb{R}^3$. When working on gradient, we consider something like $\def\pf[#1]{\frac{\partial f}{\partial #1}}$

When working on curl, we consider

Finally for divergence we consider

They were connected by Green’s theorem, Gauss’s theorem, Stokes’ theorem. But are they abruptly connected for no reason but numerical equality? Fortunately, no. Let’s see why.

First of all for convenience we write $(x_1,x_2,x_3)$ instead of $(x,y,z)$. Define $dx_idx_j=-dx_jdx_i$ for all $i,j = 1,2,3$. Note this implies that $dx_idx_i=0$. For $d$ we have the definition as follows:

  1. If $f$ is a $C^\infty$ function, then $df = \sum_{i=1}^{3}\pf[x_i]dx_i$.
  2. If $\omega$ is of the form $\sum f_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}$, then $d\omega=\sum df_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}$.

Then gradient, curl and divergence follows in the nature of things. You can verify that the second one is actually equal to $d(f_1dx+f_2dy+f_3dz)$ and the third one is equal to $d(f_1dydz-f_2dxdz+f_3dxdy)$. We call $d$ the exterior differentiation.

Linear algebra is not just for $\mathbb{R}^3$ space, so is exterior differentiation. Let $\Omega^\ast$ be the algebra over $\mathbb{R}$ (for algebra over a field, see this), generated by $dx_1,\dots,dx_n$ with the multiplication defined by an anti-commutative multiplication $dx_idx_j=-dx_jdx_i$ for all $i,j=1,2,\cdots,n$. As a vector space over $\mathbb{R}$, $\Omega^\ast$ is of dimension $2^n$ with a basis

where $i<j<k$. Let $C^\infty$ itself be the vector space of $C^\infty$ functions on $\mathbb{R}$, and we define the $C^\infty$ differential forms on $\mathbb{R}^n$ by

For simplicity we omit the tensor product symbol $\otimes$. As a result, for any $\omega \in \Omega^\ast(\mathbb{R})$, we have $\omega$ to be a simple $C^\infty$ function (why don’t we call it a $0$-form? ) or we have $\omega = \sum f_{i_1\cdots i_q}dx_{i_1}\dots dx_{i_q}$, and we call it a $q$-form since the maximal degree of $dx_j$ is $q$. Also we can define $\Omega^q(\mathbb{R}^n)$ to be the vector space of $q$-forms. Consider the differential defined $d$ defined by

  1. If $f$ is a $C^\infty$ function, then $df = \sum_{i=1}^{n}\pf[x_i]dx_i$.
  2. If $\omega$ is of the form $\sum f_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}$, then $d\omega=\sum df_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}$.

This is what we call the exterior differentiation. It’s the ultimate abstract extension of gradient, curl and divergence. Your calculus teacher may have warned you, that you cannot deal with $dx$ independently. So is it safe to work like this? Yes, there is nothing to worry about. We are doing abstraction algebraically.

There are so many concepts can be understood in a linear algebra way. For example we also have

In fact Green’s theorem, Gauss’ theorem and Stokes’ theorem have a ultimate abstract extension as well, which is called the general Stokes’ theorem:

If $\omega$ is an $(n-1)$-form with compact support on an oriented manifold $M$ of dimension $n$ and if $\partial M$ is given the induced orientation, then

We are not diving into this theorem but we will conclude this subsection by a glimpse on integration. Recall that the Riemann integral of a differentiable function $f:\mathbb{R}^n \to \mathbb{R}$ can be written as

Here we add the absolute value function to $dx_1 \dots dx_n$ is to emphasise the distinction between the Riemann integral of a function and the integral of differential form, since order only matters in the latter case. For the latter case, if $\pi$ is a permutation of $1,2,\cdots,n$ or we simply say $\pi \in S_n$, then

This definition is natural and obvious. Since $\operatorname{sgn} \pi$ is equal to the determinant of the matrix representing $\pi$ (see here), it’s natural to consider the determinant. Consider the function

Then $J(\Pi)=\operatorname{sgn}\pi$. This is quite similar to what we expect from Jacobian determinant in general, which describes change-of-variable essentially. Let $x_1,x_2,\cdots,x_n$ be a basis of $\mathbb{R}^n$ and $T:\mathbb{R}^n \to \mathbb{R}^n$ be a diffeomorphism. We have a new basis $y_1,y_2,\cdots,y_n$ given by

where $\pi_i:(a_1,a_2,\cdots,a_n) \mapsto a_i$ is the $i$th projection. Namely

written in column vectors. We now show that

First we recall that $J(T)$ is the determinant of $(\partial T_i / \partial x_j)$, and the determinant of a matrix $(a_{ij})$ is defined by

where $\epsilon(\sigma)$ is actually $\operatorname{sgn}\sigma$ and $\sigma$ ranges through all permutation of $1,2,\cdots,n$. We need something to coincide. First of all, we compute $dy_i$. Note

Hence

We get, as a result,

After cancelling out so many zeros, we get $J(T)$. You don’t have to expand the identity. Pick a component $\frac{\partial T_1}{\partial x_{j_1}}dx_{j_1}$ from $dy_1$. Then when we pick another component from $dy_2$ to get it multiplied with the first one, say $\frac{\partial T_2}{\partial x_{j_2}}dx_{j_2}$, then we must have $j_1 \neq j_2$ since if not, then $dx_{j_1}dx_{j_2}=0$, and we cancel that. The rule remains the same (but even stricter) when we pick components from $dy_3$, $dy_4$, and until $dy_n$. In the end, $j_1,j_2,\cdots,j_n$ are pairwise unequal. This corresponds exactly a permutation of $1,2,\cdots,n$. Hence we get

On the other hand, $dx_{\sigma(1)}dx_{\sigma(2)}\cdots dx_{\sigma(n)}=\epsilon(\sigma)dx_1dx_2\cdots dx_n$, and if we put this inside the expansion of $dy_1dy_2\cdots dy_n$, we get

We answered a calculus question in an algebraic way (and more than that if you review more related concepts in calculus).