Tensors

Below explains the view(it only changes the shape for veiwing) but in memory the shape is retained. A value of negative 1 means the shape if inferred from the tensor.

In [21]: a
Out[21]:
tensor([[ 1,  0,  0],
        [ 1,  1,  1],
        [-1,  1,  1],
        [ 2,  3,  4]])

In [22]: a.sum(dim=1).view(-1, 1)
Out[22]:
tensor([[1],
        [3],
        [1],
        [9]])

In [23]: a/a.sum(dim=1).view(-1, 1)
Out[23]:
tensor([[ 1.0000,  0.0000,  0.0000],
        [ 0.3333,  0.3333,  0.3333],
        [-1.0000,  1.0000,  1.0000],
        [ 0.2222,  0.3333,  0.4444]])

In [24]:

NOTE: If any value is changed in-place for one object then the other value changes as well. Both have a shared memory.

Neural Networks

We will use the MNIST dataset of handwritten examples of digits in postcards which is pre-processed and formatted. This dataset can be loaded with torchvisionin pytorch.

Tensor of size (64, 1, 28, 28) that it is a batch of 64 images with 1 channel of 28 * 28 pixel. Note that an (d1, d2, ..., dn)-dimensional array/tensor is represented as follows.

Each $d_i$ represents the length of the list we will have "list of $d_1$ lists each containing $d_2$ lists with $d_3$ lists and so on..."

Plotting the probabilities of the above network which is not trained.

Autograd

Autograd is reverse automatic differentiation system and a good introduction is provided on the pytorch page.

We require error associated with output of each neuron and to rectify it we require certain partial derivatives and autograd simplies this task by keeping track of the operations associated with scalar. In simple terms autograd stores the jacobian, it is simply the derivative(linear transformation) of multivariable function and give any vector $v$ (the direction of derivative) it gives out $Jv$.

Mechanics

Note that partial derivatives can be represented by a tree

chain_rule.png

So if we want to calculate $\partial z/\partial s$ we get this by adding up the values in the above tree. So it would be \begin{equation} \frac{\partial z}{\partial s}= \frac{\partial z}{\partial x} \frac{\partial x}{\partial s} + \frac{\partial z}{\partial y} \frac{\partial y}{\partial s} \end{equation} In a similar manner autograd calculates the partial derivatives for the requires function. From the documentaion:

Autograd relies on the user to write thread safe C++ hooks. If you want the hook to be correctly applied in multithreading environment, you will need to write proper thread locking code to ensure the hooks are thread safe.

In python we don't need to worry because GIL.

Example

In the following example we use the flag requires_grad=True to keep track of the operations on the tensor

In the following we see that<PowBackward0 at 0x7f8dadefa7d0> is the function as we raised 2 to original tensor.

The above shows that it is powerbackward function.

The above shows that it is exp function.

Example 2

The above error was thrown as y is not a scalar. So in this case we need to give the direction for the gradient evaluation. Suppose we want to evaluate it in the direction of $(1, 4)$. So that would be $Jv^t$ where $J$ is the jacobian. Hence, we have the following

\begin{equation} J= \begin{pmatrix} 2 & 0\\ 0 & 8\\ \end{pmatrix} % \begin{pmatrix} 1\\ 4 \end{pmatrix} % = \begin{pmatrix} 2\\ 32 \end{pmatrix} \end{equation}

Example 3