文章目录
C.1 einops简介
张量(Tensor)操作是机器学习、深度学习中的常用操作,这些操作在
NumPy、Tensorflow、PyTorch、Mxnet、Paddle等框架都有相应的函数。比如PyTorch中的review,transpose,permute等操作。
einops是提供常用张量操作的Python包,支持NumPy、Tensorflow、PyTorch等框架,可以与这些框架有机衔接。其功能涵盖了reshape、view、transpose和permute等操作。其特点是可读性强、易维护,如变更轴的顺序的操作。
1 2 3 4 |
#用传统方法 y = x.transpose(0, 2, 3, 1) #这个功能用einops实现 y = rearrange(x, 'b c h w -> b h w c') |
einops可用pip安装
1 |
pip install eniops |
einops的常用函数:rearrange, reduce, repeat
C.1.1 rearrange
rearrange只改变形状,但不改变元素总个数,其功能涵盖transpose, reshape, stack, concatenate, squeeze 和expand_dims等。
1、导入需要的库
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
import numpy as np from einops import rearrange, reduce, repeat from PIL.Image import fromarray from IPython import get_ipython #定义一个函数,用于自动可视化arrays数组 def display_np_arrays_as_images(): def np_to_png(a): if 2 <= len(a.shape) <= 3: return fromarray(np.array(np.clip(a, 0, 1) * 255, dtype='uint8'))._repr_png_() else: return fromarray(np.zeros([1, 1], dtype='uint8'))._repr_png_() def np_to_text(obj, p, cycle): if len(obj.shape) < 2: print(repr(obj)) if 2 <= len(obj.shape) <= 3: pass else: print(''.format(obj.shape)) get_ipython().display_formatter.formatters['image/png'].for_type(np.ndarray, np_to_png) get_ipython().display_formatter.formatters['text/plain'].for_type(np.ndarray, np_to_text) |
2、自动可视化arrays数据
1 2 |
#把arrays以图像方式显示 display_np_arrays_as_images() |
3、导入测试数据
数据文件下载地址:https://github.com/arogozhnikov/einops/tree/master/docs/resources
1 2 3 |
ims = np.load('../data/test_images.npy', allow_pickle=False) # 共有6张图,形状为96x96x3 print(ims.shape, ims.dtype) # (6, 96, 96, 3) float64 |
(1)测试数据
1 2 |
#显示第1张图 ims[0] |
(2)显示第2张图
1 2 |
#显示第2张图 ims[1] |
4、交互维度
1 2 |
##交互宽和高维度 rearrange(ims[0], 'h w c -> w h c') |
5、轴的拼接
涵盖Stack and concatenate等功能。
1 2 |
##沿w方向,把原图堆叠成一个3维张量 rearrange(ims, 'b h w c -> h (b w) c') |
6、轴的拆分
(1)拆分batch轴
1 2 |
# 把batch=6 分解为b1=2和b2=3,变成一个5维张量 rearrange(ims, '(b1 b2) h w c -> b1 b2 h w c ', b1=2).shape |
(2)拆分与拼接(concatenate)
1 2 |
# 同时利用轴的拼接与拆分 rearrange(ims, '(b1 b2) h w c -> (b1 h) (b2 w) c ', b1=2) |
(3)对width轴进行拆分
1 2 |
#把一部分width 维度上的值移到height维度上 rearrange(ims, 'b h (w w2) c -> (h w2) (b w) c', w2=2) |
7、重新拼接轴
1 |
rearrange(ims, 'b h w c -> h (b w) c') |
8、沿轴增加或减少一个维度
覆盖这些函数的功能:Squeeze and unsqueeze (expand_dims)
1 2 3 |
x = rearrange(ims, 'b h w c -> b 1 h w 1 c') # 等价于numpy.expand_dims print(x.shape) print(rearrange(x, 'b 1 h w 1 c -> b h w c').shape) # 等价于 numpy.squeeze |
(6, 1, 96, 96, 1, 3)
(6, 96, 96, 3)
C.1.2 reduce
沿轴求平均值,最大值、最小值等。
1 2 3 4 5 |
# 沿batch轴进行平均,等价于ims.mean(axis=0),但reduce可读性更好 reduce(ims, 'b h w c -> h w c', 'mean') # 把图像分成2x2大小的块,然后对每块求平均,其输出形状为:48x(6x48)x3 #也可把mean改为max或min reduce(ims, 'b (h h2) (w w2) c -> h (b w) c', 'mean', h2=2, w2=2) |
C.1.3 repeat
在某轴上重复n次。
1 2 |
# 沿width维度重复n次 repeat(ims[0], 'h w c -> h (repeat w) c', repeat=3) |
1 2 |
# 沿highth及width维度复制多个元素 repeat(ims[0], 'h w c -> (2 h) (2 w) c') |
C.2 作为pytorch的layer来使用
Rearrange是nn.module的子类,直接可以当作pytorch网络层放到模型里。
C.2.1 展平
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
from torch.nn import Sequential, Conv2d, MaxPool2d, Linear, ReLU from einops.layers.torch import Rearrange from einops.layers.torch import Reduce model = Sequential( Conv2d(3, 6, kernel_size=5), MaxPool2d(kernel_size=2), Conv2d(6, 16, kernel_size=5), MaxPool2d(kernel_size=2), #展平 Rearrange('b c h w -> b (c h w)'), Linear(16*5*5, 120), ReLU(), Linear(120, 10), ) |
这个代码与下代码等价
1 2 3 4 5 6 7 8 9 10 |
model01 = Sequential( Conv2d(3, 6, kernel_size=5), MaxPool2d(kernel_size=2), Conv2d(6, 16, kernel_size=5), #最大池化并展平 Reduce('b c (h 2) (w 2) -> b (c h w)', 'max'), Linear(16*5*5, 120), ReLU(), Linear(120, 10), ) |
C.2.2 使用einops可大大简化PyTorch代码
1、构建模型
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
import torch import torch.nn as nn import torch.nn.functional as F import numpy as np import math from einops import rearrange, reduce, asnumpy, parse_shape from einops.layers.torch import Rearrange, Reduce class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 10, kernel_size=5) self.conv2 = nn.Conv2d(10, 20, kernel_size=5) self.conv2_drop = nn.Dropout2d() self.fc1 = nn.Linear(320, 50) self.fc2 = nn.Linear(50, 10) def forward(self, x): x = F.relu(F.max_pool2d(self.conv1(x), 2)) x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) x = x.view(-1, 320) x = F.relu(self.fc1(x)) x = F.dropout(x, training=self.training) x = self.fc2(x) return F.log_softmax(x, dim=1) conv_net_old = Net() |
2、代码1与下面这个代码等价
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
conv_net_new = nn.Sequential( nn.Conv2d(1, 10, kernel_size=5), nn.MaxPool2d(kernel_size=2), nn.ReLU(), nn.Conv2d(10, 20, kernel_size=5), nn.MaxPool2d(kernel_size=2), nn.ReLU(), nn.Dropout2d(), Rearrange('b c h w -> b (c h w)'), nn.Linear(320, 50), nn.ReLU(), nn.Dropout(), nn.Linear(50, 10), nn.LogSoftmax(dim=1) ) |
C.2.3 构建注意力模型
1 2 3 4 5 6 7 8 9 |
class Attention(nn.Module): def __init__(self): super(Attention, self).__init__() def forward(self, K, V, Q): A = torch.bmm(K.transpose(1,2), Q) / np.sqrt(Q.shape[1]) A = F.softmax(A, 1) R = torch.bmm(V, A) return torch.cat((R, Q), dim=1) |
这段代码与下列代码等价
1 2 3 4 5 6 |
def attention(K, V, Q): _, n_channels, _ = K.shape A = torch.einsum('bct,bcl->btl', [K, Q]) A = F.softmax(A * n_channels ** (-0.5), 1) R = torch.einsum('bct,btl->bcl', [V, A]) return torch.cat((R, Q), dim=1) |