TensorFlow 2.0（一）mnist手写数字识别

私人物语 2020-04-23

771

前言

2017年，笔者就开始接触tensorflow了。但那会也没深入下去，加上跑训练太耗资源，机器发热严重，最后不了了之了。

最近，发现已经到2.2版本了。与此同时，官方考虑到python在大规模并行运算上的性能太差，也在推进swift版本的tensorflow，即S4TF。

笔者决定重走一遍从入门到放弃之路。今次从mnist手写数字识别开始，这是tensorflow入门第一课，相当于编程界的hello world。

知识准备

卷积神经网络（Convolutional Neural Network, CNN）是一种前馈神经网络，它的人工神经元可以响应一部分覆盖范围内的周围单元，对于大型图像处理有出色表现。

卷积神经网络由一个或多个卷积层和顶端的全连通层（对应经典的神经网络）组成，同时也包括关联权重和池化层（pooling layer）。这一结构使得卷积神经网络能够利用输入数据的二维结构。与其他深度学习结构相比，卷积神经网络在图像和语音识别方面能够给出更好的结果。这一模型也可以使用反向传播算法进行训练。相比较其他深度、前馈神经网络，卷积神经网络需要考量的参数更少，使之成为一种颇具吸引力的深度学习结构。

——维基百科

项目结构

├── 0.png    # 预测用的图片, 数字0
├── 1.png     # 预测用的图片, 数字1
├── 4.png    # 预测用的图片, 数字4
├── checkpoint  # 检查点
├── cp-00005.data-00000-of-00001  # 训练的模型data
├── cp-00005.ckpt.index  # 训练的模型index
├── mnist.npz   # mnist数据集
├── predict.py   # 预测代码
└── train.py       # 训练代码

CNN模型代码（train.py）

模型定义的前半部分主要使用Keras.layers
提供的Conv2D
（卷积）与MaxPooling2D
（池化）函数。

CNN的输入是维度为 (image_height, image_width, color_channels)
的张量，mnist数据集是黑白的，因此只有一个color_channel
（颜色通道），一般的彩色图片有3个（R,G,B）
,做前端同学可能知道，有些图片有4个通道(R,G,B,A)
，A代表透明度。对于mnist数据集，输入的张量维度就是(28,28,1)，通过参数input_shape
传给网络的第一层。

import os
import tensorflow as tf
from tensorflow.keras import datasets, layers, models




class CNN(object):
    def __init__(self):
        model = models.Sequential()
        # 第1层卷积，卷积核大小为3*3，32个，28*28为待训练图片的大小
        model.add(layers.Conv2D(
            32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
        model.add(layers.MaxPooling2D((2, 2)))
        # 第2层卷积，卷积核大小为3*3，64个
        model.add(layers.Conv2D(64, (3, 3), activation='relu'))
        model.add(layers.MaxPooling2D((2, 2)))
        # 第3层卷积，卷积核大小为3*3，64个
        model.add(layers.Conv2D(64, (3, 3), activation='relu'))
        model.add(layers.Flatten())
        model.add(layers.Dense(64, activation='relu'))
        model.add(layers.Dense(10, activation='softmax'))
        model.summary()
        self.model = model

model.summary()
用来打印我们定义的模型的结构。

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten (Flatten)            (None, 576)               0         
_________________________________________________________________
dense (Dense)                (None, 64)                36928     
_________________________________________________________________
dense_1 (Dense)              (None, 10)                650       
=================================================================
Total params: 93,322
Trainable params: 93,322
Non-trainable params: 0
_________________________________________________________________

我们可以看到，每一个Conv2D
和MaxPooling2D
层的输出都是一个三维的张量(height, width, channels)
。height和width会逐渐地变小。输出的channel的个数，是由第一个参数(例如，32或64)控制的，随着height和width的变小，channel可以变大（从算力的角度）。

模型的后半部分，是定义输出张量的。layers.Flatten
会将三维的张量转为一维的向量。展开前张量的维度是(3, 3, 64) ，转为一维(576)的向量后，紧接着使用layers.Dense
层，构造了2层全连接层，逐步地将一维向量的位数从576变为64，再变为10。

后半部分相当于是构建了一个隐藏层为64，输入层为576，输出层为10的普通的神经网络。最后一层的激活函数是softmax
，10位恰好可以表达0-9十个数字。

最大值的下标即可代表对应的数字，使用numpy
很容易计算出来：

import numpy as np


y1 = [0, 0.8, 0.1, 0.1, 0, 0, 0, 0, 0, 0]
y2 = [0, 0.1, 0.1, 0.1, 0.5, 0, 0.2, 0, 0, 0]
np.argmax(y1)    # 1
np.argmax(y2)   # 4

mnist数据集预处理（train.py）

class DataSource(object):
    def __init__(self):
        # mnist数据集存储的位置，如何不存在将自动下载
        data_path = 'mnist.npz'
        (train_images, train_labels), (test_images,
                                       test_labels) = datasets.mnist.load_data(path=data_path)
        # 6万张训练图片，1万张测试图片
        train_images = train_images.reshape((60000, 28, 28, 1))
        test_images = test_images.reshape((10000, 28, 28, 1))
        # 像素值映射到 0 - 1 之间
        train_images, test_images = train_images / 255.0, test_images / 255.0


        self.train_images, self.train_labels = train_images, train_labels
        self.test_images, self.test_labels = test_images, test_labels

开始训练并保存训练结果（train.py）

class Train:
    def __init__(self):
        self.cnn = CNN()
        self.data = DataSource()


    def train(self):
        check_path = 'cp-{epoch:04d}.ckpt'
        # save_freq 每隔5epoch保存一次
        save_model_cb = tf.keras.callbacks.ModelCheckpoint(
            check_path, save_weights_only=True, verbose=1, save_freq=5000)


        self.cnn.model.compile(optimizer='adam',
                               loss='sparse_categorical_crossentropy',
                               metrics=['accuracy'])
        self.cnn.model.fit(self.data.train_images, self.data.train_labels,
                           epochs=5, callbacks=[save_model_cb])


        test_loss, test_acc = self.cnn.model.evaluate(
            self.data.test_images, self.data.test_labels)
        print('准确率: %.4f，共测试了%d张图片 ' % (test_acc, len(self.data.test_labels)))




if __name__ == '__main__':
    app = Train()
    app.train()

在执行python3 train.py
后，会得到以下的结果：

Train on 60000 samples
Epoch 1/5
49952/60000 [=======================>......] - ETA: 6s - loss: 0.1653 - accuracy: 0.9489       
Epoch 00001: saving model to cp-0001.ckpt
60000/60000 [==============================] - 36s 608us/sample - loss: 0.1470 - accuracy: 0.9544
Epoch 2/5
39968/60000 [==================>...........] - ETA: 13s - loss: 0.0473 - accuracy: 0.9847
Epoch 00002: saving model to cp-0002.ckpt
60000/60000 [==============================] - 40s 668us/sample - loss: 0.0461 - accuracy: 0.9852
Epoch 3/5
29984/60000 [=============>................] - ETA: 19s - loss: 0.0319 - accuracy: 0.9895
Epoch 00003: saving model to cp-0003.ckpt
60000/60000 [==============================] - 40s 660us/sample - loss: 0.0318 - accuracy: 0.9900
Epoch 4/5
20032/60000 [=========>....................] - ETA: 25s - loss: 0.0236 - accuracy: 0.9921
Epoch 00004: saving model to cp-0004.ckpt
60000/60000 [==============================] - 38s 627us/sample - loss: 0.0248 - accuracy: 0.9919
Epoch 5/5
 9984/60000 [===>..........................] - ETA: 28s - loss: 0.0166 - accuracy: 0.9943
Epoch 00005: saving model to cp-0005.ckpt
60000/60000 [==============================] - 38s 628us/sample - loss: 0.0201 - accuracy: 0.9932
10000/10000 [==============================] - 2s 155us/sample - loss: 0.0275 - accuracy: 0.9918
准确率: 0.9918，共测试了10000张图片

可以看到，在第一轮训练后，识别准确率达到了0.9489
，5轮之后，使用测试集验证，准确率达到了0.9918

在第五轮时，模型参数成功保存在了cp-0005.ckpt
。接下来我们就可以加载保存的模型参数，恢复整个卷积神经网络，进行真实图片的预测了。

图片预测（predict.py）

为了将模型的训练和加载分开，预测的代码写在了predict.py
中。

#!/usr/local/bin/python3


import tensorflow as tf
from PIL import Image
import numpy as np


from train import CNN


'''
python 3.7.6
tensorflow 2.1.0
pillow 7.1.1
'''


class Predict(object):
    def __init__(self):
        latest = tf.train.latest_checkpoint('./')
        self.cnn = CNN()
        # 恢复网络权重
        self.cnn.model.load_weights(latest)


    def predict(self, image_path):
        # 以黑白方式读取图片
        img = Image.open(image_path).convert('L')
        flatten_img = np.reshape(img, (28, 28, 1))
        x = np.array([1 - flatten_img])


        # API refer: https://keras.io/models/model/
        y = self.cnn.model.predict(x)


        # 因为x只传入了一张图片，取y[0]即可
        # np.argmax()取得最大值的下标，即代表的数字
        print(image_path)
        print(y[0])
        print('        -> Predict digit', np.argmax(y[0]))




if __name__ == '__main__':
    app = Predict()
    app.predict('0.png')
    app.predict('1.png')
    app.predict('4.png')

最终，执行pyton3 predict.py
，可以看到：

0.png
[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
        -> Predict digit 0
1.png
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
        -> Predict digit 1
4.png
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
        -> Predict digit 4

数据库

文章转载自私人物语，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

TensorFlow 2.0（一）mnist手写数字识别

mnist数据集预处理（train.py）

开始训练并保存训练结果（train.py）

评论