Description
In this homework, please read and understand the paper “Interpretable Convolutional Neural Networks
via Feedforward Design” [1] introduced by professor Kuo.
1) Understanding of feedforward-designed convolutional neural networks (FF-CNNs) (15%)
An FF-CNN consists of two modules in cascade: 1) the construction of conv layers using the Saab
(Subspace approximation with adjusted bias) transforms: and 2) the construction of fully-connected (FC)
layers using the multi-stage linear least squared regressor (LSR).
• Summarize FF-CNNs with a flow chart and explain it in your own words.
• Explain the similarities and differences between FF-CNNs and backpropagation-designed CNNs
(BP-CNNs).
Do not copy any sentences from [1] or other papers directly. It is plagiarism. The scores will depend on
your degree of understanding.
2) Image reconstructions from Saab coefficients (35%)
Apply Saab transforms to images in the MNIST dataset [2].
• Compute the Saab coefficients (you can use online source codes [3] or implement by yourself) of
four handwritten digits images as shown in Figure 1 and implement the reconstruction algorithm
(write your own codes) to transform the Saab coefficients back to images.
• To evaluate the reconstruction results, you need to show the reconstructed images and compute
PSNR scores between original images and reconstructed images.
Architecture setting:
In this problem, you should use two stage Saab transforms where the spatial size of the transform
kernels is 4×4. The stride of each stage is 4 (non-overlapping). Thus, at the output, the dimension of
your Saab coefficients of an image should be 2x2xN, where N is the number of transform kernel in the
second stage. You need to evaluate on four different settings (different kernel numbers of each stage)
and discuss your results.
EE 569 Digital Image Processing: Homework #6
Professor C.-C. Jay Kuo Page 2 of 2
Figure 1
3) Handwritten digits recognition using ensembles of feedforward design (50%)
In this problem, you will apply an FF-CNN to solve handwritten digits recognition. Train an FF-CNN
using the 60,000 training images from the MNIST dataset. Adopt the LeNet-5-like architecture where
the filter numbers of the first- and the second-conv layers and the first- and the second-FC layers are 6,
16, 120 and 80, respectively. The spatial size of the transform kernels is 5×5 and the stride is 1 for each
conv layer. To reduce the spatial dimension, max-pooling layer is adopted.
• Report the training and testing classification accuracy for individual FF-CNN on the MNIST
dataset.
• One way to improve the performance is building the ensemble systems of FF-CNNs. Train ten
different FF-CNNs and ensemble their results following the method in [4]. Diversity is the key to
have successful ensembles, and paper [4] introduces three strategies to increase diversities in an
ensemble of FF-CNNs which you can refer to. Explain and justify your strategies to generate
various FF-CNNs in an ensemble and report the training and testing classification accuracy of
your ensemble system.
• Error analysis: Please compare classification error cases arising from BP-CNNs (use best result
in your HW#5) and FF-CNNs. What percentages of errors are the same? What percentages are
different? Please give explanations to your observations. Also, please propose ideas to improve
BP-CNNs, FF-CNNs or both and justify your proposal. There is no need to implement your
proposed ideas.
References
[1][ Kuo, C. C. J., Zhang, M., Li, S., Duan, J., & Chen, Y. (2019). Interpretable convolutional neural networks via
feedforward design. Journal of Visual Communication and Image Representation.
[2][MNIST] http://yann.lecun.com/exdb/mnist/
[3] https://github.com/davidsonic/Interpretable_CNN
[4] Chen, Y., Yang, Y., Wang, W., & Kuo, C. C. J. (2019). Ensembles of feedforward-designed convolutional
neural networks. arXiv preprint arXiv:1901.02154.