Description
Self-organizing maps (SOM) constitute an alternative way to perform clustering, yielding
some additional valuable properties. In addition to enabling data compression through
clustering (in this case, with clusters corresponding to grid points) and cluster centers (a
prototype–a vector associated with each grid point), the self-organizing map also may
recognize useful dimensional reduction and topological connectivity.
The accompanying zip file contains a set of training patterns and test patterns. These
patterns correspond to binary 8×8 scenes. The scenes originate from coherent blobs,
comprised of 2×2 and 3×3 squares of “on” pixels with the remaining pixels “off”.
However, these individual pixels are scrambled between the original receptors and the
terminals at the self-ordering map. Although the signals are scrambled in this
transmission, all of them are scrambled according to a consistent remapping, e.g. as
though respective axons had become tangled in the transmission from inputs to outputs.
A collection of clusters is arranged in a grid, which receives the disordered axon
terminals. The dimensions of the grid of clusters may be the same as the input patterns
(8×8) or may be smaller, larger, or non-square. The Matlab starter code posted on the
web site performs the necessary operations for reading and unpacking the data. The data
files are: scrambled_blobs.mat and scrambled_testpats.mat. The scrambled blobs should
be used to train the grid of clusters, and the scrambled test patterns should be used to
evaluate the success of the grid of clusters in performing decoding. The test patterns are
the symbols I, H and X, scrambled per the same mapping as the scrambled training blobs.
The main Matlab file is som_clustering_main.m, which uses functions
find_closest_cluster(), alphafnc(), vec2pat() and pat2vec(), and
view_all_pattern_responses(), as well as the script eval_test_patterns.m. Some of these
must be edited, as noted in the program comments.
The main routine loads the training images and converts these 8×8 images to
corresponding 64×1 vectors (to be consistent with SOM training). The main routine also
establishes a set of clusters arranged in a grid. E.g., try a cluster grid that is 6×6. Each
cluster has an associated feature vector that matches the pattern vector dimensions
(64×1). Every cluster feature vector should be initialized to random values and
normalized to unit magnitude.
After initializing the cluster vectors, the main routine should cycle through many
iterations involving choosing a training pattern at random and updating the cluster grid
per the SOM training policy. Updating the cluster grid requires first finding the single
cluster, designated by its grid coordinates ibest,jbest, that is most similar to the current
pattern (as measured by the dot product between the training pattern and the cluster’s
feature vector). Each selected training pattern (potentially) influences every cluster in the
grid. However, clusters that are geometrically further from the most-similar cluster (as
measured with respect to relative grid coordinates) should be influenced less by the
training pattern than clusters that are closer to ibest,jbest. The influence of a pattern on a
cluster is modulated by the function alphafnc(), which returns the coefficient “alpha”,
which should be used to scale the influence of a training pattern on a given cluster as a
function of distance and time. The training pattern should alter the values of each
cluster’s feature vector to be more similar to the input feature vector, with the strength of
the influence modulated by the parameter alpha. Note that every such modification
should conclude with renormalizing the cluster feature vector to unity.
The grid of clusters may be used as follows. For a given stimulus vector (which does not
have to be part of the training set), similarity to each cluster feature vector should be
computed and this value assigned to the corresponding grid location i,j. The result may
be visualized to interpret the entire cluster grid’s response to the stimulation vector. For
coherent patterns that have been scrambled per the same mapping as the training blobs,
the hope is that the cluster response will unscramble the test pattern to recover its original
image.
In the example code, after every 1000 training-pattern influences, the cluster grid is
evaluated graphically for every training pattern and all three test patterns.
The test patterns, which are stored in file “scrambled_testpats.mat”, correspond in the
original receptor space to the symbols “X”, “H” and “1”. They have been scrambled with
the same reordering as the training blobs. If your cluster training is successful, you
should be able to recognize these symbols when they are used to stimulate the cluster
grid. This evaluation is performed by the script “eval_test_patterns.m”.
Your assignment is:
1) Edit the starter Matlab code to execute the SOM algorithm (include your code with
your solution).
2) Experiment with different training parameters (e.g., number of iterations, radius of
influence as a function of time, and value of alpha as a function of radius from the
most similar cluster) to train a 6×6 cluster grid (optionally, additional different-sized
grids) on the provided data.
3) Show your results with respect to the three test patterns.