Traditional Culture Encyclopedia - Traditional culture - A look at four basic neural network architectures in one article

A look at four basic neural network architectures in one article

Original link:

http://blackblog.tech/2018/02/23/Eight-Neural-Network/

More on my personal blog http://blackblog.tech Welcome to follow

Just getting started with neural networks. The neural network architecture is often confusing, neural network looks complex and diverse, but so many architectures are just three types, feed-forward neural networks, recurrent networks, symmetric connection networks, this article will introduce four common neural networks, respectively, CNN, RNN, DBN, GAN. through the four basic neural network architectures, let us have a certain understanding of neural networks.

A neural network is a model in machine learning, an algorithmic mathematical model that mimics the behavioral characteristics of animal neural networks for distributed parallel information processing. This type of network relies on the complexity of the system to process information by adjusting the relationship between the large number of nodes interconnected within it.

In general, the architecture of neural networks can be divided into three categories:

Feed-forward neural networks:

This is the most common type of neural network in practice. The first layer is the input and the last layer is the output. If there are multiple hidden layers, we call them "deep" neural networks. They compute a series of transformations that change the similarity of the samples. The activity of the neurons in each layer is a nonlinear function of the activity of the previous layer.

Recurrent networks:

Recurrent networks have directed loops in their connectivity graphs, which means you can follow the arrows back to where you started. They can have complex dynamics that make them hard to train. They are more biologically realistic.

Recurrent networks are intended to be used to process sequential data. In a traditional neural network model, it is from the input layer to the hidden layer to the output layer, the layers are fully connected to each other and the nodes between each layer are unconnected. But this ordinary neural network is incompetent for many problems. For example, if you want to predict what the next word in a sentence will be, you generally need to use the previous word, because the words before and after a sentence are not independent.

A recurrent neural network, where the current output of a sequence is also related to the previous output. This is manifested by the fact that the network remembers the previous information and applies it to the computation of the current output, i.e., the nodes between hidden layers are no longer unconnected but connected, and the input to the hidden layer includes not only the output of the input layer but also the output of the hidden layer from the previous moment.

Symmetric Connected Networks:

Symmetric Connected Networks are a bit like recurrent networks, but the connections between the cells are symmetric (they have the same weight in both directions). Symmetric networks are easier to analyze than cyclic networks. There are more restrictions in this network because they obey the energy function law. Symmetrically connected networks without hidden units are called "Hopfield networks". Symmetrically connected networks with hidden units are called Boltzmann machines.

Actually, I've talked a bit about perceptual machines in previous posts, so I'll recap here.

First of all, there's this picture

This is an M-P neuron

A neuron has n inputs, each of which corresponds to a weight, w. The neuron multiplies the inputs by the weights and sums them up, and then sums them up with the bias, and then places the result into an activation function, which then gives you the final output, which tends to be binary, with the 0 state representing inhibition, and the 1 state representing activation. The output is often binary, with 0 representing inhibition and 1 representing activation.

The perceptron can be thought of as a hyperplane decision surface in an n-dimensional instance space, where the perceptron outputs a 1 for samples on one side of the hyperplane and a 0 for instances on the other side, and the hyperplane equation for this decision is w?x=0. The set of positive and negative samples that can be partitioned by a particular hyperplane is called the linearly separable set of samples. They can be represented using the perceptual machine in Fig.

The with, or, and not problems are linearly separable problems that can be easily represented using a perceptron with two inputs, while the with or is not a linearly separable problem, so using a single-layer perceptron does not work, and it is necessary to use a multilayer perceptron to solve the puzzling problem.

What should we do if we want to train a perceptual machine?

We would start with random weights, apply this perceptron iteratively to each training sample, and modify the perceptron's weights whenever it misclassifies a sample. Repeat this process until the perceptron correctly classifies all samples. At each step, the weights are modified according to the perceptron training law, which is the weight wi corresponding to the input xi, as follows:

Here t is the target output of the current training sample, o is the output of the perceptron, and η is a positive constant known as the learning rate. The learning rate serves to moderate the degree of adjustment of the weights at each step; it is usually set to a small value (e.g., 0.1) and is sometimes made to decay as the number of weight adjustments increases.

A multilayer perceptron, or multilayer neural network, is nothing more than multiple hidden layers between the input and output layers, with subsequent neural networks such as CNNs and DBNs simply redesigning each layer. Perceptual machine can be said to be the basis of the neural network, the subsequent more complex neural networks are inseparable from the simplest model of the perceptual machine,

When it comes to machine learning, we tend to follow a word called pattern recognition, but the real environment of the pattern recognition tends to be a variety of problems. For example:

Image segmentation: real scenes are always mixed with other objects. It is difficult to determine which parts belong to the same object. Some parts of an object can be hidden behind other objects.

Object illumination: the intensity of pixels is strongly influenced by light.

Image distortion: objects can be distorted in various non-affine ways. For example, handwriting can also have a large circle or just a pointed tip.

Contextual support: the category to which objects belong is usually defined by how they are used. For example, chairs are designed for people to sit on, so they come in a variety of physical shapes.

The difference between a convolutional neural network and a regular neural network is that a convolutional neural network contains a feature extractor consisting of a convolutional layer and a subsampling layer. In the convolutional layer of a convolutional neural network, a neuron is connected to only some of its neighboring neurons. In a convolutional layer of a CNN, it usually contains a number of feature planes (featureMap), each feature plane consists of a number of rectangularly arranged neurons, neurons in the same feature plane **** enjoy the weights, where **** enjoy the weights is the convolution kernel. The convolution kernel is usually initialized in the form of a random fractional matrix, and the convolution kernel will learn to get reasonable weights during the training process of the network. The immediate benefit of the *** shared weights (convolution kernel) is to reduce the number of connections between the layers of the network while at the same time reducing the risk of overfitting. Sub-sampling is also called pooling and usually comes in the form of mean pooling and max pooling. Sub-sampling can be seen as a special kind of convolution process. Convolution and subsampling greatly simplify the model complexity and reduce the parameters of the model.

A convolutional neural network consists of three parts. The first part is the input layer. The second part consists of a combination of n convolutional and pooling layers. The third part consists of a fully connected multilayer perceptron classifier.

Here's an example of AlexNet:

- Input: 224×224 sized image, 3 channels

- First convolutional layer: 96 convolutional kernels of 11×11 size, 48 on each GPU.

- First layer max-pooling: 2×2 kernels.

- Second layer convolution: 5×5 convolution kernels 256, 128 on each GPU.

- Second layer max-pooling: 2×2 kernels.

- Layer 3 convolution: fully connected to the previous layer, 384 convolution kernels in 3×3. Split to two GPUs 192.

- Fourth convolutional layer: 384 convolutional kernels in 3×3, 192 on each of the two GPUs. This layer is connected to the previous layer without going through a pooling layer.

- The fifth convolutional layer: 256 convolutional kernels of 3×3, 128 on each of the two GPUs.

- Layer 5 max-pooling: 2×2 kernels.

- First layer fully-connected: 4096 dimensions, connecting the output of the fifth layer of max-pooling into a one-dimensional vector as input to that layer.

-The second fully connected layer: 4096 dimensions

-The Softmax layer: the output is 1000, and each dimension of the output is the probability that the picture belongs to that category.

Convolutional neural networks have important applications in the field of pattern recognition, of course, here is only the simplest explanation of convolutional neural networks, convolutional neural networks still have a lot of knowledge, such as local sense of the field, weights **** enjoyment, multi-convolution kernel, etc., the subsequent opportunity to explain.

Traditional neural networks are difficult to deal with for many problems, for example, you want to predict what the next word of the sentence is, you generally need to use the previous word, because a sentence before and after the word is not independent. the reason why the RNN is called a recurrent neural network, that is, the current output of a sequence is also related to the previous output. The specific form of expression is that the network will memorize the previous information and apply it to the calculation of the current output, i.e., the nodes between the hidden layers are no longer unconnected but connected, and the input of the hidden layer includes not only the output of the input layer but also the output of the hidden layer at the previous moment. Theoretically, RNNs are capable of processing sequential data of any length.

This is the structure of a simple RNN, and you can see that the hidden layers themselves are connected to themselves.

So why does the RNN have a hidden layer that can see the output of the hidden layer from the previous moment, which is actually quite clear when we expand the network.

As we can see from the equation above, the difference between a cyclic layer and a fully-connected layer is that the cyclic layer has an additional weight matrix W.

If we repeatedly bring Eq. 2 to Eq. 1, we get:

Before we get to the DBNs we need to have some idea of what the basic unit of composition of the DBNs are, and that's the RBM, the Restricted Boltzmann Machine.

What is a Boltzmann machine in the first place?

[Image upload failed... (image-d36b31-1519636788074)]

A Boltzmann machine is shown in the figure, with blue nodes for the hidden layer and white nodes for the input layer.

The difference between a Boltzmann machine and a recurrent neural network is the following:

1. A recurrent neural network is essentially learning a function, and therefore has the concepts of an input and an output layer, whereas a Boltzmann machine is used to learn the "intrinsic representation" of a set of data, and therefore does not have the concept of an output layer.

2. The nodes of a recurrent neural network are linked in a directed ring, while the nodes of a Boltzmann machine are linked in an undirected complete graph.

And what is a restricted Boltzmann machine?

In the simplest terms, it is the addition of a restriction, and this restriction is what turns the complete graph into a bipartite graph. That is, it consists of a dominant layer and a hidden layer, with bi-directional full connections between neurons in the dominant and hidden layers.

h denotes the hidden layer, and v denotes the explicit layer

In RBM, any two connected neurons have a weight w between them to indicate the strength of their connection, and each neuron itself has a bias coefficient b (for the explicit neuron) and c (for the implicit neuron) to indicate its own weight.

The exact derivation of the formulas is not shown here

DBN is a probabilistic generative model, as opposed to the traditional discriminative model of neural networks, where the generative model builds a joint distribution between observations and labels, evaluating both P(Observation|Label) and P(Label|Observation). The discriminant model only evaluates the latter, P(Label|Observation).

DBNs consist of multiple layers of Restricted Boltzmann Machines, a typical type of neural network shown in Figure . These networks are "restricted" to a visual layer and a hidden layer, with connections between the layers, but not between the units within the layers. The hidden layer units are trained to capture the correlation of higher-order data expressed in the visual layer.

Generating Adversarial Networks was actually explained in a previous post, so I'll explain it here.

The goal of generative adversarial networks is to generate, and our traditional network structures tend to be discriminative models, i.e., judging the truth of a sample. Generative models, on the other hand, are able to generate similar new samples based on the samples provided, noting that these samples are learned by the computer.

GANs generally consist of two networks, the generative model network, and the discriminative model network.

The generative model G captures the distribution of the sample data, and generates a sample that resembles the real training data using a noise z that obeys a certain distribution (uniform, Gaussian, etc.), pursuing the effect that the more it resembles the real samples, the better; the discriminative model D is a binary classifier estimating the probability that a sample comes from the training data (rather than from the generated data), and if the sample comes from the real training data, D outputs a large probability if the sample comes from real training data; otherwise, D outputs a small probability.

An example: the generative network G is like a counterfeit currency manufacturing gang, specializing in manufacturing counterfeit currency, and the discriminative network D is like a police officer, specializing in detecting whether the currency in use is real or counterfeit, G's goal is to find ways to generate currency that is the same as the real currency, so that D can't discriminate, and D's goal is to find ways to detect the counterfeit currency that G generates.

Traditional discriminative network:

Generative Adversarial Network:

Below shows an example of cDCGAN (written in an earlier post)

Generative Network

Discriminative Network

The final result, using MNIST as the initial samples, is generated by the numbers generated by the learning, and you can see that the effect of the learning is is still good.

In this article, we briefly introduced four neural network architectures, CNN, RNN, DBN, and GAN, but of course, we did not go into the depths of what they mean. These four neural network architectures are very common and widely used. Of course, about the knowledge of neural networks, it is not possible to explain the end of a few posts, the knowledge here to explain some of the basics, to help you quickly into (zhuang) door (bi). Later posts will be on the depth of the autoencoder, Hopfield network long short-term memory network (LSTM) to explain.

Previous article:The composition of eating glutinous rice balls is 250 words
Next article:What are some of the poems about gray sideburns?