Image processing apparatus
1. An image processing apparatus is characterized in that,
comprising:
a 1 st processor to which image data is input;
a cache provided in the 1 st processor; and
and a 2 nd processor for performing the recurrent neural network operation using at least one of the plurality of pixel data of the image data and an operation result of the recurrent neural network operation stored in the buffer.
2. The image processing apparatus according to claim 1,
the operation result of the recurrent neural network operation is a hidden state.
3. The image processing apparatus according to claim 1,
the plurality of pixel data are sequentially input to the circular neural network processor;
the cyclic neural network processor sequentially performs the cyclic neural network operation on the plurality of input pixel data, and stores the operation result in the buffer.
4. The image processing apparatus according to claim 3,
the 2 nd processor may be capable of executing a plurality of layers, which are processing units for executing the recurrent neural network operations a plurality of times.
5. The image processing apparatus according to claim 4,
the plurality of layers include a 1 st processing unit that inputs the plurality of pixel data and executes the recurrent neural network operation, and a 2 nd processing unit that inputs the operation result obtained in the 1 st processing unit and executes the recurrent neural network operation.
6. The image processing apparatus according to claim 5,
the operation result obtained in the 1 st processing unit is supplied from the state buffer to the 2 nd processor in a step delayed by a set offset amount in the 2 nd processing unit.
7. The image processing apparatus according to claim 3,
the image data is composed of n rows and m columns of pixel data;
the 2 nd processor performs a predetermined operation on the operation result between two adjacent lines.
Background
There is a technique of realizing recognition processing and the like of image data by a neural network. For example, kernel (kernel) operation in a Convolutional Neural Network (CNN) is performed by holding the entire image data of an image in a frame buffer in an off-chip memory such as a DRAM, and then sliding a window of a predetermined size over the held image data.
Therefore, since it takes time to store the entire image data in the off-chip memory and to access the off-chip memory for writing and reading the feature map for each core operation, latency (latency) of the CNN operation is large. In a device such as an Image Signal Processor (Image Signal Processor), the waiting time is preferably short.
In order to reduce the latency of the CNN operation, a line cache smaller than the size of the frame cache may be used, but since accesses to the line cache for the core operation occur frequently, a memory that can be accessed at high speed needs to be used for the line cache, and the cost of the image processing apparatus increases.
Disclosure of Invention
An object of the present invention is to provide an image processing apparatus which has a small waiting time and can be realized at low cost.
An image processing apparatus according to the present invention includes: a 1 st processor to which image data is input; a cache provided in the 1 st processor; and a 2 nd processor that performs the recurrent neural network operation using at least one of a plurality of pixel data of the image data and an operation result of the recurrent neural network operation stored in the buffer.
Drawings
Fig. 1 is a block diagram of an image processing apparatus according to an embodiment.
Fig. 2 is a diagram for explaining the processing contents of the image processing processor according to the embodiment.
Fig. 3 is a block diagram showing a configuration of an image processing processor according to the embodiment.
Fig. 4 is a block diagram of a cyclic neural network element processor according to the embodiment.
Fig. 5 is a diagram for explaining conversion from input image data to stream data according to the embodiment.
Fig. 6 is a diagram for explaining a processing procedure of the recurrent neural network unit for a plurality of pixel values included in input image data according to the embodiment.
Fig. 7 is a diagram for explaining the processing procedure of the line end cell for the output value of the final column of each row in accordance with modification 1.
Fig. 8 is a diagram for explaining a processing procedure of the recurrent neural network unit for a plurality of pixel values included in input image data according to modification 2.
Fig. 9 is a diagram illustrating a Receptive Field (Receptive Field) of the convolutional neural network.
Fig. 10 is a diagram for explaining a receptive field of the embodiment.
Fig. 11 is a diagram illustrating a difference in the range of the receptive field between the convolutional neural network and the circular neural network.
Fig. 12 is a diagram for explaining the input step size of the cyclic neural network unit according to modification 2.
Fig. 13 is a diagram for explaining the setting range of the reception field according to modification 2.
Detailed Description
Hereinafter, embodiments will be described with reference to the drawings.
(Structure)
Fig. 1 is a block diagram of an image processing apparatus according to the present embodiment. The image processing system 1 using the image processing apparatus of the present embodiment processes image data from the camera apparatus, performs processing such as image recognition, and outputs information of the processing result.
The Image processing system 1 includes an Image processing Processor (Image Signal Processor: ISP, hereinafter referred to as ISP)11, an off-chip memory 12, and a Processor 13.
The ISP11 is connected to a Camera device (not shown) via an Interface conforming to the MIPI (Mobile Industry Processor Interface) CSI (Camera Serial Interface) standard or the like. The ISP11 receives an image pickup signal from the image sensor 14 of the camera device, performs predetermined processing on the image pickup signal, and outputs the result data of the predetermined processing. That is, a plurality of pixel data of image data are sequentially input to the ISP11 as a processor. Here, the ISP11 receives an image pickup signal (hereinafter referred to as input image data) IG from the image sensor 14 as an image pickup device, and outputs image data (hereinafter referred to as output image data) OG as result data. For example, the ISP11 removes noise from the input image data IG, and outputs output image data OG free from noise or the like.
Note that all of the input image data IG from the image sensor 14 may be input to the ISP11, and the RNN operation described later may be performed on all of the input image data IG, or the RNN operation described later may be performed on a part of the input image data IG.
ISP11 includes a state cache 21 and an RNN unit processor 22 that repeatedly performs prescribed operations based on a Recurrent Neural Network (RNN), hereinafter referred to as RNN. The structure of ISP11 will be described later.
The off-chip memory 12 is a memory such as a DRAM. The output image data OG generated in the ISP11 and output from the ISP11 is stored in the off-chip memory 12.
The processor 13 performs recognition processing and the like based on the output image data OG stored in the off-chip memory 12. The processor 13 outputs result data RD obtained by the recognition processing or the like. Thus, the ISP11, the off-chip memory 12, and the processor 13 constitute, for example, an image recognition device (indicated by a broken line in fig. 1) 2 that performs image recognition processing and the like on an image.
Fig. 2 is a diagram for explaining the processing contents of the ISP 11. As shown in fig. 2, ISP11 performs predetermined processing such as noise removal on input image data IG from image sensor 14 using RNN unit processor 22 (described later), and generates output image data OG.
For example, when the image recognition device 2 performs recognition processing or the like based on the output image data OG by the processor 13, since the output image data OG is data from which noise is removed, improvement in accuracy of the recognition processing or the like in the processor 13 can be expected.
Fig. 3 is a block diagram showing the structure of ISP 11. FIG. 4 is a block diagram of RNN unit processor 22. ISP11 includes state cache 21, RNN unit processor 22, and pixel stream decoder 23. The pixel stream decoder 23 is a circuit that converts input image data IG into stream data SD and outputs the stream data SD to the RNN unit processor 22.
Fig. 5 is a diagram for explaining conversion from input image data IG to stream data SD. Here, in fig. 5, the image of the input image data IG is composed of 6 lines of image data for the sake of simplicity of explanation. Each row includes a plurality of pixel data. That is, an image is composed of pixel data of a plurality of rows (here, 6 rows) and a plurality of columns.
Upon receiving the input image data IG from the image sensor 14, the pixel stream decoder 23 converts a plurality of pixel data of the received input image data IG into stream data SD in a predetermined order.
The pixel stream decoder 23 generates and outputs stream data SD including a plurality of pixel data of a data column LL from the pixel at the 1 st column of the 1 st row (i.e., the left-end pixel of the uppermost row) to the pixel at the final column of the 1 st row (i.e., the right-end pixel of the uppermost row), row data L1, the pixel at the 1 st column of the 2 nd row (i.e., the left-end pixel of the 2 nd row from the top) to the pixel at the final column of the 2 nd row (i.e., the right-end pixel of the 2 nd row) following row data L1, and the pixel at the 1 st column of the 6 th row (i.e., the left-end pixel of the lowermost row) to the pixel at the final column of the 6 th row (i.e., the right-end pixel of the lowermost row) from the input image data IG.
Thus, the pixel stream decoder 23 is a circuit that converts the input image data IG into stream data SD and outputs the stream data SD to the RNN unit processor 22.
As shown in FIG. 4, RNN unit processor 22 is a processor that includes 1 RNN unit 31. RNN Cell 31 is a Simple RNN Cell (Simple RNN Cell), and is a hardware circuit that outputs a hidden state (hidden state) obtained by performing a predetermined operation on two input values IN1 and IN2 as two output values OUT1 and OUT 2.
Here, RNN unit processor 22 includes 1 RNN unit 31, but may include two or more RNN units 31. Alternatively, the number of RNN units 31 may be the same as the number of layers described later.
The input value IN1 of RNN unit 31 is il,t. l denotes a layer and t denotes a step size (step). The input value IN2 of RNN unit 31 is hidden state hl,t-1. The output value OUT1 of RNN unit 31 is hidden state hl,tInput value IN1 for step size t of the next layer (l +1) (i.e., il+1,t). The output value OUT2 of RNN unit 31 is hidden state hl,tInput value IN2 for RNN unit 31 of the next step length (t +1) of the same layer.
The step t is also called a time step (time step), is a number that is incremented every time 1 sequence data is input to the RNN and the hidden state is updated, is assigned as an index of the hidden state and input/output, and is a virtual unit that is not always the same as the real time.
As shown in fig. 3, RNN unit 31 can read various parameters (indicated by dotted lines) used in RNN operation from off-chip memory 12 and hold them inside RNN unit 31. The parameters include a weight parameter w and a bias (bias) value b calculated by RNN for each layer, which will be described later.
In addition, the RNN unit 31 may also be implemented by software executed by a Central Processing Unit (CPU).
RNN section 31 performs corresponding operations for each layer described later, and sequentially inputs stream data SD as input value IN1 of RNN section 31 IN the first layer (layer 1). RNN unit 31 performs a predetermined operation to determine the hidden state h as the operation resultl,tOutput values OUT1 and OUT2 are generated and output to state buffer 21.
The output values OUT1 and OUT2 obtained in the respective layers are stored in predetermined storage areas in the state buffer 21. State cache 21 is, for example, a line cache.
Since state cache 21 is provided in ISP11, RNN unit 31 can write and read data to and from state cache 21 at high speed. RNN section 31 stores hidden state h obtained by performing a predetermined operation in state cache 21. The state buffer 21 is an SRAM including a line buffer, and is a buffer for storing data of at least the number of stream data.
RNN unit 31 is capable of performing multiple layer operations. Here, RNN section 31 is capable of executing a layer 1 operation of performing a predetermined operation with stream data SD as an input, a layer 2 operation of performing a predetermined operation with hidden state h as an input of an operation result of the predetermined operation in layer 1, a layer 3 operation of performing a predetermined operation with hidden state h as an input of an operation result of the predetermined operation in layer 2, and the like.
A predetermined operation in RNN section 31 will be described. IN the L (letter "L") layer operation, the RNN unit 31 outputs the input value IN1 as pixel data i IN a certain step t, and outputs the output values OUT1 and OUT2 using an activation function tanh which is a nonlinear function as a predetermined operation. The output values OUT1, OUT2 are hidden states ht. Here, as shown in FIG. 4, the hidden state hl,tCalculated by the following formula (1).
hl,t=tanh(wl,ihil,t+wl,hhhl,t-1+bl)…(1)
Here, wl,ihAnd wl,hhThe weight parameters are expressed by the following expressions (2) and (3).
wl,ih∈Re×d…(2)
wl,hh∈Re×e…(3)
Here, Re×dAnd Re×eThe matrix is a space formed by a real number matrix of e rows and d columns and e rows and e columns, and both represent a matrix formed by real numbers.
Further, the input value (pixel data i)l,t) And output value (hidden state h)l,t) Are represented by the following formulae (4) and (5).
il,t∈Rd…(4)
hl,t∈Re…(5)
Here, RdRepresenting a real space of d dimensions, ReThe real number space, which represents the e-dimension, is represented by a vector formed by real numbers.
The values of the respective weight parameters of the above-described nonlinear-type function are optimized by learning of RNN.
Pixel data il,tThe input vector is, for example, a three-dimensional vector in the case of an RGB image being input, and the number of channels (channels) in the case of an intermediate feature map. Hidden state hl,tIs output toAmount of the compound (A). D. e denotes the dimensions of the input vector and the output vector, respectively. l is the layer number, which is the index of the sequence data. B is an offset value.
IN fig. 4, the RNN unit 31 generates and outputs two output values OUT1 and OUT2 of the same value from the input value IN1 and the output value from the previous pixel as the input value IN2, but the RNN unit 31 may output two output values OUT1 and OUT2 different from each other.
IN the layer 2 operation, the RNN unit 31 outputs the input value IN1 as the output value OUT1 of the layer 1, and outputs the output values OUT1 and OUT2 using the activation function tanh which is a nonlinear function as a predetermined operation.
When performing the layer operations such as layer 3 and layer 4 following the layer 2 operation, as IN the layer 2 operation, RNN section 31 sets input value IN1 to output value OUT1 of the previous layer, and outputs output values OUT1 and OUT2 using an activation function tanh which is a nonlinear function of a predetermined operation.
(action)
Next, the operation of ISP11 will be described. Here, an example having 3 layers is explained. As described above, the pixel stream decoder 23 outputs the stream data SD (fig. 5) in which the input image data IG is arranged in the order (the order indicated by the arrow a) of the plurality of pixel data from the pixel at the left end to the pixel at the right end of the 1 st line L1, the plurality of pixel data from the pixel at the left end to the pixel at the right end of the 2 nd line L2, …, and the plurality of pixel data from the pixel at the left end to the pixel at the right end of the data column LL (i.e., L6) of the final line.
IN layer 1, the first input value IN1 to the RNN unit 31 is the first data of the stream data SD (i.e., the 1 st column pixel IN the 1 st row of the input image data IG), and the input value IN2 is a predetermined default value.
IN layer 1, if two input values IN1 and IN2 are input to the RNN unit 31 IN the first step t1, a predetermined operation is performed to output values OUT1 and OUT 2. The output values OUT1 and OUT2 are stored in a predetermined storage area in the state buffer 21. The output value OUT1 of the step t1 at layer 1 is read from the state buffer 21 IN the first step t1 at the next layer 2 and used as the input value IN1 of the RNN cell 31. IN layer 1, the output value OUT2 IN step t1 is used as the input value IN2 IN the next step t 2.
Similarly, IN the following layer 1, the output value OUT1 IN each subsequent stride is read from state buffer 21 IN the subsequent layer 2 IN the corresponding stride, and used as the input value IN1 of RNN cell 31. IN layer 1, the output value OUT2 IN each subsequent stride is read from state cache 21 IN the next stride for use as the input value IN2 of RNN cell 31.
If the predetermined operation for each pixel data of the stream data SD in the layer 1 is finished, the process of the layer 2 is executed.
If the predetermined operation for the 1 st pixel data in the 1 st layer is finished, the process corresponding to the 1 st pixel of the 2 nd layer is executed.
IN layer 2, a plurality of output values OUT1 obtained from the first to the last step IN layer 1 are sequentially input to RNN unit 31 as input values IN 1. As in the processing in layer 1, RNN section 31 executes predetermined operations in layer 2 in the order from layer 1 step to the last step in layer 1.
If the predetermined calculation for each output value OUT1 of layer 1 in layer 2 is completed, the processing of layer 3 is executed.
If the predetermined operation for the 1 st pixel data in the 2 nd layer is finished, the process corresponding to the 1 st pixel in the 3 rd layer is executed.
IN layer 3, a plurality of output values OUT1 resulting from the first to last step sizes IN layer 2 are sequentially input as input values IN1 to the RNN unit 31. As in the processing in layer 2, RNN section 31 executes predetermined operations in layer 3 in the order from step 1 to the last step in layer 2.
Fig. 6 is a diagram for explaining a processing procedure of RNN section 31 for a plurality of pixel values included in input image data IG. Fig. 6 shows a flow of input values IN1, IN2 input to the RNN unit 31 and output values OUT1, OUT2 output from the RNN unit 31 among a plurality of steps. At layer 1, RNN unit 31 is denoted as RNN unit (RNNCell)1, at layer 2, RNN unit 2, and at layer 3, RNN unit 3.
Fig. 6 shows only the flow of processing for pixel data of the column x of the row y and the columns (x-1) and (x-2) preceding the column x in the input image data IG.
As shown IN FIG. 6, the input value IN1 of RNN cell 1 of column (x-2) of level 1 (level 1) is at step tkThe input pixel data. The input value IN2 of RNN unit 1 of column (x-2) at level 1 is the output OUT2 of RNN unit 1 of column (x-3) at level 1. The output value OUT1 of RNN unit 1 of column (x-2) at level 1 is the input value IN1 of RNN unit 2 of column (x-2) at level 2. The output value OUT2 of RNN unit 1 of column (x-2) at level 1 is the input value IN2 of RNN unit 1 of column (x-1) at level 1.
Similarly, the input value IN1 for RNN cell 1 of column (x-1) at level 1 is at step t(k+1)The input pixel data. The input value IN2 of RNN unit 1 of column (x-1) at level 1 is the output OUT2 of RNN unit 1 of column (x-2) at level 1. The output value OUT1 of RNN unit 1 of column (x-1) at level 1 is the input value IN1 of RNN unit 2 of column (x-1) at level 2. The output value OUT2 of RNN unit 1 of column (x-1) at level 1 is the input value IN2 of RNN unit 1 of column (x) at level 1.
The input value IN1 of RNN unit 1 of column (x) at level 1 is at step t(k+2)The input pixel data. The input value IN2 of RNN unit 1 of column (x) at level 1 is the output OUT2 of RNN unit 1 of column (x-1) at level 1. The output value OUT1 of RNN cell 1 of column (x) at level 1 is the input value IN1 of RNN cell 2 of column (x) at level 2. The output value OUT2 of RNN unit 1 IN column (x-1) at layer 1 is used as the input value IN2 of RNN unit l IN the next step.
As described above, RNN section 31 of RNN processor 22 sequentially performs RNN operations on a plurality of input pixel data and stores hidden state information in state buffer 21. The hidden state is the output of RNN unit 31.
The input value IN1 of RNN cell 2 of column (x-2) of layer 2 (layer 2) is the output value OUT1 of RNN cell 1 of column (x-2) of layer 1. The input value IN2 of RNN unit 2 of column (x-2) at level 2 is the output OUT2 of RNN unit 2 of column (x-3) at level 2. The output value OUT1 of RNN unit 2 of column (x-2) at level 2 is the input value IN1 of RNN unit 3 of column (x-2) at level 3. The output value OUT2 of RNN unit 2 of column (x-2) at level 2 is the input value IN2 of RNN unit 2 of column (x-1) at level 2.
Similarly, the input value IN1 of RNN unit 2 of column (x-1) at level 2 is the output value OUT1 of RNN unit 1 of column (x-1) at level 1. The input value IN2 of RNN unit 2 of column (x-1) at level 2 is the output OUT2 of RNN unit 2 of column (x-3) at level 2. The output value OUT1 of RNN unit 2 of column (x-1) at level 2 is the input value IN1 of RNN unit 3 of column (x-1) at level 3. The output value OUT2 of RNN unit 2 of column (x-1) at level 2 is the input value IN2 of RNN unit 2 of column (x) at level 2.
The input value IN1 of RNN cell 2 of column (x) of layer 2 is the output value OUT1 of RNN cell 1 of column (x) of layer 1. The input value IN2 of RNN unit 2 of column (x) at level 2 is the output OUT2 of RNN unit 2 of column (x-1) at level 2. The output value OUT1 of RNN cell 2 of column (x) at level 2 is the input value IN1 of RNN cell 3 of column (x) at level 3. The output value OUT2 of RNN cell 2 IN column (x) of layer 2 is used as the input value IN2 of next-step RNN cell 2.
The input value IN1 of RNN cell 3 of column (x-2) of layer 3 (layer 3) is the output value OUT1 of RNN cell 2 of column (x-2) of layer 2. The input value IN2 of RNN unit 3 of column (x-2) at level 3 is the output OUT2 of RNN unit 3 of column (x-3) at level 3. The output value OUT1 of the RNN unit 3 of the column (x-2) of layer 3 is input to the softmax layer here, and the output image data OG is output from the softmax layer. The output value OUT2 of RNN unit 3 of column (x-2) at level 3 is the input value IN2 of RNN unit 3 of column (x-1) at level 3.
Similarly, the input value IN1 of RNN unit 3 of column (x-1) at level 3 is the output value OUT1 of RNN unit 2 of column (x-1) at level 2. The input value IN2 of RNN unit 3 of column (x-1) at level 3 is the output OUT2 of RNN unit 3 of column (x-2) at level 3. The output value OUT1 of the RNN unit 3 of the column (x-1) of layer 3 is input to the softmax layer here, and the output image data OG is output from the softmax layer. The output value OUT2 of RNN unit 3 of column (x-1) at level 3 is the input value IN2 of RNN unit 3 of column (x) at level 3.
The input value IN1 of RNN cell 3 of column (x) of layer 3 is the output value OUT1 of RNN cell 2 of column (x) of layer 2. The input value IN2 of RNN unit 3 of column (x) at level 3 is the output OUT2 of RNN unit 3 of column (x-1) at level 3. The output value OUT1 of the RNN unit 3 of column (x) of layer 3 is input to the softmax layer here, and the output image data OG is output from the softmax layer. The output value OUT2 of RNN cell 3 IN column (x) of layer 3 is used as the input value IN2 of RNN cell 3 IN the next step.
Thus, the output of layer 3 is data of a plurality of output values OUT1 obtained in a plurality of steps. The output of layer 3 is input to the softmax layer. The output of the softmax layer is converted into image data of y rows and x columns, and stored in the off-chip memory 12 as output image data OG.
As described above, the RNN unit processor 22 performs the recurrent neural network operation using at least one of the plurality of pixel data of the image data and the hidden state as the operation result of the RNN operation stored in the state buffer 21. RNN processor 22 is capable of executing multiple layers that are units of processing to perform multiple RNN operations. The plurality of layers include a 1 st processing unit (layer 1) that performs RNN operation with a plurality of pixel data as input, and a 2 nd processing unit (layer 2) that performs RNN operation with hidden-state data obtained in the 1 st processing unit (layer 1) as input.
As described above, the values of the weight parameters of the nonlinear function in the RNN operation are optimized by the RNN learning.
As described above, according to the above-described embodiment, predetermined processing is performed on image data using RNN instead of CNN.
Thus, unlike the method of performing the kernel operation while sliding a window of a predetermined size with respect to the entire image data after the image data is held in the off-chip memory 12, the image processing apparatus according to the present embodiment converts the image data into stream data SD and sequentially executes RNN operations, and therefore can perform the neural network operation processing with a small latency and at a low cost.
(modification 1)
IN the above-described embodiment, image data including a plurality of pixels IN a plurality of rows and a plurality of columns is converted into stream data SD, and pixel values from the 1 st row and the 1 st column to the final column IN the final row are sequentially input as input values IN1 of 1 RNN cell processor 31.
However, in the case of image data, the tendency of the feature amount is different between the pixel value of the pixel at the 1 st column in each row and the pixel value at the final column in the row immediately before.
Therefore, IN modification 1, instead of setting the output value OUT2 of the final column of each row as the first input value IN2 of the next row, a line end (line end) element is added which is changed to a predetermined value and then set as the first input value IN2 of the RNN element 31 of the next row.
As the line end unit, RNN unit 31 may be used by changing the execution content of RNN unit 31 to perform an operation of a nonlinear function different from the above-described nonlinear function, or line end unit 31a, which is an operation unit different from RNN unit 31 and is provided in RNN unit processor 22, may be used as indicated by a broken line in fig. 3.
The values of the respective weight parameters of the nonlinear function of the line end unit are also optimized by the learning of RNN.
Fig. 7 is a diagram for explaining the processing procedure of the line end unit 31a for the output value OUT2 of the final column of each row. Here, each line of the image data has W pixel values. That is, the image data has W columns.
As shown in fig. 7, after RNN section 31 performs a predetermined operation on the pixel data of the final column (W-1) when the 1 st column is set to 0, output value OUT2 is input to line end section 31 a.
As shown in fig. 7, the line end unit 31a processes the output value OUT2 of the RNN unit 31 in the final column (W-1) of each row for each layer. In fig. 7, the line end cell 31a in the 1 st layer is denoted as a line end cell (LineEndCell)1, the line end cell 31a in the 2 nd layer is denoted as a line end cell 2, and the line end cell 31a in the 3 rd layer is denoted as a line end cell 3.
In layer 1, the line end unit 31a of the y-th row takes the output value OUT2 (h) of RNN unit l of the final column of the y-th row of layer 11(W-1,y)) For input, the hidden state h of the output value as the result of the operation1(line)As the input value IN2 for RNN cell 1 IN the next (y +1) th row.
Similarly, in layer 2, the line end unit 31a in the y-th row also takes the output value OUT2 (h) of RNN unit 2 in the final column in the y-th row in layer 22(W-1,y)) For input, the hidden state h of the output value as the result of the operation2(line)As the input value IN2 for RNN cell 2 IN the next (y +1) th row.
Similarly, in layer 3, the line end unit 31a in the y-th row also takes the output value OUT2 (h) of RNN unit 3 in the final column in the y-th row in layer 33(W-1,y)) For input, the hidden state h of the output value as the result of the operation3(line)As the input value IN2 for RNN cell 3 IN the next (y +1) th row.
As described above, when the RNN processor 22 includes the pixel data of n rows and m columns, the RNN processor 22 includes the line end unit 31a that performs a predetermined operation on the hidden state between two adjacent rows.
Thus, the line end unit 31a is provided at a changing point of the line in each layer. The line end unit 31a performs a process of changing the input output value OUT2, and sets the changed output value as the input value IN2 of the RNN unit 31 when performing the next line process.
As described above, by changing the output value OUT2 of the final column in each row by the line end unit 31a, it is possible to eliminate the influence of the difference in the tendency of the feature amount between the final pixel value of each row and the first pixel value of the next row, and it is possible to expect improvement in accuracy such as noise removal.
(modification 2)
IN the above-described embodiment, the input value IN1 of RNN section 31 is obtained at a uniform step size across all layers. IN contrast, IN modification 2, input value IN1 of RNN section 31 is obtained not with a step size that matches between layers, but with an offset delay so that the RNN operation has the same Field as the Field of CNN (received Field). In other words, the image processing apparatus according to modification 2 is configured to execute RNN calculation with an offset between layers.
Fig. 8 is a diagram for explaining the processing procedure of RNN section 31 for a plurality of pixel values included in input image data IG according to modification 2.
As shown in fig. 8, the pixel data i of the stream data SD is sequentially processed in the layer 1. However, IN the layer 2, as the input value IN1 of the RNN unit 2, the output value OUT1 of the RNN unit 1 is used with a delay of the shift amount u1 IN the x direction of the image and a delay of the shift amount v1 IN the y direction of the image. The offset information is written to the off-chip memory 12, and is written from the off-chip memory 12 to the RNN unit processor 22 as a parameter.
IN fig. 8, input value IN1 of RNN unit 2 is represented by the following expression (6).
i2(x-u1,y-v1)=h1(x-u1,y-v1)…(6)
Further, IN layer 3, the input value IN1 of the RNN unit 3 is delayed by the shift amount (u1+ u2) IN the x direction of the image and by the shift amount (v1+ v2) IN the y direction of the image, and the output value OUT1 of the RNN unit 1 is used. That is, IN fig. 8, input value IN1 of RNN section 3 is represented by the following expression (7).
The output value OUT1 of each RNN cell 3 of layer 3 is represented by the following expression (8).
Fig. 9 is a diagram illustrating the Receptive Field (Receptive Field) of CNN. The receptive field is the range of input values that affect the kernel operation. The output image data OG is generated by the layer LY1 that performs the CNN operation on the input image data IG. In this case, the range R2 of the layer LY1 wider than the kernel size R1 has an influence on the output value P1 of the output image data. Thus, in the case of CNN, if the CNN calculation is repeated, the reception field, which is a range of input values directly or indirectly referred to for obtaining an output value, becomes large.
In contrast, in the above-described embodiment, since the RNN operation is performed, the range of the result of the RNN operation performed before the operation step size for each layer can be referred to as the reception field.
Fig. 10 is a diagram for explaining the receptive field of the above embodiment. Fig. 11 is a diagram illustrating the difference in the range of receptive fields between CNN and RNN. The RNN unit 31 is configured to perform RNN operation on the stream data SD of the input image data IG in the layer LY11, and to determine a range R12 indicated by a broken line in the input image data IG in fig. 10 as a reception field. The range R11 is the result of the calculation of the step size earlier than the step size of the output value P1 in the reception field of the output value P1 of the level LY 11.
Therefore, in the above-described embodiment, the result of the RNN operation on the pixel values around the output value P1 such as CNN shown in fig. 9 is not used. As shown in FIG. 11, the RNN receptor field RNNR is different from the CNN receptor field CNNR.
Therefore, IN the above-described embodiment, IN order to perform RNN calculation IN consideration of the reception field, as IN the case of CNN, RNN section 31 shifts the range of input value IN1 read from state buffer 32 so that input value IN1 of RNN section 31 used for a certain step of a certain layer becomes the hidden state h (output value) of RNN section 31 IN a step different from the step IN the previous layer. That is, the data of the hidden state obtained in the layer 1 as the 1 st processing unit is given from the state buffer 21 to the RNN processor 22 in the layer 2 as the 2 nd processing unit by a step delayed by a set offset amount.
As shown IN fig. 8, IN the layer 2, the input value IN1 of RNN cell 2 becomes the output value OUT1 of the pixel position after being shifted by u1 IN the x direction and v1 IN the y direction. That is, IN layer 2, the output value OUT1 of RNN operation of layer 1 at the pixel position shifted by a predetermined value (u1, v1) IN the horizontal and vertical directions of the image data by RNN cell 2 becomes the input value IN1 of RNN cell 2 of layer 2.
Further, IN the 3 rd layer, the input value IN1 of the RNN unit 3 becomes the output value OUT1 shifted IN the x direction by (u1+ u2) and IN the y direction by (v1+ v2) IN the output image of the 2 nd layer.
Also, the output value OUT1 of the RNN unit 3 becomes an output value shifted in the x direction by (u1+ u2+ u3) and in the y direction by (v1+ v2+ v3) in the output image of the layer 2.
Fig. 12 is a diagram for explaining an input step size of RNN unit 31. As shown IN fig. 12, the output value OUT1 of RNN cell 1 having the first pixel data i1(0, 0) as the input value IN1 is at step t corresponding to the offset value IN layer 2aIs used as the input value IN 1. The offset value in layer 2 is a step difference with respect to the acquisition step of the pixel data of the stream data SD in layer 1. Here, the offset value is a value corresponding to a step difference from the position (0, 0) of the pixel at the 1 st column in row 1 to the pixel position (u1, v1) at the v1 th column in row u 1.
Thus, the first step t at layer 2aIN (1), the input value IN1 of RNN unit 2 is the first step t from layer 1bThe output value OUT1 shifted by the step size of the offset value.
Further, the offset value may be the same between layers, but here, it differs for each layer. Step size t in layer 3, as shown in FIG. 12aThe value of the output value OUT1 of the RNN cell 31 shifted by the pixel position (u11, v11) becomes the input value IN1 of the RNN cell 31 IN the layer 3.
Fig. 13 is a diagram for explaining the setting range of the reception field in modification 2. When the offset value of the input value IN of the layer LY21 is set, a predetermined area AA is added to the input image data IG by padding (padding). As shown in fig. 13, the output value P1 is influenced by the input value P2 in the receptor field RNNR and is output. Thereby, the output value P1 is influenced by the output value of the receptive field RNNR of the layer LY21, and the receptive field RNNR of the layer LY21 is influenced by the input value of the receptive field RNNR of the input image data IG. The output value PE is affected by the input value P3 of the added area AA.
As described above, the offset amount of the input step of the input value IN1 IN each RNN calculation is set for each layer, and the same setting of the reception field as that of the CNN can be performed even IN the image processing using RNNs.
As described above, according to the above-described embodiment and the modifications, it is possible to provide an image processing apparatus which has a small waiting time and can be realized at low cost.
The RNN Unit 31 is a simple RNN, but may have a structure of an LSTM (Long Short Term Memory) network, a GRU (Gated current Unit), or the like.
While the embodiments of the present invention have been described above, these embodiments are merely examples and are not intended to limit the scope of the present invention. These new embodiments may be implemented in other various forms, and various omissions, substitutions, and changes may be made without departing from the spirit of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalent scope thereof.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种基于GPU的快速重建成像方法