Hardware implementation method, module, chip and system of softmax function
1. A hardware implementation method of a softmax function is characterized in that the softmax function is subjected to function transformation, the base number E in the softmax function is transformed into 2, and an E-to-2softmax function is formed;
the maximum value of the data is obtained through the comparator module, the data is subtracted through the subtractor module, the data is divided through the data dividing module, the corresponding value of the data is searched in the lookup table through the mirror image data searching module, the data multiplication and addition operation is realized through the shifting and summing module, and the data is converted through the data conversion module.
2. The hardware implementation method of the softmax function according to claim 1, comprising the following steps:
step 1: inputting all input data into a comparator module for comparison, and selecting the maximum value of the input data;
step 2: inputting the input data and the maximum value of the input data into a subtractor module for difference operation;
and step 3: sending the data calculated by the subtractor module to a data division module, and dividing the data into a first part and a second part;
and 4, step 4: inputting the second part of data divided by the data dividing module into the mirror image searching module for searching and calculating, inputting the value and the first part of data into the shifting and summing module together, realizing multiplication operation through shifting, and summing the results of all shifting data;
and 5: the data conversion module converts the data output by the shift summation module into an expression form, and the converted data is searched in the mirror image searching module to obtain a corresponding value;
step 6: and (5) dividing the calculation result in the step (5) by a data dividing module, carrying out shifting operation by a mirror image searching module and a shifting and summing module, and calculating to obtain the value of the E-to-2softmax function.
3. The method as claimed in claim 2, wherein the data partitioning module in step 3 partitions the data into two multiplied parts, the first part of the data is expressed by base 2, the exponent is a negative integer, and 2 is usedNegative integerRepresents; the second part of data is expressed in the form of base 2, positive exponent and decimal number 2Decimal fractionIs shown as 2Negative integer + decimal=2Negative integer×2Decimal fraction。
4. The method of claim 3, wherein the multiplication of the first portion of data and the second portion of data is implemented by shifting the second portion of data table lookup confirmation value to the right by a negative integer number of bits in the shift and sum module.
5. The hardware implementation method of softmax function according to claim 2, wherein the mirror lookup module looks up 2 in the lookup table in a mirror mannerxAnd log2x two functions.
6. The hardware implementation method of the softmax function as claimed in claim 2, wherein the data transformation module transforms the data into 1. mx 2EOf the form (A), E is an integer, where log2 1.MAnd (4) performing table lookup calculation through a mirror image lookup module.
7. A computation module of a softmax function, characterized in that it uses a hardware implementation method of a softmax function according to any of claims 1 to 6, said computation module comprising a comparator module, a subtractor module, a data segmentation module, a mirror lookup module, a shift summation module and a data transformation module;
the device comprises a comparator module, a subtractor module, a data dividing module, a mirror image data searching module, a shift summing module and a data conversion module, wherein the comparator module is used for calculating the maximum value of input data, the subtractor module is used for calculating the difference value between the input data and the maximum value of the input data, the data dividing module is used for dividing the data, the mirror image data searching module is used for searching a data corresponding value in a searching table, the shift summing module is used for multiplying and adding the data, and the data conversion module is used for converting the data;
the output end of the comparator module is connected with the input end of the subtracter module, the output end of the subtracter module is connected with the input end of the data dividing module, the output end of the data dividing module is respectively connected with the input end of the mirror image searching module and the input end of the shifting summing module, the output end of the mirror image searching module is also connected with the input end of the shifting summing module, the output end of the shifting summing module is connected with the input end of the data conversion module, and the output end of the data conversion module is also connected with the input end of the mirror.
8. The computation module of a softmax function as claimed in claim 7, wherein the data partitioning module partitions the data into two multiplied parts, the first part of the data being expressed by a base 2, an exponent being a negative integer, and 2 being usedNegative integerRepresents; the second part of data is expressed in the form of base 2, positive exponent and decimal number 2Decimal fractionIs shown as 2Negative integer + decimal=2Negative integer×2Decimal fraction。
9. A chip characterised by comprising a computation module of a softmax function according to any of claims 7 to 8.
10. A softmax function hardware system comprising a chip as claimed in claim 9, the system further comprising a host controller, a data input module and a data output module, the host controller being connected to the data input module, the data output module and the chip; the main controller controls data to be input into the chip through the data input module for calculation, and the chip returns the calculation result to the main controller through the data output module.
Background
Deep learning is a subset of machine learning, and features in data are effectively extracted and converted by using a multi-layer mathematical function. One of the research hotspots of the deep learning algorithm is an activation function, and the activation function introduces a nonlinear factor to a neuron, so that the neural network can approach any nonlinear function at will, and the neural network can be applied to a plurality of nonlinear models.
The softmax function (normalized exponential function) is usually used as an output layer activation function in a multi-classification task, as the last layer in deep learning for classification. The softmax function has e-exponential operation, so that the hardware implementation of the DNN has quite high complexity, and meanwhile, in a high-speed neural network, the calculation of the softmax function needs to be completed in as few cycles as possible, namely, the implementation needs to be completed in as little time as possible. Currently, many scholars research various methods to reduce the complexity of hardware implementation, and simultaneously consider the resource usage of hardware and reduce the computation time to the maximum extent.
The softmax function can be expressed by the following formula:
as can be seen from the above formula, the calculation of the softmax function includes e-exponential operation and division, so that it cannot be directly calculated in a hardware system. In the prior art, a method based on an LUT lookup table is often used for calculating when a function is realized, and a lookup table method needs to consume a lot of resources when being realized; compared with the method of looking up the table, the implementation method based on the CORDIC consumes much resources but consumes time.
In summary, in the softmax function hardware implementation in the prior art, since the function has e-exponential operation and division calculation, the problems of high hardware consumption and high complexity are caused, the softmax function hardware implementation is hardware-unfriendly, and an implementation method with hardware friendliness and high performance is needed to support the multi-classification task problem of the neural network.
Disclosure of Invention
1. Technical problem to be solved
Aiming at the problems that the softmax function in the prior art has e exponential operation and division, excessive resources or time are consumed for hardware implementation, the complexity is high and the like, the invention provides the hardware implementation method, the module, the chip and the system of the softmax function, the softmax function is transformed, exponential operation and division operation in hardware implementation are avoided, and the softmax function hardware implementation with high performance and low complexity is realized.
2. Technical scheme
The purpose of the invention is realized by the following technical scheme.
The invention relates to a method for realizing a multi-classification task classification activation function in deep learning, in particular to a hardware realization method of a softmax function.
The first technical scheme disclosed by the invention is a hardware implementation method of the softmax function, before the hardware implementation, the softmax function is firstly subjected to function transformation, the base number E in the softmax function is converted into 2, an E-to-2softmax function is formed, and the function expression is as follows:
in order to avoid the problem of overlarge intermediate result data operation, converting input data and performing xi-xmaxOperates as follows:
the method can effectively reduce the problem of overlarge intermediate data operation and can also reduce the operation based on 2xAnd log2The lookup table of x needs to store the range of data, thereby saving hardware resources.
And transforming the transformed function in an exponential logarithmic transformation form of 2 to avoid division operation unfriendly by hardware, wherein the transformation formula is as follows:
the formula is arranged to obtain:
the transformed function is subjected to data maximum value obtaining through a comparator module, data are subjected to subtraction through a subtractor module, data are segmented through a data segmentation module, data corresponding values are searched in a lookup table through a mirror image data search module, data multiplication and addition operation is achieved through a shift summation module, and data are converted through a data conversion module.
Further, the method comprises the following steps:
step 1: all input data xiInputting the data into a comparator module for comparison, and selecting the maximum value x of the input datamax;
Step 2: will input data xiAnd maximum value of input dataxmaxInputting the difference into a subtracter module to perform difference operation to obtain xi-xmax;
And step 3: sending the data calculated by the subtractor module to a data division module, and dividing the data into a first part and a second part; data partitioning module pairSeparating, dividing the data into two multiplied parts, the first part of the data being expressed in the form of base 2, the exponent being a negative integer, using 2Negative integerRepresents; the second part of data is expressed in the form of base 2, positive exponent and decimal number 2Decimal fractionIs shown as 2Negative integer + decimal=2Negative integer×2Decimal fraction,
Namely, it is
And 4, step 4: inputting the second part of data divided by the data dividing module into the mirror image searching module for searching and calculating, inputting the value and the first part of data into the shifting and summing module together, realizing multiplication operation through shifting, shifting the second part of data table searching confirmation value to the right in the shifting and summing module by negative integer digits to realize multiplication operation of the first part of data and the second part of data, realizing data multiplication through shifting, avoiding power operation and multiplication operation, and reducing complexity.
Summing all shifted data results, since xi-xmaxLess than or equal to 0, and 2 is calculated, and the value range after the shift results of all the input data are summed is (1, N)]N represents the number of input data;
and 5: the data conversion module converts the data output by the shift summation module into an expression form, and the converted data is searched in the mirror image searching module to obtain a corresponding value; the data conversion module converts the data into 1. MX 2EE is an integer, and is represented by the following formula:
converting the data into a form during a logarithmic calculation, wherein log2 1.MCalculating by looking up table through mirror image searching module based on log2x mirror lookup, lookup log2 1.MThe value of (c), reducing computational complexity; at this point,. epsilon.iIs calculated, i.e.
Step 6: inputting the calculated values in the step into a mirror image lookup table to calculate the value of the E-to-2softmax function, and calculating the epsilon in the step 5iThe value is less than zero, and the data is sent to the data segmentation module to be changed into a form of negative integer plus decimal, at this time,
repeating the step 4 to obtain the product 2Decimal fractionSending the data to a mirror image searching module for searching, wherein the searching value is equal to 2Negative integerMultiplication is realized by right shifting negative integer bits, and then the value of the E-to-2softmax function is calculated, and the whole output is completed.
Further, the mirror lookup module looks up 2 in the lookup table in a mirror mannerxAnd log2x two functions. Because 2xAnd log2The two functions x are symmetrical with respect to y ═ x, and the interval to be searched for in the present invention is also symmetrical, for example 2xFor x the search interval is [0,1), the corresponding function value is [1,2), and log2The interval in which x needs to be searched for x is [1,2), and the interval of the corresponding function value is [0,1), so in order to save hardware resources and reduce area, the invention constructs a mirror image lookup table, and can complete the search of two functions on the basis of symmetry by means of interval division.
The invention adopts a lookup table multiplexing method to search two function values in a mirror image mode, thereby reducing the complexity of hardware and the consumption of resources.
The second technical scheme disclosed by the invention is a calculation module of the softmax function, the calculation module uses the hardware implementation method of the softmax function, and the calculation module comprises a comparator module, a subtractor module, a data segmentation module, a mirror image searching module, a shift summation module and a data conversion module;
the device comprises a comparator module, a subtractor module, a data dividing module, a mirror image data searching module, a shift summing module and a data conversion module, wherein the comparator module is used for calculating the maximum value of input data, the subtractor module is used for calculating the difference value between the input data and the maximum value of the input data, the data dividing module is used for dividing the data, the mirror image data searching module is used for searching a data corresponding value in a searching table, the shift summing module is used for multiplying and adding the data, and the data conversion module is used for converting the data;
the output end of the comparator module is connected with the input end of the subtracter module, the output end of the subtracter module is connected with the input end of the data dividing module, the output end of the data dividing module is respectively connected with the input end of the mirror image searching module and the input end of the shifting summing module, the output end of the mirror image searching module is also connected with the input end of the shifting summing module, the output end of the shifting summing module is connected with the input end of the data conversion module, and the output end of the data conversion module is also connected with the input end of the mirror image searching module.
The invention utilizes the mathematical transformation and the mirror image searching mode, and all parts of the computing module cooperate to complete the realization of the E-to-2softmax function, thereby avoiding the power operation and division operation which are not friendly to hardware, and having the characteristics of low complexity, high performance and the like. On the hardware realization of a multitask classification activation function in a neural network, the method for realizing the hardware function of the E-to-2softmax function does not change the characteristic of the softmax function, and the reliability of the E-to-2softmax function is proved through software simulation.
Further, the data dividing module divides the data into two multiplied parts, the first part of the data is expressed in a form of base 2, the exponent is a negative integer, and 2 is usedNegative integerRepresents; the second part of data is expressed in a form of base 2 and exponent of positive decimal numberUsing 2Decimal fractionIs shown as 2Negative integer + decimal=2Negative integer×2Decimal fraction。
The third technical scheme disclosed by the invention is a chip, and the chip comprises the calculation module of the softmax function.
The fourth technical scheme disclosed by the invention is a hardware system of a softmax function, the system comprises the chip, and the system also comprises a main controller, a data input module and a data output module, wherein the main controller module controls distribution of configuration information, the data input module and the data output module are responsible for carrying data, and a computing module included by the chip is used for carrying out operations such as comparison, subtraction, segmentation, search, shift, summation, conversion and the like on the data.
The main controller is connected with the data input module, the data output module and the chip; the main controller controls data to be input into the chip through the data input module for calculation, and the chip returns the calculation result to the main controller through the data output module.
3. Advantageous effects
Compared with the prior art, the invention has the advantages that:
the new E-to-2softmax function provided by the invention does not change the characteristic of the softmax function, simultaneously utilizes the mathematical transformable characteristic of the function to perform mathematical transformation on the E-to-2softmax function, and performs hardware implementation by methods of data transformation, data segmentation, shifting and mirror image searching.
In the implementation of the activation function of the neural network multi-classification task, because softmax has e-exponential operation and division, the problems of high hardware consumption, high complexity and the like can be generated in the hardware implementation, and great hardware unfriendliness is realized. The invention optimizes the softmax activation function, provides the E-to-2softmax activation function to replace the softmax activation function, and provides a hardware implementation method with high performance and low complexity based on the E-to-2softmax to avoid exponential operation and division operation, thereby greatly solving the difficult problem of hard implementation of softmax hardware.
Drawings
FIG. 1 is a graph showing the results of experiments in which the softmax function is used as the activation function of the output layer in the reuters experiments;
FIG. 2 is a diagram illustrating the results of an experiment of making an E-to-2softmax function as an output layer activation function in a reuters experiment;
FIG. 3 is a graph showing the result of the experiment of the softmax function as the activation function of the output layer in the MNIST experiment of the present invention;
FIG. 4 is a graph showing the result of the experiment of the E-to-2softmax function as the activation function of the output layer in the MNIST experiment of the present invention;
FIG. 5 is 2xAnd log2xA schematic diagram of a mirror image searching principle;
FIG. 6 is a diagram of the E-to-2softmax function hardware architecture of the present invention.
Detailed Description
The invention is described in detail below with reference to the drawings and specific examples.
Examples
The embodiment performs transformation based on a traditional softmax function, discloses an E-to-2softmax function, and compared with the softmax function, the E-to-2softmax function avoids E-exponential operation and division operation in hardware implementation, can be implemented only by performing some data transformation, data segmentation, shifting and addition operations, and has high performance and low complexity.
The softmax function is expressed by the formula:
wherein x isiDenotes the ith element in array x, softmax (i) denotes the softmax value of this element, array x has n elements in total.
In this embodiment, the base E is replaced by 2 according to the E-to-2softmax function, and the function expression is:
first, theoretical analysis was performed on the feasibility of the E-to-2softmax function.
The softmax function is widely used in the output layer function of the multi-classification task, and has two main reasons: the first is normalization; the second is the simplicity of its form of loss function cross entropy.
The E-to-2softmax function was analyzed in detail for both of the above characteristics.
First, the normalized characteristics of the E-to-2softmax function were theoretically analyzed.
Take an example of a softmax function with n inputs:
equation (3) represents the softmax function value corresponding to the ith element, and the sum of the softmax function values of n inputs can obtain:
summing the E-to-2softmax functions for n inputs yields:
from equation (5), the E-to-2softmax function retains the normalized properties of the softmax function.
Next, theoretical analysis was performed on the simplicity of the cross-entropy form of the E-to-2softmax function.
An objective function is specified when gradient optimization is carried out, because the mean square error of the softmax function is relatively complex, cross entropy is generally used as the objective function for optimization, because 2xIs equivalent to exln2The E-to-2softmax function inherits the compact cross-entropy form of the softmax function, simply by multiplying the result by an ln2 factor. Thus, the activation function for the multi-tasking output layer of the E-to-2softmax function mathematically has the same mathematical properties as the softmax function.
Finally, the software feasibility of the E-to-2softmax function was analyzed by two sets of experiments.
Experiment one, performing multi-classification based on corpus package reuters related to news reports of the road agency in a Keras package.
In the classification of the experiment, 46 subjects are distinguished, when news subjects are classified by using a reuters data set, data preprocessing is firstly carried out, keywords of each piece of news are represented by numbers, wherein 1 represents that the keywords exist, 0 represents that the keywords do not exist, and each piece of news can be represented by a one-dimensional vector consisting of 0 and 1.
In this embodiment, a first neural network is designed, and properties of the softmax function and the E-to-2softmax function are analyzed by taking this as an example, and the first neural network is only one realizable form and can be transformed in structure according to requirements such as calculation accuracy. The first neural network comprises four layers, wherein the first layer is an input layer, the second layer is a hidden layer which is provided with 64 neurons and takes a Relu function as an activation function, the third layer is still a hidden layer which is provided with 64 neurons and takes a Relu function as an activation function, and the fourth layer is an output layer which is provided with 46 neurons and takes a softmax function as an activation function.
FIG. 1 is a graph showing the results of experiments in which softmax function was used as the activation function of the output layer in the reuters experiment, and each iteration continuously improves the prediction accuracy of the neural network, and the training time of 20 periods (Epochs) is about 8.57 seconds.
The fourth layer of the experiment-neural network was designed as the output layer with 46 neurons with the E-to-2softmax function as the activation function.
FIG. 2 is a graph of the results of the reuters experiment using the E-to-2softmax function as the activation function of the output layer, where each iteration continuously improves the prediction accuracy of the neural network, and the training time of 20 Epochs (Epochs) is about 8.80 seconds.
According to a calculation result of an experiment, the E-to-2softmax function introduced into the neural network training can achieve the performance basically same as the softmax function in the aspects of accuracy and speed. Through further experimental demonstration, the time consumption of training 20 periods (Epochs) after the E-to-2softmax function is replaced rises from 8.57s to 8.80s, because the E-to-2softmax function is not specially adapted in software, extra operation cost is required to be paid in each calling, and the training time is further increased.
Experiment two, based on the classical python open source data set MNIST.
MNIST is a handwritten digit data set, and the training targets of the experiment are numbers of 0-9. In the experiment, a second neural network is designed for analyzing the softmax function and the E-to-2softmax function as an example, and at the moment, the second neural network is only an realizable form and can transform the structure according to the requirements of calculation precision and the like. The second neural network comprises seven layers: the first layer is the input layer, the second layer is the convolutional layer, with 32 filters, the filter is of the form (5,5), the activation function is Relu; the third layer is a pooling layer of (2, 2); the fourth layer is a Dropout layer, the fifth layer is a Flatten layer, and the sixth layer is a full-connection layer of 240 neurons with Relu function as an activation function; the seventh layer is the output layer, which consists of ten neurons, with the activation function softmax.
Fig. 3 is a graph showing the results of the MNIST experiment using softmax function as the activation function of the output layer, and as can be seen from fig. 3, ten iterations can substantially converge, and the total time of 20 Epochs (Epochs) is about 158.5s after the experiment.
The seventh output layer of the neural network in experiment two above was adjusted to the E-to-2softmax function as the activation function.
FIG. 4 is a graph of the results of the MNIST experiment using the E-to-2softmaxsoftmax function as the activation function of the output layer, and it can be seen from FIG. 4 that ten iterations can substantially converge, and the total time consumption of 20 Epochs (Epochs) is about 152.6s after the experiment, i.e. the time consumption for training using the E-to-2 softmaxsofmax function as the activation function of the output layer is reduced.
As can be seen from fig. 3 and 4, when the E-to-2softmax function realizes MNIST identification, the accuracy rate is substantially the same as when the softmax function is used as an activation function, the loss function has a descending trend, and the function is the same as that of the softmax activation function.
The hardware implementation method of the E-to-2softmax function is described in detail below, and includes the following steps.
First a mirror look-up table is constructed, mirror symmetry being shown in fig. 5.
According to the numerator and denominator, divided byThe functional form of the E-to-2softmax is transformed as follows, without changing the nature of the function value:
the logarithm at base 2 operation is performed on the E-to-2softmax function after deformation, and the E-to-2softmax function becomes the following form:
the following form is obtained by collating equation (7):
for epsilon in formula (8)iIs obtained byiAs can be seen from the above formula, we need to select the maximum value x of the maximum input datamaxThe maximum value can be selected by a comparison operation, where xi-xmaxAnd xj-xmaxCan be achieved by a subtraction operation, for
We turn xj-xmaxInto the form of negative integers plus positive decimal numbers, i.e.
Where the fractional number is in the range of [0,1 ], when x isjIs equal to xmaxWhen the decimal position is 0; from the range of the fractional number, it can be determined that the range of search x based on the 2 x-th mirror is also [0,1 ], so search 2 can be performed based on thisDecimal fraction. And 2Negative integerThe multiplication may be performed by shifting the found values and summing them using an addition operation. Due to xj-xmax0 or less, thereforeHas a value range of (1, N)]Thus, can beThe value of (A) is converted into 1. MX 2EWhere E is to store a converted weight, for example, if the summed data is defined as a signed number of 13 bits, where the decimal digit is 6 bits, and the binary value after summation is 0100100.100000, then the binary number after conversion is 1.001001, where E has a value of 5, as follows:
log2 1.Mis in the range of [0,1 ]]So that it can implement mirror symmetry search with power x of 2, and can calculate value so as to implement epsiloniAnd (4) calculating.
Then, the method obtains
As can be seen from the formula (12), εiIs less than 0, and we can process ε in the same way as equation (10)iTo a negative integer plus fractional form, at which point the above equation may be changed as follows:
based on mirror look-up table 2xTo look for 2Decimal fractionValue of (2)Negative integerThe method can be realized by performing right shift operation on the found value, and then obtaining E-to-2softmax (i), thereby completing the whole realization. The mirror lookup table is implemented because 2 pairs are requiredxAnd log2And searching the x two function values by using a mirror image searching mode. Because 2xAnd log2The two functions x are symmetrical about y ═ x, and the interval to be searched for in this embodiment is also symmetrical, for example 2xFor x the search interval is [0,1), the corresponding function value is [1,2), and log2The interval in which x needs to be searched for x is [1,2), and the interval of the corresponding function value is [0,1), so in order to save hardware resources and reduce area, the embodiment constructs a lookup table, and the search for two functions can be completed on the basis of symmetry by means of interval division.
According to the analysis, the hardware implementation of the E-to-2softmax function needs to comprise a main controller, a data input module, a data output module and a calculation module, wherein the main controller is connected with the data input module, the data output module and the calculation module; the main controller controls data to be input into the calculation module through the data input module for calculation, and the calculation module returns the calculation result to the main controller through the data output module.
The computing module comprises a comparator module, a subtractor module, a data segmentation module, a mirror image searching module, a shifting summation module and a data conversion module; the main controller controls input data to be input into the comparator module and the subtracter module, data comparison and difference are conducted between the comparator module and the subtracter module, the data are divided into a decimal part and a negative integer part through the data dividing module, corresponding values of the data divided by the data dividing module are searched through the mirror image searching module, then the data are input into the shifting and summing module to be shifted and summed, data conversion is conducted through the data conversion module, and finally calculation results are output. The E-to-2softmax function is successfully realized on hardware through methods such as comparison, subtraction, data segmentation, shift summation, mirror image searching and the like.
When the device is applied specifically, the comparator module is used for comparing and finding out the maximum value of the input data; the subtractor module is used for subtracting the input data from the maximum value of the input data calculated by the comparator module to obtain xi-xmax(ii) a The data segmentation module is used for separating the data values calculated by the subtracter into a first part negative number integer and a second part positive number decimal; the mirror image searching module is used for inputting the data output by the data dividing module into 2xAnd log2Finding out a corresponding value in a lookup table searched by the x mirror image; the shift summation module is used for shifting the value found by the mirror image searching module; the data conversion module is used for converting data into [1, M), and the data conversion module is used for converting the data into 1. MX 2E。
Fig. 6 is a hardware architecture diagram of the present invention, and the following description is given with reference to fig. 6 for hardware implementation, and the number of input data and output data can be flexibly adjusted according to specific requirements and hardware resource conditions during application.
As shown in the figure, taking the input as 8 as an example, the specific implementation manner is as follows:
step 1) all input data xiInputting the data into a comparator module, comparing the data by the comparator module, and selecting the maximum number xmaxN is less than or equal to i, n represents the number of input data, and n is 8 in the embodiment;
step 2) inputting data xiAnd maximum value x of input datamaxInputting the data into a subtracter module, performing difference operation through the subtracter module, and calculating xi-xmax;
Step 3) sending the output value calculated by the subtractor module to a data segmentation module, and segmenting the output value into a form of a negative integer Z' and a decimal D;
step 4) dividing the fractional part D by 2Decimal fractionI.e. 2DInputting mirror image searching module to search, and after searching is finished, 2Negative integerAll input into a shift summation module to calculate 2Negative integer×2Decimal fractionA value of (2)Negative integerMultiplication can be achieved by right-shifting a negative integer bit, and then performing a summation operation, since xi-xmaxIs less than or equal to 0, soIs less than or equal to 1 and greater than 0, and when x isiIs equal to xmaxThis value is 1 at the time, so the range of values after the summation of 8 such numbers is (1, 8);
step 5) sending the data calculated by the shift summation module to a data conversion module, and converting the data output by the shift summation module into 1. MX 2 by the data conversion moduleEIn the form of (1), E represents a weight after conversion, wherein 2EThe base 2 logarithm operation may be changed to E; result log output by data conversion module2 1.MAnd sending the data to a mirror image searching module for calculation in a mirror image searching table shown in fig. 5 to obtain a corresponding value.
Step 6) then εiHas already calculated, forThe function is divided again toAnd 4, the exponent in the function is changed into an expression form of a negative integer plus a decimal, the step 4 is repeated, the decimal part is sent into a mirror image lookup table for lookup, multiplication operation is realized by shifting the negative integer number to the right in a shifting and summing module through the lookup value and the negative integer part, the value of the E-to-2softmax function can be calculated, and the whole calculation is completed.
The invention transforms the traditional softmax function, provides a new E-to-2softmax function, does not change the characteristic of the softmax function, simultaneously utilizes the mathematical transformable characteristic of the function to mathematically transform the E-to-2softmax function, and realizes hardware by methods of data transformation, data segmentation, shifting and mirror image searching, thereby avoiding exponential operation and division operation when the softmax function is realized, solving the difficulty of realizing softmax function hardware.
The function hardware implementation method is used as the last layer of a neural network in the reuters experiment and MNIST experiment proof and is used for processing the multi-task classification problem in deep learning. The accuracy of the recognition rate of the hardware system Function can not be lost by using the method, compared with the prior method, the method can also avoid division operation and e-exponential operation, has the advantages of Low Complexity, Low hardware resource consumption, high performance and the like, and compared with the softmax hardware realization method of 'AHigh-Speed and Low-Complexity architecture for software Function in Deep Learning' of Meiqi Wang et al, although the maximum frequency of the two modes can be supported to 2.8GHz under the TSMC 28nm process, the area under the process of the method only needs 4515um2The power consumption is 9.98mW, which is reduced by more than 50% compared with the area and the power consumption, and the performance of a hardware system is effectively improved.
The method has great advantages in solving the problem of high hardware complexity of the softmax activation function, can be widely applied to activation function calculation of neural network multitask classification in the field of image processing and recognition and the like, is used for improving hardware performance and reducing function implementation complexity, and is suitable for wide application.
The invention and its embodiments have been described above schematically, without limitation, and the invention can be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The representation in the drawings is only one of the embodiments of the invention, the actual construction is not limited thereto, and any reference signs in the claims shall not limit the claims concerned. Therefore, if a person skilled in the art receives the teachings of the present invention, without inventive design, a similar structure and an embodiment to the above technical solution should be covered by the protection scope of the present patent. Furthermore, the word "comprising" does not exclude other elements or steps, and several of the elements recited in the product claims may be implemented by one element in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.