Substitution model automatic selection method facing black box attack, storage medium and terminal
1. A substitution model automatic selection method facing to black box attack is characterized in that: the method comprises the following steps:
selecting a substitute model from the neural network model according to the original sample attribute information, and/or,
and updating the currently used substitution model according to the attack feedback information.
2. The method for automatically selecting the substitution model for the black box attack according to claim 1, wherein: the step of selecting the alternative model further comprises the following steps:
and dividing the grade of the neural network model according to the network complexity information and the classification accuracy information of the neural network model, wherein the higher the grade of the neural network model is, the higher the classification accuracy of the complex sample is.
3. The method for automatically selecting the substitution model for the black box attack according to claim 2, wherein: the selecting a substitute model in the neural network model according to the original sample attribute information comprises:
calculating the score of the original sample according to the attribute information of the original sample;
and determining a neural network model of a corresponding grade according to the original sample fraction so as to determine a substitute model.
4. The method for automatically selecting the substitution model for the black box attack according to claim 3, wherein: the calculating the original sample score according to the original sample attribute information includes:
setting level thresholds of different attributes according to the influence of different attribute values of the original sample on the classification accuracy, and scoring corresponding attributes of the original sample according to the level thresholds so as to obtain each attribute score of the original sample;
and carrying out weight calculation on each attribute score of the original sample to obtain the score of the original sample.
5. The method for automatically selecting the substitution model for the black box attack according to claim 3, wherein: the step of determining the neural network model of the corresponding grade according to the original sample fraction and then determining the substitute model comprises the following steps:
establishing a first mapping relation between neural network models of different levels and original sample scores;
and inquiring the first mapping relation according to the original sample fraction, further determining the neural network model grade corresponding to the current original sample, and selecting any model in the corresponding neural network model grade as a substitute model.
6. The method for automatically selecting the substitution model for the black box attack according to claim 2, wherein: the attack feedback information is attack success rate, and updating the currently used alternative model according to the attack feedback information comprises the following steps:
acquiring the attack success rate of the confrontation sample generated by the currently used substitution model on the black box model;
and if the attack success rate is in the preset attack success rate range, selecting a neural network model which is one grade higher than the currently used substitution model as a new substitution model.
7. The method for automatically selecting the substitution model for the black box attack according to claim 6, wherein: the method further comprises the following steps: and if the attack success rate is smaller than the preset attack success rate range, selecting the neural network model with the highest grade as a new substitute model.
8. The method for automatically selecting the substitution model for the black box attack according to claim 6 or 7, wherein: the method further comprises the following steps:
if the currently used model is already the highest-level surrogate model, selecting a neural network model that has not been used as the surrogate model in the highest level as the new surrogate model.
9. A storage medium having stored thereon computer instructions, characterized in that: the computer instructions when executed perform the steps of the method for automatically selecting the substitution model for the black box attack according to any one of claims 1 to 8.
10. A terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, the terminal comprising: the processor executes the computer instructions to execute the steps of the method for automatically selecting the substitution model facing the black box attack according to any one of claims 1 to 8.
Background
In recent years, deep learning models are widely used in various fields, and are more susceptible to malicious attacks although the accuracy of the models is higher and higher, wherein counterattack is the most common attack mode, and a target model outputs a wrong prediction result and even a prediction result expected by an attacker by adding tiny disturbance which cannot be perceived by human eyes to an original sample. Since the prior art suggests that the model classification result can be influenced by adding tiny disturbance to input data, a plurality of scholars are added to the research of the model for resisting attacks, and a plurality of attack algorithms are born with the study.
Through the research on the attack algorithm, the defense force of the neural network model can be effectively improved, namely, the black box model is retrained according to the confrontation sample generated by the attack algorithm with high attack success rate, so that the black box model can learn the characteristics of the confrontation sample to improve the classification accuracy, the defense force of the black box model is improved, the safety of the neural network model is ensured, and the research on the attack algorithm is necessary.
According to the knowledge of an attacker on the target model information, the counterattack can be divided into white box attack and black box attack, and the attacker can acquire all relevant information of the target model in the white box attack. The counterattack may be classified into a non-target attack and a target attack depending on whether a specific attack target is designated, and the counterattack for which the specific attack target is not designated belongs to the non-target attack. In a real attack scene, the target model is often not published outwards, and most of the target model outputs the class result with the maximum prediction probability, so that an attacker is difficult to acquire relevant information of the target model, and the method has an important application value for the research of black box attack. However, in a black box attack scenario, because the substitution function of the artificially selected substitution model is poor and the mobility of the confrontation sample generated by the current attack algorithm is also poor, the success rate of the black box attack is low, which is not beneficial to the development of the safety research work of the current neural network model, and therefore, how to quickly and accurately select the substitution model capable of generating the confrontation sample with high offensive power is an urgent problem to be solved in the field.
Disclosure of Invention
The invention aims to solve the problems that attack samples generated by a substitution model selected in the prior art are low in attack success rate and not beneficial to development of safety research work of a current neural network model, and provides a substitution model automatic selection method, a storage medium and a terminal for black box attack.
The purpose of the invention is realized by the following technical scheme: a black box attack-oriented substitution model automatic selection method comprises the following steps:
selecting a substitute model from the existing neural network model according to the original sample attribute information, and/or,
and updating the currently used substitution model according to the attack feedback information. The substitution model is a neural network model similar to the classification performance of the target model to be attacked.
In one example, the selecting the alternative model step further includes:
and dividing the grade of the neural network model according to the network complexity information and the classification accuracy information of the neural network model, wherein the higher the grade of the neural network model is, the higher the classification accuracy of the complex sample is.
In one example, the selecting a surrogate model in the neural network model based on the raw sample attribute information includes:
calculating the score of the original sample according to the attribute information of the original sample;
and determining a neural network model of a corresponding grade according to the original sample fraction so as to determine a substitute model.
In one example, the calculating the raw sample score according to the raw sample attribute information includes:
setting level thresholds of different attributes according to the influence of different attribute values of the original sample on the classification accuracy, and scoring corresponding attributes of the original sample according to the level thresholds so as to obtain each attribute score of the original sample;
and carrying out weight calculation on each attribute score of the original sample to obtain the score of the original sample.
In one example, the determining the neural network model of the corresponding level according to the original sample fraction and then determining the substitute model includes:
establishing a first mapping relation between neural network models of different levels and original sample scores;
and inquiring the first mapping relation according to the original sample fraction, further determining the neural network model grade corresponding to the current original sample, and selecting any model in the corresponding neural network model grade as a substitute model.
In one example, the attack feedback information is attack success rate, and updating the currently used surrogate model according to the attack feedback information includes:
acquiring the attack success rate of the confrontation sample generated by the currently used substitution model on the black box model;
and if the attack success rate is in the preset attack success rate range, selecting a neural network model which is one grade higher than the currently used substitution model as a new substitution model.
In an example, the method further comprises: and if the attack success rate is smaller than the preset attack success rate range, selecting the neural network model with the highest grade as a new substitute model.
In an example, the method further comprises:
if the currently used model is already the highest-level surrogate model, selecting a neural network model that has not been used as the surrogate model in the highest level as the new surrogate model.
It should be further noted that the technical features corresponding to the above options can be combined with each other or replaced to form a new technical solution.
Based on the inventive concept, the invention also provides a black box model training method of the black box attack-oriented substitution model automatic selection method based on any one or a plurality of examples, which specifically comprises the following steps:
selecting a substitution model from the neural network model according to the original sample attribute information, and/or updating the currently used substitution model according to the attack feedback information;
adopting an attack algorithm to attack the substitution model to generate a countercheck sample;
and training the black box model according to the confrontation sample, so that the black box model can learn the characteristics of the confrontation sample and accurately classify the confrontation sample. The black box model can learn the characteristics of the confrontation samples, namely the black box model can learn the disturbance characteristics of the confrontation samples different from the original samples, so that the classification result is corrected, accurate classification is realized, and the safety performance of the neural network model is improved.
The invention further comprises a storage medium, on which computer instructions are stored, which when executed perform the steps of the method for automatically selecting the black box attack-oriented surrogate model formed by any one or a combination of the above examples.
The invention also includes a terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, characterized in that: and when the processor runs the computer instructions, executing the steps of the automatic selection method of the substitution model for the black box attack, which is formed by any one or combination of the above examples.
Compared with the prior art, the invention has the beneficial effects that:
according to the method, the substitution model with high matching degree with the current black box model is selected according to the original sample attribute information which is relevant to the complexity of the black box model, and/or the currently used substitution model is updated through the attack feedback information, so that the selected or updated substitution model can be ensured to play an excellent substitution performance, the attack success rate of the generated countervail sample in the black box attack is high, and the safety research work of the current neural network model is facilitated.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention.
FIG. 1 is a flow chart of a method in an example of the invention;
FIG. 2 is a flow chart of a method in an example of the invention;
FIG. 3 is a simulation diagram of the attack success rate of the ASMSM-based attack algorithm when attacking VGG13_ BN in one example of the invention;
FIG. 4 is a comparison graph of challenge samples generated before and after combining the attack algorithms with the ASMSM algorithm in the MNIST dataset according to an example of the present invention;
FIG. 5 is a simulation diagram of the attack success rate of the ASMSM-based attack algorithm in attacking DenseNet-161 according to an example of the present invention;
fig. 6 is a comparison graph of challenge samples generated before and after each attack algorithm in the CIFAR10 dataset combined with the ASMSM algorithm in an example of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that directions or positional relationships indicated by "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like are directions or positional relationships based on the drawings, and are only for convenience of description and simplification of description, and do not indicate or imply that the device or element referred to must have a specific orientation, be configured and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly stated or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
In any of the following exemplary embodiments, the surrogate model obtained by the surrogate model selecting method can be used for black box attack, and the confrontation sample generated by the surrogate model is used for further training the black box model, so that the black box model can learn the characteristics of the confrontation sample, that is, the black box model can learn the disturbance characteristics of the confrontation sample different from the original sample, and then the classification result is corrected, thereby realizing accurate classification and improving the safety performance of the neural network model.
First, the black box model can be divided into a soft label black box and a hard label black box, wherein the soft label black box returns each class prediction probability (soft label) of the input sample, and the hard label black box returns only the class result (hard label) with the maximum prediction probability. Assuming that X is a training set in a sample space R and Y is a real label corresponding to a sample in the training set, a black box model B is trained to obtain mapping from X to Y: x → Y → a base material of a base material,for a set of other sample sets X 'in the sample space R, the black box model B can still give a mapping of X' to Y: x' → Y of the total weight of the composition,assuming that sample X is one sample in the set of samples X', with its true label y, the task of the attacker is to generate a challenge sample X by adding a small amount of visually imperceptible perturbation to sample XAdvLet black box model B pair xAdvResulting in misclassifications. For a targetless attack, the attack can be expressed as:
B(xAdv)≠y,xAdv=x+εandε<p
for a targeted attack, the attack can be expressed as:
B(xAdv)=t,xAdv=x+εandε<p
wherein t represents a specified attack target, epsilon and p represent added disturbance on the sample and a disturbance upper bound respectively, and the smaller epsilon, the smaller the difference between the countersample and the original sample is, and the more indistinguishable is the naked eye.
As shown in fig. 1-2, in this example, an automatic selection method of a substitution model for black box attack (hereinafter referred to as ASMSM) specifically includes:
s11: selecting a substitution model in the neural network model according to the original sample attribute information; the substitution model is a neural network model similar to the classification performance of the target model to be attacked (black box model).
S12: and updating the currently used substitution model according to the attack feedback information.
Specifically, the original sample attribute information includes channel number information, pixel information, category number information, resolution information, color information, luminance information, and the like of the original sample. Because the original sample attribute and the complexity of the black box model present a certain correlation, the method selects the substitute model with high similarity to the black box model through the attribute information of the original sample. Specifically, in the model training process, the number of image channels in a simple data set such as MNIST is 1, the number of pixels is 28 × 28, and the number of categories is 10, and by using a simple model such as a three-layer fully-connected network, the three-layer convolutional neural network can well learn the characteristics of images in the data set, so that unknown images can be well predicted. And for a slightly complex data set such as CIFAR10, the number of image channels in the data set is 3, the number of pixels is 32 × 32, and the number of categories is 10, at this time, a simple model such as a three-layer fully-connected network and a three-layer convolutional neural network cannot be competent for the task of learning image features in the CIFAR10 data set, and image features can be relatively well learned by using a neural network model with a more complex network structure such as VGG16 and ResNet 50. Similarly, for a large complex data set such as ImageNet, wherein the number of channels of an image is 3, the number of categories is 1000, and the number of pixels is variable, but the image is generally scaled to 224 × 224 or 299 × 299 during model training (the scaling degree is different according to different models), for such a data set, the medium model cannot be competent for learning tasks, and only large models such as VGG19, Dennetet 201 and the like can be used for learning original samples.
In this example, the attack feedback information in step S12 is specifically the attack success rate, and if step S11 is skipped and only step S12 of the present application is executed, the attack feedback information is the attack success rate and the currently used surrogate model. In the example, the substitution model which is more similar to the black box model in characteristic learning performance and classification performance is selected according to the attack success rate and the currently used substitution model, so that the attack success rate of the black box model is improved, and the mobility of the confrontation sample is ensured.
In an example, before the step of selecting the alternative model in steps S11 and S21, the method further includes:
the grade of the neural network model is divided according to the network complexity information and the classification accuracy information of the neural network model, the higher the grade of the neural network model is, the higher the classification accuracy of a complex sample (complex data set) is, and the higher the network complexity is, namely, the network complexity of the neural network is positively correlated with the classification accuracy. Specifically, the present example divides the neural network model into a small model set, a medium model set, and a large model set, which is the highest-level neural network model, according to the network complexity and the classification accuracy. More specifically, for a simple data set such as MNIST, the classification precision of the three types of models can reach more than 98%, and at the moment, the classification precision does not increase gradually. For a complex data set such as ImageNet, the classification precision of a small model is very low, the classification precision of a medium model is slightly high, the classification precision of a large model is highest, the classification precision of the three models is sequentially increased, the application range of the medium model and the large model is wider, however, if an attack target model network is a simple model, the small model can be better qualified for an attack task as a substitute model, and the calculation difficulty and the time overhead cost can be increased by adopting the medium model and the large model.
In one example, the selecting a surrogate model in the neural network model according to the original sample attribute information in step S11 includes:
s111: calculating the score of the original sample according to the attribute information of the original sample; the channel number information, the pixel information and the category number information of the sample are main characteristics influencing deep learning classification, and the original sample score is calculated according to the channel number information, the pixel information and the category number information of the original sample.
S112: and determining a neural network model of a corresponding grade according to the original sample fraction so as to determine a substitute model. Specifically, the complexity of the original sample is determined according to the original sample fraction, and then whether a small model, a medium model or a large model is adopted as a substitute model of the black box model at present is determined.
Further, the step S111 of calculating the original sample score according to the original sample attribute information includes:
S111A: setting level thresholds of different attributes according to the influence of different attribute values of an original sample on classification accuracy, and scoring corresponding attributes of the original sample according to the level thresholds so as to obtain each attribute score of the original sample, wherein the level thresholds of the different attributes of the original sample are divided according to experience in the example, namely the application sets two level thresholds aiming at the number of classes of the sample, namely the number of classes of the sample is divided into three levels which are less than or equal to 10, more than 10 and less than or equal to 100 and more than 100; setting two grade thresholds aiming at the number of channels of the sample, namely dividing the number of channels of the sample into three grades, namely a single channel and three channels which are larger than the three channels; two level thresholds are set for the number of pixels of a sample, that is, the number of pixels of a sample is divided into three levels, which are 28 × 28 or less, 28 × 28 or more, 96 × 96 or less, and 96 × 96 or more.
S111B: and carrying out weight calculation on each attribute score of the original sample to obtain the score of the original sample. Specifically, the calculation formula of the weight calculation is specifically:
wherein S isFeatureIs the (original) sample score; sLabel,SChannal,SPixelRespectively being the class fraction, the channel fraction and the pixel fraction of the sample; α, β, γ are class, channel and pixel weights, respectively, which are all 1 by default in this application.
As a specific embodiment, the raw sample score calculation specifically includes:
a. v. inputting the original sample into a function get _ num (X), and obtaining the number of classes, channels and pixels of the original sample +
NLabel,NChannal,NPixel=get_Num(X)
b. V. input the number of classes, channels and pixels to the function get _ Score (N)Label,NChannal,NPixel) In the method, a category score, a channel score and a pixel score are obtained
SLabel,SChannal,SPixel=get_Score(NLabel,NChannal,NPixel)
c. V. calculating a sample score +
SFeature=α·SLabel+β·SChannal+γ·SPixel
In one example, determining the neural network model of the corresponding level according to the raw sample fraction and then determining the surrogate model includes:
S112A: establishing a first mapping relation between neural network models of different levels and original sample scores; specifically, the mapping relationship between the neural network models of different levels and the original sample scores is represented by the following calculation formula:
M=sel_Random(MQ)
wherein M is a finally selected substitution model, MQThe function sel _ random (M) serves to randomly pick a model from the set M of models for the model class corresponding to the (original) sample fractionS,MM,MLRepresenting small, medium, large model sets, respectively, δ and η are the ranking parameters of the sample scores, and are defaulted to 1/3 and 2/3 in the present invention.
S112B: and inquiring the first mapping relation according to the original sample fraction, further determining the neural network model grade corresponding to the current original sample, and selecting any model in the corresponding neural network model grade as a substitute model. Specifically, if the original sample scores SFeature∈[0,2δ(α+β+γ)]Selecting any neural network model in the small model set as a substitute model; if the original sample is scored by SFeature∈[2δ(α+β+γ),2η(α+β+γ)]Selecting any neural network model in the medium model set as a substitute model; for the other case, i.e. the raw sample fraction SFeature∈[2η(α+β+γ),2(α+β+γ)]Any neural network model in the large model set is selected as the substitute model.
In one example, after the substitution model is selected from the neural network model according to the original sample attribute information, the substitution model can automatically adjust the network structure and parameters of the substitution model according to the number of channels, the number of pixels and the number of categories of the original sample, so as to further improve the similarity between the substitution model and the black box model, and further improve the success rate of the black box attack. Specifically, when the number of sample channels is a, the number of labels is b, and the pixel is p × p, the input of the surrogate model is set as a, the output of the last full-link layer is set as b, and the padding parameter is set in the maximum pooling layer of the model according to the pixel p. For example, there are 5 pooling layers in the VGG model, when a sample with a pixel of 28 × 28 is processed, the sample height and width changes to 28- >14- >7- >3- >1, that is, the 4 th pooling layer changes to 1, and the sample at this time cannot be transmitted in the model, so the previous 4 maximum pooling layer padding parameters need to be modified. If the 1 st layer is modified, the height and width of the sample are changed to 30- >15- >7- >3- >1, which is the same as the previous layer, so that the modification of the 1 st pooling layer is abandoned, and the change is changed to the modification of the 2 nd pooling layer, and the height and width of the sample at the moment is changed to 28- >16- >8- >4- >2- >1, so that the model requirements are met. In other words, in the process of 28 to 1, 4 "2" are divided evenly (the dividing means dividing first and then rounding, and rounding is realized by removing decimal), 30 can only divide 4 "2" evenly, so that the number of padding layers at layer 1 cannot be modified, 14 in layer 2 can divide 3 "2" evenly, and 16 can divide 4 evenly, so that it is the padding parameter at layer 2 that needs to be modified, i.e. to find a position where the number of "2" can be divided evenly by modifying the padding parameter. In the above process of modifying the maximum pooling layer parameter of the layer 1, when the modified maximum pooling layer parameter of the layer 1 cannot meet the sample transmission requirement in the model, on the basis of modifying the maximum pooling layer parameter of the layer 1, a position where the number of the integer divisors of "2" is increased after the padding parameter is modified is found again, and so on.
As a specific implementation manner, the method for automatically selecting the alternative model based on the sample score specifically comprises the following steps:
setting input information, i.e. sample fraction SFeatureSet of small, medium and large models MS,MM,MLSample classification parameters δ and η, sample class, channel and pixel weights α, β, γ;
setting output information, namely, a substitution model M;
the execution main body executes the following program:
1:if 0≤SFeature≤2δ(α+β+γ)then
2:MQ←MS;
3:else if 2δ(α+β+γ)<SFeature≤2η(α+β+γ)then
4:MQ←MM;
5:else
6:MQ←ML;
7:end if
8:M←Random(MQ);
9:return M
further, the attack feedback information is an attack success rate, and updating the currently used alternative model according to the attack feedback information includes:
s121: acquiring the attack success rate of the confrontation sample generated by the currently used substitution model on the black box model; specifically, the attack success rate is a ratio of the number of confrontation samples to the total number of confrontation samples for successfully attacking the black box model, and is used for reflecting the attack effect of the confrontation samples generated by the surrogate model on the black box model, the higher the attack success rate is, the better the attack effect is, and the calculation expression of the attack success rate is:
wherein N isSuccessNumber of challenge samples, N, representing successful attacks on the Black Box modelTotalRepresenting the total number of challenge samples; the higher the success rate of the attack, the better the mobility of the resisting sample, and the more effective the attack method.
S122: and if the attack success rate is in the preset attack success rate range, selecting a neural network model which is one grade higher than the currently used substitution model as a new substitution model. Specifically, the preset attack success rate range is from the product of the expected attack success rate and the boundary parameter to the expected attack success rate, and the boundary parameter is set according to the experience value, so that the attack success rate of the generated countermeasure sample in the black box attack can reach the expected attack success rate after the substitute model meeting the preset attack success rate range is subjected to subsequent training. By setting the expected attack success rate and the boundary parameters, on one hand, the method can be used for prejudging the attack effect of the attack black box model, when the actual attack success rate (the attack success rate fed back in the feedback information) is greater than the expected attack success rate, the attack is considered to be successful, and at the moment, the currently used substitution model does not need to be updated; on the other hand, the currently used substitution model can be updated in real time according to the actual attack success rate, and the similarity between the substitution model and the black box model is improved. In this embodiment, if the currently used surrogate model is any one of the models in the small model set, the currently used surrogate model is updated to the model in the medium model set; if the currently used substitution model is any model in the medium model set, the currently used substitution model is updated to a model in the large model set, so that the classification similarity of the substitution model and the black box model is improved, and the attack success rate of the countermeasures sample when attacking the target model is improved.
In one example, the current update method using a surrogate model further comprises:
s123: and if the attack success rate is smaller than the preset attack success rate range, selecting the neural network model with the highest grade as a new substitute model. Specifically, if the currently used surrogate model is any one of the small model set or any one of the medium model set, and the attack success rate of the countersample generated by the surrogate model in the black box attack cannot reach the product of the expected attack success rate and the boundary parameter, any one of the large model set is selected as a new surrogate model.
In one example, the current update method using a surrogate model further comprises:
s124: if the currently used model is already the highest-level surrogate model, selecting a neural network model that has not been used as the surrogate model in the highest level as the new surrogate model. Specifically, if the currently used surrogate model is already any model in the large model set, the large model that has not been used as the surrogate model is selected as the new surrogate model, thereby improving the performance of the surrogate model.
As an option, any one of the above-described update examples of the currently used surrogate model or a combination of a plurality of update examples of the currently used surrogate model further includes:
s125: and continuously updating the currently used alternative model according to the attack success rate until the attack success rate of the new alternative model is greater than the expected attack success rate. Specifically, when the attack success rate of the surrogate model is greater than the expected attack success rate, the countermeasure sample generated by the surrogate model can successfully attack the black box model, and the currently used surrogate model does not need to be updated continuously.
As a specific embodiment, the update method of the surrogate model formed by combining the above examples can be represented by the following formula:
when the attack success rate is between ζ X% and X%, the alternative model selection strategy can be formulated as follows:
MNew=sel_Random(MQ)
MOld∈SOld,ASR∈[ζ*X%,X%)
when the attack success rate is less than ζ × X%, the surrogate model selection strategy may be formulated as:
MNew=sel_Random(MQ)
MOld∈SOld,ASR∈[0,ζ*X%)
where X% and ζ represent expected attack success and boundary parameters, respectively, MNewAs a new surrogate model, MoldAs an old surrogate model, SOldRepresenting the set of surrogate models that have been used, ASR represents the success rate of the attack.
As a specific embodiment, the updating of the currently used alternative model execution mode based on the feedback information in the present invention specifically includes:
setting input information, i.e. attack success rate ASR, old substitution model MoldSet S of used surrogate modelsOldSet of small, medium and large models MS,MM,MLThe expected attack success rate is X%, and the boundary parameter is zeta;
setting output information, i.e. new substitution model MNewSet S of used surrogate modelsOld;
The execution main body executes the following program:
1:ifζ·X%≤ASR≤X%then
2:if MOld∈MS then
3:MQ←MM;
4:else if MOld∈MM then
5:MQ←ML;
6:else
7:MQ←ML–(ML∩SOld);
8:end if
9:else
10:if MOld∈(MS∪MM)then
11:MQ←ML;
12:else
13:MQ←ML–(ML∩SOld);
14:end if
15:end if
16:SOld←SOld∪MOld;
17:M←Random(MQ);
18:return M,SOld
in one example, selecting the surrogate model further comprises:
s131: training a surrogate model using the original samples;
s132: adopting an attack algorithm to attack the substitution model to generate a countercheck sample;
s133: adopting the confrontation sample attack black box model to obtain attack feedback information; specifically, the current surrogate model, the countermeasure sample, can be further updated according to the attack feedback information.
To illustrate the effect of ASMSM on different types of data sets for the present application on the anti-attack algorithm, the data sets MNIST and CIFAR10 were used for training and comparative testing of black-box models to perform performance verification on surrogate models selected based on the ASNSN method, and the specific statistical information of the two data sets is shown in table 1.
TABLE 1 Experimental data set information statistics
Database with a plurality of databases
Pixel
Number of categories
Number of samples in training set
Number of samples in test set
MNIST
28×28
10
60000
10000
CIFAR10
32×32
10
50000
10000
In the MNIST and CIFAR10 dataset experiments, the black box model was trained using the full training set and the testing set. In the comparison experiment, because it is meaningless to attack the sample that the black box can not identify, and simultaneously, because the attacker can not know the training set of the black box model in the black box attack scene, the sample that can be correctly classified by the black box in the test set is used as the original sample for the substitute model training and the generation of the countermeasure sample.
In the MNIST data set comparison experiment, three layers of convolutional neural networks, five layers of convolutional neural networks and VGG13_ BN are selected as black box models, and in the CIFAR10 data set comparison experiment, VGG19_ BN, Resnet-101 and DenseNet-161 are selected as black box models.
In an ASMSM-based black-box attack, each set of ASMSM models includes models as shown in table 2.
TABLE 2 models contained in each model set
Furthermore, the black box attack effect of the FGSM, BIM, PGD, MI-FGSM, DI-2-FGSM and SI-NI-FGSM attack algorithms before and after the ASMSM is used is mainly compared in the performance verification process. In MNIST experiments, the maximum disturbance of an attack algorithm is set to be 64 within the range of image pixel values of [0, 255], in CIFAR10 experiments, the maximum disturbance of the attack algorithm is set to be 16 within the range of image pixel values of [0, 255], the iteration times of BIM, PGD, MI-FGSM, DI-2-FGSM and SI-NI-FGSM are set to be 20, attack step size is set to be the ratio of the maximum disturbance to the iteration times, the attenuation coefficients of MI-FGSM, DI-2-FGSM and SI-NI-FGSM are set to be 1, the conversion probability in DI-2-FGSM is set to be 0.5, and the number of copies of different scales in SI-NI-FGSM is set to be 5.
Further, the evaluation method in the performance verification process includes an Attack Success Rate (ASR) -based evaluation, a peak signal-to-noise ratio (PSNR) -based evaluation, and a Structural Similarity (SSIM) -based evaluation.
Specifically, the evaluation based on the peak signal-to-noise ratio (PSNR) is an objective standard for judging the image quality, the value range of the evaluation is 0-100, and the higher the peak signal-to-noise ratio is, the higher the image quality of the confrontation sample is. The calculation of the peak signal-to-noise ratio can be expressed as:
wherein, X and XAdvRespectively representing an original sample and a confrontation sample with the size of m multiplied by n; x (i, j) and XAdv(i, j) respectively representing pixel values at the positions of the original sample and the confrontation sample (i, j); MSE represents the mean square error between corresponding pixel points of the original sample and the confrontation sample; MAXXRepresenting the maximum pixel value of the sample, MAX if the pixel value is represented by a B-bit binaryX=2B–1。
Specifically, the structural similarity reflects the similarity of two images, the value range of the similarity is 0-1, the higher the structural similarity is, the more similar the countercheck sample is to the original sample, the more difficult the countercheck sample is to be distinguished, and the higher the attack success rate of the countercheck sample in the black box attack is. The calculation of structural similarity can be expressed as:
where x and y denote the two samples whose similarity is to be calculated, μxAnd muyDenotes the mean, σ, of x and y, respectivelyx 2And σy 2Denotes the variance, σ, of x and y, respectivelyxyDenotes the covariance of x and y, c1And c2Two minimal constants for avoiding zeros are represented, where c1=(k1MAXX)2,c2=(k2MAXX)2,k1And k2Are 0.01 and 0.03, respectively.
Based on the performance verification concept, in a comparison experiment of an MNIST data set, firstly, a three-layer convolutional neural network (3-layer CNN), a five-layer convolutional neural network (5-layer CNN) and VGG13_ BN are trained by using the MNIST data set to serve as black box models, and the recognition accuracy rates are recorded to be 98.90%, 98.87% and 99.63% respectively. The VGG11_ BN is trained by using a test set to serve as a substitute model to be attacked by attack algorithms FGSM, BIM, PGD, MI-FGSM, DI-2-FGSM and SI-NI-FGSM, samples which can be correctly classified by the black box model are selected from the test set to serve as original samples, different attack algorithms and ASMSM-based attack algorithms are used for attacking the VGG11_ BN to generate antagonistic samples, and the black box model is attacked by the antagonistic samples, wherein experimental results are shown in Table 3.
Table 3 comparison table of black box attack results on MNIST dataset
From table 3, it can be found that, on one hand, the attack algorithm based on the ASMSM can greatly improve the success rate of the black box attack, and except that the success rate of the FGSM attack algorithm based on the ASMSM when attacking the 5-layer CNN is 89.81%, the success rates of the other attack algorithms based on the ASMSM are all over 90%; on the other hand, according to the experimental results of PSNR and SSIM, the ASMSM-based attack algorithm can improve the image quality of the challenge sample and the similarity between the challenge sample and the original sample to some extent. This also demonstrates that in black box attack, the ASMSM proposed by the present invention is effective for attack success rate and image quality improvement. The attack success rate is greatly improved mainly because the ASMSM-based attack algorithm has the effects that the substitution model is approximately approximated to the black box model, the similarity of the two models is increased, the substitution model can better play a substitution role, and therefore the confrontation sample which can successfully attack the substitution model can certainly well attack the black box model.
The attack success rate of the ASMSM-based attack algorithm when attacking the black box model VGG13_ BN is shown in fig. 3. All ASMSM-based attack algorithms have good attack effects, and can achieve high attack success rate within less attack times, and the attack success rate is over 90 percent.
Furthermore, in the comparison experiment of the MNIST data set, the present invention also compares the confrontation samples generated before and after each attack algorithm is combined with ASMSM in an intuitive experimental manner, as shown in fig. 4, fig. 4(a) is an original drawing, fig. 4(b) -fig. 4(g) includes two upper and lower graphs, wherein the upper graph of fig. 4(b) -fig. 4(g) sequentially includes an FGSM attack algorithm, a BIM attack algorithm, a PGD attack algorithm, an MI-FGSM attack algorithm, a DI-2-FGSM attack algorithm, and a SI-NI-FGSM attack algorithm, and the lower graph of fig. 4(b) -fig. 4(g) sequentially includes an FGSM attack algorithm combined with asmm, a BIM attack algorithm combined with asmm, a PGD attack algorithm combined with asmm, an MI-FGSM attack algorithm combined with asmm, a DI-2-FGSM attack algorithm combined with asmm, a SI-NI-FGSM attack algorithm combined with asmm, it can be seen that the image quality of most of the confrontation samples is not greatly improved by the attack algorithm based on the ASMSM compared with the original attack algorithm, but before and after the BIM, PGD, MI-FGSM and DI-2-FGSM attack algorithms are combined with the ASMSM, the disturbance in the confrontation sample "4" is obviously reduced and is more concentrated, which indicates that the ASMSM has a stealth effect on reducing the disturbance of some confrontation samples and improving the image quality.
Based on the above performance verification concept, in the comparative experiment of CIFAR10 dataset, VGG19_ BN, Resnet-101, DenseNet-161 were trained first using CIFAR10 dataset as black box model, and recognition accuracy rates were recorded as 93.27%, 93.05%, 94.38%, respectively. The VGG13_ BN is trained by using a test set to serve as a substitute model to be attacked by attack algorithms FGSM, BIM, PGD, MI-FGSM, DI-2-FGSM and SI-NI-FGSM, samples which can be correctly classified by the black box model are selected from the test set to serve as original samples, the VGG13_ BN is attacked by using the different attack algorithms and the ASMSM-based attack algorithm to generate a countermeasure sample, and the countermeasure sample is used to attack the black box model, and the experimental result is shown in the table 4.
Table 4 comparison of black box attack results on CIFAR10 data set
As can be seen from table 4, similar to the comparison test on the MNIST data set, on the CIFAR data set, the challenge sample generated by the ASMSM-based attack algorithm can also greatly improve the success rate of the black box attack. After the FGSM is combined with the ASMSM, the attack success rate on the three black box models is improved by more than 20%, and after the BIM, the PGD, the MI-FGSM and the DI-2-FGSM are combined with the ASMSM, the attack success rate on the three black box models is improved by more than 30%, wherein the black box attack success rate of most ASMSM-based attack algorithms exceeds 90%. In the aspect of image quality, except that the PSNR of BIM based on ASMSM is reduced when attacking VGG19_ BN, other attack algorithms based on ASMSM are improved in the PSNR and SSIM values to different degrees, and the effectiveness of the ASMSM provided by the invention in black box attack is further verified.
The attack success rate of the ASMSM-based attack algorithm when attacking the black box model DenseNet-161 is shown in FIG. 5 along with the change of the attack times. All ASMSM-based attack algorithms have good attack effects, and can achieve high attack success rate within less attack times, and the attack success rate is over 85 percent.
Further, in the comparison test on the MNIST data set, the present invention also compares the confrontation samples generated before and after each attack algorithm is combined with ASMSM in an intuitive test manner, as shown in fig. 6, fig. 6(a) is an original drawing, fig. 6(b) -6 (g) includes an upper drawing and a lower drawing, wherein the upper drawing of fig. 4(b) -4 (g) sequentially includes an FGSM attack algorithm, a BIM attack algorithm, a PGD attack algorithm, an MI-FGSM attack algorithm, a DI-2-FGSM attack algorithm, and a SI-NI-FGSM attack algorithm, and the lower drawing of fig. 6(b) -6 (g) sequentially includes an FGSM attack algorithm combined with asmm, a BIM attack algorithm combined with asmm, a PGD attack algorithm combined with asmm, a MI-FGSM attack algorithm combined with asmm, a DI-2-FGSM attack algorithm combined with asmm, a SI-NI-FGSM attack algorithm combined with asmm, it can be seen that, similar to the comparison test on the MNIST data set, the ASMSM-based attack algorithm does not greatly improve the image quality of most of the confrontation samples compared with the original attack algorithm, but has the effects of reducing disturbance and improving the image quality of individual confrontation samples.
Based on the inventive concept, the invention also provides a black box model training method of the black box attack-oriented substitution model automatic selection method based on any one or a plurality of examples, which specifically comprises the following steps:
s01: selecting a substitution model from the neural network model according to the original sample attribute information, and/or updating the currently used substitution model according to the attack feedback information;
s02: adopting an attack algorithm to attack the substitution model to generate a countercheck sample;
s03: and training the black box model according to the confrontation sample, so that the black box model can learn the characteristics of the confrontation sample and accurately classify the confrontation sample to obtain the black box model with high defense. The black box model can learn the characteristics of the confrontation samples, namely the black box model can learn the disturbance characteristics of the confrontation samples different from the original samples, so that the classification result is corrected, accurate classification is realized, and the safety performance of the neural network model is improved.
Further, the present invention also provides a storage medium, which has the same inventive concept as any one or a combination of the above examples, and on which computer instructions are stored, and the computer instructions execute, when executed, the steps of the method for automatically selecting the surrogate model for the black box attack, which is formed by any one or a combination of the above examples.
Based on such understanding, the technical solution of the present embodiment or parts of the technical solution may be essentially implemented in the form of a software product, which is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Further, the present invention also provides a terminal, which has the same inventive concept as any one or combination of the above examples, and includes a memory and a processor, where the memory stores computer instructions executable on the processor, and the processor executes the steps of the method for automatically selecting the surrogate model for the black box attack, which is formed by any one or combination of the above examples. The processor may be a single or multi-core central processing unit or a specific integrated circuit, or one or more integrated circuits configured to implement the present invention.
Each functional unit in the embodiments provided by the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The above detailed description is for the purpose of describing the invention in detail, and it should not be construed that the detailed description is limited to the description, and it will be apparent to those skilled in the art that various modifications and substitutions can be made without departing from the spirit of the invention.