Music neural network model pre-training method, electronic device and storage medium
1. A music neural network model pre-training method is characterized by comprising the following steps:
acquiring an original audio characteristic vector, performing mask processing on a feature sub-vector to be masked in the original audio characteristic vector to obtain a mask feature sub-vector, and replacing the feature sub-vector to be masked in the original audio characteristic vector with the mask feature sub-vector;
inputting the mask feature vector into a neural network to be trained to predict a predicted audio feature sub-vector corresponding to the mask feature sub-vector;
discretizing the original audio features in the feature sub-vectors to be masked to obtain discretized feature sub-vectors to be masked, and splicing the discretized feature sub-vectors to be masked as positive samples and a plurality of negative samples to obtain spliced audio feature vectors;
and constructing a loss function based on the predicted audio characteristic sub-vector and the spliced audio characteristic vector, and adjusting parameters in the neural network to be trained based on the current value of the loss function until the loss function is converged to obtain a music neural network model which is pre-trained.
2. The music neural network model pre-training method of claim 1, wherein masking the feature sub-vectors to be masked in the original audio feature vectors to obtain masked feature sub-vectors comprises:
determining mask feature quantity based on preset probability distribution and preset length, and determining feature sub-vectors to be masked in the original audio feature vectors; wherein, the original audio features in the feature sub-vectors to be masked are the mask feature quantity;
and performing mask processing on the to-be-masked feature sub-vectors to obtain mask feature sub-vectors.
3. The music neural network model pre-training method of claim 1, wherein discretizing the original audio features in the feature sub-vectors to be masked to obtain discretized feature sub-vectors to be masked comprises:
generating a discretization vector table of a target dimension based on a preset classification rule; wherein the target dimension is a dimension of an original audio feature in the feature sub-vector to be masked;
determining a category corresponding to each original audio feature in the feature sub-vectors to be masked based on the preset classification rule to generate category vectors corresponding to the feature sub-vectors to be masked;
and determining the product of the category vector and the discretization vector table as a discretization feature to-be-masked sub-vector.
4. The music neural network model pre-training method according to claim 3, wherein the determining, based on the preset classification rule, a category corresponding to each original audio feature in the feature sub-vectors to be masked to generate category vectors corresponding to the feature sub-vectors to be masked includes:
and determining the category corresponding to each original audio feature in the feature sub-vectors to be masked by using a full link layer based on a preset classification rule so as to generate one hot vectors corresponding to the feature sub-vectors to be masked.
5. The music neural network model pre-training method of claim 4, wherein after determining the class of the original audio features using the full-connected layer, further comprising:
calculating category entropy based on the probability corresponding to the category output by the full connection layer;
correspondingly, the adjusting the parameters in the neural network to be trained based on the current values of the loss function includes:
adjusting parameters in the neural network to be trained based on the current values of the loss functions and the class entropy.
6. The pre-training method for the music neural network model according to claim 1, wherein before the discretizing feature sub-vector to be masked is used as a positive sample and is spliced with a plurality of negative samples to obtain a spliced audio feature vector, the method further comprises:
and selecting negative samples from the original audio features which are not masked in the original audio feature vector.
7. The music neural network model pre-training method of claim 3, wherein the generating of the discretized vector table of the target dimension based on the preset classification rule comprises:
generating a plurality of groups of discretization vector tables of target dimensions based on a preset classification rule;
correspondingly, determining the product of the category vector and the discretization vector table as a discretization feature to-be-masked sub-vector;
determining a target discretization vector table in a plurality of groups of discretization vector tables of target dimensions, and determining the product of the category vector and the target discretization vector table as a discretization feature to-be-masked sub-vector;
correspondingly, before the discretization to-be-masked feature sub-vector is used as a positive sample and spliced with a plurality of negative samples to obtain a spliced audio feature vector, the method further comprises the following steps:
and selecting negative samples in the discretization vector tables except the target discretization vector table.
8. The music neural network model pre-training method of claim 1, wherein the constructing a loss function based on the predicted audio features and the spliced audio features comprises:
constructing a loss function based on a cosine distance between the predicted audio feature and the spliced audio feature.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the music neural network model pre-training method of any one of claims 1 to 8 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the music neural network model pre-training method according to any one of claims 1 to 8.
Background
For a neural network model in the music field, in the related art, pre-training is performed based on original audio, that is, the original audio is converted into a frequency spectrum, the frequency spectrum of the partial frequency spectrum is masked, and then the masked audio is predicted by a coder-decoder method, so that the neural network model is pre-trained. The purpose of the encoder-decoder is to reconstruct the masked out audio, but in downstream tasks, such as classification tasks, where the model usually does not require all the information of the audio to predict, training a pre-trained neural network model based on the above approach affects the training speed and the performance of the trained neural network model is poor.
Therefore, how to improve the speed of neural network model training in downstream tasks and the performance of the trained neural network model is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
The application aims to provide a music neural network model pre-training method, an electronic device and a computer readable storage medium, which improve the speed of neural network model training in downstream tasks and the performance of a trained neural network model.
To achieve the above object, a first aspect of the present application provides a music neural network model pre-training method, including:
acquiring an original audio characteristic vector, performing mask processing on a feature sub-vector to be masked in the original audio characteristic vector to obtain a mask feature sub-vector, and replacing the feature sub-vector to be masked in the original audio characteristic vector with the mask feature sub-vector;
inputting the mask feature vector into a neural network to be trained to predict a predicted audio feature sub-vector corresponding to the mask feature sub-vector;
discretizing the original audio features in the feature sub-vectors to be masked to obtain discretized feature sub-vectors to be masked, and splicing the discretized feature sub-vectors to be masked as positive samples and a plurality of negative samples to obtain spliced audio feature vectors;
and constructing a loss function based on the predicted audio characteristic sub-vector and the spliced audio characteristic vector, and adjusting parameters in the neural network to be trained based on the current value of the loss function until the loss function is converged to obtain a music neural network model which is pre-trained.
To achieve the above object, a second aspect of the present application provides an electronic device comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the music neural network model pre-training method when executing the computer program.
To achieve the above object, a third aspect of the present application provides a computer-readable storage medium, having a computer program stored thereon, where the computer program, when executed by a processor, implements the steps of the music neural network model pre-training method as described above.
According to the scheme, the music neural network model pre-training method comprises the following steps: acquiring an original audio characteristic vector, performing mask processing on a feature sub-vector to be masked in the original audio characteristic vector to obtain a mask feature sub-vector, and replacing the feature sub-vector to be masked in the original audio characteristic vector with the mask feature sub-vector; inputting the mask feature vector into a neural network to be trained to predict a predicted audio feature sub-vector corresponding to the mask feature sub-vector; discretizing the original audio features in the feature sub-vectors to be masked to obtain discretized feature sub-vectors to be masked, and splicing the discretized feature sub-vectors to be masked as positive samples and a plurality of negative samples to obtain spliced audio feature vectors; and constructing a loss function based on the predicted audio characteristic sub-vector and the spliced audio characteristic vector, and adjusting parameters in the neural network to be trained based on the current value of the loss function until the loss function is converged to obtain a music neural network model which is pre-trained.
For a neural network model in the music field, downstream tasks need to use audio features to perform model training, so that in a pre-training stage, the audio features are directly subjected to mask masking, and the neural network to be trained predicts the mask features and then performs pre-training. When the music neural network model based on pre-training is used for model training of downstream tasks, the convergence speed of the neural network model can be increased, and the training speed is increased. In addition, the original audio features are coded through discretization, and the performance of the neural network model can be improved when the music neural network model which is trained in advance is used for model training of downstream tasks. Therefore, the music neural network model pre-training method provided by the application improves the speed of neural network model training in downstream tasks and the performance of the trained neural network model. The application also discloses an electronic device and a computer readable storage medium, which can also achieve the technical effects.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
fig. 1 is a flowchart of a music neural network model pre-training method according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of another music neural network model pre-training method provided in an embodiment of the present application;
fig. 3 is a structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application discloses a music neural network model pre-training method, which improves the speed of neural network model training in downstream tasks and the performance of a trained neural network model.
Referring to fig. 1, a flowchart of a music neural network model pre-training method provided in an embodiment of the present application is shown in fig. 1, and includes:
s101: acquiring an original audio characteristic vector, performing mask processing on a feature sub-vector to be masked in the original audio characteristic vector to obtain a mask feature sub-vector, and replacing the feature sub-vector to be masked in the original audio characteristic vector with the mask feature sub-vector;
the purpose of this embodiment is to pre-train a neural network to be trained to obtain a trained music neural network model, where the neural network to be trained may include bert, and the like, and is not specifically limited herein. In this step, the original audio feature vector [ v ] is first obtained1,v2,…,vn]TThe original audio feature vector may be a feature output by a partial layer in the music neural network model to be trained, that is, the original music is input into the music neural network model to be trained, and the original audio feature vector is obtained through feature extraction by the partial layer. Secondly, selecting partial features from the original audio feature vector, namely, to-be-masked feature sub-vectors, which can be expressed as [ v [ ]k,vk+1,vk+2,…]TStoring the original sub-vector of the feature to be masked and its position, masking it to obtain the sub-vector of the feature to be masked, which can be expressed as [ m [ ]k,mk+1,mk+2,…]T. Finally, replacing the sub-vectors of the features to be masked in the original audio feature vector with the sub-vectors of the features to be masked, which can be expressed as [ v [ ]1,v2,…mk,mk+1,mk+2,…,vn]T。
As a possible implementation manner, performing a masking process on a feature sub-vector to be masked in the original audio feature vector to obtain a masked feature sub-vector, includes: determining mask feature quantity based on preset probability distribution and preset length, and determining feature sub-vectors to be masked in the original audio feature vectors; wherein, the original audio features in the feature sub-vectors to be masked are the mask feature quantity; and performing mask processing on the to-be-masked feature sub-vectors to obtain mask feature sub-vectors. In a specific implementation, the number of features to be masked, that is, the number of original audio features contained in the feature sub-vector to be masked, is calculated based on a preset probability distribution and a preset length. The preset probability distribution and the preset length are hyper-parameters and can be set according to the actual network condition.
S102: inputting the mask feature vector into a neural network to be trained to predict a predicted audio feature sub-vector corresponding to the mask feature sub-vector;
in this step, the mask feature vector is input into the neural network to be trained, the neural network to be trained is used to determine the masked feature in the mask feature vector, i.e. the mask feature sub-vector, and predict the corresponding original feature sub-vector, i.e. predict the feature sub-vector to be masked, the prediction result of the neural network to be trained is a predicted audio feature sub-vector, which can be expressed as [ p [, [ p ]k,pk+1,pk+2,…]T。
S103: discretizing the original audio features in the feature sub-vectors to be masked to obtain discretized feature sub-vectors to be masked, and splicing the discretized feature sub-vectors to be masked as positive samples and a plurality of negative samples to obtain spliced audio feature vectors;
in this step, firstly, discretizing the original audio features in the sub-vectors of the features to be masked to obtain discretized sub-vectors of the features to be masked, which can be expressed as [ vq ]k,vqk+1,vqk+2,…]TThe original audio features are encoded in a discretized manner.Secondly, a plurality of negative samples are obtained, wherein the adopted number of the negative samples is not specifically limited, and the negative samples can be set according to the actual network condition. It will be appreciated that the number of features contained in each negative example needs to be consistent with the number of original audio features contained in the sub-vectors of features to be masked. As a possible implementation, negative examples may be selected from the original audio features in the original audio feature vector that are not masked. Taking the sampling number of the negative samples as 100 as an example, the negative samples can be expressed as follows:
finally, splicing the discrete feature sub-vectors to be masked as positive samples with the obtained negative samples to obtain spliced audio feature vectors, which can be expressed in the following form:
s104: and constructing a loss function based on the predicted audio characteristic sub-vector and the spliced audio characteristic vector, and adjusting parameters in the neural network to be trained based on the current value of the loss function until the loss function is converged to obtain a music neural network model which is pre-trained.
In the step, a loss function is firstly constructed on the basis of the predicted audio characteristic sub-vector and the spliced audio characteristic vector which are obtained by predicting the neural network to be trained. As a possible implementation, a loss function may be constructed based on the cosine distance between the predicted audio feature and the spliced audio feature. The cosine distance can be expressed as:
by comparing the cosine distance with the standard vector result to construct a loss function, since the first column of the spliced audio feature vector is a positive sample and the rest columns are negative samples, the cosine distance between the positive sample and the predicted audio feature sub-vector is optimized towards 1, and the cosine distance between the negative sample and the predicted audio feature sub-vector is optimized towards 0, the standard vector result can be expressed as:
secondly, optimizing the neural network to be trained through back propagation based on the current value of the loss function, namely adjusting parameters in the neural network to be trained until the loss function is converged, and obtaining the music neural network model which is pre-trained. When the deep learning model related to music is trained, retraining can be carried out based on the music neural network model after the pre-training is completed.
For a neural network model in the music field, downstream tasks need to use audio features for model training, so that in a pre-training stage, the embodiment of the application directly performs mask masking on the audio features, and the neural network to be trained predicts the mask features and then performs pre-training. When the music neural network model based on pre-training is used for model training of downstream tasks, the convergence speed of the neural network model can be increased, and the training speed is increased. In addition, the original audio features are coded through discretization, and the performance of the neural network model can be improved when the model training of downstream tasks is carried out on the basis of the music neural network model which is trained in advance. Therefore, the music neural network model pre-training method provided by the application improves the speed of neural network model training in downstream tasks and the performance of the trained neural network model.
The embodiment of the application discloses a music neural network model pre-training method, and compared with the previous embodiments, the embodiment further explains and optimizes the technical scheme. Specifically, the method comprises the following steps:
referring to fig. 2, a flowchart of another music neural network model pre-training method provided in the embodiment of the present application is shown in fig. 2, and includes:
s201: acquiring an original audio characteristic vector, performing mask processing on a feature sub-vector to be masked in the original audio characteristic vector to obtain a mask feature sub-vector, and replacing the feature sub-vector to be masked in the original audio characteristic vector with the mask feature sub-vector;
s202: inputting the mask feature vector into a neural network to be trained to predict a predicted audio feature sub-vector corresponding to the mask feature sub-vector;
s203: generating a discretization vector table of a target dimension based on a preset classification rule; wherein the target dimension is a dimension of an original audio feature in the feature sub-vector to be masked;
in this step, a discretization vector table is generated based on a preset classification rule, the preset classification rule at least comprises the number of categories and the specific representation mode of each category, and the discretization vector table can be represented as [ vq ]1,vq2,…,vqw]TWhere w is the number of classes, vqwDimension of and original audio features vnAre consistent in dimension. For example, if the number of categories is 10 and the dimension of the original audio feature is 512, the discretized vector table is 10 × 512.
S204: determining a category corresponding to each original audio feature in the feature sub-vectors to be masked based on the preset classification rule to generate category vectors corresponding to the feature sub-vectors to be masked;
in this step, a category corresponding to each original audio feature in the feature sub-vectors to be masked is determined based on a preset classification rule, and a category vector is generated. For example, if the size of the feature sub-vector to be masked is 3 × 512 and the number of categories is 10, the size of the generated category vector is 3 × 10.
As a possible implementation, the step may include: and determining the category corresponding to each original audio feature in the feature sub-vectors to be masked by using a full link layer based on a preset classification rule so as to generate one hot vectors corresponding to the feature sub-vectors to be masked. In a specific implementation, a full-connection layer can be used to predict a class corresponding to each original audio feature in the feature sub-vector to be maskedOtherwise, it is converted into a one hot vector using hard gumbal softmax, which can be expressed as [ vo ]k,vok+1,vok+2,…]T。
S205: determining the product of the category vector and the discretization vector table as a discretization feature sub-vector to be masked;
in this step, the product of the category vector and the discretization vector table is determined as the discretization feature to be masked sub-vector. The size of the discretized sub-vector of features to be masked coincides with the size of the sub-vector of features to be masked, and for the example given in the above step, the size of the discretized sub-vector of features to be masked is 3 × 512.
S206: splicing the discretized feature sub-vector to be masked as a positive sample with a plurality of negative samples to obtain a spliced audio feature vector;
in this step, the discrete feature sub-vectors to be masked are used as positive samples and spliced with a plurality of negative samples to obtain spliced audio feature vectors. As a possible implementation, negative examples may be selected from the original audio features in the original audio feature vector that are not masked. As another possible implementation manner, multiple sets of discretization vector tables of the target dimension may be generated based on a preset classification rule, one of the target discretization vector tables is selected to be multiplied by the category vector to obtain a discretization feature sub-vector to be masked, and a negative sample is selected from the remaining discretization vector tables.
S207: and constructing a loss function based on the predicted audio characteristic sub-vector and the spliced audio characteristic vector, and adjusting parameters in the neural network to be trained based on the current value of the loss function until the loss function is converged to obtain a music neural network model which is pre-trained.
As a preferred embodiment, after determining the category of the original audio feature by using the full connection layer, the method further includes: calculating category entropy based on the probability corresponding to the category output by the full connection layer; correspondingly, the adjusting the parameters in the neural network to be trained based on the current values of the loss function includes: adjusting parameters in the neural network to be trained based on the current values of the loss functions and the class entropy. In specific implementation, in order to prevent the neural network from outputting too single results in the optimization process, a category entropy is added in the optimization process, so that the categories output by the full connection layer are as diverse as possible.
Therefore, in the embodiment, the original audio features in the sub-vectors of the features to be masked are firstly classified and then discretized by multiplying the discretization vector table, so that the feature vector obtained by each category is fixed. The performance of the neural network model can be improved when the model training of the downstream task is carried out based on the music neural network model finished by pre-training.
The following introduces a music neural network model pre-training device provided in an embodiment of the present application, where a music neural network model pre-training device described below and a music neural network model pre-training method described above may refer to each other, and specifically includes:
the mask module is used for acquiring an original audio feature vector, performing mask processing on a to-be-masked feature sub-vector in the original audio feature vector to obtain a mask feature sub-vector, and replacing the to-be-masked feature sub-vector in the original audio feature vector with the mask feature sub-vector;
the prediction module is used for inputting the mask feature vector into a neural network to be trained so as to predict a prediction audio feature sub-vector corresponding to the mask feature sub-vector;
the discretization processing module is used for discretizing the original audio features in the sub-vectors of the features to be masked to obtain discretized sub-vectors of the features to be masked, and splicing the discretized sub-vectors of the features to be masked as positive samples with a plurality of negative samples to obtain spliced audio feature vectors;
and the adjusting module is used for constructing a loss function based on the predicted audio characteristic sub-vector and the spliced audio characteristic vector, and adjusting parameters in the neural network to be trained based on the current value of the loss function until the loss function is converged to obtain a music neural network model which is trained in advance.
For a neural network model in the music field, downstream tasks need to use audio features for model training, so that in a pre-training stage, the embodiment of the application directly performs mask masking on the audio features, and the neural network to be trained predicts the mask features and then performs pre-training. When the music neural network model based on pre-training is used for model training of downstream tasks, the convergence speed of the neural network model can be increased, and the training speed is increased. In addition, the original audio features are coded through discretization, and the performance of the neural network model can be improved when the model training of downstream tasks is carried out on the basis of the music neural network model which is trained in advance. Therefore, the music neural network model pre-training method provided by the application improves the speed of neural network model training in downstream tasks and the performance of the trained neural network model.
On the basis of the above embodiment, as a preferred implementation, the mask module includes:
the acquisition unit is used for acquiring an original audio feature vector;
the determining unit is used for determining the mask feature quantity based on preset probability distribution and preset length and determining a feature sub-vector to be masked in the original audio feature vector; wherein, the original audio features in the feature sub-vectors to be masked are the mask feature quantity;
the mask unit is used for performing mask processing on the sub-vectors of the features to be masked to obtain sub-vectors of the features to be masked;
and the replacing unit is used for replacing the to-be-masked feature sub-vector in the original audio feature vector with the masked feature sub-vector.
On the basis of the foregoing embodiment, as a preferred implementation, the discretization processing module includes:
the first generation unit is used for generating a discretization vector table of the target dimension based on a preset classification rule; wherein the target dimension is a dimension of an original audio feature in the feature sub-vector to be masked;
the second generating unit is used for determining a category corresponding to each original audio feature in the to-be-masked feature sub-vectors based on the preset classification rule so as to generate category vectors corresponding to the to-be-masked feature sub-vectors;
the discretization processing unit is used for determining the product of the category vector and the discretization vector table as a discretization feature to-be-masked sub-vector;
and the splicing unit is used for splicing the discretized feature sub-vector to be masked serving as a positive sample with a plurality of negative samples to obtain a spliced audio feature vector.
On the basis of the foregoing embodiment, as a preferred implementation manner, the second generating unit is specifically a unit that determines, by using a full connection layer, a category corresponding to each original audio feature in the feature sub-vectors to be masked based on a preset classification rule, so as to generate a one hot vector corresponding to the feature sub-vectors to be masked.
On the basis of the foregoing embodiment, as a preferred implementation, the discretization processing module further includes:
the computing unit is used for computing class entropy based on the probability corresponding to the class output by the full connection layer;
correspondingly, the adjusting module is specifically a module for constructing a loss function based on the predicted audio feature sub-vector and the spliced audio feature vector, and adjusting parameters in the neural network to be trained based on a current value of the loss function and the class entropy to obtain a music neural network model which is pre-trained.
On the basis of the foregoing embodiment, as a preferred implementation, the discretization processing module further includes:
a first selecting unit, configured to select negative samples from the unmasked original audio features in the original audio feature vector.
On the basis of the foregoing embodiment, as a preferred implementation manner, the first generating unit is specifically a unit that generates a plurality of sets of discretized vector tables of target dimensions based on a preset classification rule;
correspondingly, the discretization processing unit is specifically a unit for determining a target discretization vector table in a plurality of groups of discretization vector tables of target dimensions, and determining a product of the category vector and the target discretization vector table as a discretization feature sub-vector to be masked;
correspondingly, the discretization processing module further comprises:
and the second selecting unit is used for selecting negative samples in the discretization vector tables except the target discretization vector table.
On the basis of the foregoing embodiment, as a preferred implementation manner, the adjusting module is specifically a module that constructs a loss function based on a cosine distance between the predicted audio feature and the spliced audio feature, and adjusts parameters in the neural network to be trained based on a current value of the loss function until the loss function converges to obtain a music neural network model that is pre-trained.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
The present application further provides an electronic device, and referring to fig. 3, a structure diagram of an electronic device 30 provided in an embodiment of the present application may include a processor 31 and a memory 32, as shown in fig. 3.
The processor 31 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 31 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 31 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 31 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 31 may further include an AI (Artificial Intelligence) processor for processing a calculation operation related to machine learning.
Memory 32 may include one or more computer-readable storage media, which may be non-transitory. Memory 32 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 32 is at least used for storing the following computer program 321, wherein after being loaded and executed by the processor 31, the computer program can implement relevant steps in the music neural network model pre-training method executed by the electronic device side disclosed in any of the foregoing embodiments. In addition, the resources stored by the memory 32 may also include an operating system 322, data 323, and the like, which may be stored in a transient or persistent manner. Operating system 322 may include Windows, Unix, Linux, etc.
In some embodiments, the electronic device 30 may further include a display 33, an input/output interface 34, a communication interface 35, a sensor 36, a power source 37, and a communication bus 38.
Of course, the structure of the electronic device shown in fig. 3 does not constitute a limitation of the electronic device in the embodiment of the present application, and in practical applications, the electronic device may include more or less components than those shown in fig. 3, or some components may be combined.
In another exemplary embodiment, a computer readable storage medium including program instructions is further provided, which when executed by a processor, implement the steps of the music neural network model pre-training method performed by the electronic device according to any of the above embodiments.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.