Automobile future value-keeping rate prediction method, system, equipment and readable storage medium

文档序号:8782 发布日期:2021-09-17 浏览:29次 中文

1. A method for predicting a future value-retaining rate of an automobile is characterized by comprising the following steps:

acquiring automobile text data to be predicted, and performing word segmentation on the automobile text data to be predicted to obtain an automobile text sequence to be predicted;

performing word vector mapping on each word in the automobile text sequence to be predicted by adopting a word vector model to obtain a word vector matrix of the text sequence;

taking a word vector matrix of the text sequence as the input of an encoder-decoder model, and acquiring the output of the encoder-decoder model to obtain a future value-preserving rate prediction result of the automobile to be predicted; wherein the encoder-decoder model employs an encoder-decoder model based on a gate-loop unit variant.

2. The method for predicting the future warranty rate of an automobile according to claim 1, wherein a jieba tool is used for word segmentation of the text data of the automobile to be predicted to obtain the text sequence of the automobile to be predicted.

3. The automobile future value-preserving rate prediction method according to claim 1, characterized in that a word2vec word vector model is adopted to perform word vector mapping on each word in an automobile text sequence to be predicted to obtain a word vector matrix of the text sequence.

4. The method according to claim 1, wherein the encoder-decoder model based on the door cycle unit variant comprises a dual-layer encoder and a dual-layer decoder, and a soft attention mechanism is added between the dual-layer encoder and the dual-layer decoder;

the double-layer encoder and the double-layer decoder both adopt gate cycle unit variants; the gate cycle unit variant is constructed by removing a reset gate from the gate cycle unit, reserving an update gate and modifying a filtering data stream of the update gate to a candidate hidden state on the basis of the gate cycle unit; the initial hidden state of the double-layer encoder is a zero vector, and the initial hidden state of the double-layer decoder is the hidden state of the double-layer encoder at the last moment; the initial input to the dual layer decoder is 1.0.

5. The method of claim 4, wherein the forward propagation formula of the door cycle unit variant is as follows:

zt=σ(Wz·[ht-1,xt])

wherein z istThe output of the update gate of the gate cycle unit variant at the time t, sigma (—) is a sigmoid function, WzAnd WhAre respectively a weight parameter matrix, xtFor the word vector input at time t, ht-1Is a hidden state at the time t-1,is a candidate hidden state at time t, htFor the hidden state at time t, tanh (×) is a hyperbolic tangent function.

6. The method as claimed in claim 4, wherein the soft attention mechanism is calculated as follows:

wherein, aijThe attention distribution coefficient of the decoder at the ith moment to the encoder at the jth moment is set as similarity function hi-1Hidden state at time i-1 of a dual layer decoder, hjHidden state at time j of the double-layer encoder, yiIs the output of the dual-layer encoder at time i, CiFor the attention of the dual-layer encoder at the ith moment, lx is the length of the input text sequence of the dual-layer encoder, hkIs a hidden state at the kth moment of the double-layer encoder.

7. The method according to claim 4, wherein the encoder-decoder model based on the gate cycle unit variant is constructed and trained as follows:

acquiring a plurality of known automobile text data, and analyzing the known automobile text data to obtain a known automobile text sequence; performing word vector mapping on each word in the known automobile text sequence by adopting a word vector model to obtain a word vector matrix of the known automobile text sequence;

training a word vector matrix of a known automobile text sequence to an encoder-decoder model based on a gate cycle unit variant; the training process adopts back propagation;

constructing a loss function according to the sample prediction output value of the known automobile text data and the sample real output value of the known automobile text data, and calculating a loss function value;

judging whether the loss function value converges to a preset value;

if the loss function value is not converged to a preset value, updating model parameters of the encoder-decoder model based on the gate cycle unit variant until the loss function value is converged to the preset value, and storing corresponding model parameters to obtain the encoder-decoder model;

wherein the expression of the loss function is:

wherein L is a loss function, m is the number of samples of known automobile text data, yiFor a sample true output value of known car text data,an output value is predicted for a sample of the known car text data, and i is a sample number of the known car text data.

8. A prediction system for future value-preserving rate of an automobile is characterized by comprising a text sequence module, a word vector matrix and a result output module;

the text sequence module is used for acquiring automobile text data to be predicted, and performing word segmentation on the automobile text data to be predicted to obtain an automobile text sequence to be predicted;

the word vector module is used for carrying out word vector mapping on each word in the automobile text sequence to be predicted by adopting a word vector model to obtain a word vector matrix of the text sequence;

the result output module is used for taking the word vector matrix of the text sequence as the input of the encoder-decoder model, obtaining the output of the encoder-decoder model and obtaining the future value-preserving rate prediction result of the automobile to be predicted; wherein the encoder-decoder model employs an encoder-decoder model based on a gate-loop unit variant.

9. An automotive future warranty rate prediction apparatus comprising a memory, a processor, and executable instructions stored in said memory and operable in said processor; the processor, when executing the executable instructions, implements the method of any of claims 1-7.

10. A computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, implement the method of any one of claims 1-7.

Background

At present, the automobile market mainly comprises a new automobile market and a second-hand automobile market, and the automobile trading volume of the new automobile market is twice of that of the second-hand automobile market; the development of the second-hand car market is driven by the rise of the new car market, and the second-hand car market has more car sources; the customer's decision making is influenced by factors such as the performance, appearance, decoration and warranty rate of the new vehicle, wherein the future warranty rate of the new vehicle is the primary consideration of the customer.

At present, the future value-keeping rate of a new vehicle is usually statistical analysis historical data, if the value-keeping rate of a certain brand of new vehicle in the future 5 years is required to be calculated, the used-vehicle trading data of the brand with the vehicle age of 1-5 years are grouped according to the vehicle age, the used-vehicle trading prices of the used-vehicle trading data are averaged, and the average value of the used-vehicle trading prices corresponding to each vehicle age is used as the value-keeping rate of the brand of new vehicle in the future 1-5 years. The learner predicts the value-keeping rate of the vehicle by using an algorithm of clustering first and then regressing, finds other vehicles of the same type as the vehicle by clustering first, and then trains a regression model by using data of the other vehicles to predict the value-keeping rate of the vehicle, and the method value can predict the current value-keeping rate of a certain second-hand vehicle. Meanwhile, the big data technology is getting more and more hot, so that a plurality of companies begin to analyze the current situation of the domestic automobile market by using the big data technology, and the value-keeping rate of a new automobile in the coming years is given according to the formats of the family, the model and the like.

Through investigation and research, the following three defects are found in the prediction method of the future value-keeping rate of the new vehicle: firstly, the prediction of the future warranty rate of the new vehicle needs to collect and process a large amount of historical transaction data, which not only consumes manpower, but also has a long data collection period; secondly, the value retention rate prediction model based on the structured data needs to do numerical processing on the discrete data, the numerical processing mode can influence the expression meaning of the discrete data, and the characteristic values of some discrete data are more and difficult to do data value processing, so that the proportion of continuous data in the structured data is high and the proportion of discrete data is less; finally, the future warranty rate of the new vehicle is essentially a time series, the warranty rate of the next 1 year influences the warranty rate of the next 2 years, and so on; the conventional regression prediction model is not suitable for predicting future warranty rates of new vehicles.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides a method, a system, equipment and a readable storage medium for predicting the future value-preserving rate of an automobile, which aim to solve the technical problems that when the future value-preserving rate of a new automobile is predicted in the prior art, historical data is insufficient, the prediction can only be carried out according to structured data, the function of discrete data cannot be fully played, and meanwhile, the prediction result cannot reflect the time sequence between the future value-preserving rates.

In order to achieve the purpose, the invention adopts the technical scheme that:

the invention provides a method for predicting future value-preserving rate of an automobile, which comprises the following steps:

acquiring automobile text data to be predicted, and performing word segmentation on the automobile text data to be predicted to obtain an automobile text sequence to be predicted;

performing word vector mapping on each word in the automobile text sequence to be predicted by adopting a word vector model to obtain a word vector matrix of the text sequence;

taking a word vector matrix of the text sequence as the input of an encoder-decoder model, and acquiring the output of the encoder-decoder model to obtain a future value-preserving rate prediction result of the automobile to be predicted; wherein the encoder-decoder model employs an encoder-decoder model based on a gate-loop unit variant.

Further, a jieba tool is adopted to perform word segmentation on the automobile text data to be predicted, and an automobile text sequence to be predicted is obtained.

Further, word vector mapping is carried out on each word in the automobile text sequence to be predicted by adopting a word2vec word vector model, and a word vector matrix of the text sequence is obtained.

Further, the encoder-decoder model based on the gate cycle unit variant comprises a double-layer encoder and a double-layer decoder, and a soft attention mechanism is added between the double-layer encoder and the double-layer decoder;

the double-layer encoder and the double-layer decoder both adopt gate cycle unit variants; the gate cycle unit variant is constructed by removing a reset gate from the gate cycle unit, reserving an update gate and modifying a filtering data stream of the update gate to a candidate hidden state on the basis of the gate cycle unit; the initial hidden state of the double-layer encoder is a zero vector, and the initial hidden state of the double-layer decoder is the hidden state of the double-layer encoder at the last moment; the initial input to the dual layer decoder is 1.0.

Further, the forward propagation formula of the gate cycle unit variant is as follows:

zt=σ(Wz×[ht-1,xt])

wherein z istThe output of the update gate of the gate cycle unit variant at the time t, sigma (—) is a sigmoid function, WzAnd WhAre respectively a weight parameter matrix, xtFor the word vector input at time t, ht-1Is a hidden state at the time t-1,is a candidate hidden state at time t, htFor the hidden state at time t, tanh (×) is a hyperbolic tangent function.

Further, the attention calculation formula of the soft attention mechanism is as follows:

wherein, aijThe attention distribution coefficient of the decoder at the ith moment to the encoder at the jth moment is set as similarity function hi-1Hidden state at time i-1 of a dual layer decoder, hjHidden state at time j of the double-layer encoder, yiIs the output of the dual-layer encoder at time i, CiFor the attention of the dual-layer encoder at the ith moment, lx is the length of the input text sequence of the dual-layer encoder, hkIs a hidden state at the kth moment of the double-layer encoder.

Further, the construction training process of the encoder-decoder model based on the gate cycle unit variant is specifically as follows:

acquiring a plurality of known automobile text data, and analyzing the known automobile text data to obtain a known automobile text sequence; performing word vector mapping on each word in the known automobile text sequence by adopting a word vector model to obtain a word vector matrix of the known automobile text sequence;

training a word vector matrix of a known automobile text sequence to an encoder-decoder model based on a gate cycle unit variant; the training process adopts back propagation;

constructing a loss function according to the sample prediction output value of the known automobile text data and the sample real output value of the known automobile text data, and calculating a loss function value;

judging whether the loss function value converges to a preset value;

if the loss function value is not converged to a preset value, updating model parameters of the encoder-decoder model based on the gate cycle unit variant until the loss function value is converged to the preset value, and storing corresponding model parameters to obtain the encoder-decoder model;

wherein the expression of the loss function is:

wherein L is a loss function, m is the number of samples of known automobile text data, yiFor a sample true output value of known car text data,an output value is predicted for a sample of the known car text data, and i is a sample number of the known car text data.

The invention also provides a system for predicting the future value-preserving rate of the automobile, which comprises a text sequence module, a word vector matrix and a result output module;

the text sequence module is used for acquiring automobile text data to be predicted, and performing word segmentation on the automobile text data to be predicted to obtain an automobile text sequence to be predicted;

the word vector module is used for carrying out word vector mapping on each word in the automobile text sequence to be predicted by adopting a word vector model to obtain a word vector matrix of the text sequence;

the result output module is used for taking the word vector matrix of the text sequence as the input of the encoder-decoder model, obtaining the output of the encoder-decoder model and obtaining the future value-preserving rate prediction result of the automobile to be predicted; wherein the encoder-decoder model employs an encoder-decoder model based on a gate-loop unit variant.

The invention also provides a device for predicting the future value-preserving rate of the automobile, which comprises a memory, a processor and executable instructions stored in the memory and capable of running in the processor; and the processor executes the executable instructions to realize the automobile future warranty rate prediction method.

The invention also provides a computer-readable storage medium, which stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, the method for predicting the future value-preserving rate of the automobile is realized.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a method and a system for predicting future value-preserving rate of an automobile, which utilize text data of the automobile to be predicted to predict the future value-preserving rate of a new automobile, and the process of collecting and processing the text data of the automobile to be predicted is simpler than that of structured data; meanwhile, the function of discrete data can be amplified, the function of discrete data is fully exerted, and the cost of manually collecting and processing structured historical data is effectively reduced; by adopting an encoder-decoder model based on the gate cycle unit variant, the input length of the model can be adaptively changed according to the length of the text, and the flexibility is good; the understanding of the recurrent neural network to the time sequence can be fully exerted in the decoding process, and the time sequence between the future value-preserving rates of the new vehicle is fully reflected.

Furthermore, the double-layer encoder and the double-layer decoder both adopt a gate cycle unit variant, and the gate cycle unit variant removes a reset gate therein on the basis of the gate cycle unit, reserves an update gate, and modifies the filtering data stream of the update gate to the candidate hidden state; the precision and the training speed of the door circulating unit are effectively improved; the training speed of each sample can be improved by at least 22.8%, and the prediction precision is higher than that of a common gate cycle unit; the initial input of the double-layer decoder is set to be 1.0, and the value-preserving rate of the double-layer decoder is matched with the value-preserving rate of the new vehicle at the initial moment to be 100%, so that the method is in line with the reality.

Drawings

FIG. 1 is a flowchart illustrating a method for predicting a future warranty rate of an automobile according to an embodiment;

fig. 2 is a block diagram of an encoder-decoder model in an embodiment.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects of the present invention more apparent, the following embodiments further describe the present invention in detail. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a method for predicting future value-preserving rate of an automobile, which comprises the following steps:

step 1, obtaining automobile text data to be predicted, and performing word segmentation on the automobile text data to be predicted by adopting a jieba tool to obtain an automobile text sequence to be predicted; the text data of the automobile to be predicted comprises the text data of the brand, the automobile series and the configuration parameters of the automobile to be predicted; and collecting text data of the brand, the automobile series and the configuration parameters of the automobile to be predicted on a second-hand automobile website.

Step 2, performing word vector mapping on each word in the automobile text sequence to be predicted by adopting a word2vec word vector model to obtain a word vector matrix of the text sequence; in the word vector mapping process, each word in the automobile text sequence to be predicted is subjected to the influence, and a corresponding word vector is obtained; arranging corresponding word vectors from top to bottom according to the sequence of each word in the automobile text sequence to be predicted to obtain a word vector matrix of the text sequence; in the invention, the word2vec word vector model adopts a cbow model based on negative sampling.

Step 3, taking the word vector matrix of the text sequence as the input of an encoder-decoder, and obtaining the output of an encoder-decoder model to obtain the future value-preserving rate prediction result of the automobile to be predicted; wherein the encoder-decoder model employs an encoder-decoder model based on a gate-loop unit variant.

In the invention, an encoder-decoder model based on a gate cycle unit variant comprises a double-layer encoder and a double-layer decoder, and a soft attention mechanism is added between the double-layer encoder and the double-layer decoder; the double-layer encoder and the double-layer decoder both adopt gate cycle unit variants; the gate cycle unit variant is constructed by removing a reset gate from the gate cycle unit, reserving an update gate and modifying a filtering data stream of the update gate to a candidate hidden state on the basis of the gate cycle unit; the base models of the bilateral encoder and the double-layer decoder both adopt gate cycle unit variants, the input length of the models can be changed in a self-adaptive manner according to the length of the text, and the flexibility is good; in the decoding process, the understanding of the recurrent neural network to the time sequence can be fully exerted, and the time sequence between the future value-preserving rates of the new vehicle is fully embodied.

The forward propagation formula for the gate-cycle unit variant is as follows:

zt=σ(Wz·[ht-1,xt])

wherein z istFor the output of the update gate of the gate cycle unit at time t, σ (#) is a sigmoid function, WzAnd WhAre respectively a weight parameter matrix, xtFor the word vector input at time t, ht-1Is a hidden state at the time t-1,is a candidate hidden state at time t, htFor the hidden state at time t, tanh (×) is a hyperbolic tangent function.

The attention calculation formula for the soft attention mechanism is as follows:

wherein, aijThe attention distribution coefficient of the decoder at the ith moment to the encoder at the jth moment is set as similarity function hi-1Hidden state at time i-1 of a dual layer decoder, hjHidden state at time j of the double-layer encoder, yiIs the output of the dual-layer encoder at time i, CiFor the attention of the dual-layer encoder at the ith moment, lx is the length of the input text sequence of the dual-layer encoder, hkIs a hidden state at the kth moment of the double-layer encoder.

The construction training process of the encoder-decoder model based on the gate cycle unit variant is specifically as follows:

constructing an encoder-decoder model based on the gate cycle unit variant, and adding an attention mechanism in the encoder-decoder model; the gate cycle unit variant is a derivative structure of the gate cycle unit, and is based on the gate cycle unit, the reset gate is removed, the updated gate is reserved, and the filtered data flow of the updated gate to the candidate hidden state is modified.

Acquiring a plurality of known automobile text data, and analyzing the known automobile text data to obtain a known automobile text sequence; performing word vector mapping on each word in the known automobile text sequence by adopting a word vector model to obtain a word vector matrix of the known automobile text sequence;

training a word vector matrix of a known automobile text sequence to an encoder-decoder model based on a gate cycle unit variant; the training process adopts back propagation; the known automobile text data is obtained by collecting a plurality of text data describing brand, automobile series and configuration parameters of the used automobile on a used automobile website; in the invention, the value-preserving rate of each vehicle in front of a plurality of years is obtained by preprocessing the known vehicle text data, grouping the known vehicle text data according to the vehicle and then grouping the known vehicle text data according to the vehicle age; the text data of each vehicle is subjected to word segmentation and word vector mapping and then is used as an input sequence of an encoder-decoder model; taking the value-preserving rate of each vehicle in front of a plurality of years as a real output value of the encoder-decoder model; training a model by using the processed data; the training process adopts backward propagation, and the backward propagation can not obtain a predicted output value, which is a method for calculating the gradient; specifically, the gradient of the loss function to the last layer of network weight is calculated, the gradient of the last but one layer of network weight is calculated by following the chain rule, and the gradient of all layer of network weights is obtained by analogy.

Constructing a loss function according to the sample prediction output value of the known automobile text data and the sample real output value of the known automobile text data, and calculating a loss function value;

judging whether the loss function value converges to a preset value;

if the loss function value is not converged to a preset value, updating model parameters of the encoder-decoder model based on the gate cycle unit variant until the loss function value is converged to the preset value, and storing corresponding model parameters to obtain the encoder-decoder model;

wherein the expression of the loss function is:

wherein L is a loss function, m is the number of samples of known automobile text data, yiFor a sample true output value of known car text data,an output value is predicted for a sample of the known car text data, and i is a sample number of the known car text data.

Principle of operation

According to the automobile future value-preserving rate prediction method, after word segmentation and word vector mapping are carried out on each known automobile text data in the encoder-decoder model building and training process, the word segmentation and word vector mapping are used as an input sequence of an encoder-decoder model based on a gate cycle unit variant; taking the value-preserving rate of each vehicle in the previous years as the real output of an encoder-decoder model based on the door cycle unit variant, adopting backward propagation in the training process, and adjusting the parameters of the model by using a loss function to obtain the encoder-decoder model; in the prediction process, word segmentation and word vector mapping are carried out on the text data of the automobile to be detected, and the text data are used as an input sequence of an encoder-decoder model; the input sequence is encoded to be semantically adjacent by a double-layer encoder, and the output value is predicted by the double-layer decoder by using the information in the semantic vector under the action of a soft attention mechanism.

The method for predicting the future value-retention rate of the automobile predicts the future value-retention rate of a new automobile by using the text data of the automobile to be predicted, and the process of collecting and processing the text data of the automobile to be predicted is simpler than that of structured data; meanwhile, the function of discrete data can be amplified, the function of discrete data is fully exerted, and the cost of manually collecting and processing structured historical data is effectively reduced; by adopting an encoder-decoder model based on the gate cycle unit variant, the input length of the model can be adaptively changed according to the length of the text, and the flexibility is good; the understanding of the recurrent neural network to the time sequence can be fully exerted in the decoding process, and the time sequence between the future value-preserving rates of the new vehicle is fully reflected.

The invention also provides a system for predicting the future value-preserving rate of the automobile, which comprises a text sequence module, a word vector matrix and a result output module; the text sequence module is used for acquiring automobile text data to be predicted, and performing word segmentation on the automobile text data to be predicted to obtain an automobile text sequence to be predicted; the word vector module is used for carrying out word vector mapping on each word in the automobile text sequence to be predicted by adopting a word vector model to obtain a word vector matrix of the text sequence; the result output module is used for taking the word vector matrix of the text sequence as the input of the encoder-decoder model, obtaining the output of the encoder-decoder model and obtaining the future value-preserving rate prediction result of the automobile to be predicted; wherein the encoder-decoder model employs an encoder-decoder model based on a gate-loop unit variant.

The invention also provides a device for predicting the future value-retaining rate of the automobile, which comprises: a processor, a memory, and a computer program stored in and executable on the memory, such as an automotive future warranty rate prediction program; the processor implements the steps in the method for predicting the future warranty rate of the automobile when executing the computer program; alternatively, the processor implements the functions of the modules in the above device embodiments when executing the computer program.

Illustratively, the computer program may be partitioned into one or more modules that are stored in the memory and executed by the processor to implement the invention. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the automobile future warranty rate prediction device. For example, the computer program may be partitioned into a text sequence module, a word vector matrix, and a result output module; the specific functions of each module are as follows: the text sequence module is used for acquiring automobile text data to be predicted, and performing word segmentation on the automobile text data to be predicted to obtain an automobile text sequence to be predicted; the word vector module is used for carrying out word vector mapping on each word in the automobile text sequence to be predicted by adopting a word vector model to obtain a word vector matrix of the text sequence; the result output module is used for taking the word vector matrix of the text sequence as the input of the encoder-decoder model, obtaining the output of the encoder-decoder model and obtaining the future value-preserving rate prediction result of the automobile to be predicted; wherein the encoder-decoder model employs an encoder-decoder model based on a gate-loop unit variant.

The future value-retention rate prediction device of the automobile can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing devices. The automotive future warranty rate prediction device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the above examples of the vehicle future warranty rate prediction device do not limit the vehicle future warranty rate prediction device, and may include more or fewer components, or some components in combination, or different components, for example, the vehicle future warranty rate prediction device may further include an input-output device, a network access device, a bus, etc.

The processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor is a control center of the vehicle future warranty rate prediction device, and various interfaces and lines are utilized to connect various parts of the whole vehicle future warranty rate prediction device.

The memory may be used to store the computer programs and/or modules, and the processor may implement the various functions of the future warranty rate prediction apparatus by running or executing the computer programs and/or modules stored in the memory and invoking the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash memory card (FlashCard), at least one disk storage device, a flash memory device, or other volatile solid state storage device.

The module integrated with the future warranty rate prediction device of the automobile can be stored in a computer readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product.

Based on such understanding, all or part of the processes in the method for predicting the future warranty rate of the automobile can be realized by the present invention, and can also be completed by instructing the relevant hardware through a computer program, which can be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of the embodiment of the method for predicting the future warranty rate of the automobile can be realized. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer memory, Read-only memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc.

It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

Examples

As shown in fig. 1-2, in order to estimate the warranty rate of a new vehicle, the embodiment provides a method for predicting the future warranty rate of a vehicle, which includes the following steps:

step 1, sorting all information of an automobile to be predicted to obtain automobile text data to be predicted; the text data of the automobile to be predicted is collected on a second-hand car website and comprises text description data of the brand, the car series and the configuration parameters of the automobile.

Step 2, performing word segmentation on the automobile text data to be predicted by adopting a jieba tool to obtain an automobile text sequence to be predicted; wherein, the automobile text sequence to be predicted; the method comprises the steps of carrying out word segmentation on automobile text data to be predicted, dividing the automobile text data to be predicted into a plurality of words, wherein the word number of the text data is unchanged; for example, the text data of "today's weather is really good" is segmented to obtain "today's weather is really good".

Step 3, collecting a plurality of known automobile text data in the automobile market as a corpus, and training the word vector model to obtain a trained word vector model; in the embodiment, the word vector model adopts a word2vec word vector model of google; the word2vec word vector model adopts a cbow model based on negative sampling.

Step 4, performing word vector mapping on each word in the automobile text sequence to be predicted by adopting the trained word vector model, and respectively obtaining corresponding word vectors after the word vector mapping on each word in the automobile text sequence to be predicted; and arranging the corresponding word vectors from top to bottom according to the sequence of each word in the automobile text sequence to be predicted to obtain a word vector matrix of the text sequence.

Step 5, constructing an encoder-decoder model based on the gate cycle unit variant, and adding a soft attention mechanism between the encoder model and the decoder model; the encoder-decoder model comprises a double-layer encoder and a double-layer decoder, and a soft attention mechanism is added between the double-layer encoder and the double-layer decoder; the double-layer encoder and the double-layer decoder both adopt gate cycle unit variants; the gate cycle unit variant is constructed by removing a reset gate from the gate cycle unit, reserving an update gate and modifying a filtering data stream of the update gate to a candidate hidden state on the basis of the gate cycle unit; wherein, the initial hidden state of the double-layer encoder is zero vector, and the initial hidden state of the double-layer decoder is the hidden state of the last moment of the double-layer encoder

In the embodiment, the base models of the double-layer encoder and the double-layer decoder both adopt gate cycle unit variant CURG; wherein the gate-loop unit variant CURG employs a filtered data stream that retains only the update gates on a gate-loop unit basis and modifies the update gates to a candidate hidden state.

In this embodiment, the forward propagation formula of the gate cycle unit variant is as follows:

zt=σ(Wz×[ht-1,xt])

wherein z istThe output of the update gate of the gate cycle unit variant at the time t, sigma (—) is a sigmoid function, WzAnd WhAre respectively a weight parameter matrix, xtFor the word vector input at time t, ht-1Is a hidden state at the time t-1,is a candidate hidden state at time t, htFor the hidden state at time t, tanh (×) is a hyperbolic tangent function.

The attention calculation formula for the soft attention mechanism is as follows:

wherein, aijThe attention distribution coefficient of the decoder at the ith moment to the encoder at the jth moment is set as similarity function hi-1Hidden state at time i-1 of a dual layer decoder, hjHidden state at time j of the double-layer encoder, yiIs the output of the dual-layer encoder at time i, CiFor the attention of the dual-layer encoder at the ith moment, lx is the length of the input text sequence of the dual-layer encoder, hkIs a hidden state at the kth moment of the double-layer encoder.

In this embodiment, the process of constructing and training the encoder-decoder model specifically includes the following steps:

step 51, constructing an encoder-decoder model based on the gate cycle unit variant, and adding an attention mechanism in the encoder-decoder model based on the gate cycle unit variant;

step 52, acquiring a plurality of known automobile text data, and analyzing the known automobile text data to obtain a known automobile text sequence; performing word vector mapping on each word in the known automobile text sequence by adopting a word vector model to obtain a word vector matrix of the known automobile text sequence; training a word vector matrix of a known automobile text sequence to an encoder-decoder model based on a gate cycle unit variant; the training process adopts back propagation; the known automobile text data is obtained by collecting a plurality of text data describing brand, automobile series and configuration parameters of the used automobile on a used automobile website; in the invention, the value-preserving rate of each vehicle for 5 years is obtained by preprocessing the known vehicle text data, grouping the known vehicle text data according to the vehicle and then grouping the known vehicle text data according to the vehicle age; the text data of each vehicle is subjected to word segmentation and word vector mapping and then is used as an input sequence of an encoder-decoder model; taking the 5-year value-preserving rate of each vehicle as a real output value of the encoder-decoder model; training a model by using the processed data; the training process adopts backward propagation, and the backward propagation can not obtain a predicted output value, which is a method for calculating the gradient; specifically, the gradient of the loss function to the last layer of network weight is calculated, the gradient of the last but one layer of network weight is calculated by following the chain rule, and the gradient of all layer of network weights is obtained by analogy.

Step 53, constructing a loss function according to the predicted output value and the real output value, and calculating a loss function value; wherein the expression of the loss function is:

wherein L is a loss function, m is the number of samples of known automobile text data, yiFor a sample true output value of known car text data,an output value is predicted for a sample of the known car text data, and i is a sample number of the known car text data.

Step 54, judging whether the loss function value converges to a preset value;

and step 55, if the loss function value is not converged to the preset value, updating the model parameters of the encoder-decoder model until the loss function value is converged to the preset value, and storing the corresponding model parameters to obtain the trained encoder-decoder model.

Similar to most deep learning algorithms, the encoder-decoder model in this embodiment needs to be trained using a large amount of labeled car text data; where back propagation is used, where model parameters are adjusted according to the loss, a loss function needs to be constructed.

In the embodiment, a double-layer encoder built by a gate cycle unit variant CURG is adopted, and a first layer of the double-layer encoder encodes word vectors input at each moment into a middle hidden state; the intermediate hidden state is used as an input to a second layer of the dual layer encoder; the second layer of the double-layer encoder obtains the final hidden state of the double-layer encoder according to the intermediate hidden state; wherein, the initial hidden states of the first layer and the second layer of the double-layer encoder are zero vectors; by adding a soft attention mechanism between the double-layer encoder and the double-layer decoder, under the action of the soft attention mechanism, the double-layer decoder calculates the attention of the current moment according to the hidden state of the previous moment and the hidden states of the double-layer encoder at all moments; the double-layer decoder calculates and obtains the best hidden state at the current moment by using the attention at the current moment; the double-layer decoder is built by adopting a gate cycle unit variant CURG, the double-layer decoder calculates the attention of the current moment according to the hidden state of the previous moment and the hidden states of the double-layer encoder at all moments under the action of a soft attention mechanism, and the double-layer decoder calculates the output of each moment by using the attention of each moment and the input of each moment; wherein, the output of the double-layer decoder at each moment is the prediction result of the future value-preserving rate of the automobile.

In this embodiment, the initial hidden states of the first layer and the second layer of the dual-layer decoder are set as the hidden states of the first layer and the second layer of the last-moment encoder, respectively; since the value-preserving rate of the new car at the initial moment is 100%, the initial input of the double-layer decoder is set to 1.0.

And 6, taking the word vector matrix of the text sequence as the input of the encoder-decoder model, and acquiring the output of the encoder-decoder model to obtain the future value-preserving rate prediction result of the automobile to be predicted.

In this embodiment, a word vector matrix of a text sequence is used as an input process of an encoder-decoder model, word vectors in each row of the word vector matrix of the text sequence are respectively used as input of a bilateral encoder at each moment, and a double-layer decoder calculates attention at the current moment according to a hidden state at the previous moment and hidden states at all moments of the double-layer encoder under the action of a soft attention mechanism; and the double-layer decoder calculates the output of each moment by using the attention of each moment and the input of each moment, and the future value-preserving rate prediction result of the automobile is obtained.

The automobile future value-retention rate prediction method adopts an encoder-decoder model based on a door cycle unit variant, and adds a soft attention mechanism in the encoder-decoder model, so that the cost of collecting and processing structured historical data by people is reduced, the effect of discrete data can be amplified by predicting the future value-retention rate of a new automobile by using text data, and the method is more in line with the cognition of people on brand effect; the gate cycle unit variant adopts the filtering data stream which eliminates the reset gate and updates the gate to the candidate hidden state on the basis of the gate cycle unit, so that the precision and the training speed of the gate cycle unit are improved, wherein the training speed of each sample can be improved by 22.8%, and the precision can be improved a little above the gate cycle unit.

In the invention, the encoder-decoder model based on the gate cycle unit variant belongs to the category of the cyclic neural network, the double-layer decoder takes the gate cycle unit variant as a basic model, and the double-layer decoder can give full play to the specific understanding of the cyclic neural network on the time sequence in the decoding process, thereby embodying the time sequence between future value-preserving rates of the new vehicle.

The above-described embodiment is only one of the embodiments that can implement the technical solution of the present invention, and the scope of the present invention is not limited by the embodiment, but includes any variations, substitutions and other embodiments that can be easily conceived by those skilled in the art within the technical scope of the present invention disclosed.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:基于Encoder-Decoder的长时交通流预测方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!