Suspicious transaction identification model construction method
1. A suspicious transaction identification model construction method is characterized by comprising the following steps:
s1, constructing a training set, wherein the training set comprises transaction accounts with preset labels;
s2, obtaining all transaction records of the transaction account in a certain time period, dividing all records into different transaction record subsequences according to a certain time interval, calculating the total transaction amount of the transaction records contained in each subsequence, taking the calculation result as an element of the time sequence of the transaction account, and enabling all elements to form the time sequence representation of the transaction account;
s3, selecting a fixed number of subsequences from subsequences of all time sequences in the training set as candidate Shapelets to form a Shapelet candidate set, then calculating the information gain of each candidate Shapelet in the Shapelet candidate set, and finally extracting the first K Shapelets with the largest information gain;
s4, constructing a Shapelet relational graph, wherein the Shapelet relational graph is an undirected weighted graph and consists of K nodes, each node represents a Shapelet, and the weight of each edge represents the probability that two different connected Shapelets can be matched with the same time sequence at the same time;
s5, after the Shapelet relational graph is obtained, embedding the Shapelet relational graph by adopting a graph embedding method to obtain a representation vector of any embedded node;
s6, multiplying the embedded expression vector of the Shapelet and the matching degree for the time sequence, all Shapelets matched with the time sequence and the corresponding matching degree, and accumulating all multiplied results to be used as the expression vector of the current time sequence;
and S7, inputting the expression vector of each time sequence in the training set as the characteristic of the time sequence into a multi-layer perceptron neural network for training, and performing classified prediction on the transaction account to be predicted by using the trained multi-layer perceptron neural network.
2. The method according to claim 1, wherein the acquiring of all transaction records of a transaction account within a certain time period, dividing all records into different transaction record subsequences according to a certain time interval, calculating a transaction amount total of transaction records included in each subsequence, and taking a calculation result as an element of the time sequence of the transaction account, wherein all elements constitute a time sequence representation of the transaction account, specifically:
all transaction records A for the ith transaction account in the dataseti,*It is first converted into a time series representation;
according to a certain timeSpaced from each other by ai,*Dividing the data into different transaction record subsequences, wherein n transaction records are shared in the mth subsequence, and then the mth subsequence is expressed as:
Segi,m={Ai,x,Ai,x+1,…,Ai,x+n-1};
wherein A isi,x,Ai,x+1,…,Ai,x+n-1Respectively corresponding 1 st to nth transaction records in the mth subsequence;
the total transaction amount of the mth subsequence is:
wherein A isi,jIs the jth transaction record, Value (A) of the ith accounti,j) Record A for transactioni,jA corresponding transaction amount;
the time series of the ith transaction account is represented asWherein L isiThe number of sub-sequences of the transaction records divided for the ith transaction account is also the time sequence TiLength of (d);first, theiL th of an accountiSegment subsequence representation.
3. The method according to claim 1 or 2, wherein the selecting a fixed number of subsequences from the subsequences of all time sequences in the training set comprises: and selecting the sub-sequences by adopting a greedy strategy so that the Euler distance between the selected sub-sequences is maximum.
4. The method of claim 1 or 2, wherein the constructing the Shapelet relationship graph is an undirected weighted graph, the Shapelet relationship graph is composed of K nodes, each node represents a Shapelet, and the weight of each edge represents the probability that two different connected Shapelets can simultaneously match the same time series, specifically:
for the training set D of the time sequence, K Shapelets are respectively { S }1,…,SKSetting a distance threshold value as delta;
the constructed Shapelet relational graph is represented as G, wherein the node set is V ═ V1,…,vK},v1,…,vKSequentially corresponding K Shapelets to the 1 st to Kth nodes respectively;
for each time sequence T in the training set DiFind the time series TiSet of all sharelets matching under distance threshold δ Si,*};
Set of { S }i,*Any two Shapelets (S)j,Sk) Constructing a line vjAnd vkThe weight of the edge is set to be pi,j×pi,kWherein p isi,jIs a time sequence TiWith ShapeletS under the condition of distance threshold deltajProbability of match, pi,kIs a time sequence TiWith ShapeletS under the condition of distance threshold deltakThe probability of matching;
all the repeated edges are combined into one edge, and the weights of the repeated edges are added.
5. The method of claim 4, wherein the Shapelet relational graph is embedded by using a graph embedding method to obtain a representation vector of any embedded node
Obtaining any node v in Shapelet relational graph by adopting graph embedding method DeepwalkiIs represented by vector ui∈RBAnd B is the set embedding dimension, and B is taken to be 64.
6. The method of claim 4, wherein for the time series and all sharelets matching it, and the corresponding match degrees, multiplying the embedded representation vector of sharelet by the match degrees, and adding the results of all multiplications as the representation vector for the current time series, specifically:
for time series TiAnd the set of all Shapelets matching it Si,*} and a corresponding degree of matching { pi,*Will Shapelet Si,jThe embedded expression vector of (2) is μ (S)i,j),Si,j∈{Si,*},μ(Si,j) Degree of matching pi,jMultiplying and then pairing the set { S }i,*The results of all multiplications are accumulated as a time series TiIs represented by a vector phii;
7. The method of claim 6, wherein the degree of match pi,jRepresenting Shapelet SjAnd TiThe matching degree is specifically as follows:
where max (dist (S)i,*,Ti) Represents the set Si,*All sharelets therein and time series TiMaximum value of the euler distance of (d); min (dist (S)i,*,Ti) Represents the set Si,*All sharelets therein and time series TiThe minimum value of the euler distance of (d); dist (S)j,Ti) Finger Shapelet SjAnd TiThe euler distance of.
8. The method according to any one of claims 1, 2, 5, 6 or 7, wherein the classification prediction of the transaction account to be predicted is performed by using a trained multi-layer perceptron neural network, specifically:
aiming at the transaction account to be predicted, the time-series expression vector is obtained in the manner of steps S2-S6, and the time-series expression vector is input into the trained multilayer perceptron neural network as the characteristic of the time series, so that the classification prediction result of the transaction account to be predicted is obtained.
Background
The crime of money laundering is an important criminal activity which endangers the financial safety and social stability of China, and has great practical significance in researching the anti-money laundering problem. The core of the anti-money laundering work is the recording and analysis of large-amount and suspicious transaction data, and although the suspicious transaction identification problem has been studied for decades, no mature and effective technology can be applied and realized on a large scale at present. The suspicious transaction identification model based on the traditional machine learning method needs a user to extract features by himself, usually only pays attention to the static attributes of an account, and does not consider the dynamic change of transaction behaviors along with time, so that the requirement of the current anti-money laundering work cannot be met.
Shapelet is a popular research direction for the time series classification problem, and Shapelet can capture local features of time series, and has good classification effect even if the data is noisy and distorted. In addition, Shapelet may provide the user with interpretable classification results, which are important for suspicious transaction identification issues because the analysis results may serve as an important basis for subsequent investigation, evidence collection, and case resolution of money laundering criminal cases. However, the sequence pattern classification model only uses a single Shapelet as a unique standard for time-series classification, so that it is difficult to identify money laundering behaviors of different transaction patterns and different transaction rules.
Disclosure of Invention
In view of the above, the invention provides a suspicious transaction identification model construction method, which can identify suspicious transaction behaviors in different modes and has good identification capability for different money laundering means and different money laundering behavior characteristics.
In order to achieve the above purpose, the method for constructing the suspicious transaction identification model of the invention comprises the following steps:
s1, constructing a training set, wherein the training set comprises transaction accounts with preset labels.
S2, obtaining all transaction records of the transaction account in a certain time period, dividing all records into different transaction record subsequences according to a certain time interval, calculating the total transaction amount of the transaction records contained in each subsequence, taking the calculation result as an element of the time sequence of the transaction account, and enabling all elements to form the time sequence representation of the transaction account.
And S3, selecting a fixed number of subsequences from subsequences of all time sequences in the training set as candidate Shapelts to form a Shapelt candidate set, then calculating the information gain of each candidate Shapelt in the Shapelt candidate set, and finally extracting the first K Shapelts with the largest information gain.
S4, constructing a Shapelet relational graph, wherein the Shapelet relational graph is an undirected weighted graph and consists of K nodes, each node represents one Shapelet, and the weight of each edge represents the probability that two different connected Shapelets can be matched with the same time sequence at the same time;
and S5, after the Shapelet relational graph is obtained, embedding the Shapelet relational graph by adopting a graph embedding method to obtain a representation vector of any embedded node.
And S6, multiplying the embedded expression vector of the Shapelet with the matching degree for the time sequence, all Shapelets matched with the time sequence and the corresponding matching degree, and accumulating all multiplied results to serve as the expression vector of the current time sequence.
And S7, inputting the expression vector of each time sequence in the training set as the characteristic of the time sequence into a multi-layer perceptron neural network for training, and performing classified prediction on the transaction account to be predicted by using the trained multi-layer perceptron neural network.
Further, acquiring all transaction records of the transaction account within a certain time period, dividing all records into different transaction record subsequences according to a certain time interval, calculating the total transaction amount of the transaction records contained in each subsequence, taking the calculation result as an element of the time sequence of the transaction account, wherein all elements form the time sequence representation of the transaction account, and specifically comprises the following steps:
all transaction records A for the ith transaction account in the dataseti,*It is first converted to a time series representation.
According to a certain time interval Ai,*Dividing the data into different transaction record subsequences, wherein n transaction records are shared in the mth subsequence, and then the mth subsequence is expressed as:
Segi,m={Ai,x,Ai,x+1,…,Ai,x+n-1};
wherein A isi,x,Ai,x+1,…,Ai,x+n-1Respectively corresponding 1 st to nth transaction records in the mth subsequence.
The total transaction amount of the mth subsequence is:
wherein A isi,jIs the jth transaction record, Value (A) of the ith accounti,j) Record A for transactioni,jThe corresponding transaction amount.
The time series of the ith transaction account is represented asWherein L isiThe number of sub-sequences of the transaction records divided for the ith transaction account is also the time sequence TiLength of (d);first, theiL th of an accountiSegment subsequence representation.
3. A method as claimed in claim 1 or 2, characterized by selecting a fixed number of subsequences from the subsequences of all time sequences in the training set, in particular: selecting the sub-sequences by a greedy strategy to enable the Euler distance between the selected sub-sequences to be maximum;
further, a Shapelet relational graph is constructed, the Shapelet relational graph is an undirected weighted graph, the Shapelet relational graph is composed of K nodes, each node represents one Shapelet, the weight of each edge represents the probability that two connected different Shapelets can be matched with the same time sequence at the same time, and the specific steps are as follows:
for the training set D of the time sequence, K Shapelets are respectively { S }1,…,SKAnd setting the distance threshold value to be delta.
The constructed Shapelet relational graph is represented as G, wherein the node set is V ═ V1,…,vK},v1,…,vKThe 1 st to Kth nodes correspond to K Shapelets in sequence respectively.
For each time sequence T in the training set DiFind the time series TiSet of all sharelets matching under distance threshold δ Si,*}。
Set of { S }i,*Any two Shapelets (S)j,Sk) Constructing a line vjAnd vkThe weight of the edge is set to be pi,j×pi,kWherein p isi,jIs a time sequence TiWith ShapeletS under the condition of distance threshold deltajProbability of match, pi,kIs a time sequence TiWith ShapeletS under the condition of distance threshold deltakThe probability of a match.
All the repeated edges are combined into one edge, and the weights of the repeated edges are added.
Further, embedding the Shapelet relational graph by adopting a graph embedding method to obtain a representation vector of any embedded node; obtaining any node v in Shapelet relational graph by adopting graph embedding method DeepwalkiIs represented by vector ui∈RBAnd B is the set embedding dimension, and B is taken to be 64.
Further, for the time sequence, all sharelets matched with the time sequence, and the corresponding matching degree, multiplying the embedded expression vector of the sharelet by the matching degree, and then accumulating all multiplication results as the expression vector of the current time sequence, specifically:
for time series TiAnd the set of all Shapelets matching it Si,*} and a corresponding degree of matching { pi,*Will Shapelet Si,jThe embedded expression vector of (2) is μ (S)i,j),Si,j∈{Si,*},μ(Si,j) Degree of matching pi,jMultiplying and then pairing the set { S }i,*The results of all multiplications are accumulated as a time series TiIs represented by a vector phii;
Further, the degree of matching pi,jRepresenting Shapelet SjAnd TiThe matching degree is specifically as follows:
where max (dist (S)i,*,Ti) Represents the set Si,*All sharelets therein and time series TiMaximum value of the euler distance of (d); min (dist (S)i,*,Ti) Represents the set Si,*All sharelets therein and time series TiThe minimum value of the euler distance of (d); dist (S)j,Ti) Finger Shapelet SjAnd TiThe euler distance of.
Further, classification prediction of the transaction account to be predicted is carried out by using the trained multilayer perceptron neural network, and the method specifically comprises the following steps: aiming at the transaction account to be predicted, the time-series expression vector is obtained in the manner of steps S2-S6, and the time-series expression vector is input into the trained multilayer perceptron neural network as the characteristic of the time series, so that the classification prediction result of the transaction account to be predicted is obtained.
Has the advantages that:
1. the invention provides a suspicious transaction identification model, which comprises the steps of firstly extracting the first K Shapelets with the maximum information gain, then constructing a Shapelet relational graph according to the matching condition of the Shapelets and time sequences, wherein each node in the graph represents one Shapelet, and the main idea is that for two different Shapelets, if the Shapelets can be matched with more time sequences at the same time, the correlation between the Shapelets is stronger, and the weight of the edge reflected between the two nodes in the Shapelet relational graph is larger; and then embedding the graph by a Deepwalk algorithm, performing representation learning on the time sequence according to an embedding result to obtain a representation vector of each sequence, and finally performing classification training on the representation vectors by using a multi-layer perceptron. The model has good recognition effect on money laundering behaviors with different modes and different rules in mass transaction record data, and has good practical application value.
2. The method provides a concept of a Shapelet relational Graph, a plurality of Shapelets are constructed into a Graph structure according to the degree of association among the Shapelets, one Shapelet represents a certain transaction mode, the relationship among different transaction modes is mined through the Shapelet relational Graph, and the whole Shapelet Graph model has the characteristic of money laundering behaviors capable of identifying different transaction rules. This is the most central part of the model.
3. A method of representation learning of a time series is presented. For any time series, the Shapelet relationship graph and the matching degree between the time series and each Shapelet are used to represent the time series as a vector, and the vector contains the association degree between the time series and each Shapelet (or each transaction mode), so that in the downstream classification task, the representing vector can be classified by a classifier to obtain the classification result of the corresponding time series.
Drawings
FIG. 1 is a block diagram of a suspicious transaction identification model characterized by a sequence diagram;
figure 2 is a Shapelet relationship diagram.
Detailed Description
The invention is described in detail below by way of example with reference to the accompanying drawings.
The invention provides a construction method of a transaction identification model, which comprises the following steps as shown in figure 1:
s1, constructing a training set, wherein the training set comprises transaction accounts with preset labels; as shown in fig. 1, a training data set and a testing data set may also be constructed simultaneously from historical transaction record data.
And S2, extracting the time sequence aiming at the transaction account.
And considering all transaction records of a transaction account in a certain time period, dividing the transaction records into different transaction record subsequences according to a certain time interval, calculating the total transaction amount of all transaction records of each subsequence, taking the calculation result as an element of the time sequence of the account, and forming the time sequence representation of the account by all elements.
A complete transaction record Ai,jAt least 6 fields of transaction card number, transaction time, transaction amount, transaction balance, transaction counter card number and transaction counter balance are included, as shown in table 1.
TABLE 1 transaction record field
Wherein A isi,*All transaction records representing the ith account in the data set,wherein c isiIndicating the number of transaction records for the ith account. Converting the account into a time series representation, specifically: a is added at certain time intervals (such as 10 days)i,*Dividing the data into different transaction record subsequences, and if n transaction records are totally contained in the mth subsequence, obtaining the mth subsequenceShown as follows:
Segi,m={Ai,x,Ai,x+1,…,Ai,x+n-1};
wherein A isi,x,Ai,x+1,…,Ai,x+n-1Respectively corresponding 1 st to nth transaction records in the mth subsequence.
The total transaction amount of the mth subsequence is:
wherein A isi,jIs the jth transaction record, Value (A) of the ith accounti,j) Record A for transactioni,jThe corresponding transaction amount.
The time series for this account is represented as:
Tithe result of the time-series extraction of the account is also the time-series representation of the account, and in the subsequent classification task, the account is classified into a normal transaction account (marked as N) or a suspicious money washing account (marked as P) according to the time-series representation of each account; l isiRecord the number of sub-sequences, also time-sequences T, for the divided transactions of the accountiLength of (d); a. thei,jIs the jth transaction record, Value (A) of the ith accounti,j) Is a transaction record Ai,jThe transaction amount of (1).First, theiL th of an accountiSegment subsequence representation.
S3, extracting Shapelet
Extracting a Shapelet candidate set from all time sequence subsequences in the training set, then calculating the information gain of each candidate Shapelet in the candidate set, and finally extracting the first K Shapelets with the largest information gain. To reduce the size of the Shapelet candidate set to improve efficiency, only a fixed number of subsequences are selected, and a greedy strategy is adopted to maximize the Euler distance between all subsequences in the candidate set.
And selecting a fixed number of subsequences from the subsequences of all time sequences in the training set, and adopting a greedy strategy to enable the Euler distance between the subsequences in the candidate set to be maximum. This results in a near-optimal solution rather than an absolute optimal solution, but greatly reduces the size of the Shapelet candidate set, and thus the time to extract Shapelet.
After a Shapelet candidate set is obtained, for each subsequence in the candidate set, the best classification point of the subsequence is found, the information gain of the classification strategy is calculated, and then the first K subsequences with the largest information gain are selected as extracted Shapelet.
The calculation formula of the information gain is as follows:
I(D)=-p(P)log(p(P))-p(N)log(p(N))
wherein P (P), P (N) respectively refer to the proportion of the P type time series and the N type time series in the set D, P and N are respectively the class labels of the time series, for example, P represents a non-abnormal transaction account, and N represents an abnormal transaction account; i (D) represents the entropy of the training set D;representing the entropy of the training set D after classification; gain is information gain; d1And D2Two subsets after the set D is classified; f (D)1) And f (D)2) Respectively refer to subset D1And D2The number of the elements accounts for the proportion.
S4, constructing a Shapelet relational graph
The Shapelet relationship graph is an undirected weighted graph consisting of K vertices, where each node represents a Shapelet and the weight of each edge represents the probability that two different Shapelets can match the same time series at the same time. In a pictographic way, the Shapelet relationship graph reflects the correlation between different Shapelets, and if the weight of a certain edge is larger, the relationship between the two Shapelets is more compact, and the probability that the two Shapelets can be matched with the same time sequence is higher.
After the first K Shapelets with the maximum information gain are extracted, the K Shapelets are used for constructing a Shapelet relational graph, the Shapelet relational graph is an undirected weighted graph G (V, E), the graph consists of K vertexes, and each node V in the graphiRepresenting a Shapelet SiWeight of each edge wi,jDenotes that for the same time series, SiAnd SjProbability of matching the time series under delta condition. Pictorially, a Shapelet relationship graph reflects the correlation between different Shapelets, if the weight wi,jThe larger the size, the larger the size of SiAnd SjThe tighter the relationship between them, the greater the probability that they can be matched to the same time series at the same time.
Given a distance threshold δ, for time series T and Shapelet S, if any: dist (T, S) < delta, that is, if the Euler distance between T and S is less than delta, T and S are said to be matched under the delta condition. For each time sequence T in the training setiFinding the energy and TiAll sharelets that match under the delta condition. For convenience, with Si,*Denotes all and TiSet of sharelets matching under δ condition:
Si,*={Sj|dist(Sj,Ti)<δ}
wherein dist (S)j,Ti) Finger SjAnd TiThe euler distance of.
To measure SjAnd TiDegree of matching, for SjTo TiIs normalized by the distance of (a), called pi,jIs SjAnd TiThe matching degree of (2):
for time series TiIf p isi,jThe larger the size, the more SjAnd TiThe higher the degree of matching, for Sj,Sk∈Si,*Term "pi,j*pi,kIs SjAnd SkIn time series TiCorrelation of p ifi,j*pi,kThe larger the size, the more SjAnd SkSimultaneously with TiThe higher the probability of a match, i.e. SjAnd SkThe greater the correlation.
The detailed process for constructing the Shapelet relationship graph is as follows:
first, a Shapelet relationship graph G is initialized to (V, E), and each node V in GiRepresenting the corresponding Shapelet Si,V(G)={v1,…,vK}. Then, for each time sequence T in the training set DiFind and TiAll Shapelet S matching under delta conditionsi,*. In practical applications, δ may be determined by experimental statistics on a training data set. If Si,*If | ≧ 1, then for Si,*Each pair of Shapelets (S)j,Sk) Establishing a vjAnd vkThe weight of the edge between is pi,j*pi,k. And finally, combining all repeated edges into one edge, and adding the weights of the repeated edges. Thus, any two nodes v in Shapelet relational graph Gj,vkThe weights of the edges in between are:
and S5, after obtaining the Shapelet relational graph, embedding the graph by adopting a Deepwalk algorithm. The invention adopts the existing graph embedding method Deepwalk to obtain any node v in the relational graphiIs represented by vector ui∈RBWhere B is the embedding dimension, where B is taken to be 64.
S6, representation learning of time series
And multiplying the embedded expression vector of the Shapelet by the matching degree for the time sequence, all Shapelets matched with the time sequence and the corresponding matching degree, and accumulating the results of all the multiplications to form the expression vector of the time sequence.
For time series TiAnd all Shapelet S matched with iti,*And corresponding degree of matching pi,*Shapelet Si,jThe embedded representation vector μ (S)i,j) Degree of matching pi,jMultiplying, and adding the results of all multiplications as TiRepresents a vector. If a certain time sequence TiCannot match any sharelet, i.e.:then T is represented by a zero vector of dimension Bi. This is reasonable because all sharelet embedding results are non-zero vectors (as guaranteed by the DeepWalk algorithm), so time series that do not match any one sharelet can be clearly distinguished from other time series by zero vectors. Thus, all the Shapelet embedding and any one time sequence T are obtainediIs represented by a vector phii,
S7, classification training and prediction of multilayer perceptron
And inputting the representation vector of each time sequence as the characteristic of the time sequence into a multi-layer perceptron neural network to perform classified prediction on the time sequences.
Examples 1,
The model and the algorithm provided by the invention are evaluated and compared with three traditional machine learning algorithms and an original Shapelet model, and run on a real data set, and due to the sensitivity of transaction record data, no open data set is available for the moment in the field of anti-money laundering, which is a problem commonly faced in the field of anti-money laundering research. The data set used in the experimental part of this document comes from a public security department, and rather than a public data set, there may be some bias in the data.
The raw data contains a total of about 2500 million transaction records for 29934 transaction accounts, each of which spans from days to years. Since many transaction account data only have a few transaction records and are not analytical, less than 50 transaction records or accounts with a time span of less than 6 months are filtered out of the original data. The eligible 18545 accounts of transaction record data are finally retained as experimental research data herein, wherein 723 accounts are the bank accounts involved in the money laundering criminal case that the public security department has explicitly investigated, these accounts are marked as positive examples (suspicious money laundering account), and the other 17822 accounts are marked as negative examples (normal transaction account).
The format of the transaction record data is shown in table 2.
TABLE 2 transaction record data Format
Because the time span of each account of the original data is not uniform, in order to eliminate the influence of the time span and facilitate research, the experiment divides the data by taking years as a unit. The specific method comprises the following steps: if the transaction record time span of a certain account is more than one year, segmenting the certain account into a plurality of account samples according to the year, wherein the labels of the divided samples are all the same as those of the original account samples; an account is considered to be a year if its transaction record time span is not greater than one year. Thus, the 18545 account samples in the original sample set are divided into 81742 samples, with 2785 positive samples and 78957 negative samples. And randomly selecting 500 positive samples and 500 negative samples from the sample set as a training set, and taking the rest samples as a testing set. It is reasonable that the proportion of positive samples in the test set is only 2.83%, and the number of positive samples and negative samples is greatly unbalanced, because the money laundering transaction accounts for a very low proportion of all transaction records in real-world situations. The training and test set sample distributions are shown in table 3.
TABLE 3 training set and test set sample distribution
Sample set
Number of positive samples
Number of negative samples
Total number of samples
Training set
500
500
1000
Test set
2285
78457
80742
In total
2785
78957
81742
All 80742 account samples of the test set were tested based on three different machine learning algorithms, OneClass SVM (OCSVM), solitary forest and DBSCA, as compared to the experimental results of the Shapelet Graph model. The conventional machine learning algorithm usually needs to perform feature extraction on data, and here, the following features are selected to perform experiments of the three machine learning algorithms.
1. The transaction amount, 2, the monthly transaction amount, 3, the transaction amount dispersion coefficient, 4, the transaction number, 5, the charge-out/charge-out frequency, 6, the number of counterparties, 7, the charge-out/charge-in amount, 8, the transaction amount of ten thousand per thousand, and 9, the large transaction number.
The experimental results are shown in table 4, and show that the suspicious transaction identification algorithm based on time series analysis has better performance than the traditional machine learning algorithm. And compared with the sequence pattern classification model, the sequence pattern graph characterization model provided by the invention has the advantage that F1-score is improved by 11.1 percentage points.
TABLE 4 results of the experiment
Shapelet graph As shown in FIG. 2, each node in the Shapelet graph represents a Shapelet, and the weight of each edge in the graph represents the weight of the edge.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:支付方法、系统、电子设备及存储介质