Credit risk identification method, apparatus, device, and program
1. The credit risk identification method is applied to a community network containing overlapped communities, wherein the overlapped communities at least comprise reference sample points and sample points to be identified; the method comprises the following steps:
acquiring an overlapping community structure of the reference sample point and the sample point to be identified in an overlapping community, the self-characteristics of the reference sample point and the self-characteristics of the sample point to be identified;
combining the overlapping community structure, the self-characteristics of the reference sample points and the self-characteristics of the sample points to be identified to generate community characteristics of the community where the sample points to be identified are located;
executing variable derivation operation based on the community characteristics to generate new sample characteristics;
and inputting the new sample characteristics into a risk identification model to obtain the default probability of the sample point to be identified.
2. The credit risk identification method of claim 1 wherein the step of inputting the new sample characteristics into a risk identification model to obtain the probability of breach of the sample point to be identified further comprises, prior to the step of:
constructing the risk identification model; the method specifically comprises the following steps:
acquiring historical sample data of an overlapped training community, analyzing the historical sample data and acquiring training sample points; wherein the training sample points are sample points with default labels, and the default labels are known whether to default;
acquiring an overlapping community structure of the training sample points in the overlapping training community and self-training characteristics of the training sample points;
generating a community training feature of a community in which the overlapping community is located by combining the overlapping community structure of the training sample points and the self training feature of the training sample points;
executing variable derivation operation based on the community training characteristics to generate new sample training characteristics;
inputting the new sample training features and the default labels of the training sample points into a classification model for training, and generating the risk identification model.
3. The credit risk identification method of claim 1, wherein the obtaining of the characteristics of the sample points to be identified themselves comprises:
acquiring self data of the sample points to be identified;
and performing preset operation on the self data to generate self characteristics of the sample points to be identified.
4. The credit risk identification method according to claim 3, wherein said subjecting the self data to a preset operation to generate self characteristics of the sample points to be identified comprises:
executing preprocessing operation on the self data to generate preprocessed self data;
performing feature extraction operation on the preprocessed self data to generate initial features;
and performing variable derivation on the initial features to generate self features of the sample points to be identified.
5. The credit risk identification method of claim 3 wherein the self data includes at least one of: basic information of the sample points to be identified, historical credit information of the sample points to be identified and transaction information of the sample points to be identified.
6. The credit risk identification method according to claim 1, wherein the step of obtaining the overlapping community structure of the reference sample point and the sample point to be identified in the overlapping community, the self-feature of the reference sample point, and the self-feature of the sample point to be identified is preceded by the steps of:
and acquiring the overlapping communities of the sample points to be identified based on an overlapping community discovery algorithm.
7. The credit risk identification method of claim 1, wherein the method further comprises:
and sequencing the default probability of the sample points to be identified.
8. A credit risk identification device, characterized in that it comprises an acquisition feature module: the method comprises the steps of obtaining an overlapping community structure of the reference sample point and the sample point to be identified in an overlapping community, the self-characteristics of the reference sample point and the self-characteristics of the sample point to be identified;
a combination characteristic module: the community feature generation module is used for generating the community feature of the community where the sample point to be identified is located by combining the overlapping community structure, the self feature of the reference sample point and the self feature of the sample point to be identified;
the optimized feature generation module is used for executing variable derivation operation based on the community features to generate new sample features;
and the risk identification module is used for inputting the new sample characteristics into a risk identification model to obtain the default probability of the sample point to be identified.
9. A computer program product, characterized in that it comprises a computer program which, when executed by a processor, implements the steps of the credit risk identification method according to any one of claims 1 to 7.
10. A credit risk identification device comprising a memory, a processor and a credit risk identification method program stored on said memory and operable on said processor, said processor implementing the steps of the credit risk identification method of any of claims 1 to 7 when executing said credit risk identification method program.
Background
In the existing credit risk identification, only relevant information of a borrower is generally considered, the repayment capacity and the repayment willingness of the borrower are comprehensively evaluated from basic information, historical credit records, transaction information and the like of the borrower, and finally a credit score is output. The most common mode is to select a modeling sample, construct a derivative variable, construct a credit risk model through logistic regression, tree model, ensemble learning and other modes, and obtain the ordering of the default probability of the customer.
However, the credit risk has potential characteristics, the borrower can be influenced by aspects such as self income, consumption and market environment, and the repayment is uncertain. In addition, due to the asymmetry of the information, the historical credit information of the borrower that the bank can collect is incomplete and lagged. Therefore, it is difficult to accurately determine the long-term future repayment situation of the borrower only by the provided self-related information and the historical credit records when the borrower applies for the loan.
Disclosure of Invention
In view of this, embodiments of the present application provide a credit risk identification method, apparatus, device, and program, which aim to improve the accuracy of credit risk identification.
The embodiment of the application provides a credit risk identification method, which is applied to a community network containing overlapping communities, wherein the overlapping communities at least comprise reference sample points and sample points to be identified; the method comprises the following steps:
acquiring an overlapping community structure of the reference sample point and the sample point to be identified in an overlapping community, the self-characteristics of the reference sample point and the self-characteristics of the sample point to be identified;
combining the overlapping community structure, the self-characteristics of the reference sample points and the self-characteristics of the sample points to be identified to generate community characteristics of the community where the sample points to be identified are located;
executing variable derivation operation based on the community characteristics to generate new sample characteristics;
and inputting the new sample characteristics into a risk identification model to obtain the default probability of the sample point to be identified.
In an embodiment, before the step of inputting the new sample feature into a risk identification model and obtaining the default probability of the sample point to be identified, the method further includes:
constructing the risk identification model; the method specifically comprises the following steps:
acquiring historical sample data of an overlapped training community, analyzing the historical sample data and acquiring training sample points; wherein the training sample points are sample points with default labels, and the default labels are known whether to default;
acquiring an overlapping community structure of the training sample points in the overlapping training community and self-training characteristics of the training sample points;
generating a community training feature of a community in which the overlapping community is located by combining the overlapping community structure of the training sample points and the self training feature of the training sample points;
executing variable derivation operation based on the community training characteristics to generate new sample training characteristics;
inputting the new sample training features and the default labels of the training sample points into a classification model for training, and generating the risk identification model.
In an embodiment, the obtaining of the self-feature of the sample point to be identified includes:
acquiring self data of the sample points to be identified;
and performing preset operation on the self data to generate self characteristics of the sample points to be identified.
In an embodiment, the generating the self-characteristics of the sample points to be identified by performing a preset operation on the self-data includes:
executing preprocessing operation on the self data to generate preprocessed self data;
performing feature extraction operation on the preprocessed self data to generate initial features;
and performing variable derivation on the initial features to generate self features of the sample points to be identified.
In one embodiment, the self data includes at least one of: basic information of the sample points to be identified, historical credit information of the sample points to be identified and transaction information of the sample points to be identified.
In an embodiment, before the step of obtaining the overlapping community structure of the reference sample point and the sample point to be identified in the overlapping community, the self-feature of the reference sample point, and the self-feature of the sample point to be identified, the method further includes:
and acquiring the overlapping communities of the sample points to be identified based on an overlapping community discovery algorithm.
In an embodiment, the method further comprises:
and sequencing the default probability of the sample points to be identified.
In order to achieve the above object, there is also provided a credit risk identification apparatus including
A feature acquisition module: the method comprises the steps of obtaining an overlapping community structure of the reference sample point and the sample point to be identified in an overlapping community, the self-characteristics of the reference sample point and the self-characteristics of the sample point to be identified;
a combination characteristic module: the community feature generation module is used for generating the community feature of the community where the sample point to be identified is located by combining the overlapping community structure, the self feature of the reference sample point and the self feature of the sample point to be identified;
the optimized feature generation module is used for executing variable derivation operation based on the community features to generate new sample features;
and the risk identification module is used for inputting the new sample characteristics into a risk identification model to obtain the default probability of the sample point to be identified.
To achieve the above object, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the credit risk identification method of any one of the above.
To achieve the above object, there is also provided a credit risk identification device including a memory, a processor and a credit risk identification method program stored on the memory and operable on the processor, the processor implementing the steps of the credit risk identification method of any one of the above when executing the credit risk identification method program.
One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages: acquiring an overlapping community structure of the reference sample point and the sample point to be identified in an overlapping community, the self-characteristics of the reference sample point and the self-characteristics of the sample point to be identified; combining the overlapping community structure, the self-characteristics of the reference sample points and the self-characteristics of the sample points to be identified to generate community characteristics of the community where the sample points to be identified are located; the method comprises the steps of obtaining and combining an overlapped community structure of a reference sample point and a sample point to be identified in an overlapped community, the self-characteristics of the reference sample point and the self-characteristics of the sample point to be identified, and generating community characteristics of the community where the sample point to be identified is located after summarizing, wherein the community characteristics comprise the overlapped community structure, the self-characteristics of the reference sample point and the self-characteristics of the sample point to be identified, so that the generated community characteristics of the community where the sample point to be identified is located are more comprehensive and accurate.
Executing variable derivation operation based on the community characteristics to generate new sample characteristics; inputting the new sample characteristics into a risk identification model to obtain the default probability of the sample point to be identified; and converting the community characteristics into new sample characteristics through variable derivation operation, inputting the new sample characteristics into the risk identification model, and accurately obtaining the default probability of the sample points to be identified through calculation of the risk identification model. The invention improves the accuracy of credit risk identification.
Drawings
FIG. 1 is a first embodiment of the credit risk identification method of the present application;
FIG. 2 is a second embodiment of the credit risk identification method of the present application;
FIG. 3 is a detailed implementation of step S240 in the credit risk identification method according to the second embodiment of the present application;
FIG. 4 is a detailed implementation of step S110 in the first embodiment of the credit risk identification method of the present application;
FIG. 5 shows the detailed steps of step S112 of the credit risk identification method of the present application;
FIG. 6 is a third embodiment of the credit risk identification method of the present application;
FIG. 7 is a fourth embodiment of the credit risk identification method of the present application;
FIG. 8 is a schematic view of the credit risk identification means of the present application;
fig. 9 is a schematic diagram of the credit risk identification device of the present application.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The main solution of the embodiment of the invention is as follows: acquiring an overlapping community structure of the reference sample point and the sample point to be identified in an overlapping community, the self-characteristics of the reference sample point and the self-characteristics of the sample point to be identified; combining the overlapping community structure, the self-characteristics of the reference sample points and the self-characteristics of the sample points to be identified to generate community characteristics of the community where the sample points to be identified are located; executing variable derivation operation based on the community characteristics to generate new sample characteristics; and inputting the new sample characteristics into a risk identification model to obtain the default probability of the sample point to be identified. The invention improves the accuracy of credit risk identification.
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.
Complex network community: the complex network is an abstraction of the complex system, nodes in the network are individuals in the complex system, and edges between the nodes are a relationship naturally formed or artificially constructed according to a certain rule between the individuals in the system.
Overlapping communities: the overlapped communities are sets of nodes in a network, the nodes in the communities belong to a plurality of different communities at the same time, the connection among the nodes in the communities is tight, the connection among the nodes belonging to different communities is sparse, and the communities are called the overlapped communities.
Credit risk: credit risk refers to the risk of a counterparty not fulfilling an expired debt. The credit risk is also called default risk, which means the possibility that a borrower, a security issuer or a transaction counterpart will suffer loss due to the fact that the borrower, the security issuer or the transaction counterpart are unwilling or unable to fulfill contract conditions for various reasons.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.
Referring to fig. 1, fig. 1 is a first embodiment of the credit risk identification method applied to a community network including overlapping communities, where the overlapping communities at least include a reference sample point and a sample point to be identified; the method comprises the following steps:
step S110: and acquiring an overlapping community structure of the reference sample point and the sample point to be identified in an overlapping community, the self-characteristics of the reference sample point and the self-characteristics of the sample point to be identified.
Specifically, the overlapping community structure of the reference sample point and the sample point to be identified in the overlapping community may be a relationship and a structural feature between the reference sample point and the sample point to be identified in the overlapping community; the self-feature of the reference sample point may be a data feature contained in the reference sample point itself; the self-feature of the sample point to be identified may be a data feature contained in the sample point to be identified itself. The reference sample points can be all the sample points except the sample point to be identified in the overlapping community; or may be a part of the sample points in the overlapping community except the sample point to be identified, which is not limited herein. In addition, the structure of the overlapping community, the self-characteristics of the reference sample point and the self-characteristics of the sample point to be identified are obtained from the current state of the overlapping community to be tested.
Step S120: and generating the community characteristics of the community where the sample points to be identified are located by combining the overlapping community structure, the self characteristics of the reference sample points and the self characteristics of the sample points to be identified.
Specifically, the overlapping community structure, the self-characteristics of the reference sample point, and the self-characteristics of the sample point to be identified may be combined, so as to generate a description of the community characteristics of the community to which the sample point to be identified belongs; one sample point to be identified can belong to a plurality of communities, namely the sample point to be identified can be contained in the community A or the community B, and the community A &' B is an overlapping community; in addition, the number of sample points to be identified included in the overlapping communities may also be plural.
Step S130: and executing variable derivation operation based on the community characteristics to generate new sample characteristics.
In particular, variable derivation is the maximum extraction of features from raw data, finding features that may have a significant effect on decision targets for use by algorithms and models. The variable derivation mixes professional field knowledge, objective intuition and algorithm logic, derives more variables based on the original data, can more meticulous description target's characteristics or action. In the embodiment, the variable derivation operation is performed to extract features from community features to the maximum extent so as to generate new sample features.
When variable derivation operation is performed on community characteristics, the community characteristics are firstly analyzed into a standard table and divided into a static information standard table and a dynamic information standard table according to recorded characteristics; secondly, on the basis of the standard table, processing is carried out aiming at the data items of the standard table, namely, business logic is considered, algorithm logic is considered at the same time, and various derivative variables are covered as much as possible. In the derivation process, the category variables can be reduced, such as the original variables: primary school, junior middle school, high school, college, this department, research students, doctor research students; correspondingly, the number of the Chinese characters is reduced to be higher than or equal to that of the senior high school, college department and students above; deriving three categories from the original information, high school and below, college department and researchers; or numerical variable binning, such as continuous variable with original variable of 18-60; the number of the first electrodes is 18 to 25, 26 to 35, 36 to 45 and more than 46; deriving numerical variable information of 18-25, 26-35, 36-45 and more than 46 from the original information; in this embodiment, cross-derivation may be performed on the numerical variables and the category variables to generate more derived variables.
Step S140: and inputting the new sample characteristics into a risk identification model to obtain the default probability of the sample point to be identified.
In particular, the risk identification model may be a classification model, in particular a neural network based classification model; inputting the new sample characteristics into the risk identification model, so that the exact default probability of the sample points to be identified can be obtained; the higher the default probability of the sample points to be identified is, the higher the default risk of the user corresponding to the sample points to be identified is, and the actual credit borrowing process needs to be monitored at any time so as to reduce the financial risk.
In the above embodiment, there are advantageous effects of: acquiring an overlapping community structure of the reference sample point and the sample point to be identified in an overlapping community, the self-characteristics of the reference sample point and the self-characteristics of the sample point to be identified; combining the overlapping community structure, the self-characteristics of the reference sample points and the self-characteristics of the sample points to be identified to generate community characteristics of the community where the sample points to be identified are located; the community feature of the community where the sample points to be identified are located is generated after the community feature is collected by acquiring and combining the overlapping community structure of each sample point in the overlapping community, the self feature of the reference sample point and the self feature of the sample points to be identified, wherein the community feature comprises the overlapping community structure, the self feature of the reference sample point and the self feature of the sample points to be identified, so that the generated community feature of the community where the sample points to be identified are located is more comprehensive and accurate.
Executing variable derivation operation based on the community characteristics to generate new sample characteristics; inputting the new sample characteristics into a risk identification model to obtain the default probability of the sample point to be identified; and converting the community characteristics into new sample characteristics through variable derivation operation, inputting the new sample characteristics into the risk identification model, and accurately obtaining the default probability of the sample points to be identified through calculation of the risk identification model. The invention improves the accuracy of credit risk identification.
Referring to fig. 2, fig. 2 is a second embodiment of the credit risk identification method of the present application, wherein the step of inputting the new sample characteristics into the risk identification model and obtaining the default probability of the sample point to be identified includes:
step S210: and acquiring an overlapping community structure of the reference sample point and the sample point to be identified in an overlapping community, the self-characteristics of the reference sample point and the self-characteristics of the sample point to be identified.
Step S220: and generating the community characteristics of the community where the sample points to be identified are located by combining the overlapping community structure, the self characteristics of the reference sample points and the self characteristics of the sample points to be identified.
Step S230: and executing variable derivation operation based on the community characteristics to generate new sample characteristics.
Step S240: and constructing the risk identification model.
Specifically, the risk identification model may be a classification model, wherein a classification algorithm used by the classification model may specifically be an NBC (Naive Bayesian classification) algorithm, an LR (Logistic regression) algorithm, an ID3(Iterative dichotomier 3 generation) decision tree algorithm, a C4.5 decision tree algorithm, a C5.0 decision tree algorithm, an SVM (Support Vector Machine) algorithm, a KNN (K-Nearest Neighbor) algorithm, an ANN (Artificial Neural Network) algorithm, and the like.
Step S250: and inputting the new sample characteristics into a risk identification model to obtain the default probability of the sample point to be identified.
Compared with the first embodiment, the second embodiment specifically includes step S240, and other steps have already been described in the first embodiment, and are not described herein again.
In the above embodiment, there are advantageous effects of: through constructing an accurate risk identification model, the default probability of the sample points to be identified can be further improved, and accurate risk management and control are carried out on users with high default probability in advance.
Referring to fig. 3, fig. 3 is a specific implementation step of step S240 in the second embodiment of the credit risk identification method of the present application, where the building of the risk identification model specifically includes:
step S241: acquiring historical sample data of an overlapped training community, analyzing the historical sample data and acquiring training sample points; wherein the training sample points are sample points with default labels, and the default labels are known whether to default;
specifically, the historical sample data in the overlapping training communities may be historical data in the overlapping training communities; and screening the sample points with default labels from the historical sample data of the overlapped communities to be used as training sample points. The overlapping training community is not limited to the overlapping community used in the prediction of the present invention, and may be another overlapping community.
Specifically, the training sample point is a sample point with a default label and the default label is known whether to default, which may specifically be: if the training sample points violate, the value of the violation label is 1; if the training sample does not violate, the value of the violation label is 0.
Step S242: acquiring an overlapping community structure of the training sample points in the overlapping training community and self-training characteristics of the training sample points;
specifically, the overlapping community structure of the training sample points may be a characteristic relationship and a structural relationship between training sample points in the overlapping training community.
Specifically, training data is input into the classification model for training, and a trained risk recognition model, that is, a risk recognition model, is generated through operations such as back propagation of errors and parameter tuning.
It should be further noted that, in an embodiment, the obtained overlapping community structure of the reference sample point and the training sample point in the overlapping community, the self training feature of the training sample point, and the default label of the training sample point are input into a classification model for training, and a risk identification model may also be generated.
Step S243: and generating the community training characteristics of the community in which the overlapping community is located by combining the overlapping community structure of the training sample points and the self training characteristics of the training sample points.
Specifically, the overlapping community structure of the training sample points and the training characteristics of the training sample points are combined to generate the community training characteristics of the community in which the overlapping community belongs, wherein the community to which the overlapping community belongs may include a plurality of communities.
Step S244: and executing variable derivation operation based on the community training characteristics to generate new sample training characteristics.
Specifically, variable derivation operation is performed on the community training features again, namely useful features can be mined again in the community training features to serve as new sample features, and the data volume of the new sample training features is increased, so that the generated risk identification model is more accurate.
Step S245: inputting the new sample training features and the default labels of the training sample points into a classification model for training, and generating the risk identification model.
Specifically, new sample training features and default labels of training sample points are used as training features to be input into a classification model, wherein the default labels of the training sample points are used as supervision data to be input into the classification model, and a risk identification model is generated; and when the tested sample characteristics are input into the generated risk identification model, correspondingly outputting the default probability of the sample points to be identified.
In the above embodiment, there are advantageous effects: specifically, the specific steps of constructing the risk identification model are given, and new sample training characteristics are added on the basis that the default labels of the training sample points are used as supervision data, so that the prediction effect of the risk identification model is guaranteed, and the accuracy of calculating the default probability of the sample points to be identified is improved.
Referring to fig. 4, fig. 4 is a specific implementation step of step S110 in the first embodiment of the credit risk identification method of the present application, where the obtaining of the self-characteristics of the sample point to be identified includes:
step S111: and acquiring self data of the sample points to be identified.
Specifically, in one embodiment, the self data includes at least one of: basic information of the sample points to be identified, historical credit information of the sample points to be identified and transaction information of the sample points to be identified. The available self data includes but is not limited to user basic information, self borrowing and repayment behaviors of the mechanism, data of a third party mechanism and the like. It should be noted that the more information the self-data contains, the higher the accuracy of the risk recognition model generated by training.
Step S112: and performing preset operation on the self data to generate self characteristics of the sample points to be identified.
Specifically, the preset operation may be preprocessing, feature extraction, and variable derivation; where the pre-processing may include data cleansing and normalization operations.
In the above embodiment, there are advantageous effects of: the correctness and comprehensiveness of converting self data of the sample points to be recognized into self characteristics are guaranteed, and therefore the effectiveness of the risk recognition model is guaranteed.
Referring to fig. 5, fig. 5 is a specific implementation step of step S112 of the credit risk identification method of the present application, where the generating of the self characteristics of the sample point to be identified by performing the preset operation on the self data includes:
step S1121: and executing preprocessing operation on the self data to generate preprocessed self data.
In particular, pre-processing may include data cleansing and normalization; among them, Data cleansing (Data cleansing) is a process of re-examining and verifying Data, aiming at deleting duplicate information, correcting existing errors, and providing Data consistency. Normalization may be to format the data and normalize it according to the risk recognition model input format so that the generated features can be input into the risk model for prediction or training.
Step S1122: and performing feature extraction operation on the preprocessed self data to generate initial features.
Specifically, the performing of the feature extraction operation on the preprocessed self data may be performing key data extraction on the preprocessed self data to generate the initial feature.
Step S1123: and performing variable derivation on the initial features to generate self features of the sample points to be identified.
Specifically, variable derivation is performed on the initial features, so that the initial features are expanded in number and range, and the generated self features of the sample points to be identified are more comprehensive and specific.
In the above embodiment, there are advantageous effects of: the self characteristics of the sample points to be recognized generated through variable derivation operation are more comprehensive and specific, the training effect of the risk recognition model is guaranteed, and the accuracy of the default probability of the sample points to be recognized is improved.
Referring to fig. 6, fig. 6 is a third embodiment of the credit risk identification method of the present application, where the step of obtaining the overlapping community structure of the reference sample point and the sample point to be identified in the overlapping community, the self-feature of the reference sample point, and the self-feature of the sample point to be identified includes:
step S310: and acquiring the overlapping communities of the sample points to be identified based on an overlapping community discovery algorithm.
Specifically, the overlapping community discovery algorithm may be a pedigree filtering based method, an edge division based method, a seed node based expansion method, a label propagation based method, and a fuzzy clustering based method; in the embodiment, a method based on tag propagation (COPRA algorithm) is used, and the algorithm is mainly based on tag propagation, so that each node carries two tag sets, one is a community number to which the node may belong, and the other is a corresponding community membership coefficient, then a community with a membership coefficient smaller than a preset threshold is deleted in each iteration, and normalization processing is performed to ensure that the sum of the membership coefficients of each community of the node is 1.
bt(c, x) represents the membership coefficient of a node x to the community c at t iterations, and N (x) is the set of neighbor nodes of the node x. And finally, when the iteration is terminated, counting the community labels of all the nodes to find the communities and the overlapped nodes to which all the nodes belong.
Step S320: and acquiring an overlapping community structure of the reference sample point and the sample point to be identified in an overlapping community, the self-characteristics of the reference sample point and the self-characteristics of the sample point to be identified.
Step S330: and generating the community characteristics of the community where the sample points to be identified are located by combining the overlapping community structure, the self characteristics of the reference sample points and the self characteristics of the sample points to be identified.
Step S340: and executing variable derivation operation based on the community characteristics to generate new sample characteristics.
Step S350: and inputting the new sample characteristics into a risk identification model to obtain the default probability of the sample point to be identified.
Compared with the first embodiment, the third embodiment includes step S310, and other steps have already been described in the first embodiment, and are not described herein again.
In the above embodiment, there are advantageous effects: by means of the overlapping community discovery algorithm, the overlapping community where the sample to be identified is located is accurately acquired, and the accuracy of acquiring the overlapping community structure is guaranteed, so that the accuracy of calculating the default probability of the sample point to be identified is guaranteed.
Referring to fig. 7, fig. 7 shows a fourth embodiment of the credit risk identification method of the present application, which further includes:
step S410: and acquiring an overlapping community structure of the reference sample point and the sample point to be identified in an overlapping community, the self-characteristics of the reference sample point and the self-characteristics of the sample point to be identified.
Step S420: and generating the community characteristics of the community where the sample points to be identified are located by combining the overlapping community structure, the self characteristics of the reference sample points and the self characteristics of the sample points to be identified.
Step S430: and executing variable derivation operation based on the community characteristics to generate new sample characteristics.
Step S440: and inputting the new sample characteristics into a risk identification model to obtain the default probability of the sample point to be identified.
Step S450: and sequencing the default probability of the sample points to be identified.
Specifically, the default probabilities of the sample points to be identified are sequenced from large to small, and the sample points to be identified in the preset number in the sequence are tracked and monitored, so that the early warning effect is achieved, and the credit risk is better reduced.
Compared with the first embodiment, the fourth embodiment specifically includes step S450, and other steps have already been described in the first embodiment, and are not described herein again.
In the above embodiment, there are advantageous effects of: by sequencing the default probabilities of the sample points to be identified, the high-risk sample points to be identified are pertinently and dynamically monitored in real time, the credit risk is reduced, and the financial safety is guaranteed.
The application also provides a credit risk identification device, which comprises
A feature acquisition module: the method comprises the steps of obtaining an overlapping community structure of the reference sample point and the sample point to be identified in an overlapping community, the self-characteristics of the reference sample point and the self-characteristics of the sample point to be identified;
a combination characteristic module: the community feature generation module is used for generating the community feature of the community where the sample point to be identified is located by combining the overlapping community structure, the self feature of the reference sample point and the self feature of the sample point to be identified;
the optimized feature generation module is used for executing variable derivation operation based on the community features to generate new sample features;
and the risk identification module is used for inputting the new sample characteristics into a risk identification model to obtain the default probability of the sample point to be identified.
The apparatus 20 shown in fig. 8 includes an acquiring feature module 21, a combining feature module 22, an optimized feature generating module 23, and a risk identifying module 24, and the apparatus may perform the method of the embodiment shown in fig. 1 to 7, and reference may be made to the related description of the embodiment shown in fig. 1 to 7 for a part not described in detail in this embodiment. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 1 to fig. 7, and are not described herein again.
The present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the credit risk identification method as claimed in any one of the above.
The present application further provides a computer storage medium having stored thereon a credit risk identification method program that, when executed by a processor, performs the steps of any of the above-described credit risk identification methods.
The application also provides credit risk identification equipment, which comprises a memory, a processor and a credit risk identification method program which is stored on the memory and can run on the processor, wherein the processor realizes any step of the credit risk identification method when executing the credit risk identification method program.
The present application relates to a credit risk identification device 010 comprising as shown in fig. 9: at least one processor 012, memory 011.
The processor 012 may be an integrated circuit chip having signal processing capability. In implementation, the steps of the method may be performed by hardware integrated logic circuits or instructions in the form of software in the processor 012. The processor 012 may be a general-purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 011, and the processor 012 reads the information in the memory 011 and completes the steps of the method in combination with the hardware.
It is to be understood that the memory 011 in embodiments of the present invention can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double data rate Synchronous Dynamic random access memory (ddr DRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 011 of the systems and methods described in connection with the embodiments of the invention is intended to comprise, without being limited to, these and any other suitable types of memory.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.