Method for mining maximum signed theta cluster in symbol network

文档序号:9163 发布日期:2021-09-17 浏览:50次 中文

1. A method for mining a maximum signed theta group in a symbol network is characterized in that the maximum signed theta group needs to satisfy four conditions: the difference between the number of positive neighbors and the number of negative neighbors of any vertex is required to be more than or equal to theta; is extremely large, i.e., any of its hypergraphs is not a signed θ blob; the number of points is the largest; is a blob, i.e., each pair of vertices in the blob is adjacent; the method comprises the following steps:

unnecessary points and edges in the graph G are filtered through three pruning strategies, including:

introduction 1: vertex-based pruning rules: for a vertex u in a signed theta group, the number of positive neighbors of u is required to be more than or equal to theta;

2, leading: edge-based pruning rules: for two vertexes u and v in the signed theta group S, the sides (u, v) connected with each other are regular sides, and the sides are at least contained in (theta-1) regular triangles which are triangles with all three sides being regular sides;

and 3, introduction: edge-based pruning rules: for two vertexes u and v in the signed θ blob S, the side (u, v) connected with the vertexes u and v is a negative side, and the side is at least contained in (θ +1) negative triangles, wherein the negative triangles are triangles with two sides being positive sides and the other side being negative sides;

rapidly mining a maximum signed theta group on a large symbol network by a maximum signed theta group greedy heuristic algorithm, comprising:

step one, filtering vertexes which do not meet conditions in the graph G according to the theorem 1 to obtain a new graph G';

step two, judging the number of the positive triangles in which the positive edges are positioned in the graph G ' according to the theorem 2, judging the number of the negative triangles in which the negative edges are positioned in the graph G ' according to the theorem 3, and deleting the edges which do not meet the conditions from the graph G ', thereby obtaining a new graph A;

removing isolated points in the graph A;

coloring the points in the graph A greedily by using a coloring algorithm;

and step five, finding the maximum signed theta group MaxC, namely the signed network subgraph to be mined according to a recursive algorithm in the graph A colored in the step four.

2. The method of mining the largest signed θ clique in a symbol network of claim 1, wherein said step four comprises: iteratively, the vertex with the largest number of neighbors is selected from graph a and assigned a color, and if the currently selected vertex is adjacent to a colored vertex, it is assigned a color different from its neighboring vertices.

3. The method as claimed in claim 1, wherein in the fifth step, a recursive algorithm is called to find whether there is a symbol θ clique in the branch, the recursive algorithm receives two input parameters { S, C }, where S is a temporary result set, initially an empty set, C is a candidate point set to be added to S, initially a graph a, and initializes a final result MaxC to an empty set, and the implementation process of the recursive algorithm includes the following sub-steps:

(a) if the sum of the number of points in the set S and the number of color types of colored points in C is less than the maximum of | MaxC | +1 and θ +1, then 0 is returned and step (d) is performed, otherwise (b) is performed, where | MaxC | represents the number of vertices in MaxC;

(b) if C is an empty set, for any point u in S, if the difference between the number of positive neighbors and the number of negative neighbors is larger than or equal to theta, updating MaxC to S and returning to 1, otherwise returning to 0 and executing the step (d);

(c) if C is not an empty set, for a certain point u in S, if the sum of the number of positive neighbors of u and the number of color types of the point colored by the positive neighbors of u is less than the sum of the number of negative neighbors of u and theta, returning to 0, and executing step (d);

(d) setting a flag to be 0, sequencing each point in the C in an ascending order according to the number of neighbors owned by the point, and traversing the sequenced points in sequence after finishing the sequencing in the ascending order; for the traversed point v, updating an input parameter S to be a point v added in S, updating an input parameter C to be a neighbor set of the point v in C, returning to execute the steps (a) - (C), updating a flag to be a current flag value, adding the returned result to the current flag value, removing v from C, and executing the steps (e) - (f);

(e) if flag is equal to 0, for any point u in S, if the difference between the number of positive neighbors and the number of negative neighbors of u is less than theta, returning to 1, and executing step (d), otherwise executing step (f);

(f) and (4) if the number of the points in the S is greater than the | MaxC |, updating the MaxC to the S and returning to 1, otherwise, returning to 0 and executing the step (d).

4. A computer device comprising a memory and a processor, the memory having stored therein computer-readable instructions that, when executed by the processor, cause the processor to perform the method of any of claims 1-3.

5. A storage medium storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the method of any one of claims 1-3.

Background

In real life, people's lives are not separated from social networks, and social networks such as microblogs, facebooks, twitter and the like play a vital role in our lives. With the rapid development of internet and world wide web technologies, researchers have grown their interest in social networking research, which has contributed greatly to mining dense subgraphs in social networks. Many dense subgraph models have been proposed in current research, such as k-nuclei, k-trusses, and blobs. These models play a great role in protein structure prediction, error correction coding, network topology analysis, computer vision, combinatorial auctions. However, in the current research, most studies only consider unsigned social networks, i.e. consider all relationships as forward relationships, and all users as friends. However, the interaction between users involves a connection of positive (e.g., friends) and negative (e.g., enemies). In large social networks, both positive and negative relationships should exist, and if negative relationships are ignored, it may not be possible to correctly mine dense subgraphs in a signed network. For the theta blob, it places a constraint on the number of positive and negative neighbors of each vertex in the dense subgraph to ensure that the vertex has a greater number of positive neighbors than negative neighbors. And the maximum signed theta group in the signed network is mined, so that the most important characteristics in the social network can be more clearly known, and meanwhile, the signed network is greatly helped to be maintained. However, in the existing research, the existing efficiency for mining the maximum signed θ cliques in the symbol network is not high.

Disclosure of Invention

For a symbol network, there may be friends and enemy relationships between users, and the relationships between users may significantly affect the stability of the symbol network. The invention proposes a new model based on a symbolic network, namely a maximum signed theta-blob, which satisfies four conditions: the difference between the number of positive neighbors and the number of negative neighbors of any vertex is required to be more than or equal to theta; is extremely large, i.e., any of its hypergraphs is not a signed θ blob; the number of points is the largest; is a blob, i.e., each pair of vertices in the blob is adjacent.

Considering the property of the theta-blob, the present invention proposes a new pruning strategy to more effectively reduce the size of the candidate set. Meanwhile, the invention develops an efficient MSCD algorithm by combining a new pruning strategy, thereby being capable of rapidly excavating the maximum signed theta group on a large-scale symbol network.

The purpose of the invention is realized by the following technical scheme: a method of mining a maximum signed theta-blob in a network of symbols, the method comprising:

unnecessary points and edges in the graph G are filtered through three pruning strategies, including:

introduction 1: vertex-based pruning rules: for a vertex u in a signed theta group, the number of positive neighbors of u is required to be more than or equal to theta;

2, leading: edge-based pruning rules: for two vertexes u and v in the signed theta group S, the sides (u, v) connected with each other are regular sides, and the sides are at least contained in (theta-1) regular triangles which are triangles with all three sides being regular sides;

and 3, introduction: edge-based pruning rules: for two vertexes u and v in the signed θ blob S, the side (u, v) connected with the vertexes u and v is a negative side, and the side is at least contained in (θ +1) negative triangles, wherein the negative triangles are triangles with two sides being positive sides and the other side being negative sides;

rapidly mining a maximum signed theta group on a large symbol network by a maximum signed theta group greedy heuristic algorithm, comprising:

step one, filtering vertexes which do not meet conditions in the graph G according to the theorem 1 to obtain a new graph G';

step two, judging the number of the positive triangles in which the positive edges are positioned in the graph G ' according to the theorem 2, judging the number of the negative triangles in which the negative edges are positioned in the graph G ' according to the theorem 3, and deleting the edges which do not meet the conditions from the graph G ', thereby obtaining a new graph A;

removing isolated points in the graph A;

coloring the points in the graph A greedily by using a coloring algorithm;

and step five, finding the maximum signed theta group MaxC, namely the signed network subgraph to be mined according to a recursive algorithm in the graph A colored in the step four.

Further, the fourth step includes: iteratively, the vertex with the largest number of neighbors is selected from graph a and assigned a color, and if the currently selected vertex is adjacent to a colored vertex, it is assigned a color different from its neighboring vertices.

Further, in the fifth step, a recursive algorithm is called to find whether a symbol θ clique exists in a branch, the recursive algorithm receives two input parameters { S, C }, wherein S is a temporary result set and is initially an empty set, C is a candidate point set to be added to S and is initially a graph a, and a final result MaxC is initialized to be an empty set, and a specific implementation process of the recursive algorithm includes the following sub-steps:

(a) if the sum of the number of points in the set S and the number of color types of colored points in C is less than the maximum of | MaxC | +1 and θ +1, then 0 is returned and step (d) is performed, otherwise (b) is performed, where | MaxC | represents the number of vertices in MaxC;

(b) if C is an empty set, for any point u in S, if the difference between the number of positive neighbors and the number of negative neighbors is larger than or equal to theta, updating MaxC to S and returning to 1, otherwise returning to 0 and executing the step (d);

(c) if C is not an empty set, for a certain point u in S, if the sum of the number of positive neighbors of u and the number of color types of the point colored by the positive neighbors of u is less than the sum of the number of negative neighbors of u and theta, returning to 0, and executing step (d);

(d) setting a flag to be 0, sequencing each point in the C in an ascending order according to the number of neighbors owned by the point, and traversing the sequenced points in sequence after finishing the sequencing in the ascending order; for the traversed point v, updating an input parameter S to be a point v added in S, updating an input parameter C to be a neighbor set of the point v in C, returning to execute the steps (a) - (C), updating a flag to be a current flag value, adding the returned result to the current flag value, removing v from C, and executing the steps (e) - (f);

(e) if flag is equal to 0, for any point u in S, if the difference between the number of positive neighbors and the number of negative neighbors of u is less than theta, returning to 1, and executing step (d), otherwise executing step (f);

(f) and (4) if the number of the points in the S is greater than the | MaxC |, updating the MaxC to the S and returning to 1, otherwise, returning to 0 and executing the step (d).

The invention also provides a computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the above method of mining a maximum signed θ clique in a network of symbols.

The present invention also provides a storage medium having stored thereon computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the above-described method of mining a maximum signed θ clique in a network of symbols.

The invention has the beneficial effects that: for a symbol network, there may be friends and enemy relationships between users, and the relationships between users may significantly affect the stability of the symbol network. The invention proposes a new model based on a symbolic network, namely a maximum signed theta-blob, which satisfies four conditions: the difference between the number of positive neighbors and the number of negative neighbors of any vertex is required to be more than or equal to theta; is extremely large, i.e., any of its hypergraphs is not a signed θ blob; the number of points is the largest; is a blob, i.e., each pair of vertices in the blob is adjacent. Considering the property of the theta-blob, the present invention proposes a new pruning strategy to more effectively reduce the size of the candidate set. Meanwhile, the invention develops an efficient MSCD algorithm by combining a new pruning strategy, thereby being capable of rapidly excavating the maximum signed theta group on a large-scale symbol network. Therefore, the application of the method for mining the maximum signed theta group in the symbolic network has great benefits on the mining of the dense subgraphs and the prediction of the stability of the relationship between the symbolic networks.

Drawings

Fig. 1 is a flowchart of a method for mining a maximum signed θ clique in a symbolic network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an original symbol network provided by an embodiment of the present invention;

fig. 3 is a schematic diagram of a maximum signed θ clique mined from an original symbol network according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

The method for mining the maximum signed theta group in the symbol network comprises three novel pruning strategies and an efficient algorithm (MSCD algorithm) for mining the maximum signed theta group in the symbol network. The implementation of each part is described in detail below.

Three novel pruning strategies, which are directed to filtering unnecessary points and edges in the graph G, thereby significantly reducing the search space, specifically include the following:

introduction 1: vertex-based pruning rules: for a vertex u in a signed theta group, the number of positive neighbors of u is required to be more than or equal to theta;

and (3) proving that: according to the definition of the maximum signed theta group, the difference between the positive neighbor number and the negative neighbor number of any vertex needs to be larger than or equal to theta, so that the positive neighbor number of any vertex is always larger than or equal to theta. The theory is led to obtain the evidence.

2, leading: edge-based pruning rules: for two vertexes u and v in the signed theta group S, the sides (u, v) connected with each other are regular sides, and the sides are at least contained in (theta-1) regular triangles which are triangles with all three sides being regular sides;

for convenience of demonstration, we note the number of positive neighbors of point u in the signed θ clique S as dS+(u) negative neighbor number is denoted dS-(u) while the number of vertices in S is denoted as | S |.

And (3) proving that: since S is a signed θ group, we can get dS+(u)+dS-(u) +1 ═ S |, and dS+(u)-dS-(u) is equal to or greater than theta. Thus, we can get dS+(u) is not less than (| S | + theta-1)/2. Meanwhile, we define S' as the removal points u and v in S. For vertex u, the positive neighbor number dS 'of u at S'+(u) is equal to dS+(u) -1; for vertex v, v is the negative neighbor number dS 'of S'-(v) Is | S | -1-dS+(v) In that respect The number of smallest positive triangles containing an edge (u, v) occurs in the extreme case, i.e. the negative neighbors of v in S 'are covered by the positive neighbors of u in S'. Thus, if the positive neighbors of u not covered by the negative neighbors of v are their common positive neighbors, i.e., they form a regular triangle, then the number of regular triangles containing an edge (u, v) is not less than dS'+(u)-dS'-(v) In that respect Therefore, the number of regular triangles including side (u, v) in S is required to be equal to or greater than dS'+(u)-dS'-(v) In that respect And dS'+(u)-dS'-(v) Is equal to (dS)+(u)–1)–(|S|-1-dS+(v) ). Likewise, (dS)+(u)–1)–(|S|-1-dS+(v) Is equal to dS+(u)+dS+(v) - | S |, and dS+(u)+dS+(v) - | S | is equal to or greater than 2 × (| S | + theta-1)/2- | S |, 2 × (| S | + theta-1)/2- | S | is equal to theta-1. The theory is led to obtain the evidence.

And 3, introduction: edge-based pruning rules: for two vertexes u and v in the signed θ blob S, the side (u, v) connected with the vertexes u and v is a negative side, and the side is at least contained in (θ +1) negative triangles, wherein the negative triangles are triangles with two sides being positive sides and the other side being negative sides;

and (3) proving that: similar to the lemma 2 proof method, we can get dS since S is a signed θ blob+(u)=dS+(v) More than or equal to (theta + | S | -1)/2. Meanwhile, we define S' as the removal points u and v in S. For vertex u, the positive neighbor number dS 'of u at S'+(u) is equal to dS+(u). For vertex v, v is the negative neighbor number dS 'of S'-(v) Is dS-(v) -1. And dS-(v) -1 equals | S | -dS+(v) -2. The number of negative triangles including side (u, v) is required to be equal to or greater than dS'+(u)-dS'-(v) In that respect Therefore, in SThe number of negative triangles including the side (u, v) is required to be greater than or equal to dS+(u)–(|S|-2-dS+(v) ). And dS+(u)–(|S|-2-dS+(v) Is not less than 2 x (| S | + theta-1)/2- | S | + 2. Similarly, 2 × (| S | + θ -1)/2- | S | +2 is equal to θ + 1. The theory is led to obtain the evidence.

An algorithm (MSCD algorithm) for efficiently mining a maximum signed θ blob in a symbolic network is based on the above three novel pruning strategies, as shown in fig. 1, and specifically includes the following steps:

step one, filtering the top points which do not meet the conditions in the graph G according to the theorem 1 to obtain a new graph G'.

And step two, judging the number of the positive triangles in which the positive edges are positioned in the graph G ' according to the theorem 2, judging the number of the negative triangles in which the negative edges are positioned in the graph G ' according to the theorem 3, and deleting the edges which do not meet the conditions from the graph G ', thereby obtaining a new graph A.

And step three, removing isolated points in the graph A.

Coloring the points in the graph A greedily by using a coloring algorithm; specifically, the method comprises the following steps: iteratively, the vertex with the largest number of neighbors is selected from graph a and assigned a color, and if the currently selected vertex is adjacent to a colored vertex, it is assigned a color different from its neighboring vertices.

And step five, finding the maximum signed theta group MaxC, namely the signed network subgraph to be mined according to a recursive algorithm in the graph A colored in the step four. Specifically, a recursive algorithm is called to find whether a symbol theta group exists in a branch, the recursive algorithm receives two input parameters { S, C }, wherein S is a temporary result set and is initially an empty set, C is a candidate point set to be added into S and is initially a graph A, and a final result MaxC is initialized to be an empty set, and the recursive algorithm is specifically implemented as follows:

(a) if the sum of the number of points in the set S and the number of color types of colored points in C is less than the maximum of | MaxC | +1 and θ +1, then 0 is returned and step (d) is performed, otherwise (b) is performed, where | MaxC | represents the number of vertices in MaxC;

(b) if C is an empty set, for any point u in S, if the difference between the number of positive neighbors and the number of negative neighbors is larger than or equal to theta, updating MaxC to S and returning to 1, otherwise returning to 0 and executing the step (d);

(c) if C is not an empty set, for a certain point u in S, if the sum of the number of positive neighbors of u and the number of color types of the point colored by the positive neighbors of u is less than the sum of the number of negative neighbors of u and theta, returning to 0, and executing step (d);

(d) setting a flag to be 0, sequencing each point in the C in an ascending order according to the number of neighbors owned by the point, and traversing the sequenced points in sequence after finishing the sequencing in the ascending order; for the traversed point v, updating an input parameter S to be a point v added in S, updating an input parameter C to be a neighbor set of the point v in C, returning to execute the steps (a) - (C), updating a flag to be a current flag value, adding the returned result to the current flag value, removing v from C, and executing the steps (e) - (f);

(e) if flag is equal to 0, for any point u in S, if the difference between the number of positive neighbors and the number of negative neighbors of u is less than theta, returning to 1, and executing step (d), otherwise executing step (f);

(f) and (4) if the number of the points in the S is greater than the | MaxC |, updating the MaxC to the S and returning to 1, otherwise, returning to 0 and executing the step (d).

FIG. 2 is a schematic diagram of a primitive symbol network provided by an embodiment of the present invention, wherein a solid line indicates a friendship between two vertices and a dashed line indicates an enemy relationship between two vertices. If the input theta is 2, the maximum signed theta cluster mined from the original symbol network provided according to the embodiment of the present invention is { point 1, point 2, point 3, point 4, point 5}, as shown in fig. 3.

In addition, the present invention performed extensive experiments on eight real-world social networks to evaluate the effectiveness and efficiency of the proposed method. To evaluate the performance of the proposed method, we performed experiments by varying the parameter θ. The invention uses the time consumed by the algorithm to measure the high efficiency of the proposed method. For each setting, the invention was run 10 times and averaged. All procedures were implemented in standard c + +, and all experiments were performed on a PC equipped with an Intel i5-9600KF 3.7GHz CPU and 64GB RAM main memory. Experiments show that the method provided by the invention is 118 times faster than the basic greedy algorithm.

In one embodiment, a computer device is provided, which includes a memory and a processor, the memory storing computer readable instructions, when executed by the processor, cause the processor to perform the steps of one of the above-mentioned methods for mining a maximum signed θ clique in a network of symbols.

In one embodiment, a storage medium storing computer readable instructions is provided, which when executed by one or more processors, cause the one or more processors to perform the steps of one of the above-described embodiments of a method for mining a maximum signed θ clique in a network of symbols. The storage medium may be a nonvolatile storage medium.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

The foregoing is only a preferred embodiment of the present invention, and although the present invention has been disclosed in the preferred embodiments, it is not intended to limit the present invention. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:一种基于智能事件处理的集控站设备监控系统及控制方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!