Distributed ADMM machine learning method of self-adaptive network topology

文档序号:8743 发布日期:2021-09-17 浏览:21次 中文

1. A distributed ADMM machine learning method of self-adaptive network topology is characterized by comprising the following steps:

dividing the nodes into 1 management node and a plurality of working nodes, and abstracting the working nodes into upper nodes and lower nodes; decomposing a global convex optimization problem into a plurality of local convex optimization problems aiming at a connected network, solving the local convex optimization problems, and obtaining a global optimal solution by coordinating the local optimal solution, wherein the machine learning method comprises two parts of node detection and iterative computation; the node detection part comprises upper and lower layer node attribution updating, upper and lower layer node communication and management node and upper layer node communication, the iterative computation part carries out upper and lower layer related node data communication and single iterative computation, in the node detection process, a working node runs the updating of the iterative computation part, and in addition, the upper layer node feeds back the completion of the single iteration to the management node when each iteration is completed; when the position of the upper node is selected, all possibilities are avoided to be traversed through a greedy thought, and dynamic selection is adopted, so that the influence of link delay in the network is as small as possible.

2. The distributed ADMM machine learning method of adaptive network topology of claim 1, for solving a regularized linear regression problem, namelyWhere A is an m x n order matrix, b is an m order vector, λ is a constant, and x is an n order vector.

3. The distributed ADMM machine learning method of adaptive network topology as claimed in claim 1, wherein the node probing part comprises the following steps:

1) the management node issues a detection starting start instruction to an upper node and records the current time ts

2) The upper layer node i receives the clustering start instruction and then sends data to the corresponding lower layer node list LiA node in (1);

3) the lower node j receives the data and stores the data until all the corresponding upper node lists U are receivedjThe node data in the system is calculated and returned to all the corresponding upper node U after the calculationj

4) The upper node i receives the data and stores the data until receiving all the corresponding lower node lists LiThe node data in the node list is calculated and returned to all the corresponding lower-layer nodes L after the calculation;

5) the upper node sends an iterover instruction for one iteration to the management node;

6) the management node waits for receiving and recording the over commands of all upper nodes, and when the information corresponding to only one upper node is not received, the corresponding upper node is cancelled, and the current time t is obtainedcThrough tc-tsObtaining the time t required for a single complete system iterationlIt is stored in an iteration time set T, if TiIs the minimum value in the iteration time set T, i.e. TiWhen it is minT, it is preservedThe upper node set U of the previous system is collected, otherwise, the U is not updated;

7) the management node issues a node attribution updating update instruction to all upper-layer and lower-layer nodes, and the node receives the instruction and then performs corresponding node attribution updating operation;

8) repeating the processes 3 to 7 until only one upper node is left in the network, wherein the upper node set U stored by the management node is the final upper node set;

9) and the management node sends a detection completion instruction to all upper nodes and all lower nodes, and the nodes perform node attribution updating operation after receiving the instruction.

4. The distributed ADMM machine learning method according to claim 3, wherein the nodes in the upper layer and the nodes in the lower layer contain a relative node attribution relationship, that is, the nodes in the lower layer are related to the nodes in the upper layer with neighbor relationship and the closest node in the upper layer; the upper node is related to the lower node with neighbor relation; if the distance from the lower node to the upper node is the shortest among all the upper nodes, the upper node is also related to the lower node; each lower level node may be associated with multiple upper level nodes, and each upper level node may also be associated with multiple lower level nodes.

5. The distributed ADMM machine learning method for adaptive network topology according to claim 3, wherein each lower node stores a local variable x and a variable ui corresponding to each related upper node i, each upper node stores a local variable z, all variables x, u, z are n-order vectors, the initial state is n-order zero vector, and the order of the variable x in claim 2 is the same.

6. The distributed ADMM machine learning method of adaptive network topology according to claim 3,

in the kth iterative computation, the upper node issues zk to the related lower node, and the lower node passes through a formula

xk+1=(ATA+ρI)-1(ATb+ρ(zk-uk)),

uk+1=uk+xk+1-zk+1.

Updating the local x, u to obtain the result x after the kth iterative computationk+1、zk+1And uk+1. Wherein A is n-dimensional real number closed convex set, I is n-order unit square matrix, rho > 0 is punishment parameter, and updated x and u are returned to related upper node which passes through formula

The local z is updated. WhereinRepresenting a soft threshold operator in the form of

7. The distributed ADMM machine learning method of adaptive network topology as claimed in claim 6, wherein the management node stores a list of currently completed iteration nodes, the number of upper nodes in the current network is set to N, if the length of the list of completed iteration nodes is N-1, the management node notifies the entire network of removing the only upper nodes not in the list of completed iteration nodes, clears the list of completed iteration nodes, and stores a list S of upper nodes in the current systemN-1And time interval t between two list emptionsN-1(ii) a And repeating until only one upper node is left in the network.

8. The distributed ADMM machine learning method of adaptive network topology as claimed in claim 7, wherein for i e { 1., N-1}, the smallest t is selectediAnd pair itCorresponding upper node list SiThe upper node serves as an upper node in a final system, and all working nodes are informed to perform a formal iterative computation part; after receiving the notification, the working node initializes local corresponding x, u and z variables, updates the affiliation of the related node, and then starts iterative computation communication with the related node; and stopping iterative computation when the iteration times reach the maximum iteration times preset by the system.

Background

In recent years, with the rapid development of the information industry, the internet scale is increasing, and big data and machine learning are used more and more frequently in business. In the field of machine learning, a large amount of high-dimensional data come from different nodes, and high requirements are put on computing capacity, in this case, a single node is difficult to solve the problem, but a distributed machine learning algorithm can better adapt to the situation.

An Alternating Direction Method (ADMM) is a constraint problem optimization Method widely used in machine learning. The method greatly reduces the cost of a single problem by decomposing a global problem into a local problem, and the local problem can be coordinated to obtain a final global problem solution. The method has greater expansion and optimization possibility, from an initial dual-up, dual decomposition and augmented Lagrange multiplier method to ADMM proposed by Stephen Boyd and then to a later stage, people continuously propose variant ADMM aiming at various specific conditions, and the processing advantages of the method in the convex optimization problem are applied to various fields.

According to the iterative communication mode in the ADMM machine learning process, the method can be simply divided into a centralized mode and a distributed mode. The centralization here is distinguished from the common stand-alone centralization, i.e. the system is still distributed over several nodes, but all nodes will communicate with one particular node. A central node and a plurality of common nodes exist in the network, any two common nodes cannot communicate with each other, and any one common node can exchange data with the central node. The central node acquires the optimal solution of the local subproblems of all the common nodes after communicating with all the common nodes, performs coordination operation on all the solutions to obtain a result after one iteration is completed, and then issues the result to all the nodes again for recalculation. Different from the centralized type, a certain specific central node is not arranged in the network, a plurality of intermediate nodes can be arranged in the network, and the common nodes can select the intermediate nodes nearby and finally perform data summarization coordination by the central node. The pressure of one node is dispersed to a plurality of nodes, and the method is more in line with the development direction of the Internet. The mode can avoid the link with poor quality by selecting a proper intermediate node, thereby accelerating the iteration speed. And because of having a plurality of intermediate nodes, can not lead to the crash of the whole system because of the paralysis of a certain central node.

In the case of distributed computing, continuous communication is required between nodes to perform data interaction, and thus, the data convergence speed is increased. However, since the communication between the nodes needs to be performed by the network, the overall operation is affected by the network conditions. If improper node interaction exists in the operation and is influenced by network delay, the whole calculation process is greatly slowed down, and if the selected object is improper, data of a corresponding communication group can be polluted, so that the data convergence speed is slowed down and the convergence accuracy error is increased. It is therefore an inevitable problem how to select a suitable community of communication partners for each individual node. The invention provides a distributed ADMM machine learning method based on self-adaptive network topology aiming at the influence of delay in a network link on a distributed system.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. The distributed ADMM machine learning method has the advantages that the node distribution is more reasonable through the self-adaptive network condition under the condition of improving the system robustness, and the influence of network link delay on the self-adaptive network topology is reduced. The technical scheme of the invention is as follows:

a distributed ADMM machine learning method of adaptive network topology, comprising the steps of:

dividing the nodes into 1 management node and a plurality of working nodes, and abstracting the working nodes into upper nodes and lower nodes; decomposing a global convex optimization problem into a plurality of local convex optimization problems aiming at a connected network, solving the local convex optimization problems, and obtaining a global optimal solution by coordinating the local optimal solution, wherein the machine learning method comprises two parts of node detection and iterative computation; the node detection part comprises upper and lower layer node attribution updating, upper and lower layer node communication, management node and upper layer node communication, and the iterative computation part comprises upper and lower layer related node data communication and single iterative computation. In the node detection process, the working node runs the update of the iterative computation part, and in addition, the upper node feeds back the completion of single iteration to the management node when each iteration is completed; when the position of the upper node is selected, all possibilities are avoided to be traversed through a greedy thought, and dynamic selection is adopted, so that the influence of link delay in the network is as small as possible.

Further, the method is used to solve a regularized linear regression problem, i.e. Where A is an m x n order matrix, b is an m order vector, λ is a constant, and x is an n order vector.

Further, the node detection part specifically includes the following steps:

1) the management node issues a detection starting start instruction to an upper node and records the current time ts

2) The upper layer node i receives the clustering start instruction and then sends data to the corresponding lower layer node list LiA node in (1);

3) the lower node j receives the data and stores the data until all the corresponding upper node lists U are receivedjThe node data in the system is calculated and returned to all the corresponding upper node U after the calculationj

4) The upper node i receives the data and stores the data until receiving all the corresponding lower node lists LiThe node data in the node list is calculated and returned to all the corresponding lower-layer nodes L after the calculation;

5) the upper node sends an iterover instruction for one iteration to the management node;

6) the management node waits for receiving and recording the over commands of all upper nodes, and when the information corresponding to only one upper node is not received, the corresponding upper node is cancelled, and the current time t is obtainedcThrough tc-tsObtaining the time t required for a single complete system iterationlIt is stored in an iteration time set T, if TiIs the minimum value in the iteration time set T, i.e. TiWhen minT is satisfied, the upper layer of the current system is savedCollecting U by the nodes, otherwise, not updating U;

7) the management node issues a node attribution updating update instruction to all upper-layer and lower-layer nodes, and the node receives the instruction and then performs corresponding node attribution updating operation;

8) repeating the processes 3 to 7 until only one upper node is left in the network, wherein the upper node set U stored by the management node is the final upper node set;

9) the management node sends a detection completion instruction to all upper nodes and all lower nodes, and the nodes perform node attribution updating operation after receiving the instruction;

furthermore, the upper node and the lower node contain a relative node attribution relationship, namely the lower node is related to the upper node with neighbor relationship and the upper node closest to the lower node; the upper node is related to the lower node with neighbor relation; if the distance from the lower node to the upper node is the shortest among all the upper nodes, the upper node is also related to the lower node; each lower level node may be associated with multiple upper level nodes, and each upper level node may also be associated with multiple lower level nodes.

Furthermore, each lower level node stores u corresponding to the local variable x and each related upper level node iiEach upper node stores a local variable z, all variables x, u and z are n-order vectors, and the initial state is an n-order zero vector.

Further, the iterative computation section includes the steps of:

the upper node in the kth iterative computation will send down zkTo the related lower node, the lower node passes through the formula

xk+1=(ATA+ρI)-1(ATb+ρ(zk-uk)),

uk+1=uk+xk+1-zk+1.

Updating the local x, u to obtain the result x after the kth iterative computationk+1、zk+1And uk+1Wherein A is n-dimensional real number closed convex set, I is n-order unit square matrix, rho > 0 is punishment parameter, and the updated x and u are returnedReturning to relevant upper nodes, the upper nodes pass through the formula

The local z is updated. WhereinRepresenting a soft threshold operator in the form of

Further, the management node stores a list of currently finished iteration nodes, the number of upper nodes in the current network is set to be N, if the length of the list of finished iteration nodes is N-1, the management node notifies the whole network of removing the only upper nodes which are not in the list of finished iteration nodes, clears the list of finished iteration nodes, and stores a list S of upper nodes in the current systemN-1And time interval t between two list emptionsN-1(ii) a And repeating until only one upper node is left in the network.

Further, for i ∈ { 1.,. N-1}, the minimum t is selectediAnd lists S with its corresponding upper nodeiThe upper node serves as an upper node in a final system, and all working nodes are informed to perform a formal iterative computation part; after receiving the notification, the working node initializes local corresponding x, u and z variables, updates the affiliation of the related node, and then starts iterative computation communication with the related node; and stopping iterative computation when the iteration times reach the maximum iteration times preset by the system.

The invention has the following advantages and beneficial effects:

according to the invention, the calculation pressure is dispersed from a single node to the nodes in the whole system through a distributed method, the whole calculation speed is not limited by the hardware processing capacity of the single node any more, the high-dimensional data is split, the iterative calculation pressure of the single node is reduced after the dimensionality is reduced, and the speed is accelerated. And the number of the upper nodes in the system is at least 2, namely, a centralized star mode can not appear, and the reliability of the system is ensured. Finally, the number and the positions of the upper nodes are determined by an algorithm. And (3) setting all the initial nodes as upper-layer nodes by adopting a similar hierarchical clustering mode, continuously removing the slowest upper-layer nodes, and reducing the number of clusters until finally comparing to obtain an upper-layer node set corresponding to the fastest iterative convergence of a single system. A greedy idea is taken to avoid enumerating all possibilities. The upper node set is determined by taking into account the computing power of each device, rather than just taking into account network latency and ignoring differences between nodes. And because the calculation time of the node detection part equipment and the network delay are calculated in the single iteration time, the calculation capability of each node does not need to be known in advance. The influence of network attributes such as network delay and topology change on distributed computation is reduced to a certain extent.

Drawings

FIG. 1 is a simple topology embodiment node correspondence;

FIG. 2 is a diagram of a preferred embodiment of a small world simulation network topology provided by the present invention;

FIG. 3 is a flow chart of a node probe section;

FIG. 4 is a diagram of a formal iterative computation process.

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

a distributed ADMM machine learning method based on self-adaptive network topology is used for decomposing a global convex optimization problem into a plurality of local convex optimization problems aiming at a connected network and solving the local convex optimization problems, and obtaining a global optimal solution by coordinating the local optimal solution.

Furthermore, the whole machine learning method is decomposed into two parts of node detection and iterative computation.

Further, the nodes in the system are divided into a management node and other working nodes, and the working nodes are abstracted into two attributes, namely an upper node and a lower node, wherein the working nodes are necessarily the lower nodes but not necessarily the upper nodes, which is determined by an algorithm.

Furthermore, an affiliation method of the relative node is provided, and since the node is divided into an upper node and a lower node, the affiliation update is also divided into an upper part and a lower part.

The process of the lower node self attribution updating part is as follows:

1) traversing all the neighbor nodes, and if the neighbor nodes are upper nodes, adding the neighbor nodes to the corresponding upper node list UselfPerforming the following steps;

2) traversing all upper nodes to obtain the time delay T from the lower node to all upper nodes uuSelecting the shortest TxAdding the corresponding upper node x into an upper node list U corresponding to the lower node iselfIn (1).

Because the upper and lower node relationships are corresponding, that is, after all the lower nodes determine the corresponding related upper nodes, the related lower nodes corresponding to the upper nodes are also determined, so that the related lower node list L corresponding to the upper node u can be reversely deduced from the process of the lower node attribution updatinguThe process comprises the following steps:

1) traversing all the neighbor nodes, and adding the neighbor nodes to the corresponding lower layer node list LuPerforming the following steps;

2) traversing all lower nodes, the time delay from the lower node to all upper nodes { T }1,...,TnIn the method, if the time delay T to the current upper node is reachedi=min{T1,...,TnAdding the node to a corresponding lower node list Lu

Further, each lower level node may be associated with a plurality of upper level nodes, and each upper level node may be associated with a plurality of lower level nodes.

Furthermore, each lower level node stores u corresponding to the local variable x and each related upper level node iiEach upper node holds a local variable z.

Further, the upper node in the k iteration calculation sends down zkTo the related lower node, the lower node passes through the formula

xk+1:=(ATA+ρI)-1(ATb+ρ(zk-uk)),

uk+1:=uk+xk+1-zk+1.

Updating local x and u, wherein A is n-dimensional real number closed convex set, I is n-order unit square matrix, returning the updated x and u to related upper-layer nodes, and the upper-layer nodes pass through a formula

The local z is updated.

Furthermore, in the node detection process, the working node executes the updating of the iterative computation part, and in addition, the upper node feeds back the completion of a single iteration to the management node when each iteration is completed.

Further, the management node stores a list of currently finished iteration nodes, the number of upper nodes in the current network is set to be N, if the length of the list of finished iteration nodes is N-1, the management node notifies the whole network of removing the only upper nodes which are not in the list of finished iteration nodes, clears the list of finished iteration nodes, and stores a list S of upper nodes in the current systemN-1And time interval t between two list emptionsN-1And repeating until only one upper node is left in the network.

Further, after the working node receives the upper node update instruction notified by the management node, new related node attribution update is performed, and the process is repeated until only one upper node remains in the network.

Further, for i ∈ { 1.,. N-1}, the minimum t is selectediAnd lists S with its corresponding upper nodeiAnd the upper-layer node is used as the upper-layer node in the final system and informs all the working nodes to carry out a formal iterative computation part.

Further, after receiving the notification, the working node initializes local corresponding x, u, z variables, performs home update of the relevant node, and then starts iterative computation communication with the relevant node. And stopping iterative computation when the iteration times reach the maximum iteration times preset by the system.

The invention abstracts the working nodes into upper-layer nodes and lower-layer nodes, solves the solution of the convex optimization problem by using a distributed ADMM algorithm, simultaneously considers the influence of the link delay in the network on the communication speed during iterative computation among different nodes, avoids traversing all possibilities by greedy thought when selecting the position of the upper-layer node, and makes the influence of the link delay in the network as small as possible by dynamic selection. As shown in fig. 1, a simple topology network is composed of 5 working nodes and 1 management node, wherein the upper and lower nodes with the same serial number represent the same physical device entity, and since the related nodes of each working node are not limited to one, there may be many-to-many relationship, such as upper nodes 1, 3, 5 and lower nodes 2, 3, 4 in fig. 1.

For the small-world simulation network, 1 management node and 16 working nodes are selected, any one working node has 4 neighbor working nodes, each link has 90% of possibility to be reset between other two nodes, and the approximate topology is shown in fig. 2, wherein the management node is not included.

Initially, all the work nodes are set as upper and lower nodes. Initial entry node probing section:

1) the management node issues a detection starting (clustering start) instruction to an upper node and records the current time ts

2) The upper layer node i receives the clustering start instruction and then sends data to the corresponding lower layer node list LiA node in (1);

3) the lower node j receives the data and stores the data until all the corresponding upper node lists U are receivedjThe node data in the system is calculated and returned to all the corresponding upper node U after the calculationj

4) The upper node i receives the data and stores the data until receiving the dataAll corresponding lower level node lists LiThe node data in the node list is calculated and returned to all the corresponding lower-layer nodes L after the calculation;

5) an upper node sends an iteration over instruction to a management node;

6) the management node waits for receiving and recording the over commands of all upper nodes, and when the information corresponding to only one upper node is not received, the corresponding upper node is cancelled, and the current time t is obtainedcThrough tc-tsObtaining the time t required for a single complete system iterationlIt is stored in an iteration time set T, if TiIs the minimum value in the iteration time set T, i.e. TiIf not, storing the current system upper node set U, otherwise, not updating U;

7) the management node issues a node attribution updating (update) instruction to all upper-layer and lower-layer nodes, and the node receives the instruction and then performs corresponding node attribution updating operation;

8) repeating the processes 3 to 7 until only one upper node is left in the network, wherein the upper node set U stored by the management node is the final upper node set;

9) the management node sends a detection completion instruction to all upper nodes and all lower nodes, and the nodes perform node attribution updating operation after receiving the instruction;

fig. 3 shows the communication process of detecting different nodes in a single clustering in the above steps, but the detection process continues until only one upper node is left in the network, and finally the upper node in the system is determined.

And (4) finishing attribution of all nodes, determining the final upper-layer nodes in the network, and storing the nodes in the U, namely determining the number and the positions of the node clustering centers. After the upper node is determined, the attribution relationship of the lower node can be easily obtained through the attribution updating part. It should be noted that the selected upper-level node is only the current optimal solution selected in each detection iteration, which is determined by the greedy idea, and compared with traversing all possibilities and spending most of the time in the clustering detection part before formal calculation, the local optimal solution is selected to better meet the actual requirement.

After determining the upper node, to solve

The updating of x, u, z is shown below

xk+1=(ATA+ρI)-1(ATb+ρ(zk-uk)),# (2)

uk+1=uk+xk+1-zk+1.# (4)

As shown in FIG. 4, the upper nodes will store respective ziEach lower node stores its own xiAnd u is the same as the number of related upper nodesiI.e. each related upper node will have a corresponding u at the lower nodeiThe upper node sends its z initiallyiFor all the lower nodes related to the lower nodes, the lower nodes update the u owned by the lower nodes through the relation between the formula 2 and the formula 4 after receiving the uiAnd xiAnd returning the two to the upper node related to the upper node, and the upper node receives the correlation u of all the lower nodesiAnd xiThen, z owned by the user is updated by formula 3iAnd completing one iteration.

After each iteration is completed, the upper node obtains the result L in the problem 1 according to the current x, u and ziAnd storing the iteration result L in the local area, and finally completing the iteration of the specified times, and then carrying out each iteration result L on all upper-layer nodesiTaking an average to obtainFinally according toThe result convergence time and the number of iterations i required for convergence are determined.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:一种基于机器学习的高精度海表温度反演方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!