API recommendation method based on knowledge graph and collaborative filtering
1. An API recommendation method based on knowledge graph and collaborative filtering is characterized by comprising the following steps:
(1) constructing a service knowledge graph according to Mashup and the existing API, embedding the API in the knowledge graph into a low-dimensional vector by using a representation learning algorithm TransH, and calculating the similarity between API entities;
(2) acquiring APIs used by the target Mashup, and obtaining similar APIs of the APIs used by the target Mashup according to the step (1) to form a recommendation list RS 1;
(3) extracting functions of the target Mashup and other mashups by using a natural language processing technology according to the text description documents of the Mashup, and calculating the similarity between the target Mashup and other mashups through the functions;
(4) obtaining similar mashups of the target Mashup according to the step (3), and forming an API used by the similar mashups into a recommendation list RS 2;
(5) respectively constructing a Mashup use matrix and an API use matrix according to the API use records of the Mashup, and then respectively calculating the similarity between the mashups and the similarity between the APIs;
(6) obtaining similar mashups of the target Mashup according to the step (5), and forming an API used by the similar mashups into a recommendation list RS 3;
(7) acquiring APIs used by the target Mashup, and obtaining similar APIs of the APIs used by the target Mashup according to the step (5) to form a recommendation list RS 4;
(8) obtaining an API-based recommendation set AS according to RS1 and RS4, specifically:
carrying out normalization processing on the similarity of recommended APIs in RS1 and RS 4;
recording a recommended API set in RS1 as set1, recording a recommended API set in RS4 as set2, and recording a union set of set1 and set2 as set; firstly, assigning a score to a recommended API in RS1, traversing each API in a set, and if the API exists in RS1, the score of the API in RS1 is the normalized similarity; if this API does not exist in RS1, the API in RS1 scores 0; then in the same way, a score is assigned to the recommended API in RS 4; merging the processed RS1 and RS4 into an API-based recommendation set AS, of which the ith recommendation API isScore SriCalculating the formula:
wherein s is1And s2The scores of the ith recommended API in RS1 and RS4 respectively;
(9) obtaining a recommendation set MS based on Mashup according to RS2 and RS3 by using the method in the step (8);
(10) and (4) combining the AS and the MS by using the method in the step (8) to obtain a final API recommendation result.
2. The API recommendation method based on knowledge graph and collaborative filtering according to claim 1, wherein the service knowledge graph is specifically constructed by: taking the API, the Mashup, the Category Category and the Tag thereof as service entities, wherein the relationship between the API and the Mashup is defined as used; the relationship between APl and Category, and between Mashup and Category are both defined as "belong _ to"; the relationships between API and Tag, and between Mashup and Tag are defined as "Tag"; and calculating text description similarity among the APIs, setting the top 20% of the APIs with the highest similarity as competition relations, and adding the APIs into the knowledge graph.
3. The API recommendation method based on knowledge graph and collaborative filtering according to claim 1, wherein in the step (1), the similarity calculation between API entities is specifically:
extracting entities and relations from a knowledge graph to form a triple, and then embedding the triple into a low-dimensional vector by using a representation learning method TransH; after obtaining the low-dimensional vector of the API entities, calculating the similarity between the API entities by using the cosine similarity, wherein the specific calculation formula is as follows:
wherein the APIiAnd APIi' is the low-dimensional vector of the two APIs, and d is the dimension of the APl vector.
4. The API recommendation method based on knowledge graph and collaborative filtering according to claim 1, wherein in the step (3), the similarity calculation between mashups is specifically as follows:
firstly, using a natural language processing tool Stanford Parser to carry out word segmentation processing on the description information of Mashup, and then forming a corresponding dependency relationship by using a predicate and an object; the similarity between the target Mashup and other mashups is calculated through the probability of the same function, and the specific calculation formula is as follows:
wherein P and Q are the functional sets of two mashups, respectively.
5. The API recommendation method based on knowledge graph and collaborative filtering according to claim 1, wherein the similarity calculation in the step (5) is specifically as follows:
the construction equation of Mashup using matrix is as follows:
Mashupi=(API1,API2,...,APIj,...)
wherein the APIjIs dependent on the APIjWhether it was used by Mashupi, if it was used, APIjIs 1 and is 0 if not used;
the API uses the construction equation of the matrix as follows:
APIi=(Mashup1,Mashup2,...,Mashupj,...)
wherein when the APIiUse of MashupjWhen it is, MashupjIf the value of (A) is 1, if the APIiMashup has not been usedjThen MashupjIs 0;
and (3) calculating the similarity between mashups and between APIs by using the co-occurrence similarity, wherein the specific calculation formula is as follows:
wherein MashupxAnd MashupyIs a usage matrix, API, of MashupxAnd APIyIs the usage matrix of the API.
Background
With the rapid development of the internet, Web services are becoming a major technology. Nowadays, the dependence on the API becomes larger and larger, a single Web API cannot meet the requirement, and at this time, a plurality of APIs are often required to work together to achieve the final purpose.
The Mashup concept has become popular in recent years, which is a Web application that utilizes data and Web api by combining existing Web resources. A Mashup often combines two or more Web APIs, such as Mashup named sportlogger, which is an application providing sports blogging functionality. This Mashup integrates a total of three APIs, google maps, twitter, janrain engage. Mashup easily meets the requirements of end users and can facilitate development work of developers. Therefore, the appearance and the development of Mashup not only solve the defect of single function of a single Web API, but also enable the Web API to be reusable. However, the number of the current Web APIs is huge, and the number of APIs on the programeweb is already more than 23000 according to statistics, which brings great trouble to developers. Under the circumstances, how to recommend a proper API for Mashup has become a hotspot and difficulty in the service field, and a service recommendation system with excellent performance will greatly improve the development efficiency of developers.
The service recommendation facing Mashup at present is mainly a collaborative filtering algorithm used, but the deep relation between Mashup and the API cannot be well mined. Meanwhile, when the target Mashup has no API use record, there is a problem of cold start, that is, it is difficult to recommend a Web API to a Mashup that has not used an API. In addition, many studies are currently dependent on only API and Mashup description information. But the description information of many services is incomplete or inaccurate, which greatly affects the final recommendation.
Disclosure of Invention
In order to efficiently and accurately recommend APIs for corresponding mashups, the invention provides an API recommendation method based on a knowledge graph and collaborative filtering.
The purpose of the invention is realized by the following technical scheme: an API recommendation method based on knowledge graph and collaborative filtering comprises the following steps:
(1) constructing a service knowledge graph according to Mashup and the existing API, embedding the API in the knowledge graph into a low-dimensional vector by using a representation learning algorithm TransH, and calculating the similarity between API entities;
(2) acquiring APIs used by the target Mashup, and obtaining similar APIs of the APIs used by the target Mashup according to the step (1) to form a recommendation list RS 1;
(3) extracting functions of the target Mashup and other mashups by using a natural language processing technology according to the text description documents of the Mashup, and calculating the similarity between the target Mashup and other mashups through the functions;
(4) obtaining similar mashups of the target Mashup according to the step (3), and forming an API used by the similar mashups into a recommendation list RS 2;
(5) respectively constructing a Mashup use matrix and an API use matrix according to the API use records of the Mashup, and then respectively calculating the similarity between the mashups and the similarity between the APIs;
(6) obtaining similar mashups of the target Mashup according to the step (5), and forming an API used by the similar mashups into a recommendation list RS 3;
(7) acquiring APIs used by the target Mashup, and obtaining similar APIs of the APIs used by the target Mashup according to the step (5) to form a recommendation list RS 4;
(8) obtaining an API-based recommendation set AS according to RS1 and RS4, specifically:
carrying out normalization processing on the similarity of recommended APIs in RS1 and RS 4;
recording a recommended API set in RS1 as set1, recording a recommended API set in RS4 as set2, and recording a union set of set1 and set2 as set; firstly, assigning a score to a recommended API in RS1, traversing each API in a set, and if the API exists in RS1, the score of the API in RS1 is the normalized similarity; if this API does not exist in RS1, the API in RS1 scores 0; then in the same way, a score is assigned to the recommended API in RS 4; after the operations, the lengths of the RS1 and the RS4 are the same, and the recommended APIs are the same; merging the processed RS1 and RS4 into an API-based recommendation set AS, wherein the score Sr of the ith recommendation API in the ASiThe calculation formula is as follows:
wherein s is1And s2The scores of the ith recommended API in RS1 and RS4 respectively;
(9) obtaining a recommendation set MS based on Mashup according to RS2 and RS3 by using the method in the step (8);
carrying out normalization processing on the similarity of recommended APIs in RS2 and RS 3;
recording a recommended API set in RS2 as set3, recording a recommended API set in RS3 as set4, and recording a union set of set3 and set4 as set'; firstly, assigning a score to a recommended API in RS2, traversing each API in set', and if the API exists in RS2, the score of the API in RS2 is the normalized similarity; if this API does not exist in RS2, the API in RS2 scores 0; then in the same way, a score is assigned to the recommended API in RS 3; after the operations, the lengths of the RS2 and the RS3 are the same, and the recommended APIs are the same; combining the processed RS2 and RS3 into an API-based recommendation set MS, wherein the score Sr of the ith recommendation API in the MSiThe calculation formula is as follows:
wherein s is1And s2The scores of the ith recommended API in RS2 and RS3 respectively;
(10) combining the AS and the MS by using the method in the step (8) to obtain a final API recommendation result;
recording a recommended API set in the AS AS set5, recording a recommended API set in the MS AS set6, and recording a union set of set5 and set6 AS set "; firstly, assigning a score to a recommended API in the AS, traversing each API in the set', and if the API exists in the AS, determining the score of the API in the AS to be the similarity after normalization; if the API does not exist in the AS, the score of the API in the AS is 0; then, assigning a score to the recommended API in the MS in the same way; after the operation, the AS and the MS have the same length and the recommended API is the same; merging the processed AS and MS into a final API recommendation set RS, wherein the rating Sr of the ith recommendation API in the RSiThe calculation formula is as follows:
wherein s is1And s2The scores of the ith recommended API in the AS and the MS, respectively.
Further, the construction of the service knowledge graph specifically comprises: taking the API, the Mashup, the Category Category and the Tag thereof as service entities, wherein the relationship between the API and the Mashup is defined as used; the relationships between API and Category, and between Mashup and Category are defined as "belong _ to"; the relationships between API and Tag, and between Mashup and Tag are defined as "Tag"; when the text description similarity between the APIs is higher, the functions of the two APIs are more similar, the fact that the text description similarity between the APIs possibly exists in a competitive relationship is indicated, the text description similarity between the APIs is calculated, and then the APIs with the top 20% of the highest text description similarity of each API are set to be in the competitive relationship and added to the knowledge graph.
Further, in the step (1), the similarity calculation between API entities specifically includes:
entities and relations are firstly extracted from the knowledge graph to form triples, and then the triples are embedded into low-dimensional vectors by using a representation learning method TransH. TansH is an improvement on the basis of TansE. The TransE algorithm is to add the head vector and the displacement vector in the triple as equal as possible to the tail vector. If the head vector, the relation vector and the tail vector in the triple are respectively h, r and t, the triple is embedded into the low-dimensional vector to the greatest extent, and the following formula is satisfied:
h+r≈t
embedding the triples into a single plane when the triples are embedded into a vector space by using a TransE algorithm; like TransE, TansH is also to satisfy that the sum of the head vector and the translation vector in the triplet is as equal as possible to the tail vector. But unlike TransE, which maps triples to a single plane, TransH maps triples to multiple planes.
After obtaining the low-dimensional vectors of the API entities, calculating the similarity between the API entities by using cosine similarity, wherein the larger the numerical value is, the larger the association between the two API entities is, and the specific calculation formula is as follows:
wherein the APIiAnd APIi' is a low-dimensional vector of the two APIs and d is the dimension of the API vector, preferably d is taken to be 100.
Further, in the step (3), the similarity calculation between mashups specifically includes:
in order to mine the similarity between mashups, the functions of all mashups are extracted. Firstly, the description information of Mashup is participled by using a currently popular natural language processing tool Stanford Parser. Corresponding dependencies are then composed with the predicates and objects. For example, Mashup named "ordataeur" is described as "ordataurelps company and buy computers", and the extracted guests are "computers" and "bugs", which are two functions of the Mashup of "ordataeur".
The similarity between the target Mashup and other mashups in the step (3) is calculated according to the probability of the occurrence of the same function, so that the closer the functions among the mashups are, the higher the similarity is. The specific similarity calculation formula is as follows:
wherein P and Q are the functional sets of two mashups, respectively.
Further, the similarity calculation in the step (5) is specifically as follows:
the construction equation of Mashup using matrix is as follows:
Mashupi=(API1,API2,...,APIj,...)
wherein the APIjIs dependent on the APIjWhether or not to be MashupiUsed, if used, APIjThe value of (1) is 0 if it is not used.
The API uses the construction equation of the matrix as follows:
APIi=(Mashup1,Mashup2,...,Mashupj,...)
wherein when the APIiUse of MashupjWhen it is, MashupjIf the value of (A) is 1, if the APIiMashup has not been usedjThen MaxhupjThe value of (d) is 0.
And respectively calculating the similarity between Mashup use matrixes and the similarity between API use matrixes, wherein the similarity between Mashup and the similarity between API use matrixes is calculated by using the co-occurrence similarity, and the specific calculation formula is as follows:
wherein MashupxAnd MashupyIs a usage matrix, API, of MashupxAnd APIyIs the usage matrix of the API.
The invention has the following beneficial effects: the invention reduces the influence of data sparsity on the recommendation result by using the knowledge graph technology. Meanwhile, the problem of cold start is solved by using the function of extracting Mashup. Compared with the existing service recommendation algorithm, the method provided by the invention has the advantage that the accuracy of service recommendation is remarkably improved.
Drawings
FIG. 1 is a flowchart of an API recommendation method based on knowledge-graph and collaborative filtering provided by an embodiment of the present invention;
fig. 2 is a schematic diagram of target Mashup basic information and used APIs provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a partial service knowledge graph constructed by an embodiment of the invention;
fig. 4 is a final recommendation result of the recommendation method provided in the embodiment of the present invention.
Detailed Description
The technical solution of the present invention is further described in detail by the following embodiments with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present invention provides an API recommendation method based on a knowledge graph and collaborative filtering, which includes the following specific steps:
(1) and constructing a service knowledge graph according to Mashup and the existing API. For example, a partial knowledge graph constructed according to the target Mashup information named springmaker shown in fig. 2 is shown in fig. 3. The relationship between Mashup and the used API is used, for example, portlog and twitter form a triple (portlog, use, twitte). The relationship between the target Mashup and category is belong _ to. For example, the springlog and the Travel form a triplet (springlog, belong _ to, twitte). The relationship between Mashup and Tag is Tag, such as the shortloader and the Social form a triplet (springto, Social). And embedding the API entities into the low-dimensional vector, and calculating the similarity between the APIs.
(2) And acquiring the APIs google maps, twitter and janrain engage used by the target Mashup named as sportlogger, and respectively forming the APIs most similar to the three APIs into a candidate recommendation list RS1 based on the step (1).
(3) The method comprises the steps of extracting functions of a target Mashup and other mashups By utilizing a natural language technology, wherein text description information of the target mashuppsportloggers is 'By inputting and tagging GPS tracks, users get a clear view of the third sports activities within space (on a map) and time (on a caption)', and obtaining a binary SD (get, sports activities) By using a Stanford Parser source opening tool after word segmentation. And then calculating the similarity between mashups according to the function of the description text extraction.
(4) Similar mashups of the Mashup of the sportlogger are found as fonfiner, sneaker store finer, twitterwho, ukoel social jukebox and breaking news headlines based on the step (3), and then the five used APIs are combined into a recommendation list RS 2.
(5) Obtaining a use matrix of all mashups and APIs, wherein each element formula of the matrix is as follows:
and calculating the similarity between mashups and between APIs according to the matrix.
(6) Similar mashups of mashuppsportloggers obtained according to the step (5) are destinations, 100travel destinations,200towns, cool bars reserves and club and darwin bus map and the APIs used by these similar mashups are composed into a recommendation list RS 3.
(7) And (4) acquiring APIgoogle maps, twitter and janrain engage used by the springlogger, and then obtaining similar APIs of the APIs according to the step (5) to form a recommendation list RS 4.
(8) Carrying out normalization processing on the similarity of recommended APIs in RS1 and RS 4;
recording a recommended API set in RS1 as set1, recording a recommended API set in RS4 as set2, and recording a union set of set1 and set2 as set; firstly, assigning a score to a recommended API in RS1, traversing each API in a set, and if the API exists in RS1, the score of the API in RS1 is the normalized similarity; if this API does not exist in RS1, the API in RS1 scores 0; the recommended API in RS4 is then assigned a rating in the same mannerDividing; merging the processed RS1 and RS4 into an API-based recommendation set AS, wherein the score Sr of the ith recommendation API in the ASiCalculating the formula:
wherein s is1And s2The scores are of the ith recommendation API in RS1 and RS4, respectively.
(9) Carrying out normalization processing on the similarity of recommended APIs in RS2 and RS 3;
recording a recommended API set in RS2 as set3, recording a recommended API set in RS3 as set4, and recording a union set of set3 and set4 as set'; firstly, assigning a score to a recommended API in RS2, traversing each API in set', and if the API exists in RS2, the score of the API in RS2 is the normalized similarity; if this API does not exist in RS2, the API in RS2 scores 0; then in the same way, a score is assigned to the recommended API in RS 3; after the operations, the lengths of the RS2 and the RS3 are the same, and the recommended APIs are the same; combining the processed RS2 and RS3 into an API-based recommendation set MS, wherein the score Sr of the ith recommendation API in the MSiThe calculation formula is as follows:
wherein s is1And s2The scores are of the ith recommendation API in RS2 and RS3, respectively.
(10) And normalizing the scores of the recommended APIs in the AS and the MS, and then combining the AS and the MS and taking the 20 APIs with the highest scores to form a final recommended set RS. The final API recommendation set for Sportlogger is shown in FIG. 4.
Further, the similarity calculation in the step (1), the step (3) and the step (5) is specifically as follows:
A) in the step (1), the similarity calculation of the API entity in the low-dimensional vector uses cosine similarity calculation, the larger the numerical value is, the more relevant the two APIs are, and the specific calculation formula is as follows:
wherein the APIiAnd APIi' is a low-dimensional vector of the two APIs, and d is the dimension of the API vector.
B) In the step (3), the similarity between the target Mashup and other mashups is the probability of occurrence of the same function, so that the more similar the functions among mashups are, the higher the similarity is, and a specific calculation formula is as follows:
wherein P and Q are the functional sets of two mashups, respectively.
C) And (5) respectively calculating the similarity between Mashup use matrixes and the similarity between API use matrixes, wherein the similarity between Mashup and the similarity between API use matrixes is calculated by using the co-occurrence similarity because the data in the use matrixes are only 0 and 1, and the specific calculation formula is as follows:
wherein MashupxAnd MashupyIs a usage matrix, API, of MashupxAnd APIyIs the usage matrix of the API.
The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.