Data consistency determining method, device, equipment and storage medium
1. A method for determining data consistency, comprising:
obtaining source data from a first data source;
acquiring target data from a second data source based on the data identification of the source data, wherein the target data and the source data have the same data identification;
and determining whether the fields to be compared in the source data and the target data are consistent or not according to a field mapping relation between the source data and the target data which is configured in advance.
2. The method of claim 1, wherein determining whether the fields to be compared in the source data and the target data are consistent comprises:
according to the first data source and the second data source, acquiring the field mapping relation between the source data and the target data from a plurality of field mapping relations configured in advance;
comparing the values of the fields to be compared in the source data and the target data based on the field mapping relation;
if the values of the fields to be compared of the source data and the target data are consistent, determining that the fields to be compared in the source data and the target data are consistent.
3. The method of claim 2, further comprising:
if the values of the fields to be compared in the source data and the target data are not consistent, determining whether a delayed retry condition is met, wherein the delayed retry condition comprises that retry time is within a preset time range or retry times are within a preset time threshold;
if the delay retry condition is met, re-acquiring the source data from the first data source, and re-acquiring the target data from the second data source;
and determining whether the values of the fields to be compared in the source data and the target data which are acquired again are consistent or not based on the field mapping relation.
4. The method of claim 3, further comprising:
generating a delayed retry message prior to re-acquiring the source data from the first data source, the delayed retry message including a data identification of the source data,
the retrieving the source data from the first data source and retrieving the target data from the second data source comprises:
if the delayed retry message is monitored, acquiring the data identifier from the delayed retry message;
based on the data identification, the source data is retrieved from the first data source and the target data is retrieved from the second data source.
5. The method of claim 4, further comprising:
placing the delayed retry message in a retry message queue;
if the source data is monitored to be changed, determining whether the source data exists in the retry message queue based on the data identification;
if the source data already exists, updating the source data already existing in the retry message queue through the changed source data.
6. The method of any of claims 1 to 5, wherein the obtaining source data from a first data source comprises:
monitoring a binary log binlog message of the first data source;
obtaining the source data from the first data source based on the monitored data identification in the binlog message;
the obtaining target data from a second data source based on the data identification of the source data comprises:
determining a data extraction task corresponding to the first data source based on the monitored data identification in the binlog message;
and acquiring target data from the second data source based on the data identification and the data extraction task.
7. The method of any one of claims 1 to 5, wherein the first data source and the second data source are the same type of data source, and wherein the obtaining source data from the first data source comprises:
and obtaining a plurality of pieces of source data from the first data source in batches.
8. The method according to any one of claims 1 to 5, wherein the first data source is a data file, and the obtaining source data from the first data source comprises:
reading line data from the data file;
and packaging the read line data, and taking the packaged line data as the source data.
9. The method of claim 8, further comprising:
if the size of the data file is larger than a preset threshold value, the data file is fragmented;
the reading of line data from the data file comprises:
and reading the data of each fragment of the data file through the fragment granularity.
10. The method according to any one of claims 1 to 5, further comprising:
performing aggregation processing on the source data and the target data to generate line data, wherein the line data comprises the source data and the target data;
the determining whether the fields to be compared in the source data and the target data are consistent includes:
comparing the streaming data of the source data in the row data and the field to be compared in the target data;
and determining whether the fields to be compared in the source data and the target data are consistent or not based on the comparison result.
11. The method of claim 10, wherein the aggregating the source data and the target data to generate line data comprises:
acquiring a pre-configured data aggregation template corresponding to the first data source and the second data source;
and performing aggregation processing on the source data and the target data based on the data aggregation template to generate line data.
12. A data consistency determining apparatus, characterized by comprising:
the first data acquisition module is used for acquiring source data from a first data source;
a second data obtaining module, configured to obtain target data from the second data source based on a data identifier of the source data, where the target data and the source data have a same data identifier;
and the data comparison module is used for determining whether the fields to be compared in the source data and the target data are consistent or not according to a field mapping relation between the source data and the target data which is configured in advance.
13. The apparatus of claim 12, wherein the data alignment module comprises:
a mapping relationship determining unit, configured to obtain, according to the first data source and the second data source, the field mapping relationship between the source data and the target data from a plurality of field mapping relationships configured in advance;
a comparison unit, configured to compare values of fields to be compared in the source data and the target data based on the field mapping relationship;
and the consistency determining unit is used for determining that the fields to be compared in the source data and the target data are consistent if the values of the fields to be compared in the source data and the target data are consistent.
14. An electronic device, comprising: a memory, a processor;
a memory for storing the processor-executable instructions;
the processor is configured to implement the data consistency determination method of any of claims 1 to 11.
15. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor, are configured to implement the data consistency determination method according to any one of claims 1 to 11.
Background
With the development of enterprises, the enterprise business becomes more and more complex, a single business system is difficult to support a complex business scene, and a new system architecture comprising a plurality of business systems appears.
Taking a micro-service system architecture as an example, in the micro-service system architecture, deployment is performed on each service system by adopting a micro-service distributed deployment mode according to functional modules. Therefore, in the micro service system architecture, one service may generate a plurality of pieces of data to be processed distributed dispersedly on the structure, that is, a plurality of pieces of data to be processed of the same service are distributed in a plurality of micro service modules. However, the inventors found that: data processed independently by each micro service module in the micro service system architecture can change, and data inconsistency among the micro service modules in the micro service system architecture is caused.
Therefore, how to determine whether data among modules in the same system architecture are consistent becomes a technical problem to be solved urgently.
Disclosure of Invention
Embodiments of the present invention provide a method, an apparatus, a device, and a storage medium for determining data consistency, so as to solve a problem how to determine whether data among modules in a same system architecture are consistent.
According to a first aspect of the embodiments of the present invention, there is provided a data consistency determining method, including:
obtaining source data from a first data source;
acquiring target data from a second data source based on the data identification of the source data, wherein the target data and the source data have the same data identification;
and determining whether the fields to be compared in the source data and the target data are consistent or not according to a field mapping relation between the source data and the target data which is configured in advance.
In a specific embodiment, the determining whether the fields to be compared in the source data and the target data are consistent includes:
according to the first data source and the second data source, acquiring the field mapping relation between the source data and the target data from a plurality of field mapping relations configured in advance;
comparing the values of the fields to be compared in the source data and the target data based on the field mapping relation;
if the values of the fields to be compared of the source data and the target data are consistent, determining that the fields to be compared in the source data and the target data are consistent.
In one embodiment, the method further comprises:
if the values of the fields to be compared in the source data and the target data are not consistent, determining whether a delayed retry condition is met, wherein the delayed retry condition comprises that retry time is within a preset time range or retry times are within a preset time threshold;
if the delay retry condition is met, re-acquiring the source data from the first data source, and re-acquiring the target data from the second data source;
and determining whether the values of the fields to be compared in the source data and the target data which are acquired again are consistent or not based on the field mapping relation.
In one embodiment, the method further comprises:
generating a delayed retry message prior to re-acquiring the source data from the first data source, the delayed retry message including a data identification of the source data,
the retrieving the source data from the first data source and retrieving the target data from the second data source comprises:
if the delayed retry message is monitored, acquiring the data identifier from the delayed retry message;
based on the data identification, the source data is retrieved from the first data source and the target data is retrieved from the second data source.
In one embodiment, the method further comprises:
placing the delayed retry message in a retry message queue;
if the source data is monitored to be changed, determining whether the source data exists in the retry message queue based on the data identification;
if the source data already exists, updating the source data already existing in the retry message queue through the changed source data.
In one embodiment, the obtaining source data from a first data source includes:
monitoring a binary log binlog message of the first data source;
obtaining the source data from the first data source based on the monitored data identification in the binlog message;
the obtaining target data from a second data source based on the data identification of the source data comprises:
determining a data extraction task corresponding to the first data source based on the monitored data identification in the binlog message;
and acquiring target data from the second data source based on the data identification and the data extraction task.
In a specific embodiment, the first data source and the second data source are data sources of the same type, and the obtaining source data from the first data source includes:
and obtaining a plurality of pieces of source data from the first data source in batches.
In a specific embodiment, the obtaining source data from the first data source, where the first data source is a data file, includes:
reading line data from the data file;
and packaging the read line data, and taking the packaged line data as the source data.
According to some embodiments of the invention, based on the above scheme, the method further comprises:
if the size of the data file is larger than a preset threshold value, the data file is fragmented;
the reading of line data from the data file comprises:
and reading the data of each fragment of the data file through the fragment granularity.
In one embodiment, the method further comprises:
performing aggregation processing on the source data and the target data to generate line data, wherein the line data comprises the source data and the target data;
the determining whether the fields to be compared in the source data and the target data are consistent includes:
comparing the streaming data of the source data in the row data and the field to be compared in the target data;
and determining whether the fields to be compared in the source data and the target data are consistent or not based on the comparison result.
In a specific embodiment, the aggregating the source data and the target data to generate line data includes:
acquiring a pre-configured data aggregation template corresponding to the first data source and the second data source;
and performing aggregation processing on the source data and the target data based on the data aggregation template to generate line data.
In a second aspect of the embodiments of the present invention, there is provided a data consistency determining apparatus, including:
the first data acquisition module is used for acquiring source data from a first data source;
the second data acquisition module is used for acquiring target data from a second data source based on the data identifier of the source data, and the target data and the source data have the same data identifier;
and the data comparison module is used for determining whether the fields to be compared in the source data and the target data are consistent or not according to a field mapping relation between the source data and the target data which is configured in advance.
In one embodiment, the data alignment module includes:
a mapping relationship determining unit, configured to obtain, according to the first data source and the second data source, the field mapping relationship between the source data and the target data from a plurality of field mapping relationships configured in advance;
a comparison unit, configured to compare values of fields to be compared in the source data and the target data based on the field mapping relationship;
and the consistency determining unit is used for determining that the fields to be compared in the source data and the target data are consistent if the values of the fields to be compared in the source data and the target data are consistent.
In one embodiment, the apparatus further comprises:
a retry determining module, configured to determine whether a delay retry condition is met if values of fields to be compared in the source data and the target data are inconsistent, where the delay retry condition includes that a retry time is within a predetermined time range or a retry number is within a predetermined threshold;
a re-acquisition module, configured to re-acquire the source data from the first data source and re-acquire the target data from the second data source if the delay retry condition is met;
and the re-comparison module is used for determining whether the values of the fields to be compared in the source data and the target data which are obtained again are consistent or not based on the field mapping relationship.
In one embodiment, the apparatus further comprises:
a retry message generation module to generate a delayed retry message before the source data is re-acquired from the first data source, the delayed retry message including a data identification of the source data,
the reacquisition module is configured to:
if the delayed retry message is monitored, acquiring the data identifier from the delayed retry message;
based on the data identification, the source data is retrieved from the first data source and the target data is retrieved from the second data source.
In one embodiment, the apparatus further comprises:
a queue generating module, configured to put the delayed retry message into a retry message queue;
a source data determining module, configured to determine, based on the data identifier, whether the source data already exists in the retry message queue if it is monitored that the source data has changed;
and the data updating module is used for updating the existing source data in the retry message queue through the changed source data if the source data exists.
In a specific embodiment, the first data obtaining module is further configured to:
monitoring a binary log binlog message of the first data source;
obtaining the source data from the first data source based on the monitored data identification in the binlog message;
the second data acquisition module is further configured to:
determining a data extraction task corresponding to the first data source based on the monitored data identification in the binlog message;
and acquiring target data from the second data source based on the data identification and the data extraction task.
In a specific embodiment, the first data source and the second data source are data sources of the same type, and the first data obtaining module is further configured to:
and obtaining a plurality of pieces of source data from the first data source in batches.
In a specific embodiment, the first data source is a data file, and the first data obtaining module is further configured to:
reading line data from the data file;
and packaging the read line data, and taking the packaged line data as the source data.
In one embodiment, the apparatus further comprises:
the file fragmentation module is used for fragmenting the data file if the size of the data file is larger than a preset threshold value;
the first data acquisition module is further configured to:
and reading the data of each fragment of the data file through the fragment granularity.
In one embodiment, the apparatus further comprises:
the data aggregation module is used for performing aggregation processing on the source data and the target data to generate line data, and the line data comprises the source data and the target data;
the data comparison module comprises:
the stream type comparison unit is used for performing stream type data comparison on the source data in the line data and the field to be compared in the target data;
and the result determining unit is used for determining whether the fields to be compared in the source data and the target data are consistent or not based on the comparison result.
In a specific embodiment, the data aggregation module is specifically configured to:
acquiring a pre-configured data aggregation template corresponding to the first data source and the second data source;
and performing aggregation processing on the source data and the target data based on the data aggregation template to generate line data.
In a third aspect of the embodiments of the present invention, there is provided an electronic device, including: a memory, a processor; wherein the content of the first and second substances,
a memory for storing the processor-executable instructions;
the processor is configured to implement the data consistency determination method of the first aspect.
In a fourth aspect of the embodiments of the present invention, a computer-readable storage medium is provided, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the computer-readable storage medium is configured to implement the data consistency determination method according to the first aspect.
According to the data consistency determining method, the data consistency determining device, the data consistency determining equipment and the data consistency determining storage medium, on one hand, the field mapping relation between the fields of the two data sources needing consistency comparison is configured in advance, so that the data consistency comparison between the data sources with different structures can be supported; on the other hand, according to the field mapping relationship, consistency comparison is performed on the data to be compared acquired from the two data sources, so that whether the service data under different modules under the same system architecture are consistent or not can be determined, and the consistency of the data under the same system architecture can be further ensured.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a schematic block diagram of an application scenario of a data consistency determination method of some embodiments of the present invention;
FIG. 2 is a schematic flow chart diagram of a data consistency determination method according to some embodiments of the present invention;
FIG. 3 is a flowchart illustrating a data consistency determination method according to further embodiments of the present invention;
FIG. 4 is a schematic flow chart diagram of a data consistency determination method according to another embodiment of the present invention;
FIG. 5 is a schematic flow chart diagram of a data consistency determination method according to further embodiments of the present invention;
FIG. 6 is a flowchart illustrating a data consistency determination method according to still other embodiments of the present invention
FIG. 7 is a schematic flow chart of a data alignment process provided in some embodiments of the present invention;
FIG. 8 is a schematic block diagram of a first embodiment of a data consistency determination apparatus provided in the present invention;
FIG. 9 is a schematic block diagram of a data alignment module provided in some embodiments of the present invention;
FIG. 10 is a schematic block diagram of a second embodiment of a data consistency determination apparatus provided in the present invention;
FIG. 11 is a schematic block diagram of a third embodiment of a data consistency determination apparatus provided in the present invention;
FIG. 12 is a schematic block diagram of a fourth embodiment of a data consistency determination apparatus provided in the present invention;
FIG. 13 is a schematic block diagram of a fifth embodiment of a data consistency determining apparatus provided in the present invention;
FIG. 14 is a schematic block diagram of a fifth embodiment of a data consistency determining apparatus according to the present invention
Fig. 15 is a schematic block diagram of an electronic device provided by some embodiments of the invention.
With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terms to which the present invention relates will be explained first:
micro-service: the microservice is a way of developing a single Application by using a set of servlets, is constructed based on business capability, is different from a system, can be independently deployed, runs in a process of the microservice, and communicates by using a lightweight mechanism, such as an Application Programming Interface (API) of a HyperText Transfer Protocol (HTTP).
Binary log (binary log, binlog): is a binary log that records TABLE structure changes (e.g., create TABLE, ALTER TABLE …) and TABLE data modifications (insert, update, delete …) of data TABLEs in a database.
A data source: refers to a database, a data table, a file, or the like, which stores active data or target data.
And (3) streaming data comparison: neglecting the existing form of the data, converting the original data into a strip of data for processing, constructing a single composite data by the single data, namely the composite data containing the source data and the target data, such as row data, and then completing the verification of the data in the composite data, such as the row data.
Batch data comparison: the process of checking the mass data is completed in a mode of periodically executing by a dispatching center at regular time or manually triggering at a single time, for example, yesterday deposit data verification is executed by crossing a service library, a standing book library and an asset library at 3 am every day.
Currently, in a micro service system architecture, a micro service is adopted to deploy each service system in a distributed deployment mode of functional modules. Therefore, in the micro service system architecture, one service may generate a plurality of pieces of data to be processed distributed dispersedly on the structure, that is, a plurality of pieces of data to be processed of the same service are distributed in a plurality of micro service modules. However, the inventors found that: data processed independently by each micro service module in the micro service system architecture can change, and data inconsistency among the micro service modules in the micro service system architecture is caused.
In order to solve the problem of data inconsistency among modules in the same system architecture, in one technical scheme, consistency comparison is performed on data among modules with the same table structure under the system architecture. However, in this technical solution, consistency comparison of data of different structures having different table structures cannot be supported.
Based on the above, the basic idea of the invention is: and configuring a field mapping relation between fields of the two data sources needing consistency comparison in advance, and performing consistency comparison on the data to be compared acquired from the two data sources according to the field mapping relation. According to the technical scheme of the embodiment of the invention, the field mapping relation between the fields of the two data sources needing consistency comparison is configured in advance, so that the consistency comparison of data between the data sources with different structures can be supported; according to the field mapping relation, consistency comparison is carried out on the data to be compared acquired from the two data sources, so that whether the service data under different modules under the same system architecture are consistent or not can be determined, and the consistency of the data under the same system architecture can be further ensured.
Fig. 1 is a schematic block diagram of an application scenario of a data consistency determination method according to some embodiments of the present invention. Referring to fig. 1, the application scenario includes a data consistency determining apparatus 110, a micro service module 120, a micro service module 130, and a configuration library 140, where the data consistency determining apparatus 110 includes a data obtaining module 112 and a data comparing module 114, and the data obtaining module 112 is configured to obtain source data from a first data source 122 and obtain target data from a second data source 132; the data comparison module 114 is configured to perform consistency comparison on the to-be-compared fields in the source data and the target data acquired by the data acquisition module 112; the microservice module 120 includes a first data source 122, the microservice module 130 includes a second data source 132, and the first data source 122 and the second data source 132 may be databases or data files.
Further, the configuration library 140 stores a field mapping relationship file between the data in the pre-configured micro service module 120 and the data in the micro service module 130. The data comparison module 114 is configured to perform consistency comparison on the fields to be compared between the source data and the target data according to the field mapping relationship obtained in the configuration library 140.
It should be noted that the data consistency determining apparatus 110 may be a desktop computer or a laptop computer, or may be other suitable general-purpose computing devices such as a notebook computer or a cloud computing device, which is not limited in this respect.
Further, the first data source and the second data source may be data sources in a microservice system, or may be data sources in other suitable business systems, such as a distributed business system, which also falls within the protection scope of the present invention.
A data consistency determination method according to an exemplary embodiment of the present invention is described below with reference to the accompanying drawings in conjunction with an application scenario of fig. 1. It should be noted that the above application scenarios are merely illustrative for the convenience of understanding the spirit and principles of the present invention, and the embodiments of the present invention are not limited in this respect. Rather, embodiments of the present invention may be applied to any scenario where applicable.
Fig. 2 is a flow chart illustrating a data consistency determination method according to some embodiments of the present invention. The data consistency determination method can be applied to the data consistency determination apparatus 110 in fig. 1, and the data consistency determination method in the example embodiment is described in detail below with reference to the drawings by taking the micro service system as a commodity transaction system as an example.
Referring to FIG. 2, in step S210, source data is obtained from a first data source.
In an example embodiment, the first data source may be a data source in the microservice module a in the microservice system, and the first data source may be a database or a data file. If the first data source is a database, the data change in the first data source can be monitored, and if the data change in the first data source is monitored, the changed data is acquired from the first data source and serves as source data. For example, if the micro service system is a commodity transaction system, the micro service module a may be an order generation module, the first data source is an order database corresponding to the order generation module a, and the source data is a new order data.
Further, in an example embodiment, data may be read in batches from a database or data file, with the read data as source data. For example, data may be read from the first data source in a single batch or a timed batch, the single batch read being performed only once, the timed batch read performing the batch read in a timed task mechanism.
In step S220, target data is obtained from the second data source based on the data identifier of the source data, and the target data and the source data have the same data identifier.
In an example embodiment, the second data source may be a data source in another microservice module B in the microservice system, and the second data source may be a database or a data file. Further, the data identification of the source data may be a unique identification of the source data, such as a primary key. And if the second data source and the second data source are databases, acquiring target data from the second data source according to the unique identifier of the source data, such as a primary key, wherein the target data and the source data have the same data identifier. For example, let the micro-service module B be a banking module, the second data source be a receipt and payment detail database, the target data and the source data have the same transaction bank card number, and the transaction bank card number is a data identifier of the source data.
It should be noted that the database may include: a MySQL database, Oracle database, or other suitable database such as an HBase database. The data file may include: excel files, txt files, or other suitable data files such as sequence files, or Avro files.
In step S230, it is determined whether the fields to be compared in the source data and the target data are consistent according to the field mapping relationship between the pre-configured source data and the target data.
In an example embodiment, a field mapping relationship between fields to be compared in a first data source and a second data source is preconfigured, values of the fields to be compared in source data and target data are compared according to the preconfigured field mapping relationship, and whether the fields to be compared in the source data and the target data are consistent is determined. For example, a field mapping relationship between fields to be compared in a first data source and a second data source configured in advance may be stored in a database, after source data and target data are obtained, the field mapping relationship configured in advance is obtained from the database according to identification information of the first data source and identification information of the second data source, values of the fields to be compared in the source data and the target data are compared according to a predetermined comparison rule based on the field mapping relationship, and if the values of the fields to be compared in the source data and the target data are consistent, it is determined that the fields to be compared in the source data and the target data are consistent; if the values of the fields to be compared of the source data and the target data are inconsistent, alarm information can be sent out.
It should be noted that the predetermined alignment rule may include an alignment rule equal to, greater than, or less than, and the alignment rule may be different for different fields in the source data and the target data. For example, the source data includes a field a and a field c, the target data includes a field b and a field d, and the predetermined comparison rule may be: and if the field a is the field b and the field c is the field d, the source data is judged to be consistent with the target data.
According to the data consistency determination method in the example embodiment of fig. 2, on one hand, since the field mapping relationship between the fields of the two data sources that need consistency comparison is configured in advance, the data consistency comparison between the data sources of different structures can be supported; on the other hand, according to the field mapping relationship, consistency comparison is performed on the data to be compared acquired from the two data sources, so that whether the service data under different modules under the same system architecture are consistent or not can be determined, and the consistency of the data under the same system architecture can be further ensured.
Further, for a data source that changes with time, since the source data may change dynamically, after a comparison failure, the data comparison may be performed again within a predetermined time range or within a predetermined threshold number of times, and therefore, in an example embodiment, if the values of the fields to be compared in the source data and the target data are not consistent, it is determined whether a delayed retry condition is met, where the delayed retry condition includes that the retry time is within the predetermined time range or the retry number is within the predetermined threshold number of times; if the delay retry condition is met, re-acquiring the source data from the first data source, and re-acquiring the target data from the second data source; and determining whether the values of the fields to be compared in the newly acquired source data and the target data are consistent or not based on the field mapping relation.
Fig. 3 is a flowchart illustrating a data consistency determination method according to still other embodiments of the present invention.
Referring to fig. 3, in step S310, a binary log binlog message of a first data source is listened to;
in an example embodiment, the first data source is an order database corresponding to the order generation module a. When a new commodity transaction is generated in the commodity transaction system, new order data is generated, the binlog of the order database changes, and binlog change information is generated. The first data source is snooped, and after the first data source generates the binlog message, the binlog message of the first data source can be snooped.
In step S320, source data is obtained from a first data source and target data is obtained from a second data source based on the data identifications in the monitored binlog message.
In an example embodiment, a data identification of source data is extracted from a binlog message, the data identification may be a data primary key and/or a sharding key (a sharding key is available for sharding in a sharding table), the source data is obtained from a first data source based on the data identification, and the target data is obtained from a second data source, the second data source may be a database or a data file. For example, after receiving the monitored binlog message, determining a data source such as a database or a data table to which the binlog belongs, determining corresponding data extraction tasks such as task 1, task 2 and task 3 according to data aggregation configuration information configured in advance corresponding to the data source, wherein task 1, task 2 and task 3 are tasks for extracting data from micro service module 1, micro service module 2 and micro service module 3 in the micro service system respectively, and extracting data from the corresponding micro service module according to the data extraction tasks and the data identifier.
Further, in order to improve data processing efficiency, in an example embodiment, the source data and the target data are aggregated to generate row data, where the row data includes the source data and the target data, a source data column is marked with an src _ prefix, and a target data column is marked with a tgt _ prefix. Specifically, a data aggregation template which is configured in advance and corresponds to a first data source and a second data source is obtained; and performing aggregation processing on the source data and the target data based on the data aggregation template to generate line data.
By aggregating the source data and the target data into row data, the batch data can be compared and decomposed into line-by-line streaming data for processing, the task complexity is reduced, and the data processing efficiency can be improved.
In step S330, it is determined whether the fields to be compared in the source data and the target data are consistent.
In an example embodiment, streaming data comparison is performed on to-be-compared fields in source data and target data in row data, for example, Flink may be used as a processing platform for processing streaming data, and streaming data comparison is performed on to-be-compared fields in source data and target data in row data; and determining whether the fields to be compared in the source data and the target data are consistent or not based on the comparison result. Specifically, after the generated row data is received, values of all fields to be compared related in the source data and the target data are matched and compared one by one, if all the fields to be compared are verified successfully, a comparison success message is generated, and the comparison success message is put into a comparison success queue; and if the fields to be compared are inconsistent, generating a delayed retry message, and putting the delayed retry message into a retry message queue.
The first data source is monitored in a binlog monitoring mode, data change of the first data source can be found in real time, and source data and target data which change are compared in real time, so that the consistency of the service data of the micro-service system can be guaranteed within a certain time range.
Further, it may be determined whether the delayed retry message meets a delayed retry condition, the delayed retry condition including a retry time within a predetermined time range or a retry number within a predetermined number threshold; if the delay retry condition is met, re-acquiring the source data from the first data source, and re-acquiring the target data from the second data source; and determining whether the values of the fields to be compared in the newly acquired source data and the target data are consistent or not based on the field mapping relation. And if the condition is not met, generating alarm information.
Fig. 4 is a flowchart illustrating a data consistency determination method according to another embodiment of the present invention.
Referring to FIG. 4, in step S410, a batch data read task is triggered.
In an example embodiment, a data read task may be provided that reads data from the first data source in a single batch or a timed batch, the single batch read being performed only once, the timed batch read performing a batch read in a timed task mechanism. For example, a data reading task for acquiring data from a first data source in batches can be triggered regularly by a scheduling center of the microservice system in a cron timing scheduling mode.
In step S420, in response to the batch data reading task, a plurality of pieces of source data are obtained in batch from a first data source, and corresponding target data are obtained from a second data source, where the first data source and the second data source are data sources of the same type.
In an example embodiment, the same type of data source may include a MySQL single-library single-table data source, an Elasticsearch database, an HBase database table, or a MongoDB database, and may also include other suitable data sources, such as a sub-library sub-table data source in a MySQL database, where each sub-library sub-table of the MySQL database is configured separately in order to ensure that full-scale polling can be performed conveniently. Further, in order to improve the data query efficiency of the large table, the maximum value of the main key of each batch data reading task is recorded in a cursor mode, and the data is queried and read in a paging mode in a sequential increasing mode. And after the plurality of pieces of source data are acquired, acquiring corresponding target data from a second data source according to the data identification of the source data.
Further, in order to improve data processing efficiency, in an example embodiment, the source data and the target data are aggregated to generate row data, where the row data includes the source data and the target data, a source data column is marked with an src _ prefix, and a target data column is marked with a tgt _ prefix. Specifically, a data aggregation template which is configured in advance and corresponds to a first data source and a second data source is obtained; and performing aggregation processing on the source data and the target data based on the data aggregation template to generate line data.
By aggregating the source data and the target data into row data, the batch data can be compared and decomposed into the row-by-row streaming data for processing, the task complexity is reduced, the hardware resource consumption is reduced, and the data processing efficiency can be improved.
In step S430, it is determined whether the to-be-compared fields in the source data and the target data are consistent.
Since the implementation principle and effect of step S430 are substantially the same as those of step S330 or step S230, no further description is provided herein.
Fig. 5 is a flowchart illustrating a data consistency determination method according to yet another embodiment of the present invention.
Referring to fig. 5, in step S510, source data is acquired from a first data source, which is a data file.
In an example embodiment, line data is read from a data file; and packaging the read line data, and taking the packaged line data as source data. Further, if the size of the data file is larger than a preset threshold value, the data file is fragmented; and reading the row data of each fragment of the data file concurrently through the fragment granularity.
Further, in an example embodiment, the source data is encapsulated, e.g., into a binlog-like message format, and the encapsulated data is sent to the next processing module in the form of a message.
In step S520, target data is obtained from the second data source based on the data identification of the source data.
In an example embodiment, the second data source is a database, and the target data is directly obtained from the second database based on the data identification of the received source data. Since the data in the data file will generally not change in a short period of time, there is no need to look back at the source data from the first data source.
In step S530, it is determined whether the to-be-compared fields in the source data and the target data are consistent.
Since the implementation principle and effect of step S530 are substantially the same as those of step S330 or step S230, no further description is given here.
According to the data consistency determination method in the example embodiment of fig. 5, consistency comparison between a file and data in a database can be supported, comparison after fragmentation of a large file is supported, and comparison after fragmentation is performed in parallel can further improve data comparison efficiency.
Fig. 6 is a flowchart illustrating a data consistency determination method according to another embodiment of the present invention. Referring to fig. 6, in step S610, data input is performed.
In an example embodiment, data entry may include the following: (1) real-time streaming data input, which is suitable for comparison or verification of database and database dynamic data; (2) dynamic batch data input, which is suitable for comparing or checking the database with the database; or (3) fixed batch data input, which is suitable for comparison or verification of fixed data such as files and the like with a database.
The several input modes are described separately with reference to the drawings.
In step S612, real-time streaming data is input.
In this step, a data change in the data source causes a binlog change in the data source, the binlog change triggering the generation of a binlog message,
in step S614, dynamic batch data is entered.
In the step, the batch to-be-executed tasks are divided into a single batch comparison and a timing batch comparison. The single batch comparison is only executed once; and the timed batch comparison is managed by a timed task mechanism, and the scheduling center triggers and executes dynamic batch data reading through cron timed scheduling.
In step S616, a fixed batch data input is performed,
in this step, line data is read from the data file; and packaging the read line data, and taking the packaged line data as source data. Further, if the size of the data file is larger than a preset threshold value, the data file is fragmented; and reading the row data of each fragment of the data file concurrently through the fragment granularity.
In step S620, the source data and the target data are encapsulated to generate a corresponding encapsulated message.
In this step, step S622 is executed for real-time streaming data input, and in step S622, reading in raw data is mainly performed by monitoring a binlog of a data source, and if a row of data of the data source changes, using a data identifier of the data in the current row, such as a main key and a split key (for example, a split key is used in a database and a spreadsheet), as a data unique check basis.
Step S624 is performed for (2) dynamic batch data input or (3) fixed batch data input. In step S624, a row of data in the read database or file is encapsulated, and a virtual data message is generated.
In addition, in the exemplary embodiment, in this step, a delayed retry message is further received, the delayed retry message after the comparison fails is mainly snooped when the data read is delayed, and after the delayed retry message is snooped, the data identifier in the delayed retry message, for example, the primary key and the split key (for example, when the database is split into tables) are extracted to construct an aggregation task.
In step S630, data aggregation processing is performed. This step may include steps S632 to S638.
In step S632, after the binlog message or the virtual data message is read in, the binlog message or the virtual data message is parsed, the data aggregation configuration associated with the data source corresponding to the binlog message or the virtual data message is determined, and a corresponding aggregation task is generated.
In step S634, the aggregation task corresponding to the binlog message is placed in the real-time task queue, and the aggregation task corresponding to the virtual data message is placed in the offline task queue. For example, Kafka may be used as a real-time task queue for alignment.
In step S436, a corresponding execution plan is determined according to the data aggregation configuration, such as: executing plans 1, 2 and 3, executing tasks 1, 2 and 3 respectively corresponding to tasks for querying data from the data source 1, the data source 2 and the data source 3, and extracting data from the data source associated with the source data according to the executing plans and the data identification of the source data.
In step S638, the extracted data is encapsulated by the template matched with the data source, a row of data including source data and target data is generated, wherein the source data column is marked with src _ prefix and the target data column is marked with tgt _ prefix, and then the assembled composite data is sent to the data comparison module for processing.
In step S640, the values of the fields to be compared in the source data and the target data are compared according to the pre-configured field mapping relationship, and it is determined whether the fields to be compared in the source data and the target data are consistent.
According to the technical solution in the example embodiment of fig. 6, on the one hand, data comparison between various data carriers can be performed, and data comparison between files and databases and between databases is supported; on the other hand, the extracted source data and the target data are packaged into row data, the row data are compared in a streaming mode, a comparison task with high requirements on hardware resources can be converted into row data with low requirements on the hardware resources, and therefore hardware resource consumption can be reduced.
Fig. 7 is a flowchart illustrating the data matching process in step S640 in fig. 6. Referring to fig. 7, in step S710, after the generated line data is received, the field mapping relationship corresponding to the line data is determined. For example, the corresponding field mapping relationship is determined according to the data sources to which the source data and the target data belong in the line data.
In step S710, based on the field mapping relationship, corresponding source data and to-be-compared fields of the target data are extracted from the row data, and compared according to a predetermined comparison rule. It should be noted that the predetermined alignment rule may include an alignment rule equal to, greater than, or less than, and the alignment rule may be different for different fields in the source data and the target data. For example, the source data includes a field a and a field c, the target data includes a field b and a field d, and the predetermined comparison rule may be: and if the field a is the field b and the field c is the field d, the source data is judged to be consistent with the target data.
In step S720, if the comparison is successful, go to step S740; if the comparison is not successful, the process proceeds to step S725.
In step S725, it is determined whether a delayed retry condition is met, the delayed retry condition including a retry time within a predetermined time range or a retry number within a predetermined number threshold; if the delay retry condition is met, the process proceeds to step S730, and if the delay retry condition is not met, the process proceeds to step S740.
In step S730, a delayed retry message is generated and placed in a delay queue.
In step S735, the retry is automatically retransmitted, that is, the delayed retry message is automatically extracted from the delay queue, and the extracted delayed retry message is sent to the data aggregation processing module.
In step S740, the data comparison result is written into a database or a file.
In step S745, a data comparison success or data comparison failure notification message is generated.
In step S750, the generated data comparison result notification message is transmitted. For example, when the comparison fails, a large comparison failure communication message is sent to warn.
According to the technical solution in the example embodiment of fig. 7, on one hand, by configuring a plurality of comparison rules in advance, data comparison can be flexibly performed according to the data structure of the data source; on the other hand, by setting the delayed retry condition, the real-time data comparison can be carried out within the preset time threshold or the preset times threshold, and the condition that the data are inconsistent can be found in time.
Fig. 8 is a schematic block diagram of a first embodiment of a data consistency determination apparatus provided in the present invention. Referring to fig. 8, the data consistency determining apparatus 800 includes:
a first data acquisition module 810 for acquiring source data from a first data source;
a second data obtaining module 820, configured to obtain target data from the second data source based on the data identifier of the source data, where the target data and the source data have the same data identifier;
a data comparing module 830, configured to determine whether fields to be compared in the source data and the target data are consistent according to a preconfigured field mapping relationship between the source data and the target data.
FIG. 9 is a schematic block diagram of a data alignment module provided in some embodiments of the invention. Referring to fig. 9, the data alignment module 830 includes:
a mapping relationship determining unit 910, configured to obtain, according to the first data source and the second data source, the field mapping relationship between the source data and the target data from a plurality of field mapping relationships configured in advance;
a comparing unit 920, configured to compare values of fields to be compared in the source data and the target data based on the field mapping relationship;
a consistency determining unit 930, configured to determine that fields to be compared in the source data and the target data are consistent if the values of the fields to be compared in the source data and the target data are consistent.
Fig. 10 is a schematic block diagram of a second embodiment of the data consistency determination apparatus according to the present invention. Referring to fig. 10, the apparatus 800 further includes:
a retry determining module 1010, configured to determine whether a delay retry condition is met if the values of the fields to be compared in the source data and the target data are inconsistent, where the delay retry condition includes that a retry time is within a predetermined time range or a retry number is within a predetermined threshold;
a re-acquiring module 1020, configured to re-acquire the source data from the first data source and re-acquire the target data from the second data source if the delayed retry condition is met;
a re-comparison module 1030, configured to determine whether values of the fields to be compared in the source data and the target data that are re-acquired are consistent based on the field mapping relationship.
Fig. 11 is a schematic block diagram of a third embodiment of the data consistency determination apparatus according to the present invention. Referring to fig. 11, the apparatus 800 further includes:
a retry message generation module 1110 configured to generate a delayed retry message before re-acquiring the source data from the first data source, the delayed retry message including a data identification of the source data,
the reacquisition module 1020 is configured to:
if the delayed retry message is monitored, acquiring the data identifier from the delayed retry message;
based on the data identification, the source data is retrieved from the first data source and the target data is retrieved from the second data source.
Fig. 12 is a schematic block diagram of a fourth embodiment of the data consistency determination apparatus according to the present invention. Referring to fig. 12, the apparatus 800 further includes:
a queue generating module 1210, configured to put the delayed retry message into a retry message queue;
a source data determining module 1220, configured to determine whether the source data already exists in the retry message queue based on the data identifier if it is monitored that the source data has changed;
a data updating module 1230, configured to update the existing source data in the retry message queue with the changed source data if the source data already exists.
In a specific embodiment, the first data obtaining module 810 is further configured to:
monitoring a binary log binlog message of the first data source;
obtaining the source data from the first data source based on the monitored data identification in the binlog message;
the second data obtaining module 820 is further configured to:
determining a data extraction task corresponding to the first data source based on the monitored data identification in the binlog message;
and acquiring target data from the second data source based on the data identification and the data extraction task.
In a specific embodiment, the first data source and the second data source are data sources of the same type, and the first data obtaining module 810 is further configured to:
and obtaining a plurality of pieces of source data from the first data source in batches.
In a specific embodiment, the first data source is a data file, and the first data obtaining module 810 is further configured to:
reading line data from the data file;
and packaging the read line data, and taking the packaged line data as the source data.
Fig. 13 is a schematic block diagram of a fifth embodiment of the data consistency determining apparatus provided in the present invention, where the apparatus 800 further includes:
a file fragmenting module 1320, configured to fragment the data file if the size of the data file is greater than a predetermined threshold;
the first data acquisition module 810 is further configured to:
and reading the data of each fragment of the data file through the fragment granularity.
Fig. 14 is a schematic block diagram of a sixth embodiment of the data consistency determining apparatus according to the present invention, where the apparatus 800 further includes:
a data aggregation module 1410, configured to aggregate the source data and the target data to generate line data, where the line data includes the source data and the target data;
the data alignment module 840 is further configured to:
comparing the streaming data of the source data in the row data and the field to be compared in the target data;
and determining whether the fields to be compared in the source data and the target data are consistent or not based on the comparison result.
In a specific embodiment, the data aggregation module 1410 is specifically configured to:
acquiring a pre-configured data aggregation template corresponding to the first data source and the second data source;
and performing aggregation processing on the source data and the target data based on the data aggregation template to generate line data.
The data consistency determining device provided by the embodiment of the invention can realize each process in the method embodiment and achieve the same function and effect, and the process is not repeated here.
In addition, an embodiment of the present application further provides an electronic device, configured to execute the data consistency determination method described in the foregoing embodiment. Fig. 15 is a schematic block diagram of an electronic device provided by some embodiments of the invention. As shown in fig. 15, the electronic apparatus 1500 includes: at least one processor 1502, memory 1504, bus 1506, and communication interface 1508.
Wherein: the processor 1502, communication interface 1508, and memory 1504 communicate with one another via a communication bus 1506.
A communication interface 1508 for communicating with other devices.
The processor 1502, configured to execute the program 1510, may specifically perform relevant steps in the methods described in the above embodiments. For example, the processor 1502 may perform the following steps: obtaining source data from a first data source; acquiring target data from a second data source based on the data identification of the source data, wherein the target data and the source data have the same data identification; and determining whether the fields to be compared in the source data and the target data are consistent or not according to a field mapping relation between the source data and the target data which is configured in advance.
In particular, the program 1510 may include program code that includes computer operating instructions.
The processor 1502 may be a central processing unit, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention. The electronic device comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
The memory 1504 stores a program 1510. The memory 1504 may include high-speed RAM memory and may also include non-volatile memory, such as at least one disk memory.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer-readable storage medium may be Read-Only Memory (ROM), random-access Memory, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.