Data synchronization method, device, equipment and computer readable storage medium
1. A data synchronization method is applied to a data synchronization system, and the method comprises the following steps:
acquiring parameters required for acquiring data to be synchronized based on a pre-configured data source;
acquiring the data to be synchronized from a data source by using the parameters and a data acquirer in the data synchronization system, wherein the data acquirer is defined based on an acquirer interface of the data synchronization system;
and performing synchronization processing on each piece of data included in the data to be synchronized by using a data processor in the data synchronization system to obtain a synchronization processing result of each piece of data, wherein the data processor is defined based on a processor interface of the data synchronization system.
2. The method of claim 1, further comprising:
initializing the data synchronization system, and acquiring a data acquirer and a data processor in the data synchronization system;
initializing the data acquirer to enable the data acquirer to be associated with a corresponding data source;
and initializing the data processor to enable the data processor to be associated with a corresponding data source.
3. The method according to claim 1, wherein the obtaining parameters required for obtaining the data to be synchronized based on the pre-configured data source comprises:
determining the type of a data source based on a pre-configured data source, wherein the type of the data source comprises a database, a Redis cache and an ES;
when the type of the data source is a database or ES, acquiring a search field and a search space required for acquiring data to be synchronized;
the acquiring the data to be synchronized from a data source by using the parameter and a data acquirer in the data synchronization system includes:
based on the search field, performing ES search in the search space by using the data acquirer to obtain search data meeting the search field condition;
and determining the search data as data to be synchronized.
4. The method according to claim 3, wherein the obtaining parameters required for obtaining the data to be synchronized based on the pre-configured data source further comprises:
when the type of the data source is Redis cache, acquiring a search field, a matching rule and a search space which are required for acquiring data to be synchronized;
the acquiring the data to be synchronized from a data source by using the parameter and a data acquirer in the data synchronization system includes:
based on the search field, performing ES search in the search space by using the data acquirer to obtain search data meeting the search field condition;
fusing the matching rule with the search data to obtain a cache keyword;
and based on the cache keywords, performing ES search in the search space by using the data acquirer to obtain data to be synchronized.
5. The method of claim 1, further comprising:
generating log data of each piece of data based on a synchronization processing result of each piece of data;
and recording the log data of each piece of data into an ES log file.
6. The method according to claim 5, wherein the generating log data of each piece of data based on the synchronization processing result of each piece of data comprises:
generating a synchronous identifier of each piece of data based on a preset rule;
acquiring the state information of each piece of data based on the synchronous processing result of each piece of data;
and determining the data identifier, the synchronous processing result and the state information of each piece of data in a data source as the log data of each piece of data.
7. The method according to claim 6, wherein the synchronization processing result comprises a first identifier for indicating that the data synchronization processing is successful and a second identifier for indicating that the data synchronization processing is failed;
the method further comprises the following steps:
based on the synchronous identification and the second identification, ES searching is carried out in the ES log file to obtain a target data identification, and data synchronization processing corresponding to the target data identification fails;
performing secondary synchronization processing on the data corresponding to the target data identification by using the data processor to obtain a secondary synchronization processing result of each piece of data with failed synchronization processing;
generating log data of each piece of data failed in the synchronous processing based on a secondary synchronous processing result of each piece of data failed in the synchronous processing;
and recording the log data of each piece of data failed in the synchronization processing into an ES log file.
8. The method according to claim 1, wherein before the data processor in the data synchronization system is used to perform synchronization processing on each piece of data included in the data to be synchronized, and a synchronization processing result of each piece of data is obtained, the method further includes:
judging whether an operation instruction for starting synchronous processing is received or not;
when the operation instruction is received, determining a data processor corresponding to the parameter from a plurality of data processors in the data synchronization system.
9. A data synchronization apparatus, applied to a data synchronization system, the apparatus comprising:
the device comprises a first acquisition module, a second acquisition module and a synchronization module, wherein the first acquisition module is used for acquiring parameters required for acquiring data to be synchronized based on a pre-configured data source;
the second acquisition module is used for acquiring the data to be synchronized from a data source by using the parameters and a data acquirer in the data synchronization system, wherein the data acquirer is defined based on an acquirer interface of the data synchronization system;
and the synchronous processing module is used for carrying out synchronous processing on each piece of data included in the data to be synchronized by utilizing a data processor in the data synchronization system to obtain a synchronous processing result of each piece of data, and the data processor is defined based on a processor interface of the data synchronization system.
10. A data synchronization apparatus, comprising:
a processor; and
a memory for storing a computer program operable on the processor;
wherein the computer program realizes the steps of the method of any one of claims 1 to 8 when executed by a processor.
11. A computer-readable storage medium having stored thereon computer-executable instructions configured to perform the steps of the method of any one of claims 1 to 8.
Background
In the era of electronic information explosion, the expanding information is rapidly transmitted among all internet systems. Although today's systems maintain a high degree of data consistency as a fundamental requirement of the system, there are many factors that make this impractical, such as data loss due to network reasons, data desynchronization due to code logic reasons, etc.
The solutions for solving the data inconsistency in the related art mainly include: the scheme of re-triggering database change for data synchronization and the scheme of developing data brushing and deleting data tools for data synchronization exist in the two schemes, the pertinence of the data synchronization tools is too strong, the reusability is low, the development amount is large, and the development pressure and time cost of research and development are greatly increased.
Disclosure of Invention
In view of this, embodiments of the present application provide a data synchronization method, apparatus, device, and computer-readable storage medium.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides a data synchronization method, which is applied to a data synchronization system and comprises the following steps:
acquiring parameters required for acquiring data to be synchronized based on a pre-configured data source;
acquiring the data to be synchronized from a data source by using the parameters and a data acquirer in the data synchronization system, wherein the data acquirer is defined based on an acquirer interface of the data synchronization system;
and performing synchronization processing on each piece of data included in the data to be synchronized by using a data processor in the data synchronization system to obtain a synchronization processing result of each piece of data, wherein the data processor is defined based on a processor interface of the data synchronization system.
In some embodiments, the method further comprises:
initializing the data synchronization system, and acquiring a data acquirer and a data processor in the data synchronization system;
initializing the data acquirer to enable the data acquirer to be associated with a corresponding data source;
and initializing the data processor to enable the data processor to be associated with a corresponding data source.
In some embodiments, the obtaining, based on the preconfigured data source, parameters required for obtaining data to be synchronized includes:
determining the type of a data source based on a pre-configured data source, wherein the type of the data source comprises a database, a Redis cache and an ES;
when the type of the data source is a database or ES, acquiring a search field and a search space required for acquiring data to be synchronized;
the acquiring the data to be synchronized from a data source by using the parameter and a data acquirer in the data synchronization system includes:
based on the search field, performing ES search in the search space by using the data acquirer to obtain search data meeting the search field condition;
and determining the search data as data to be synchronized.
In some embodiments, the obtaining parameters required for obtaining the data to be synchronized based on the preconfigured data source further includes:
when the type of the data source is Redis cache, acquiring a search field, a matching rule and a search space which are required for acquiring data to be synchronized;
the acquiring the data to be synchronized from a data source by using the parameter and a data acquirer in the data synchronization system includes:
based on the search field, performing ES search in the search space by using the data acquirer to obtain search data meeting the search field condition;
fusing the matching rule with the search data to obtain a cache keyword;
and based on the cache keywords, performing ES search in the search space by using the data acquirer to obtain data to be synchronized.
In some embodiments, the method further comprises:
generating log data of each piece of data based on a synchronization processing result of each piece of data;
and recording the log data of each piece of data into an ES log file.
In some embodiments, the generating log data of each piece of data based on the synchronization processing result of each piece of data includes:
generating a synchronous identifier of each piece of data based on a preset rule;
acquiring the state information of each piece of data based on the synchronous processing result of each piece of data;
and determining the data identifier, the synchronous processing result and the state information of each piece of data in a data source as the log data of each piece of data.
In some embodiments, the synchronization processing result comprises a first identifier for representing that the data synchronization processing is successful and a second identifier for representing that the data synchronization processing is failed;
the method further comprises the following steps:
based on the synchronous identification and the second identification, ES searching is carried out in the ES log file to obtain a target data identification, and data synchronization processing corresponding to the target data identification fails;
performing secondary synchronization processing on the data corresponding to the target data identification by using the data processor to obtain a secondary synchronization processing result of each piece of data with failed synchronization processing;
generating log data of each piece of data failed in the synchronous processing based on a secondary synchronous processing result of each piece of data failed in the synchronous processing;
and recording the log data of each piece of data failed in the synchronization processing into an ES log file.
In some embodiments, before the performing, by using a data processor in the data synchronization system, synchronization processing on each piece of data included in the to-be-synchronized data to obtain a synchronization processing result of each piece of data, the method further includes:
judging whether an operation instruction for starting synchronous processing is received or not;
when the operation instruction is received, determining a data processor corresponding to the parameter from a plurality of data processors in the data synchronization system.
The embodiment of the application provides a data synchronization device, which is applied to a data synchronization system, and the device comprises:
the device comprises a first acquisition module, a second acquisition module and a synchronization module, wherein the first acquisition module is used for acquiring parameters required for acquiring data to be synchronized based on a pre-configured data source;
the second acquisition module is used for acquiring the data to be synchronized from a data source by using the parameters and a data acquirer in the data synchronization system, wherein the data acquirer is defined based on an acquirer interface of the data synchronization system;
and the synchronous processing module is used for carrying out synchronous processing on each piece of data included in the data to be synchronized by utilizing a data processor in the data synchronization system to obtain a synchronous processing result of each piece of data, and the data processor is defined based on a processor interface of the data synchronization system.
An embodiment of the present application provides a data synchronization apparatus, including:
a processor; and
a memory for storing a computer program operable on the processor;
wherein the computer program realizes the steps of the above data synchronization method when executed by a processor.
Embodiments of the present application provide a computer-readable storage medium storing computer-executable instructions configured to perform the steps of the data synchronization method.
The embodiment of the application provides a data synchronization method, a data synchronization device, data synchronization equipment and a computer readable storage medium, which are applied to a data synchronization system, wherein the method comprises the following steps: acquiring parameters required for acquiring data to be synchronized based on a pre-configured data source; acquiring the data to be synchronized from a data source by using the parameters and a data acquirer in the data synchronization system, wherein the data acquirer is defined based on an acquirer interface of the data synchronization system; and performing synchronization processing on each piece of data included in the data to be synchronized by using a data processor in the data synchronization system to obtain a synchronization processing result of each piece of data, wherein the data processor is defined based on a processor interface of the data synchronization system. Therefore, when a new data source is added or new data acquisition logic and processing logic are needed, only the corresponding acquirer class is added based on the acquirer interface of the data synchronization system and the corresponding processor class is added based on the processor interface of the data synchronization system, so that data synchronization is performed according to the defined data acquirer and the defined data processor, high reusability of the data synchronization system is realized, code development amount can be reduced, and development cost can be reduced.
Drawings
In the drawings, which are not necessarily drawn to scale, like reference numerals may describe similar components in different views. The drawings illustrate generally, by way of example, but not by way of limitation, various embodiments discussed herein.
Fig. 1 is a schematic flow chart of an implementation of a data synchronization method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of another implementation of a data synchronization method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of another implementation of the data synchronization method according to the embodiment of the present application;
fig. 4 is a schematic overall structure diagram of a data synchronization system according to an embodiment of the present application;
fig. 5 is a schematic flow chart of an implementation of a data synchronization method according to an embodiment of the present application;
FIG. 6 is a schematic diagram of an implementation flow for performing cleanup and record operations;
FIG. 7 is a flow chart illustrating an implementation of a cleaning operation being performed again;
FIG. 8 is a schematic diagram of an implementation flow for performing the seek and record operation again;
fig. 9 is a schematic structural diagram of a data synchronization apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a data synchronization apparatus according to an embodiment of the present application.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.
1) Remote Dictionary service (Redis), an open source, in-memory stored data structure Server, may be used as a database, cache, and message queue proxy. It supports data types such as strings, hash tables, lists, collections, ordered collections, bitmaps, HyperLogLog, and the like.
2) Elastic Search (ES), a distributed, highly extended, highly real-time search and data analysis engine. It can conveniently make a large amount of data have the capability of searching, analyzing and exploring.
The embodiment of the application provides a data synchronization method applied to a data synchronization system. The method provided by the embodiment of the present application can be implemented by a computer program, and when the computer program is executed, each step in the data synchronization method provided by the embodiment of the present application is completed. In some embodiments, the computer program may be executed by a processor in a data synchronization device. Fig. 1 is a schematic flow chart of an implementation of a data synchronization method provided in an embodiment of the present application, and as shown in fig. 1, the data synchronization method includes the following steps:
step S101, acquiring parameters required for acquiring data to be synchronized based on a pre-configured data source.
In the embodiment of the application, the data synchronization system may include a data acquirer, a data processor, a custom parameter Map, a data source, and an ES log file. Parameters required for data synchronization are generally different for different data sources. The parameters required for acquiring the data to be synchronized can be acquired by the custom parameters Map in the data synchronization system. The data synchronization system may be an application system on a device such as a computer or a mobile terminal, where the parameter is usually input by a user, for example, the data synchronization system is a mobile terminal, the user inputs parameters "value 1" and "slave" on the data synchronization system of the mobile terminal, and the parameter obtained by the custom parameter Map may be represented as:
wherein, the "search field 1" and the "cacheSource" are attributes, which refer to the search field and the search space, respectively, and the "value 1" and the "slave" are attribute values.
In one implementation, the attribute value of the search field may be input into an edit box corresponding to each search field attribute by the user through a keyboard, where there may be a plurality of search fields, for example, the user inputs "value 1" in the edit box of the attribute "search field 1" and "value 2" in the edit box of the attribute "search field 2". In one implementation, the attribute values of the search space may be selectively input by the user based on a preset candidate box, for example, the preset search space includes three candidate attribute values, i.e., a primary cluster master, a secondary cluster slave, and a secondary cluster slave1, where the number of secondary clusters may be multiple.
The preconfigured data source may be data of a database, a Redis cache, an ES, or other storage structure. When the data source is a database or an ES, the acquired parameters may include a search field and a search space; when the data source is a Redis cache, the obtained parameters may further include a "match" attribute in addition to the search field and the search space, where the attribute characterizes data whose prefix is the attribute value of the attribute in batch, for example, the attribute value of the attribute "match" is "prodBM _", and characterizes data whose prefix is all prodBM _inbatch.
And step S102, acquiring data to be synchronized from a data source by using a parameter and data acquirer in the data synchronization system.
The data fetcher is defined based on a fetcher interface of the data synchronization system. When defining the data acquirer, the model of the data synchronization system does not need to be repeatedly developed, and only the acquirer interface of the data synchronization system needs to be inherited and a new implementation class is added, namely the data acquirer is the implementation class used for acquiring correct data to be synchronized in the data synchronization system. According to the implementation-type inheritance acquirer interface, when a new data source needs to be configured or a new data acquisition mode needs to be added, a corresponding data acquirer is defined based on the newly configured data source or the newly added acquisition mode, logic codes of the inheritance acquirer interface are added in a data synchronization system, the data acquirer is initialized, the data acquirer is associated with the corresponding data source, and the data acquirer is assembled into a Map during initialization. The data to be synchronized is acquired from the data source by the data acquirer, so that the capabilities of configurably selecting the data source and configurably acquiring the data to be synchronized can be realized, and the model of the data synchronization system has higher reusability.
When the data acquirer acquires the data to be synchronized, acquiring the data conforming to the acquisition logic from the data source as the data to be synchronized according to the acquisition logic defined by the data acquirer and the parameters acquired in step S101. For example, the attribute values of the attribute "search field 1" and "cacheSource" are "value 1" and "slave", respectively, and a data acquirer is used to perform keyword search of "value 1" from a "slave" search space in an associated data source, and the searched data is data to be synchronized. Here, the data to be synchronized may be acquired by ES search.
Step S103, each piece of data included in the data to be synchronized is synchronized by using a data processor in the data synchronization system, and a synchronization processing result of each piece of data is obtained.
The data processor is defined based on a processor interface of the data synchronization system. When defining the data processor, the model of the data synchronization system does not need to be repeatedly developed, and only the processor interface of the data synchronization system needs to be inherited and a new implementation class is added, namely, the data processor is the implementation class used for processing data in the data synchronization system and implementing data synchronization. The method comprises the steps that each implementation class of a data processor inherits a data processor interface, when a new data synchronization processing mode needs to be added, a corresponding data processor is defined, logic codes of an integrated processor interface are added in a data synchronization system, the data processor is initialized, the data processor is enabled to be associated with a corresponding data source, and the data processor is assembled into a Map in initialization. Each data source may correspond to one data acquirer and one data processor, and the data processing logic for each data source is implemented in each data processor. When the data processor is used for carrying out synchronous processing on data to be synchronized, the capacity of processing synchronous information in a configurable mode can be achieved, and the model of the data synchronization system has high reusability.
In some embodiments, before data synchronization, it is determined whether a data synchronization system is initialized, and when it is determined that the data synchronization system is not initialized, the data synchronization system is initialized first, in the initialization process, implementation classes corresponding to all data acquirers and implementation classes corresponding to all data processors included in the data synchronization system are acquired through a reflection mechanism, each data acquirer and each data processor are initialized and assembled into Map, so that the data acquirers and the data processors are matched with the corresponding data processors in subsequent entries of the data processors, and data of each data source is synchronized according to data processing logic defined by the corresponding data processor. When initializing the data acquirers, associating each acquirer with a data source, injecting the data source during initialization, and taking a commodity as an example, associating: commodity basic information library-table, commodity special attribute library-table, commodity picture library-table, Redis cache and ElasticSearch index-type. And the query information enumeration in the access parameters is converted, so that the aim of obtaining the requirement meeting the required query information from the corresponding data source is fulfilled.
It should be noted that, in the embodiment of the present application, when defining a data acquirer based on an acquirer interface and defining a data processor based on a processor interface, both the data acquirer and the data processor can support a user to define logic codes, and when a user needs to perform data synchronization of other data sources, or acquire data to be synchronized with other acquisition logic, or perform synchronization processing on data to be synchronized with other processing logic, both the data synchronization system can perform customized configuration, so that the data synchronization system can provide a capability of configuring data source selection, a capability of configuring synchronization information acquisition, and a capability of configuring synchronization information.
The data synchronization method provided by the embodiment of the application is applied to a data synchronization system, and comprises the following steps: acquiring parameters required for acquiring data to be synchronized based on a pre-configured data source; acquiring data to be synchronized from a data source by using a data acquirer in a parameter and data synchronization system, wherein the data acquirer is defined based on an acquirer interface of the data synchronization system; the data synchronization method comprises the steps that each piece of data included in data to be synchronized is subjected to synchronous processing by a data processor in a data synchronization system, and a synchronous processing result of each piece of data is obtained.
Parameters required for acquiring data to be synchronized are generally different for different data sources, and common types of data sources include a database, a Redis cache, and an ES.
In some embodiments, when the type of data source is a database or ES, the parameters required to obtain the data to be synchronized may include a search field and a search space. At this time, the step S102 "acquiring data to be synchronized from a data source by using a data acquirer in the parameter and data synchronization system" in the embodiment shown in fig. 1 can be implemented by the following steps:
step S102a1, based on the search field, performs ES search in the search space by using the data acquirer, to obtain search data that meets the search field condition.
In the actual search, because the performance of the redis. In the embodiment of the application, when the data volume to be searched in the search space is less than the preset data volume threshold, the data to be synchronized in the search space is considered to be less, and the data to be synchronized can be obtained by searching through redis. And when the data volume to be searched in the search space is not less than the preset data volume threshold, considering that the data to be synchronized in the search space is more, and performing ES search in the search space by using the data acquirer. For example, search data that meets the condition of "search field 1", that is, search data having an attribute value of "value 1" of the search field is retrieved through the ES.
Step S102a2, the search data is determined as the data to be synchronized.
When the data source is a database or an ES, the database and the ES are accurately matched in a searching and matching mode, and therefore the searched data obtained according to the searching field is the data to be synchronized.
In some embodiments, when the type of the data source is a Redis cache, the parameters required to obtain the data to be synchronized may include a search field, a matching rule, and a search space. At this time, the step S102 "acquiring data to be synchronized from a data source by using a data acquirer in the parameter and data synchronization system" in the embodiment shown in fig. 1 can be implemented by the following steps:
step S102b1, based on the search field, using the data acquirer to perform ES search in the search space, and obtaining search data meeting the search field condition.
Similar to the step S102a1, in this embodiment of the application, when the data amount to be searched in the search space is less than the preset data amount threshold, data to be synchronized may be obtained by performing search through redis. And when the data volume to be searched in the search space is not less than the preset data volume threshold, performing ES search in the search space by using the data acquirer. For example, search data satisfying the "search field 1" condition, that is, search data having an attribute value of "value 1" in the search field, is retrieved by the ES, and the attribute value of "search field 1" corresponding to data having id 01,03, and 06 in the search result is "value 1", and the search result can be expressed as {01,03,06 }.
And step S102b2, fusing the matching rule with the search data to obtain cache keywords.
For example, the matching rule "match" may be data with a matching prefix as an attribute value, may also be data with a matching suffix as an attribute value, and may also be data with a prefix as an attribute value, for example, the matching prefix is "prod BM _", and may determine data with all prefixes being prodBM _inbatches.
The matching rule and the search data are fused, which may be splicing, for example, splicing "prodBM _" and {01,03,06}, and the obtained cache key words are prodBM _01, prodBM _03, and prodBM _ 06.
Step S102b3, based on the cache keyword, using the data acquirer to perform ES search in the search space, so as to obtain the data to be synchronized.
And according to the cache keywords prodBM _01, prodBM _03 and prodBM _06, performing ES search in a search space, wherein the searched data is the data to be synchronized.
In some embodiments, when the data volume is large, the batch threshold value can be set through a configuration management center included in the data synchronization system, data which does not exceed the batch threshold value is processed in batches each time, the system is prevented from being abnormal due to excessive processing data, and the situation that processing efficiency is low due to excessive processing data in a single time is prevented.
According to the method provided by the embodiment of the application, when the data source is the Redis cache, as the search matching mode of the Redis cache is fuzzy matching, a parameter of a matching rule match needs to be acquired to be fused with the search data to obtain the cache keyword, and the search data acquired according to the cache keyword is determined as the data to be synchronized.
In some embodiments, based on the embodiment shown in fig. 1, the data synchronization method may further include the following steps:
in step S104, log data of each piece of data is generated based on the result of the synchronization processing of each piece of data.
In this embodiment of the present application, generating log data of each piece of data may be implemented as: generating a synchronous identifier of each piece of data based on a preset rule; acquiring the state information of each piece of data based on the synchronous processing result of each piece of data; and determining the data identification, the synchronous processing result and the state information of each piece of data in the data source as the log data of each piece of data.
The synchronization identifier uuid has uniqueness, and log data of a piece of data for synchronous processing can be found in the ES log file according to the synchronization identifier. The synchronization processing result isSuccess is used for representing the result of the synchronization processing, including synchronization success and synchronization failure. The status information message is used for recording information in the synchronization processing process, so that a user can quickly check the reason of synchronization failure. The data identifier of each piece of data in the data source is denoted as id, and the log data of each piece of data can be denoted as (id, uuid, issucess, message).
Step S105, records the log data of each piece of data into an ES log file.
In the embodiment of the application, in the process of data synchronization processing, log data of each piece of data is recorded into an ES log file log _ ES, so that a user can conveniently search information of data synchronization processing.
When the log data are stored, all data in the synchronous processing process can be generated into log data to be recorded, so that a user can view detailed log data conveniently; the key information data identifier, the synchronization processing result and the state information can also be generated into log data as described above, the key information is recorded, a user can conveniently and quickly search certain log data, and the memory space occupied by the log file can be reduced.
Because there is an upper limit to the logging capacity, when data synchronization is performed and the data to be synchronized is small, the logging data can be recorded in Redis. However, in practice, data to be synchronized is often millions of data, and if a traditional log printing mode is adopted, the problems that data records are not completely processed synchronously, and batch search and key information acquisition are not easy exist, so that log data can be recorded into an ES log file in the embodiment of the application, and comprehensive record and rapid search of log processing can be realized.
The data synchronization method provided by the embodiment of the application considers that the redis.
The synchronization processing result may include a first identifier and a second identifier, where the first identifier is used to indicate that the data synchronization processing is successful, and the second identifier is used to indicate that the data synchronization processing is failed, that is, when the synchronization of the current piece of data is successful, the synchronization processing result of the current piece of data is set as the first identifier, and when the synchronization of the current piece of data is failed, the synchronization processing result of the current piece of data is set as the second identifier, so that the synchronization processing result of each piece of data is recorded in the ES log file. In some embodiments, after the step S105, the data synchronization method may further continue to perform the following steps:
and step S106, based on the synchronous identification and the second identification, performing ES search in the ES log file to obtain a target data identification.
The target data here identifies that the corresponding data synchronization process failed.
In the embodiment of the application, after the data to be synchronized is synchronized, the log data in the ES log file can be used to check whether synchronization is completed, when the synchronization is completed, ES search is performed in the ES log file according to the synchronization identifier uuid and the second identifier no to obtain the data identifier of the data failing in the synchronization processing, for example 01, and according to the step, all the data failing in the synchronization processing in the ES log file can be scanned to further perform the synchronization processing.
Here, by performing ES search through the synchronization identifier uuid and the second identifier, the result of the data synchronization processing of the current batch can be quickly obtained, so that the performance of data processing can be improved.
And S107, performing secondary synchronization processing on the data corresponding to the target data identification by using the data processor to obtain a secondary synchronization processing result of each piece of data with failed synchronization processing.
And after all the data which fail in the synchronous processing in the ES log file are obtained, the data are subjected to the synchronous processing again by using the data processor, and a secondary synchronous processing result is obtained.
Step S108, based on the secondary synchronization processing result of each piece of data failed in synchronization processing, generating log data of each piece of data failed in synchronization processing.
Here, the log data of each piece of data for which the generation of the synchronization processing has failed is generated in the same manner as the generation of the log data of each piece of data based on the synchronization processing result of each piece of data in step S104.
In step S109, log data of each piece of data for which the synchronization process failed is recorded in the ES log file.
Through secondary synchronous processing, the data volume of synchronous processing failure can be greatly reduced, the performance and the accuracy of data processing are improved, and the precision requirement of most scenes is met.
When data synchronization is performed, when a batch processing operation is performed on redis, a key for processing a certain cache key is not successfully processed, and the key is automatically skipped to execute the next key, so that when the redis does not throw an exception, a synchronization failure is not recorded in the ES log file log _ ES, but actually the synchronization fails, and therefore in some embodiments, in a scene with a higher precision requirement, after the secondary synchronization processing is finished, the cache needs to be precisely scanned. And re-executing the whole synchronization processing process, checking whether the data synchronization processing is thorough according to the two processing results, if the data which is not subjected to the data synchronization processing still exists, storing the data which is not subjected to the data synchronization processing in the ES storage file log _ ES, and further processing according to an operation instruction of redis or manual operation of a user. When the operation instruction redis is used for further processing, although the performance is not good when the redis is used for processing a large data volume, because two rounds of synchronous processing are already performed in the early stage, the amount of the residual data meeting the condition of the search field 1 is not large, and the performance bottleneck of the redis is within a tolerable range.
In some embodiments, after the data synchronization process in the search space slave is completed, the data synchronization method may further include the following steps:
step S110, determining whether an operation instruction for performing the main cluster data synchronization processing is received.
A user can control whether to perform synchronous processing on the data of the main cluster through a configuration management center of the data synchronization system, when a preset triggering condition is met, the configuration management center triggers an operation instruction for performing synchronous processing on the data of the main cluster, and the step S111 is entered for performing synchronous processing on the data of the main cluster; and when the trigger condition is not met, returning to the corresponding step according to the preset strategy, for example, when the preset strategy is a cyclic synchronization process, entering step S106, and for example, when the preset strategy is a synchronization ending process, ending the data synchronization.
In other embodiments, the configuration management center may control the interruption of the synchronization processing task through the configuration interruption condition during the data synchronization processing, or may control the real-time interruption of the synchronization processing task through manual operation by a user, for example, configure a configuration management center to turn off the data synchronization processing switch through a preset small event subevent.
Step S111 performs data synchronization processing on the master cluster data.
And performing data synchronization processing on the main cluster data, namely, changing the parameter of the search space from slave to master in the steps S101 to S109, and keeping other parameters unchanged. And step S111, after the execution is finished, the synchronization process of the main cluster and the auxiliary cluster is finished, and the data synchronization is realized.
In the process of synchronously processing all data of the main cluster and the auxiliary cluster by using the data processor, the same data processor is adopted to realize the high reusability of the data processor.
On the basis of the embodiment shown in fig. 1, the embodiment of the present application further provides a data synchronization method applied to a data synchronization system. In order to describe the data synchronization method provided by the embodiment of the present application more clearly, the data synchronization method provided by the embodiment of the present application is described below in a manner of cleaning the cache and deleting inconsistent data. Fig. 2 is a schematic flow chart of another implementation of the data synchronization method according to the embodiment of the present application, and as shown in fig. 2, the data synchronization method includes the following steps:
step S201, initializing the data synchronization system, and acquiring a data acquirer and a data processor in the data synchronization system.
Here, when performing the initialization application, all the fetcher classes and the processor classes may be obtained through the reflection mechanism. The data obtainer is an implementation class for obtaining correct data to be synchronized, and each implementation class inherits an obtainer interface. The data processor realizes the implementation classes of data synchronization for processing data, and each implementation class inherits the interface of the data processor.
Step S202, initializing the data acquirer and enabling the data acquirer to be associated with the corresponding data source.
Initializing the data acquirer, associating the data source, and assembling the data source into a custom parameter Map for matching a corresponding processor in subsequent data processor input parameters.
Step S203 initializes the data processor, and associates the data processor with the corresponding data source.
Initializing the data processor, associating the data source, and assembling the data source into a custom parameter Map so as to be convenient for entering the matching processor according to the data processor. The data processing logic for each data source is also implemented in the data processor corresponding to each data source.
Step S204, based on the pre-configured data source, acquiring the parameters needed for acquiring the data to be synchronized.
Parameters required in query or processing are all configured in a custom parameter Map corresponding to each data source, for example, selection of a main cluster or a standby cluster, and setting of a "match" field in a scan operation.
For example, a user inputs parameters "value 1", "prodBM _" and "slave" on the data synchronization system, and the parameters obtained by the custom parameter Map may be expressed as:
the "search field 1", "match" and "cacheSource" are attributes, respectively refer to a search field, a matching rule and a search space, and the "value 1", "prodBM _" and "slave" are attribute values.
Step S205, using the parameter and data obtainer in the data synchronization system to obtain the data to be synchronized from the data source.
The data fetcher is defined based on a fetcher interface of the data synchronization system. If the data acquirer is used for searching the keyword with the value of 1 from the 'slave' search space in the associated data source to obtain the search data, the matching rule 'prodbM _' and the search data are spliced to obtain the cache keyword, and the data acquirer is used for searching the cache keyword in the 'slave' search space to obtain the data to be synchronized. Here, the data to be synchronized may be acquired by ES search.
In step S206, it is determined whether an operation instruction for starting the synchronization process is received.
When receiving the operation instruction, the step S207 is entered; when the operation instruction is not received, step S206 is continuously executed to wait for receiving the operation instruction.
In step S207, a data processor corresponding to the parameter is determined from the plurality of data processors in the data synchronization system.
And when the data to be synchronized is determined to be required to be synchronized, determining a data processor corresponding to the parameter from the plurality of data processors according to the self-defined parameter Map, wherein the data processor is defined based on a processor interface of the data synchronization system.
And step S208, carrying out synchronous processing on each piece of data included in the data to be synchronized by using the data processor corresponding to the parameter to obtain a synchronous processing result of each piece of data.
In the embodiment of the application, when the data acquirer is defined based on the acquirer interface and the data processor is defined based on the processor interface, user-defined logic coding can be supported, and when a user needs to perform data synchronization of other data sources, or acquire data to be synchronized by other acquisition logics, or perform synchronization processing on the data to be synchronized by other processing logics, user-defined configuration can be performed on the data synchronization system, so that the data synchronization system can provide the capability of configuring data source selection, the capability of configuring synchronization information acquisition, and the capability of configuring processing synchronization information.
In step S209, log data of each piece of data is generated based on the synchronization processing result of each piece of data.
And generating log data of each piece of data by using the key information in the synchronous processing process. In this embodiment of the present application, generating log data of each piece of data may be implemented as: generating a synchronous identifier of each piece of data based on a preset rule; acquiring the state information of each piece of data based on the synchronous processing result of each piece of data; and determining the data identification, the synchronous processing result and the state information of each piece of data in the data source as the log data of each piece of data. The log data such as each piece of data can be represented as (id, uuid, isSuccess, message).
Step S210, records the log data of each piece of data into an ES log file.
The key information in the process is recorded to the ES, the stored ES type needs to set fields of database main keys, uuid, success or failure and remark information, the ES searching performance is quite high, only unique uuid is generated by single data processing triggering, unsuccessful data in the previous processing can be obtained only by using the field searching of uuid and success or failure, and the next round of data processing can be triggered. The reason for using the ES log file is that the existing implementation mode has a comprehensive record and a fast search speed, and if the mode of printing the log is used, because the log recording capacity has an upper limit, if million-level data is processed, the data record is incomplete, and the problem that the key information is not easy to search and acquire in batch exists.
Step S211, based on the synchronous identification and the second identification, ES search is carried out in the ES log file to obtain the target data identification.
The data synchronization processing corresponding to the target data identification fails.
Step S212, using the data processor to perform secondary synchronization processing on the data corresponding to the target data identifier, so as to obtain a secondary synchronization processing result of each piece of data with failed synchronization processing.
In step S213, log data of each piece of data for which the synchronization process failed is generated based on the result of the secondary synchronization process of each piece of data for which the synchronization process failed.
In step S214, log data of each piece of data for which the synchronization process failed is recorded in the ES log file.
For example, if 150 ten thousand caches need to be deleted, the ES search is performed according to the conditions during the first scanning round; and splicing the cache key with the specified prefix by itself, finally carrying out deletion logic, and storing the processing result in log _ es. There are 50 thousands of uncleaned caches at the second search resulting in log _ es, requiring the remaining caches to be retrieved again and deleted. And (3) processing the unsuccessful data in the previous round again by retrieving the data processing result in the previous round in log _ ES, scanning a cache key with a specified prefix by using redis.
According to the method provided by the embodiment of the application, when a new data source is added or new data acquisition logic and processing logic are needed, only corresponding acquirer classes need to be added on the basis of the acquirer interface of the data synchronization system and corresponding processor classes need to be added on the basis of the processor interface of the data synchronization system, so that data synchronization is carried out according to the defined data acquirer and the defined data processor, high reusability of the data synchronization system is realized, code development amount can be reduced, and development cost can be reduced; the poor performance of redis.scan is considered, and when the cleaning is checked to be complete, the cleaning can be carried out before scanning through the es record, so that the data processing efficiency can be improved; the data processing result of the batch is quickly obtained by recording uuid and other key information during data execution, and the performance and accuracy of data processing can be improved.
On the basis of the above embodiments, the present application further provides a data synchronization method applied to the data synchronization system. Fig. 3 is a schematic flow chart of another implementation of the data synchronization method according to the embodiment of the present application, and as shown in fig. 3, the data synchronization method includes the following steps:
step S301, initializing the data synchronization system, and acquiring a data acquirer and a data processor in the data synchronization system.
In the embodiment of the present application, the implementation manners of step S301 to step S303 may refer to the implementation manners of step S201 to step S203 in the embodiment shown in fig. 2.
Step S302, initializing the data acquirer, and associating the data acquirer with the corresponding data source.
Step S303, initializing the data processor, and associating the data processor with the corresponding data source.
Step S304, acquiring parameters required for acquiring the data to be synchronized based on the pre-configured data source.
Determining the type of a data source based on a pre-configured data source, wherein the type of the data source comprises a database, a Redis cache and an ES; when the type of the data source is a database or ES, acquiring a search field and a search space required for acquiring data to be synchronized; and when the type of the data source is Redis cache, acquiring a search field, a matching rule and a search space which are required for acquiring the data to be synchronized.
Step S305, acquiring the data to be synchronized from the data source by using the parameter and data acquirer in the data synchronization system.
The data fetcher is defined based on a fetcher interface of the data synchronization system.
When the type of the data source is database or ES, acquiring the data to be synchronized from the data source, including: based on the search field, performing ES search in the search space by using a data acquirer to obtain search data meeting the search field condition; and determining the search data as the data to be synchronized.
When the type of the data source is the Redis cache, acquiring the data to be synchronized from the data source may be implemented as: based on the search field, performing ES search in the search space by using a data acquirer to obtain search data meeting the search field condition; fusing the matching rule with the search data to obtain a cache keyword; and based on the cache keywords, performing ES search in the search space by using the data acquirer to obtain data to be synchronized.
In step S306, it is determined whether an operation instruction for starting the synchronization process is received.
When receiving the operation instruction, the process proceeds to step S307; when the operation instruction is not received, step S306 is executed to wait for receiving the operation instruction.
In step S307, a data processor corresponding to the parameter is determined from among the plurality of data processors in the data synchronization system.
And when the data to be synchronized is determined to be required to be synchronized, determining the data processor corresponding to the parameter from the plurality of data processors according to the self-defined parameter Map. The data processor is defined based on a processor interface of the data synchronization system.
Step S308, each piece of data included in the data to be synchronized is synchronized by the data processor corresponding to the parameter, and a synchronization processing result of each piece of data is obtained.
In the embodiment of the application, when the data acquirer is defined based on the acquirer interface and the data processor is defined based on the processor interface, user-defined logic coding can be supported, and when a user needs to perform data synchronization of other data sources, or acquire data to be synchronized by other acquisition logics, or perform synchronization processing on the data to be synchronized by other processing logics, user-defined configuration can be performed on the data synchronization system, so that the data synchronization system can provide the capability of configuring data source selection, the capability of configuring synchronization information acquisition, and the capability of configuring processing synchronization information.
In step S309, log data of each piece of data is generated based on the synchronization processing result of each piece of data.
Generating a synchronous identifier of each piece of data based on a preset rule; acquiring the state information of each piece of data based on the synchronous processing result of each piece of data; and determining the data identification, the synchronous processing result and the state information of each piece of data in the data source as the log data of each piece of data.
Step S310, records the log data of each piece of data into an ES log file.
After all the data in the ES log file that failed the synchronization process are obtained, the data processor performs the resynchronization process on the data, and the process proceeds to step S311.
Step S311, based on the synchronous identification and the second identification, ES search is carried out in the ES log file to obtain the target data identification.
The data synchronization processing corresponding to the target data identification fails.
In step S312, it is determined whether the number of the target data identifiers is less than a preset number.
When the number of the target data identifications is less than the preset number, indicating that the ES search end condition is reached, performing synchronous processing by using redis. When the number of the target data id is not less than the preset number, it indicates that the synchronization process based on the ES search is still to be performed, and the process proceeds to step S313.
Step 313, the data processor is used to perform the synchronization processing again on the data corresponding to the target data identifier, so as to obtain the result of the synchronization processing again.
In step S314, log update data is generated based on the resynchronization processing result.
Step S315, updating the ES log file based on the log updating data to obtain an updated ES log file.
After step S315, step S311 is performed. Through multiple times of synchronous processing, the data volume of synchronous processing failure can be greatly reduced, the performance and the accuracy of data processing are improved, and the precision requirement of most scenes is met.
And step S316, carrying out synchronous processing on the ES log file based on redis.
And e.g. performing redis scan on a small-magnitude condition to obtain a cache key with a specified prefix, thereby realizing cache cleaning.
In step S317, it is determined whether an operation instruction for starting the synchronization process of the master cluster is received.
When receiving the operation instruction, the process proceeds to step S318; when the operation instruction is not received, step S317 is continuously executed to wait for the reception of the operation instruction.
In step S318, data synchronization processing is performed on the master cluster data.
And (3) performing data synchronization processing on the main cluster data, namely, changing the parameter of the search space from slave to master in the steps S304 to S316, and keeping other parameters unchanged. After the step S316 is completed, the synchronization process between the main cluster and the auxiliary cluster is completed, so as to implement data synchronization.
Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.
In the period of electronic information explosion, the expanding information is rapidly transmitted among all internet systems. Although today's systems maintain a high degree of data consistency as a fundamental requirement of the system, there are many factors that make this impractical, such as data loss due to network reasons, data desynchronization due to code logic reasons, etc. After the inconsistency of the detection data occurs, the solution is divided into two types: one is to refresh data and fill in missing data in time; the other is to delete redundant data.
In the related art, the implementation manner of data synchronization after data inconsistency occurs and the defects thereof are as follows:
1) and (3) re-triggering database change for data synchronization: under the condition that the Redis and Elasticissearch search servers do not synchronize the database, the database change can be triggered again, so that the database change can send binary log files (binary log and binlog), and the existing logic in the system is used for carrying out the data synchronization.
The scheme has the following defects: unless the situation that which data are inconsistent is definitely determined, most of the time, research and development personnel do not clearly determine how many data are inconsistent, so that a comparison tool still needs to be re-developed for the data sheet and the data related to inconsistent fields, and the comparison tool is too strong in pertinence, so that the reusability is low when other data are inconsistent, the re-development is still needed, and the research and development pressure and the time cost are greatly increased; meanwhile, the operation of the database is directly related to the data which normally runs on the line, and research personnel cannot accurately evaluate whether the influence range and the cost brought by triggering the change again can be borne, so that the scheme has certain danger.
2) Developing a tool for refreshing data and deleting data to perform data synchronization: according to the solution, not only a data synchronization tool needs to be developed, but also a tool for checking whether data are completely synchronized needs to be developed, and when the data are checked out and are not synchronized in place, and data processing needs to be carried out for multiple times, the problems that how to store uncleaned data and how to trigger cleaning again exist, namely, how to integrally process all dirty data exist. Therefore, the scheme still has the problems of low reusability and large development amount.
Aiming at the problems of low reusability and large development amount in the scheme, the solution provided by the embodiment of the application can provide the capabilities of configurable data source selection, synchronous information acquisition and synchronous information processing, and can provide the capabilities of checking whether data processing is thorough and triggering data synchronization again.
Fig. 4 is a schematic overall structure diagram of a data synchronization system according to an embodiment of the present application, and as shown in fig. 4, the data synchronization system includes:
1) a data acquirer: the method includes that accurate implementation classes of data needing to be synchronized are obtained, each implementation class inherits an acquirer interface to achieve a unified method, and custom logic coding can be conducted inside the method. Each acquirer is associated with a data source, and the data source is injected during initialization and can be associated with: commodity basic information library-table, commodity special attribute library-table, commodity picture library-table, Redis cache and Elasticissearch index-type. And the query information enumeration in the access parameters is converted, so that the aim of obtaining the requirement meeting the required query information from the corresponding data source is fulfilled. When the application is initialized, all the acquirer classes are acquired through a reflection mechanism and are assembled into a Map so as to be matched with corresponding processors in subsequent data processor entry.
2) A data processor: namely, the implementation classes for processing data to implement data synchronization, as above, each implementation class inherits the data processor interface and associates with the data source so as to be assembled to the Map in initialization, and the data processing logic for each data source is also implemented in each processor.
3) Self-defining parameters Map: that is, parameters required in query or processing are configured in the Map, for example, selection of a primary cluster or a backup cluster, and setting of a "match" field of a scan operation in redis.
4) Query information enumeration: i.e. the fields to be found in the query, specify the desired query content.
5) A configuration management center: namely, at the switch configuration part for controlling the program process details, the service of dynamically changing the process state can be provided by switching the configuration switch, such as dynamically changing the size of single batch acquisition information and dynamically terminating the process to stop loss in the case of data error flushing.
6) Recording key information: recording key information in the process to es, wherein the stored es type needs to establish fields of database main key, uuid, success and remark information, and the purpose is as follows: the es search performance is quite high, only unique uuid is generated by single data processing triggering, unsuccessful data in the previous processing can be obtained only by using the uuid and success or failure field searching, and the next round of data processing can be triggered. The reason for using es is that the method is an implementation method which records comprehensively and has a high searching speed at present, and if a method of printing logs is used, because the log recording capacity has an upper limit, if million-level data is processed, the data recording is incomplete, and the problem that the key information is not easy to search and acquire in batches exists.
The following describes a specific flow of the data synchronization method provided in the embodiment of the present application, with reference to an example in which the data to be cleaned is cached data of a key with a certain prefix. Fig. 5 is a schematic flow chart of an implementation of a data synchronization method provided in an embodiment of the present application, and as shown in fig. 5, the method includes the following steps:
step S501, key words key needing to be cleaned in the cache are obtained according to conditions, cleaning operation is executed, and cleaning results corresponding to each id are recorded.
The reason for data acquisition by es is: the performance of Redis.scan operation is poor, direct use of Redis.scan operation is avoided under the condition of large-order search, data meeting the condition of 'search field 1' can be retrieved through es, the incoming key prefix is used for automatically splicing Redis cache keys, the cache keys are arranged into a set, and finally batch cache cleaning is carried out, wherein the batch size of the batch operation cache can be dynamically modified through configuration in a configuration center, and the condition that operation throwing is abnormal or the efficiency is low due to unreasonable batch size setting is prevented.
The ginseng is as follows:
fig. 6 is a schematic diagram of an implementation flow for executing the cleaning and recording operations, and as shown in fig. 6, the cleaning and recording operations executed in step S501 may be implemented by the following steps:
step S5011, in the "es" data acquirer, acquires data that meets the "search field 1" condition, and arranges the data into an id set.
Step S5012, splicing id in the id set and redis cache prefix ProdBM _ina deluselscacheSync data processor, arranging the spliced id and the proddBM _intoa cache key set, and performing cache cleaning in batch in a slave cluster.
In step S5013, during the processing of the "es" data acquirer, key information (id, uuid, isSuccess, message) of this processing is recorded into the "log _ es" type of es.
In step S5014, this process ends.
Step S502, the id of the previous execution failure is retrieved, and buffer cleaning is performed again.
By using the fast search capability of es, the data processor log _ es is searched out the id of the execution failure in the previous round of retrieval by using the unique uuid and the condition of the execution failure, and the data processor log _ es is subjected to cache cleaning again through a 'delusels cachesync'.
The ginseng is as follows:
fig. 7 is a schematic diagram of an implementation flow of performing the cleaning operation again, and as shown in fig. 7, the performing of the cleaning operation again in step S502 can be implemented by the following steps:
in step S5021, the data processor log _ es searches for the id of the execution failure in the previous round of retrieval using the unique uuid and the condition of the execution failure.
Step S5022, perform buffer cleaning again on the slave cluster through the data synchronizer "delUselesCacheSync".
Step S503, scanning the prefix in the cache, checking whether the data is cleared completely, and recording the uncleaned information to 'log _ es'.
The reason for this step is that when the Redis performs the batch delete operation, if the delete operation of a certain key is not successful, the key is skipped to execute the next key, so that when the Redis does not throw an exception, there is a case that the recording operation fails in the "log _ es" but actually fails, and therefore, the cache needs to be accurately scanned. Scan operation performance is not good, but because two rounds of cleaning are performed in the early stage, the surplus data meeting the condition of 'search field 1' is not enough, and the performance bottleneck of Redis is tolerable.
The ginseng is as follows:
fig. 8 is a schematic diagram of an implementation flow of performing the search and record operation again, and as shown in fig. 8, the performing of the search and record operation again in step S502 may be implemented by the following steps:
step S5031, in the redis data acquirer, acquiring data which meets the condition of 'search field 1', has a prefix of 'prodBM _' and has redis slots distributed between 0 and 6400.
Step S5032, perform cache scrubbing again on the slave cluster through the data synchronizer "deluselsuchecysync".
Experiments prove that:
1) if 150 ten thousand caches need to be deleted:
scene one: if the data brushing tool provided by the embodiment of the application is not used: scanning a cache key with a specified prefix by using a hard redis. Carrying out logic screening according to conditions, and finally carrying out deletion logic; it takes more than 20 hours.
Scene two: if the data brushing tool provided by the embodiment of the application is used: carrying out es search according to conditions during the first scanning round; splicing the cache key with the specified prefix by self, and finally carrying out deletion logic; it takes 8 hours.
2) If there are 50 ten thousand uncleaned caches, the remaining caches need to be retrieved and deleted:
scene one: if the data brushing tool provided by the embodiment of the application is not used: scanning a cache key with a specified prefix by using redis. Performing logic screening again according to the conditions, and finally performing deletion logic; it takes more than 10 hours; and there is a gap in redis. scan, resulting in the problem of missing cache keys not scanned out after a large order of magnitude scan.
Scene two: if the data brushing tool provided by the embodiment of the application is used: processing the unsuccessful data in the previous round again by retrieving the processing result of the data in the previous round in the es; scanning a cache key with a specified prefix by using redis in a decimal order; at this time, only about 20 ten thousand of uncleaned caches are left, logic screening is carried out, and logic deletion is carried out; it takes 1 hour; the problem of missing cache keys not scanned after a large order of magnitude scan is mitigated.
In the embodiment of the application, a data processor of 'delusels cachesync' is used in all three steps of the processing, and the high reusability of the data processor can be met only by developing specific logic judgment and rules for deletion in a tool; on the basis of the model, if new data acquisition and data processing logics are needed, only corresponding acquirer classes and processor classes need to be newly developed, so that the high reusability of the model is realized; considering that the performance of redis.scan is very poor, cleaning can be performed before scanning through es records when the cleaning is checked to be complete, so that the data processing efficiency is improved; during data processing, real-time task interruption can be performed through configuration, and a data processing switch is closed through a subframe mode after configuration modification; through the combination of each data acquirer and each processor, various data processing problems can be solved on the premise of pursuing performance and accuracy. Thus, embodiments of the present application provide a configurable model architecture with high reusability, i.e., 1) provide the capability of configurable data source selection; 2) providing the capability of configurable synchronization information acquisition; 3) the ability to configurably process synchronization information is provided. The data processing result of the batch is quickly obtained by recording uuid and other key information during data execution, so that the follow-up data verification or re-execution step is conveniently executed, and the performance and accuracy of data processing can be greatly improved.
Based on the foregoing embodiments, the embodiments of the present application provide a data synchronization apparatus, where each module included in the apparatus and each unit included in each module may be implemented by a processor in a computer device; of course, the implementation can also be realized through a specific logic circuit; in the implementation process, the processor may be a Central Processing Unit (CPU), a Microprocessor Unit (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.
Fig. 9 is a schematic diagram illustrating a structure of the data synchronization apparatus according to an embodiment of the present invention, and as shown in fig. 9, the data synchronization apparatus 900 includes:
a first obtaining module 901, configured to obtain parameters required for obtaining data to be synchronized based on a preconfigured data source;
a second obtaining module 902, configured to obtain the data to be synchronized from a data source by using the parameter and a data obtainer in the data synchronization system, where the data obtainer is defined based on an obtainer interface of the data synchronization system;
a synchronization processing module 903, configured to perform synchronization processing on each piece of data included in the to-be-synchronized data by using a data processor in the data synchronization system to obtain a synchronization processing result of each piece of data, where the data processor is defined based on a processor interface of the data synchronization system.
In some embodiments, the data synchronization apparatus 900 may further include:
the first initialization module is used for initializing the data synchronization system;
the third acquisition module is used for acquiring the data acquirer and the data processor in the data synchronization system;
the second initialization module is used for initializing the data acquirer and enabling the data acquirer to be associated with a corresponding data source;
and the third initialization module is used for initializing the data processor so as to enable the data processor to be associated with the corresponding data source.
In some embodiments, the first obtaining module 901 may further be configured to:
determining the type of a data source based on a pre-configured data source, wherein the type of the data source comprises a database, a Redis cache and an ES;
when the type of the data source is a database or ES, acquiring a search field and a search space required for acquiring data to be synchronized;
and when the type of the data source is Redis cache, acquiring a search field, a matching rule and a search space which are required for acquiring the data to be synchronized.
In some embodiments, when the type of the data source is a database or an ES, the second obtaining module 902 may further be configured to:
based on the search field, performing ES search in the search space by using the data acquirer to obtain search data meeting the search field condition;
and determining the search data as data to be synchronized.
In some embodiments, when the type of the data source is a database or an ES, the second obtaining module 902 may further be configured to:
based on the search field, performing ES search in the search space by using the data acquirer to obtain search data meeting the search field condition;
fusing the matching rule with the search data to obtain a cache keyword;
and based on the cache keywords, performing ES search in the search space by using the data acquirer to obtain data to be synchronized.
In some embodiments, the data synchronization apparatus 900 may further include:
a generating module, configured to generate log data of each piece of data based on a synchronization processing result of each piece of data;
and the recording module is used for recording the log data of each piece of data into an ES log file.
In some embodiments, the generating module is further configured to:
generating a synchronous identifier of each piece of data based on a preset rule;
acquiring the state information of each piece of data based on the synchronous processing result of each piece of data;
and determining the data identifier, the synchronous processing result and the state information of each piece of data in a data source as the log data of each piece of data.
In some embodiments, the synchronization processing result comprises a first identifier for representing that the data synchronization processing is successful and a second identifier for representing that the data synchronization processing is failed;
the second obtaining module is further configured to perform ES search in the ES log file based on the synchronization identifier and the second identifier to obtain a target data identifier, where data synchronization processing corresponding to the target data identifier fails;
the synchronous processing module is further configured to perform secondary synchronous processing on the data corresponding to the target data identifier by using the data processor to obtain a secondary synchronous processing result of each piece of data with failed synchronous processing;
the generating module is further configured to generate log data of each piece of data that fails in the synchronization processing based on a secondary synchronization processing result of each piece of data that fails in the synchronization processing;
the recording module is further configured to record log data of each piece of data that fails in the synchronization processing into an ES log file.
In some embodiments, the data synchronization apparatus 900 may further include:
the judging module is used for judging whether an operation instruction for starting synchronous processing is received or not;
and the determining module is used for determining a data processor corresponding to the parameter from a plurality of data processors in the data synchronization system when the operating instruction is received.
Here, it should be noted that: the above description of the data synchronization apparatus embodiment is similar to the above description of the method, and has the same advantageous effects as the method embodiment. For technical details not disclosed in the embodiments of the data synchronization apparatus of the present application, those skilled in the art should understand with reference to the description of the embodiments of the method of the present application.
It should be noted that, in the embodiment of the present application, if the method is implemented in the form of a software functional module and sold or used as a standalone product, it may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.
Accordingly, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps in the data synchronization method provided in the above embodiments.
Fig. 10 is a schematic diagram illustrating a composition structure of the data synchronization apparatus provided in the embodiment of the present application, and other exemplary structures of the data synchronization apparatus 1000 can be foreseen according to the exemplary structure of the data synchronization apparatus 1000 shown in fig. 10, so that the structure described herein should not be considered as a limitation, for example, some components described below may be omitted, or components not described below may be added to adapt to special requirements of some applications.
The data synchronization apparatus 1000 shown in fig. 10 includes: a processor 1001, at least one communication bus 1002, a user interface 1003, at least one external communication interface 1004, and a memory 1005. Wherein the communication bus 1002 is configured to enable connective communication between these components. The user interface 1003 may include a display 1031, and the external communication interface 1004 may include a standard wired interface and a wireless interface, among others. The processor 1001 is configured to execute a program of the data synchronization method stored in the memory, so as to implement the steps in the data synchronization method provided in the foregoing embodiments.
The above description of the data synchronization apparatus and storage medium embodiments is similar to the description of the method embodiments described above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the data synchronization device and the storage medium of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.
Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a device to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.
The above description is only for the embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.