Data processing method and device and server
1. A method for processing data, comprising:
acquiring a data query request, wherein the query request comprises a data identifier;
acquiring the current corresponding generation time of an offline database;
determining a target time period of the online data to be retrieved according to the current corresponding generation time of the offline database;
and retrieving and acquiring target data corresponding to the data identification from the offline database and online data in the target time period.
2. The method of claim 1, wherein the obtaining of the current corresponding generation time of the offline database comprises:
and inquiring an updating time list corresponding to the off-line database so as to determine the current corresponding generating time according to the latest updating time in the updating time list.
3. The method of claim 2, further comprising:
under the condition that the time interval between the current corresponding generation time and the current time of the off-line database is greater than a threshold value, acquiring data to be stored from the current corresponding generation time to the current time;
determining difference data between the data to be stored and the stored data in the off-line database;
and storing the difference data into the off-line database, and adding the current moment into the update moment list.
4. The method according to any one of claims 1-3, wherein the query request further includes a target service identifier, and after the retrieving and obtaining target data corresponding to the data identifier, further includes:
determining a target processing mode corresponding to the target data according to the target service identifier;
and processing the target data according to the target processing mode.
5. The method of claim 4, wherein the determining the processing mode corresponding to the target data according to the target service identifier comprises:
determining a target processing mode corresponding to the target service identifier according to a mapping relation between a preset service identifier and the processing mode;
alternatively, the first and second electrodes may be,
and determining the type of the target data according to the target service identifier, and determining the target processing mode according to the type of the target data.
6. The method according to any one of claims 1 to 3, wherein the determining a target time period of the online data to be retrieved according to a currently corresponding generation time of the offline database comprises:
determining a time period between the current corresponding generation time of the off-line database and the current time as the target time period;
alternatively, the first and second electrodes may be,
and determining a time period with a preset length after the current corresponding generation time of the off-line database as the target time period.
7. An apparatus for processing data, comprising:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is configured to acquire a data query request which comprises a data identifier;
the second acquisition module is configured to acquire the current corresponding generation time of the offline database;
the first determining module is configured to determine a target time period of the online data to be retrieved according to a current corresponding generation time of the offline data;
and the third acquisition module is configured to retrieve and acquire target data corresponding to the data identifier from the offline database and online data in the target time period.
8. A server, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement a method of processing data as claimed in any one of claims 1-6.
9. A computer-readable storage medium, in which instructions, when executed by a processor of a server, enable the server to perform a method of processing data according to any one of claims 1-6.
10. A computer program product comprising a computer program which, when executed by a processor, implements a method of processing data according to any one of claims 1 to 6.
Background
In the field of data processing, data can be divided into offline data and real-time data, and when data is queried, the offline data and the real-time data corresponding to the data to be queried are screened from the stored data according to the storage time of each batch of offline data, and then are merged and processed to return a query result.
In the related art, in order to obtain offline data, the whole amount of periodically generated offline data is usually synchronized to be stored on line, and a data table containing the whole amount of offline data is created during each synchronization, so that multiple versions of the data table are stored on line, more storage resources are occupied, and more calculation resources and waiting time are consumed during data synchronization.
Disclosure of Invention
The present disclosure provides a data processing method, an apparatus and a server, a storage medium and a computer program product, to at least solve the problems in the related art that in a data query service, multiple versions of data tables are stored, more storage resources are occupied, and more computing resources and waiting time are consumed when data synchronization is performed. The technical scheme of the disclosure is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided a data query method, including:
acquiring a data query request, wherein the query request comprises a data identifier;
acquiring the current corresponding generation time of an offline database;
determining a target time period of the online data to be retrieved according to the current corresponding generation time of the offline database;
and retrieving and acquiring target data corresponding to the data identification from the offline database and online data in the target time period.
In a possible implementation manner of the embodiment of the present disclosure, obtaining a current generation time corresponding to an offline database includes: and inquiring an updating time list corresponding to the off-line database so as to determine the current corresponding generating time according to the latest updating time in the updating time list.
In a possible implementation manner of the embodiment of the present disclosure, the method for processing data further includes: under the condition that the time interval between the current corresponding generation time and the current time of the off-line database is greater than a threshold value, acquiring data to be stored from the current corresponding generation time to the current time; determining difference data between data to be stored and stored data in the offline database; and storing the difference data into the off-line database, and adding the current moment into the update moment list.
In a possible implementation manner of the embodiment of the present disclosure, after the obtaining data to be stored from the current corresponding generation time to the current time, the method further includes: and under the condition that data is not stored in the offline database, storing the data to be stored into the offline database, and adding the current moment into the update moment list.
In a possible implementation manner of this disclosure, the query request further includes a target service identifier, and after retrieving and acquiring target data corresponding to the data identifier, the method further includes: determining a target processing mode corresponding to the target data according to the target service identifier; and processing the target data according to the target processing mode.
In a possible implementation manner of this embodiment of the present disclosure, the determining, according to the target service identifier, a processing manner corresponding to the target data includes: determining a target processing mode corresponding to the target service identifier according to a mapping relation between a preset service identifier and the processing mode; or, determining the type of the target data according to the target service identifier, and determining the target processing mode according to the type of the target data.
In a possible implementation manner of the embodiment of the present disclosure, the determining a target time period of online data to be retrieved according to a current corresponding generation time of the offline database includes: determining a time period between the current corresponding generation time of the off-line database and the current time as the target time period; or, determining a time period of a preset length after the current corresponding generation time of the off-line database as the target time period.
According to a second aspect of the embodiments of the present disclosure, there is provided a data processing apparatus including:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is configured to acquire a data query request which comprises a data identifier;
the second acquisition module is configured to acquire the current corresponding generation time of the offline database;
the first determining module is configured to determine a target time period of the online data to be retrieved according to a current corresponding generation time of the offline data;
and the third acquisition module is configured to retrieve and acquire target data corresponding to the data identifier from the offline database and online data in the target time period.
In a possible implementation manner of the embodiment of the present disclosure, the second obtaining module is further configured to: and inquiring an updating time list corresponding to the off-line database so as to determine the current corresponding generating time according to the latest updating time in the updating time list.
In a possible implementation manner of the embodiment of the present disclosure, the apparatus for processing data further includes: the fourth obtaining module is configured to obtain data to be stored from the current corresponding generation time to the current time when the time interval between the current corresponding generation time and the current time of the offline database is greater than a threshold value; a second determining module configured to determine difference data between the data to be stored and the data stored in the offline database; and the first storage module is configured to store the difference data into the offline database and add the current moment into the update moment list.
In one possible implementation manner of the embodiment of the present disclosure, the first storage module is further configured to: and under the condition that data is not stored in the offline database, storing the data to be stored into the offline database, and adding the current moment into the update moment list.
In a possible implementation manner of this disclosure, the query request further includes a service identifier, and the apparatus further includes: a third determining module configured to determine a target processing manner corresponding to the target data according to the target service identifier; and the processing module is configured to process the target data according to the target processing mode.
In a possible implementation manner of the embodiment of the present disclosure, the apparatus for processing data further includes: and the fourth determining module is configured to determine a target processing mode corresponding to the target service identifier according to a preset mapping relation between the service identifier and the processing mode, or determine the type of the target data according to the target service identifier, and determine the target processing mode according to the type of the target data.
In a possible implementation manner of this embodiment of the present disclosure, the first determining module further includes: a first determining unit, configured to determine a time period between a current generation time corresponding to the offline database and the current time as the target time period, or determine a time period of a preset length after the current generation time corresponding to the offline database as the target time period.
According to a third aspect of the embodiments of the present disclosure, there is provided a server, including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the method of processing data as described above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium in which instructions, when executed by a processor of a server, enable the server to perform the method of processing data as described above.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, which, when executed by a processor of a server, enables the server to perform the method of processing data as described above.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: when a data query request exists, the latest time for updating the offline data can be determined according to the current corresponding generation time of the predetermined offline database, then the target time period of the online data to be retrieved is determined according to the latest time for updating the offline data, and then the target data corresponding to the data identifier is retrieved and acquired from the offline database and the online data in the target time period. Therefore, the moment of updating the off-line data in the off-line database can be determined according to the current corresponding generation moment of the off-line database, the current synchronized off-line data and the real-time data in the corresponding time period after the moment can be inquired according to the latest off-line data updating moment, the moment of updating the off-line data is determined by setting the current corresponding generation moment of the off-line database, the off-line data changed during updating can be synchronized to the off-line database conveniently, the off-line data can be inquired in a full off-line data table, and therefore the data inquiry task can be realized by only storing the full off-line data table in the off-line database, the storage space is saved, and the calculation resources and time required by off-line data synchronization are reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a flow chart illustrating a method of processing data according to an exemplary embodiment.
Fig. 2 is a flow chart illustrating a specific data processing method according to an exemplary embodiment.
Fig. 3 is a flow chart illustrating another specific data processing method according to an example embodiment.
Fig. 4 is a flowchart illustrating yet another specific data processing method according to an exemplary embodiment.
FIG. 5 is a block diagram illustrating a data processing apparatus according to an example embodiment.
FIG. 6 is a block diagram illustrating a server in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Fig. 1 is a flowchart illustrating a data processing method according to an exemplary embodiment, where the data processing method is used in a server, as shown in fig. 1, and includes the following steps.
In step S101, a data query request is obtained, where the query request includes a data identifier.
The execution subject of the data processing method of the present disclosure is a server. The data processing method according to an embodiment of the present disclosure may be executed by the data processing apparatus according to the embodiment of the present disclosure, and the data processing apparatus according to the embodiment of the present disclosure may be configured in the server according to the embodiment of the present disclosure to execute the data processing method according to the embodiment of the present disclosure.
It should be further noted that, in order to improve the accuracy and effectiveness of the queried data, in the data query scenario, the server according to the embodiment of the present disclosure may be any server that provides a data query function for a user, in the embodiment of the present disclosure, the initialization of data is completed through offline data, and the timeliness of the data is guaranteed through online data (i.e., real-time data), that is, the offline data and online data of the data to be queried are retrieved by the data processing method according to the embodiment of the present disclosure, for this reason, the server according to the embodiment of the present disclosure stores the offline data and online data that are completed in production in advance, so as to provide a data query service in the following, wherein the embodiment of the present disclosure stores the periodically generated offline data in a full offline data table, the offline database of the server stores the full amount of offline data generated in each period through a version of data table, and no repeated offline data exists in the full offline data table. Compared with the data tables which need to store a plurality of versions in the related technology, the offline data of the embodiment of the disclosure occupies less storage resources, and does not store repeated data, thereby reducing resource consumption.
The data query request is a request for querying specific information of data sent by a client to a server, and the data query request includes a data identifier, where the data identifier is identity information of data to be queried, such as a name of the data to be queried.
In an embodiment of the present disclosure, a client may establish a connection with a server through a server interface provided by the server, and the client sends a data query request to the server again and obtains a data query result corresponding to the query request returned by the server, where the client sets a data identifier of data to be queried in the data query request, so that after the server receives the data query request, the server extracts the data identifier included in the query request, and retrieves data corresponding to the data identifier from stored data with the data identifier as a query condition.
In step S102, a current generation time corresponding to the offline database is obtained.
For the periodically generated offline data, each period of offline data is generated, and each offline data corresponding to the period needs to be stored in the offline database, and the generation time corresponding to the offline database is the time when the offline data generated in one period is synchronously stored in the offline database, which is recorded by the offline database and used for storing the last offline data in the current period, for example, 10 pieces of data are generated in the current period, and the time when the offline database records and stores the 10 th piece of data is the current corresponding generation time. And after each period of off-line data is synchronously completed, the corresponding generation time of the off-line database is correspondingly updated, so that the latest off-line data generation time in the off-line database can be determined according to the current corresponding generation time of the off-line database.
For example, after the offline data generated in the T-1 th cycle is stored in the offline database, the generation time corresponding to the offline database is the time when the last offline data generated in the T-1 th cycle is stored, then the offline data production operation continues to run, and after the offline data generated in the next cycle, i.e., the T-th cycle, is stored in the offline database, the current corresponding generation time of the offline database is updated to the time when the last offline data generated in the T-th cycle is stored. Therefore, according to the current corresponding generation time of the offline database, the server can determine the latest generation time of the offline data in the offline database, and the offline database stores the full amount of offline data up to the period T.
Therefore, according to the data query method provided by the embodiment of the disclosure, by setting the current corresponding generation time of the offline database, a full offline data table can be stored in the offline database, when the offline data is stored, the offline database records the corresponding generation time, and when data query is performed, the generation time of the latest offline data in the offline database can be determined according to the current corresponding generation time of the offline database, and the online data to be acquired is the real-time data after the time, so that the latest full offline data and the online data with higher timeliness can be acquired subsequently, the requirement of data query can be met, and the storage space of the offline database can be saved.
Optionally, in an embodiment of the present disclosure, the current generation time corresponding to the offline database may be obtained by querying an update time list corresponding to the offline database, where the update time list corresponding to the offline database records the generation time corresponding to each offline database, and when the current generation time corresponding to the offline database changes, the update time list updates the current generation time correspondingly, so that the latest update time in the update time list may be the current corresponding generation time. In specific implementation, the update time list may be set according to metadata of the offline database, where the metadata is data describing attributes of the offline data, the storage time of the offline data may be represented by the metadata, after the offline database stores the offline data, the time corresponding to the currently stored offline data is synchronized to the metadata, the time corresponding to the currently stored offline data is stored by modifying information indicating data change time in the metadata through a modification instruction, and the update time in the update list is set by reading the latest time information represented by the metadata. For example, after the time "12 o 10 min 10 s at 5/10/2021 year" corresponding to the currently stored offline data is synchronized to the metadata, the data modification time "of the offline database" a "is modified by a touch instruction to be the currently received time, that is, the storage time of the offline data" a "represented by the metadata is modified by an instruction" # touch-modification time-t202105101210.10/a "to be the currently received corresponding time" 202105101210.10 ".
In step S103, a target time period of the online data to be retrieved is determined according to a current corresponding generation time of the offline database.
The target time period refers to a time range in which online data with the same or similar timeliness are located, one target time period may include one or more online data, and the target time period may be determined according to actual factors such as accuracy requirements of data query.
As a first possible implementation manner, a time period between a generation time currently corresponding to the offline database and the current time is determined as a target time period. In the example, a time period from a generation time corresponding to the offline database to the current time is determined as a target time period of the online data, the target time period includes the online data generated after the generation time corresponding to the offline database, and the online data is retrieved in the target time period.
As a second possible implementation manner, a time period with a preset length after the current corresponding generation time of the offline database is determined as a target time period. In this example, the time period of the preset length is a time window for aggregating online data when the online data is stored in advance. It should be noted that, according to the characteristics of online data in the data processing field, since one piece of online data cannot be directly stored and then provides query service to the outside, online data generated at the current time and historical online data in a time period of a preset length need to be merged according to a corresponding rule, that is, online data in the time period of the preset length is aggregated and then stored, and the time period of the preset length is an aggregated time window. For example, 30 pieces of online data generated in the first minute are summed and stored in the tw1 window corresponding to the first minute.
It should be noted that, when storing online data, in order to reduce the amount of stored data, in an embodiment of the present disclosure, online data is aggregated in a preset time window in advance and then stored in an online database, that is, online data of the same time window is aggregated and stored in a data set corresponding to a time period in which the time window is located, and an online data table of the online database sequentially stores data sets corresponding to each time window. For example, the online data generated in the first minute, the second minute and the third minute are aggregated in the tw1, tw2 and tw3 windows corresponding to the first minute, the second minute and the third minute, respectively, and then the online data table of the online database sequentially stores the data sets corresponding to the tw1, tw2 and tw3 time windows in time order.
For example, the generated online data in one hour is aggregated in a data set of an online database corresponding to the hour, and the length of the specific time window may be set according to actual needs, which is not limited herein. In the embodiment of the present disclosure, after the generation time corresponding to the offline database is determined, each time window recorded by the online data table after the generation time corresponding to the offline database is obtained, it can be understood that the offline data is data before the generation time corresponding to the offline database, and the time period of the online data to be retrieved is after the generation time of the offline data, so that each time window of the online data after the generation time of the offline data can be determined as the target time period of the online data to be retrieved, which is convenient for retrieving the online data corresponding to the data to be queried in each target time period of the online data.
In step S104, target data corresponding to the data identifier is retrieved and acquired from the offline database and online data in the target time period.
In the embodiment of the disclosure, according to the data identifier of the data to be queried, offline data corresponding to the data identifier is retrieved from an offline database, and online data corresponding to the data identifier is retrieved within a target time period of the online database, so that offline data and online data corresponding to the data identifier are retrieved, and the retrieved offline data and online data corresponding to the data identifier are used as target data of the data to be queried, the offline data is historical data before a generation time corresponding to the offline database currently, the online data is data with stronger timeliness after the generation time corresponding to the offline database currently, the retrieved offline data and online data completely include the target data at different times, and it is convenient for subsequently returning a more accurate query result according to the target data.
Optionally, in an embodiment of the present disclosure, when retrieving online data corresponding to a data identifier, the online data may be retrieved in a manner that the data identifier is combined with each target time period, that is, whether online data corresponding to the data identifier exists in each target time period is retrieved in each target time period, a retrieval result of each target time period is separately recorded, and finally, the retrieval result of each target time period is integrated to obtain final target online data, so that influences of different target time periods on the retrieved data are avoided, and accuracy of obtaining the target data is improved.
According to the data processing method provided by the embodiment of the disclosure, when a data query request exists, the latest time for updating the offline data can be determined according to the predetermined current corresponding generation time of the offline database, then the target time period of the online data to be retrieved is determined according to the latest time for updating the offline data, and then the target data corresponding to the data identifier is retrieved and acquired from the offline database and the online data in the target time period. Therefore, the moment of updating the off-line data in the off-line database can be determined according to the current corresponding generation moment of the off-line database, the current synchronized off-line data and the real-time data in the corresponding time period after the moment can be inquired according to the latest off-line data updating moment, the moment of updating the off-line data is determined according to the current corresponding generation moment of the off-line database, the off-line data changed during updating can be synchronized to the off-line database conveniently, the off-line data can be inquired in a full off-line data table, and therefore the data inquiry task can be realized by only storing the full off-line data table in the off-line database, the storage space is saved, and the calculation resources and time required by off-line data synchronization are reduced.
Based on the foregoing embodiments, in a possible implementation form of the present disclosure, when storing offline data, the transformed portions of the offline data in different periods may be stored, so as to further reduce the storage space of the offline data and the complexity of synchronously storing the offline data.
Fig. 2 is a flow chart illustrating a specific data processing method according to an exemplary embodiment. As shown in fig. 2, the data processing method includes the following steps:
in step S201, when the time interval between the current generation time corresponding to the offline database and the current time is greater than the threshold, the data to be stored from the current generation time to the current time is acquired.
The threshold may be set according to a production cycle of the offline data, and the threshold is greater than or equal to the production cycle of the offline data, for example, if the production cycle of the offline data is 1 hour, the threshold is set to 1.1 hours. Therefore, when the time interval between the current generation time corresponding to the offline database and the current time is judged to be larger than the threshold value, it is indicated that an update period of the offline database is reached currently, the offline database needs to be refined through the newly generated offline data in the update period, that is, the newly generated offline data needs to be stored in the offline database.
In the embodiment of the present disclosure, all offline data generated from the current generation time corresponding to the offline database to the current time is acquired as data to be stored. It should be noted that, in the process of producing the offline data, the full amount of offline data is generated in each period, so that the data to be stored, which is obtained in the embodiment of the present disclosure, is the full amount of offline data produced from the current generation time corresponding to the offline database to the current time.
In step S202, difference data between the data to be stored and the data stored in the offline database is determined.
The data stored in the offline database is the offline data produced in the previous period and stored at the current corresponding generation time of the offline database, and it should be noted that, for the periodically generated full amount of offline data, usually, the full amount of offline data generated in two consecutive periods may have only part of the data changed, and the other data are kept consistent, and the changed data is the difference data in the present disclosure.
In the embodiment of the present disclosure, the data to be stored is compared with the data stored in the offline database, data in which the data to be stored changes with respect to the data stored in the offline database is obtained, and the data in which the change occurs is used as difference data between the data to be stored and the data stored in the offline database.
For example, if ten pieces of data with data identifiers of 1 to 10 are stored in the offline database, and the data to be stored also includes the storage locations and the numerical values of the ten pieces of data, but the numerical values of the data with data identifiers of 9 and 10 are changed, and other data are not changed, the server compares the data to be stored with the stored data in the offline database, determines that the numerical values of the data with data identifiers of 9 and 10 are changed, and further determines that difference data between the data to be stored and the stored data in the offline database is the numerical data with data identifiers of 9 and 10.
In step S203, the difference data is stored in the offline database, and the current time is added to the update time list.
In the embodiment of the present disclosure, the difference data between the data to be stored and the data stored in the offline database is stored in the offline data table of the offline database, and it can be understood that, after the difference data is stored in the offline data table of the offline database, the offline data table of the offline database includes the historical data before the difference data is not stored and the data that is newly changed in the current period, so that the offline data table of the offline database includes the full amount of offline data generated in each period by the current time. In addition, since the changed difference data is stored in the offline database according to the present disclosure, continuing with the above example, the present application stores two pieces of data with data labels of 9 and 10, compared with storing the whole amount of offline data generated in the current period in the offline database, the scale of the synchronized data is reduced, and the storage resources and the storage space are greatly saved.
Furthermore, because the difference data generated in the latest updating period is synchronized in the offline database at the current time, the server determines that the current corresponding generation time of the offline database is changed into the current time.
In an embodiment of the present disclosure, the current time may be updated according to the current generation time of the offline database by using the method in the above embodiment, so that the offline data and the online data to be retrieved are determined according to the current generation time of the offline database when the data query service is executed subsequently.
It should be noted that, in practical applications, the current time may be an initial time for storing the offline data, that is, the current offline database does not store the historical offline data. Therefore, in an embodiment of the present disclosure, when it is determined that the current time is the initial time, the data to be stored corresponding to the current time is stored as difference data in the offline database, and the time of generating the offline data in the offline database is determined to be the current time. Therefore, scenes where the data query method of the embodiment of the disclosure is applicable are increased, and the practicability and the applicability of the data query method of the embodiment of the disclosure are improved.
According to the data query method, under the condition that the time interval between the current corresponding generation time of the offline database and the current time is larger than the threshold value, the data to be stored from the current corresponding generation time to the current time is obtained, the difference data between the data to be stored and the data stored in the offline database is determined, the difference data is stored in the offline database, and the current corresponding generation time of the offline database is determined to be the current time. Therefore, the method only needs to store the changed data to the offline database when synchronizing the offline data every time, so that repeated data storage is avoided, the storage space is further saved, in addition, because the full amount of offline data does not need to be stored when synchronizing the offline data every time, the scale of synchronously storing the data every time is reduced, the calculation resources and the waiting time consumed by offline data synchronization are reduced, and the influence on the storage performance of the offline database is avoided.
Based on the above embodiments, in order to describe more clearly a specific process of storing online data to an online database according to a time period, the embodiments of the present disclosure further provide another specific data processing method.
Fig. 3 is a flow chart illustrating another specific data processing method according to an example embodiment. As shown in fig. 3, the data query method includes the following steps:
in step S301, online data of a current time period is obtained, where each online data includes a data identifier corresponding to the online data.
It should be noted that, in the data query method according to the embodiment of the present disclosure, after the online data is generated, the online data is read in real time, where each read online data includes a data identifier corresponding to the data.
In an embodiment of the present disclosure, the server may read a record that generates the online data when reading the online data, so as to obtain the data identifier corresponding to each online data included in the record.
In step S302, each online data of the current time period is stored into the data set corresponding to the current time period according to the corresponding data identifier.
In the embodiment of the present disclosure, the online data table storing online data in the online database is divided into different data sets in advance according to time periods, each time period corresponds to one data set, for example, when the time period is divided according to hours, a first hour corresponds to a first data set in the online data table, a second hour corresponds to a second data set in the online data table, and so on.
It should be noted that, as described above, since one piece of online data cannot be directly stored and then provides an inquiry service to the outside, the online data generated at the current time and the historical online data in a time period of a preset length need to be merged according to a corresponding rule, where the time period of the preset length is the aggregated time window. In specific implementation, when storing the online data of the current time period, the online data are stored in the data set corresponding to the current time period in sequence according to the data identification of the online data, wherein, as described in step S301, since the data identifier includes the generation time of each online data, before storing the data, acquiring the generation time in the data identifier, comparing the generation time of the online data, distinguishing the generation time of different online data and determining the storage sequence, for example, if the first data is determined to be generated earlier than the second data according to the generation time in the data identifier, the first data is stored in the data set corresponding to the current time period, and then the second data is stored in the data set corresponding to the current time period, and when the online data is stored, the data identification of the online data is stored at the same time, so that the data can be conveniently retrieved in a data set subsequently according to the data identification.
Optionally, in an embodiment of the present disclosure, when storing a piece of online data, the piece of online data stored in the data set may be used as history data, and the piece of data to be currently stored and the history data stored in the data set may be merged by a preset merge rule, for example, when the data set corresponding to the first minute stores data corresponding to a first time within the first minute, the piece of data may be used as history data, and the online data corresponding to a second time within the first minute to be currently stored and the history data are merged, where the merge rule may be set to be a different two-element processing rule according to actual needs of the data query service, for example, the two-element processing rule is to sum, add, overlap, and calculate a maximum value and a minimum value, etc. the online data to be currently stored and the history data, and then, storing the merged data serving as result data in a data set corresponding to the current time period. Further, the result data becomes new historical data after being stored, if the next piece of online data exists, merging can be continued according to the method until all online data in the current time period are stored into the corresponding data set. Therefore, the consistency of the online data in the current time period is ensured through the merging process.
Optionally, in an embodiment of the present disclosure, during the process of merging the current online data and the historical data, the generated result data may be archived inside the merging job, that is, the result data is transferred to a storage device separately configured for the merging job for storage, so that when the merging job fails, it is ensured that the generated data is not lost, and the reliability of data query is improved.
The data query method comprises the steps of firstly obtaining online data of a current time period, wherein the online data comprise data identifications corresponding to all the data, and then storing all the data in the online data of the current time period into a data set corresponding to the current time period according to the corresponding data identifications. By storing each online data in different time periods into the data set corresponding to the time period, all the data in the time period can be conveniently acquired for retrieval during subsequent data query, the efficiency of retrieving target online data and the comprehensiveness of retrieval are improved, the waiting time of data query is favorably shortened, and the accuracy of query is improved. And the online data of the current time period is merged and stored, so that the consistency of the data is ensured, and the storable data volume is reduced.
Based on the above embodiment, in one possible implementation form of the present disclosure, after the target data corresponding to the data identifier is obtained, the obtained target data may be further processed according to a specific data query service, so as to return a query result adapted to the data query service of this time, and improve the pertinence of the data query service and the satisfaction of the user.
Fig. 4 is a flowchart illustrating yet another specific data processing method according to an exemplary embodiment. As shown in fig. 4, the data processing method includes the following steps:
in step S401, a target processing manner corresponding to the target data is determined according to the target service identifier.
The target service identifier is an identifier for indicating specific service content of the data query, for example, when a user needs to query a maximum value of data in a certain time period, the service identifier may be a combination of a data identifier of data to be queried, a specific time period, and the maximum value.
In the embodiment of the present disclosure, the target service identifier may be set by a user, and the target service identifier is included in the query request and sent to the server, or the server identifies the target service identifier corresponding to the query request according to the content of the request after receiving the request sent by the user, and a specific obtaining manner of the target service identifier may be set according to actual needs, which is not limited herein. In the embodiment of the disclosure, after acquiring the data query request, the server first acquires the data identifier and the target service identifier in the data query request, and after acquiring the target data corresponding to the data identifier through the steps in the above embodiment, determines the processing mode corresponding to the target data according to the target service identifier.
The processing manner corresponding to the target data may include a manner of combining the retrieved offline data and online data according to different rules, for example, summing the offline data and online data, calculating a maximum value and a minimum value, and the like.
In the embodiment of the disclosure, the processing mode corresponding to the target data may be determined by a management system of the server. After the target service identifier is obtained, the target service identifier is sent to a management system of the server, the management system of the server determines the specific service content of the data query according to the target service identifier, and then comprehensively analyzes the way in which the service can be realized, and sets a specific target processing mode of the target data, for example, when the target service identifier is a combination of the data identifier of the data to be queried in the above example, a specific time period and a maximum value, the management system determines the target processing mode corresponding to the target data to be a maximum value of the calculation online data and the offline data, and the like.
In specific implementation, the management system of the server may determine, in different manners, a processing manner corresponding to the target data, as a possible implementation manner, the management system of the server may generate a mapping relationship between the service identifier and the processing manner in advance by performing machine learning on the historical data, and then query, according to the obtained target service identifier, a mapping relationship between the preset service identifier and the processing manner, and determine the target processing manner corresponding to the target service identifier.
The above process of machine learning the history data is as follows:
collecting a large amount of training data in a mode of acquiring recorded data in a server and the like, wherein the training data is historical data of processing modes corresponding to different data query requests, and the different data query requests contain different service identifiers;
after the data are collected, the collected data can be subjected to operations such as deduplication, standardization, error correction and the like so as to improve the accuracy of the acquired training data;
analyzing the acquired training data, analyzing the correlation between the service identification and the processing mode in a visualization mode and the like, and determining a correlation coefficient;
carrying out feature selection and vectorization processing on the data to extract a feature result and enhance the representation capability of the feature through vectorization so as to prevent the model from being too complex;
selecting a proper algorithm according to actual requirements, for example, selecting a target algorithm from algorithms such as linear regression, decision tree and random forest to perform model training according to the accuracy requirement of the actual training result, wherein the trained model reflects the mapping relation between the service identifier and the processing mode, and inputting the target service identifier into the model after the target service identifier is obtained subsequently, so that the target processing mode corresponding to the target service identifier can be output.
As another possible implementation manner, the management system of the server may determine the type of the target data according to the target service identifier, and determine the target processing manner according to the type of the target data. In this example, after the type of the target data is determined, the target processing manner may be determined according to a general processing manner of the target data of the specific type. For example, when the target service identifier is "obtaining the walking mileage of the user", it is determined that the type of the target data is the amount of the user's walking distance in each time period, and obtaining the total walking mileage of the user requires calculating the sum of the amounts of the user's walking distance in each time period.
In step S402, the target data is processed according to the processing method.
In the embodiment of the disclosure, according to the determined processing mode corresponding to the target data, the target data is correspondingly processed, and the processed target data is returned to the client, so that the client obtains the query result adapted to the data query service.
According to the data query method, the target processing mode corresponding to the target data is determined according to the target service identification, and then the target data is correspondingly processed and returned according to the determined target processing mode. The acquired target data is processed according to the specific data query service, and the query result adaptive to the data query service is returned, so that the pertinence of the data query service and the satisfaction degree of a user are improved.
In order to more clearly describe the implementation process of the data processing method described in the foregoing embodiment, a detailed description is given below with reference to a specific embodiment.
The server executing the data processing method according to the embodiment of the present disclosure is capable of synchronizing the produced offline data and online data. Specifically, for the offline data, under the condition that the server determines that the time interval between the current corresponding generation time recorded by the offline database and the current time is greater than the threshold, the server obtains the full amount of offline data produced in the T +1 period to be stored from the current corresponding generation time to the current time, compares the full amount of offline data produced in the T +1 period with the offline data produced in the T period stored at the time T0 in the offline data path, determines the difference data between the data to be stored and the stored data in the offline database, stores the difference data into the offline database, adds the current time T1 at which the difference data is synchronized into the update time list, and updates the current corresponding generation time of the offline database from T0 to T1. If the data is not stored in the offline database, the current data to be stored is stored in the offline database, and t1 is added to the update time list. For the online data, aggregating the online data in a preset time window, and then storing the aggregated online data in an online database, that is, aggregating the online data in the same time window into a data set corresponding to the time period, where an online data table of the online database sequentially stores the data sets corresponding to the time windows, for example, aggregating the generated online data in each minute into the data set of the online database corresponding to the minute, and sequentially storing the aggregated online data in each time window, such as tw1, tw2, and tw3, in the online data table of the online database.
After receiving a data query request sent by a client, a server acquires a data identifier "A" and a target service identifier "acquisition A in the query request to reach the current maximum value", queries an update time list corresponding to an offline database, determines the current corresponding generation time of the offline database to be t1 according to the latest update time in the update time list, and further determines target time periods tw1, tw2, tw3 and the like of each piece of online data after t1 according to t 1. And retrieving offline data corresponding to the data identifier 'A' from the full amount of offline data stored in the offline database, retrieving online data in each time window in sequence in a mode that the data identifier is combined with each time window in the online database by using 'A + tw 1', 'A + tw 2', 'A + tw 3' and the like, and splicing the retrieval results of each recorded time window to obtain the online data in the target time period.
After the target data is obtained, the management system of the server inquires a preset mapping relation according to a target service identifier 'obtaining A is up to the current maximum value', the target processing mode corresponding to the obtained target data of 'A' is determined to be maximum value, then the server merges the obtained online data and the obtained offline data of 'A', calculates the maximum value, and then returns the calculation result to the client.
FIG. 5 is a block diagram illustrating a data processing apparatus according to an example embodiment. Referring to fig. 5, the apparatus 100 includes a first obtaining module 110, a second obtaining module 120, a first determining module 130, and a third obtaining module 140.
The first obtaining module 110 is configured to obtain a data query request, where the query request includes a data identifier.
The data query request is a request for querying specific information of data sent by a client to a server, and the data query request includes a data identifier, where the data identifier is identity information of data to be queried, such as a name of the data to be queried.
In an embodiment of the present disclosure, a client may establish a connection with a server through a server interface provided by the server, and the client sends a data query request to the server again and obtains a data query result corresponding to the query request returned by the server, where the client sets a data identifier of data to be queried in the data query request, so that after the server receives the data query request, the server extracts the data identifier included in the query request, and retrieves data corresponding to the data identifier from stored data with the data identifier as a query condition.
The second obtaining module 120 is configured to obtain a current generation time corresponding to the offline database.
For the periodically generated offline data, each period of offline data is generated, each offline data corresponding to the period needs to be stored in the offline database, and the generation time corresponding to the offline database is the time when the offline data generated in one period is synchronously stored in the offline database, the offline database records the time when the last offline data in the current period is stored, for example, 10 pieces of data are generated in the current period, the time when the 10 th piece of data is stored in the offline database is the current corresponding generation time, and after each period of offline data is synchronously stored in one period, the current corresponding generation time of the offline database is correspondingly updated, so that the generation time of the latest offline data in the offline database can be determined according to the current corresponding generation time of the offline database.
Therefore, according to the data query method provided by the embodiment of the disclosure, by setting the current corresponding generation time of the offline database, a full offline data table can be stored in the offline database, when the offline data is stored, the offline database records the corresponding generation time, and when data query is performed, the generation time of the latest offline data in the offline database can be determined according to the current corresponding generation time of the offline database, and the online data to be acquired is the real-time data after the time, so that the latest full offline data and the online data with higher timeliness can be acquired subsequently, the requirement of data query can be met, and the storage space of the offline database can be saved.
Optionally, in an embodiment of the present disclosure, the current generation time corresponding to the offline database may be obtained by querying an update time list corresponding to the offline database, where the update time list corresponding to the offline database records the generation time corresponding to each offline database, and when the current generation time corresponding to the offline database changes, the update time list updates the current generation time correspondingly, so that the latest update time in the update time list may be the current corresponding generation time. In specific implementation, the update time list may be set according to metadata of the offline database, where the metadata is data describing attributes of the offline data, the storage time of the offline data may be represented by the metadata, after the offline database stores the offline data, the time corresponding to the currently stored offline data is synchronized with the metadata, and the time corresponding to the currently stored offline data is stored by modifying information indicating data modification time in the metadata through a modification instruction, so as to set the update time in the update list by reading the latest time information represented by the metadata.
The first determining module 130 is configured to determine a target time period of the online data to be retrieved according to a current corresponding generation time of the offline data.
The target time period refers to a time range in which online data with the same or similar timeliness are located, one target time period may include one or more online data, and the target time period may be determined according to actual factors such as accuracy requirements of data query.
As a first possible implementation manner, the first determining module 130 determines a time period between a current generation time and a current time corresponding to the offline database as a target time period. In this example, the first determining module 130 determines a time period from a generation time corresponding to the offline database to a current time, as a target time period of the online data, where the target time period includes the online data generated after the generation time corresponding to the offline database, and the online data is retrieved in the target time period, so that the retrieving manner is simple and convenient, and the computing resource is saved.
As a second possible implementation manner, the first determining module 130 determines a time period with a preset length after the current corresponding generation time of the offline database as a target time period. In this example, the time period of the preset length is a time window for aggregating online data when the online data is stored in advance. It should be noted that, according to the characteristics of online data in the data processing field, since one piece of online data cannot be directly stored and then provides query service to the outside, online data generated at the current time and historical online data in a time period of a preset length need to be merged according to a corresponding rule, that is, the online data in the time period of the preset length is aggregated and then stored, and the time period of the preset length is an aggregated time window. For example, 30 pieces of online data generated in the first minute are summed and stored in the tw1 window corresponding to the first minute.
For example, the generated online data in one hour is aggregated in a data set of an online database corresponding to the hour, and the specific time period length may be set according to actual needs, which is not limited herein.
In the embodiment of the present disclosure, after determining the current generation time corresponding to the offline database, the first determining module 130 obtains each time window of the online data recorded in the online data table after the current generation time corresponding to the offline database, it can be understood that the offline data is data before the current generation time corresponding to the offline database, and the time period of the online data to be retrieved is after the generation time of the offline data, so that each time window of the online data after the generation time of the offline data can be determined as a target time period of the online data to be retrieved, which is convenient for retrieving the online data corresponding to the data to be queried in each target time period of the online data.
And a second obtaining module 140 configured to retrieve and obtain the target data corresponding to the data identifier from the offline database and the online data in the target time period.
In the embodiment of the disclosure, according to the data identifier of the data to be queried, the offline data corresponding to the data identifier is retrieved from the offline database, and the online data corresponding to the data identifier is retrieved within the target time period of the online database, so that the offline data and the online data corresponding to the data identifier are retrieved, and the retrieved offline data and the retrieved online data corresponding to the data identifier are used as the target data of the data to be queried, which is convenient for subsequently returning a query result according to the target data.
Optionally, in an embodiment of the present disclosure, when retrieving online data corresponding to a data identifier, the online data may be retrieved in a manner that the data identifier is combined with each target time period, that is, whether online data corresponding to the data identifier exists in each target time period is retrieved in each target time period, a retrieval result of each target time period is separately recorded, and finally, the retrieval result of each target time period is integrated to obtain final target online data, so that influences of different target time periods on the retrieved data are avoided, and accuracy of obtaining the target data is improved.
In one embodiment of the present disclosure, the apparatus further includes:
the second acquisition module is further configured to: and inquiring an updating time list corresponding to the off-line database so as to determine the current corresponding generation time according to the latest updating time in the updating time list.
And the fourth acquisition module is configured to acquire the data to be stored from the current corresponding generation time to the current time under the condition that the time interval between the current corresponding generation time and the current time of the offline database is greater than the threshold value.
And the second determination module is configured to determine difference data between the data to be stored and the data stored in the offline database.
And the first storage module is configured to store the difference data into an offline database and add the current moment into the update moment list.
The difference data is data that is different between data to be stored and data stored in the offline database, and it should be noted that, for the periodically generated full amount of offline data, usually, only part of the full amount of offline data generated in two consecutive periods may be changed, and other data are kept consistent, and the changed data is the difference data in the present disclosure.
In the embodiment of the present disclosure, the data to be stored is compared with the data stored in the offline database, the data that changes between the data to be stored and the data stored in the offline database is obtained, and the data that changes is used as the difference data between the data to be stored and the data stored in the offline database.
In one embodiment of the disclosure, the first storage module is further configured to: and under the condition that the data is not stored in the offline database, storing the data to be stored into the offline database, and adding the current moment into the update moment list.
In an embodiment of the present disclosure, the query request further includes a service identifier, and the apparatus further includes:
the third determining module is configured to determine a target processing mode corresponding to the target data according to the target service identifier;
and the processing module is configured to process the target data according to the target processing mode.
In one embodiment of the present disclosure, the apparatus further includes:
and the fourth determining module is configured to determine a target processing mode corresponding to the target service identifier according to a mapping relation between a preset service identifier and the processing mode, or determine the type of the target data according to the target service identifier and determine the target processing mode according to the type of the target data.
In one embodiment of the present disclosure, the first determining module further includes: the first determining unit is configured to determine a time period between a current generation time corresponding to the offline database and the current time as a target time period, or determine a time period with a preset length after the current generation time corresponding to the offline database as the target time period.
In practical use, the data processing device provided by the embodiment of the disclosure may be configured in a server to execute the foregoing data processing method. Therefore, with regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated herein.
When a data query request exists, the data processing apparatus according to the embodiments of the present disclosure may determine a latest time for updating offline data according to a predetermined current corresponding generation time of the offline database, and then determine a target time period of online data to be retrieved according to the latest time for updating offline data, so as to retrieve and obtain target data corresponding to a data identifier from the offline database and the online data in the target time period. Therefore, the moment of updating the off-line data in the off-line database can be determined according to the current corresponding generation moment of the off-line database, the current synchronized off-line data and the real-time data in the corresponding time period after the moment can be inquired according to the latest off-line data updating moment, the moment of updating the off-line data is determined according to the current corresponding generation moment of the off-line database, the off-line data changed during updating can be synchronized to the off-line database conveniently, the off-line data can be inquired in a full off-line data table, and therefore the data inquiry task can be realized by only storing the full off-line data table in the off-line database, the storage space is saved, and the calculation resources and time required by off-line data synchronization are reduced.
Fig. 6 is a block diagram illustrating a server 200 for data processing in accordance with an example embodiment.
As shown in fig. 6, the server 200 includes:
a memory 210 and a processor 220, a bus 230 connecting different components (including the memory 210 and the processor 220), wherein the memory 210 stores a computer program, and when the processor 220 executes the program, the method for processing data according to the embodiment of the present disclosure is implemented.
Bus 230 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Server 200 typically includes a variety of electronic device readable media. Such media may be any available media that is accessible by server 200 and includes both volatile and nonvolatile media, removable and non-removable media.
Memory 210 may also include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)240 and/or cache memory 250. The server 200 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 260 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 230 by one or more data media interfaces. Memory 210 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.
A program/utility 280 having a set (at least one) of program modules 270, including but not limited to an operating system, one or more application programs, other program modules, and program data, each of which or some combination thereof may comprise an implementation of a network environment, may be stored in, for example, the memory 210. The program modules 270 generally perform the functions and/or methodologies of the embodiments described in this disclosure.
The server 200 may also communicate with one or more external devices 290 (e.g., keyboard, pointing device, display 291, etc.), with one or more devices that enable a user to interact with the server 200, and/or with any devices (e.g., network card, modem, etc.) that enable the server 200 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 292. Also, server 200 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via network adapter 293. As shown, network adapter 293 communicates with the other modules of server 200 via bus 230. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the server 200, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processor 220 executes various functional applications and data processing by executing programs stored in the memory 210.
It should be noted that, for the implementation process and the technical principle of the server of the embodiment, reference is made to the foregoing explanation of the data query method of the embodiment of the present disclosure, and details are not described here again.
The server provided by the embodiment of the present disclosure may execute the data processing method as described above, and when there is a data query request, may determine a latest time for updating offline data according to a predetermined current generation time corresponding to the offline database, and then determine a target time period of the online data to be retrieved according to the latest time for updating offline data, so as to retrieve and obtain the target data corresponding to the data identifier from the offline database and the online data in the target time period. Therefore, the moment of updating the off-line data in the off-line database can be determined according to the current corresponding generation moment of the off-line database, the current synchronized off-line data and the real-time data in the corresponding time period after the moment can be inquired according to the latest off-line data updating moment, the moment of updating the off-line data is determined according to the current corresponding generation moment of the off-line database, the off-line data changed during updating can be synchronized to the off-line database conveniently, the off-line data can be inquired in a full off-line data table, and therefore the data inquiry task can be realized by only storing the full off-line data table in the off-line database, the storage space is saved, and the calculation resources and time required by off-line data synchronization are reduced.
In order to implement the above embodiments, the present disclosure also proposes a computer-readable storage medium.
Wherein the instructions in the computer readable storage medium, when executed by the processor of the server, enable the server to perform the method of processing data as previously described.
In order to implement the above embodiments, the present disclosure also provides a computer program product, which, when executed by a processor of a server, enables the server to perform the processing method of data as described above.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.