Government affair data analysis system for sharing platform
1. A government affairs data analysis system for a shared platform is characterized in that,
a first storage unit for storing a plurality of first credit files { W, P1W is event identification, P1Executing a first execution data, the first execution data being composed of a plurality of first data blocks;
the first government affair terminal comprises a first input unit, a first identification unit, a first processing unit and a first output unit, wherein the first input unit is used for inputting at least one original file, and the first identification unit is used for extracting an event identifier and an identity identifier D from the original file1...DnFirst execution data A, first processing unit generates first credit file and first abstract { W1,D1...DnA first output unit stores the first credit file in a first storage part, wherein X is a storage index of the first data block;
a second storage unit for storing a plurality of second credit files { W, P2},P2For a second execution data, the second execution data being composed of a plurality of second data blocks;
the plurality of second government affair terminals comprise a second execution unit, a second identification unit, a second processing unit and a second output unit, wherein the second execution unit inputs at least one original file, and the second identification unit extracts an event identifier and an identity identifier D from the original filekA second execution data B, a second processing unit generates a second credit file and a second abstract { W, DkB, Y, the output unit stores the second credit file in a second storage part, and Y is a storage index of the second data block;
the service platform comprises a storage unit, a first analysis unit, a second analysis unit and a data management unit, wherein the first analysis unit and the second analysis unit both comprise a data comparison module, the storage unit receives the first abstract and the second abstract, the first analysis unit and the second analysis unit modify at least one second data block according to a first storage part and a second storage part respectively, and the data management unit stores the modified second credit file into the second storage part;
the credit investigation end generates a third abstract { W, D according to the first abstract and the second abstractkC, C is the non-executed data, C is a-B, B is the executed data,
if the second data block to be stored is consistent with the first data block with the same event identifier, the first analysis unit modifies the second data block to be stored into a storage index of the first data block;
and if the second data block to be stored is consistent with the stored second data block with the same identity, the second analysis unit modifies the second data block to be stored into a storage index of the stored second data block.
2. The system of claim 1, wherein the credit inquiry terminal retrieves the second credit file according to the second abstract and retrieves the first credit file according to at least one second data block of the second credit file.
3. The system of claim 1, wherein the first analysis unit data comparison module compares the gray scale map of the second data block to be stored with the gray scale map of the first data block in the first storage unit, and determines whether the gray scale maps are consistent according to a difference value of the gray scale maps.
4. The system of claim 1, wherein the second analysis unit data comparison module compares the gray scale map of the second data block to be stored with the gray scale map of the second data block in the second storage unit, and determines whether the gray scale maps are consistent according to a difference value of the gray scale maps.
5. The system of claim 3, wherein the gray scale map of the first data block and the gray scale map of the second data block are normalized to a preset size; calculating rough screening difference values of the two groups of gray level images according to the gradient information of the gray level images; and if the coarse screening difference value is within a preset coarse screening threshold range, calculating the fine screening difference values of the two groups of gray level images, and if the fine screening difference value is within the preset fine screening threshold range, determining that the corresponding first data block is consistent with the second data block.
6. The system of claim 5, wherein the normalized preset size of the gray scale map is 32 x 32 pixels, and the coarse screening difference value and the fine screening difference value are calculated based on the preset size.
7. The system of claim 5, wherein calculating the coarse screening difference value comprises the steps of, S1: calculating the gray average value of 1024 pixels of the two groups of gray images; s2: comparing the gray value of each pixel with the average gray value, and recording the average gray value greater than or equal to 1 and the average gray value smaller than 0; s3: calculating hash fingerprints of the two groups of gray level images; s4: and calculating the Hamming distance of the two gray level images according to the hash fingerprint, wherein if the Hamming distance is less than 10, the difference value of the rough screening is within the preset rough screening threshold range.
8. The system of claim 5, wherein calculating the fine screen difference value comprises the steps of, S101: performing DCT (discrete cosine transformation) on the two groups of gray level images to obtain a 32 x 32 coefficient matrix; s102: reserving an 8 x 8 DCT matrix at the upper left corner; s103: calculating the DCT average value of the DCT matrix of 8 multiplied by 8; s104: calculating a hash fingerprint of the DCT matrix; s105: and calculating the Hamming distance of the two groups of DCT matrixes according to the hash fingerprint, and if the Hamming distance is less than 5, determining that the first data block and the second data block corresponding to the two groups of gray level maps are consistent.
Background
And the information interconnection is constructed by using a big data means, so that the use efficiency of government affair data can be improved. For example, CN112860647A, a web portal, an interactive platform and an office platform in the system are respectively connected to the data information sharing platform, and the data acquisition unit is connected to the data information sharing platform. And data sharing is realized through the data information sharing platform. As platforms operate, shared memory is increasingly data stressed. CN112860683A provides a real-time data set cleaning method. The method comprises the steps of generating a data set of the operation to be executed, submitting the data set of the operation to be executed to a second site, and deleting the shared data set and the data set of the operation to be executed. This approach has not addressed how to quickly identify deletable data, and in particular how to find deletable unstructured class data. Therefore, there is a need for further improvements in the prior art.
Disclosure of Invention
The invention provides a government affair data analysis system for a sharing platform, which is used for generating a plurality of credit files in the government affair processing process, quickly searching possibly identical credit files according to identification data, and then storing the credit files as index information, thereby reducing the data memory and improving the data reading efficiency.
A government affair big data analysis system for a shared platform is characterized in that,
a first storage unit for storing a plurality of first credit files { W, P1W is event identification, P1Executing a first execution data, the first execution data being composed of a plurality of first data blocks;
the first government affair terminal comprises a first input unit, a first identification unit, a first processing unit and a first output unit, wherein the first input unit is used for inputting at least one original file, and the first identification unit is used for extracting an event identifier and an identity identifier D from the original file1...DnFirst execution data A, first placeThe processing unit generates a first credit file and a first summary { W }1,D1...DnA first output unit stores the first credit file in a first storage part, wherein X is a storage index of the first data block;
a second storage unit for storing a plurality of second credit files { W, P2},P2For a second execution data, the second execution data being composed of a plurality of second data blocks;
the plurality of second government affair terminals comprise a second execution unit, a second identification unit, a second processing unit and a second output unit, wherein the second execution unit records at least one original file, and the second identification unit extracts an event identifier and an identity identifier D from the original filekA second execution data B, a second processing unit generates a second credit file and a second abstract { W, DkB, Y, the second output unit stores the second credit file into the second storage part, and Y is a storage index of the second data block;
the service platform comprises a storage unit, a first analysis unit, a second analysis unit and a data management unit, wherein the first analysis unit and the second analysis unit both comprise a data comparison module. The storage unit receives the first abstract and the second abstract, the first analysis unit and the second analysis unit modify at least one second data block according to the first storage part and the second storage part respectively, and the data management unit stores the modified second credit file in the second storage part;
the credit investigation end generates a third abstract { W, D according to the first abstract and the second abstractkC, C is the non-executed data, C is a-B, B is the executed data,
if the second data block to be stored is consistent with the first data block with the same event identifier, the first analysis unit modifies the second data block to be stored into a storage index of the first data block;
and if the second data block to be stored is consistent with the stored second data block with the same identity, the second analysis unit modifies the second data block to be stored into a storage index of the stored second data block.
In the invention, the credit investigation terminal retrieves the second credit file according to the second abstract and retrieves the first credit file according to at least one second database of the second credit file.
In the invention, the first analysis unit data comparison module compares the gray-scale image of the second data block to be stored with the gray-scale image of the first data block in the first storage part, and determines whether the gray-scale images are consistent according to the difference value of the gray-scale images.
In the invention, the second analysis unit data comparison module compares the gray-scale map of the second data block to be stored with the gray-scale map of the second data block in the second storage part, and determines whether the gray-scale maps are consistent according to the difference value of the gray-scale maps.
In the invention, the gray level image of the first data block and the gray level image of the second data block are normalized to a preset size; calculating rough screening difference values of the two groups of gray level images according to the gradient information of the gray level images; and if the coarse screening difference value is within a preset coarse screening threshold range, calculating the fine screening difference values of the two groups of gray level images, and if the fine screening difference value is within the preset fine screening threshold range, determining that the corresponding first data block is consistent with the second data block.
In the invention, the preset size of the normalized gray scale image is 32 multiplied by 32 pixel points, and the difference value of the coarse screen and the difference value of the fine screen are calculated by taking the preset size as a reference.
In the present invention, calculating the coarse screening difference value includes the following steps, S1: calculating the gray average value of 1024 pixels of the two groups of gray images; s2: comparing the gray value of each pixel with the average gray value, and recording the average gray value greater than or equal to 1 and the average gray value smaller than 0; s3: calculating hash fingerprints of the two groups of gray level images; s4: and calculating the Hamming distance of the two gray level images according to the hash fingerprint, wherein if the Hamming distance is less than 10, the difference value of the rough screening is within the preset rough screening threshold range.
In the invention, the step of calculating the difference value of the fine screen comprises the following steps of S101: performing DCT (discrete cosine transformation) on the two groups of gray level images to obtain a 32 x 32 coefficient matrix; s102: reserving an 8 x 8 DCT matrix at the upper left corner; s103: calculating the DCT average value of the DCT matrix of 8 multiplied by 8; s104: calculating a hash fingerprint of the DCT matrix; s105: and calculating the Hamming distance of the two groups of DCT matrixes according to the hash fingerprint, and if the Hamming distance is less than 5, determining that the first data block and the second data block corresponding to the two groups of gray level maps are consistent.
According to the government affair data analysis system for the sharing platform, a plurality of credit files which accord with rules in the government affair processing process are generated through the processing unit, possibly identical credit files are searched according to different types of identification data, and then the credit files are stored as index information. The data can be quickly read through the storage index, so that the data memory can be reduced, and the data reading efficiency can be improved.
Drawings
FIG. 1 is a block diagram of a government data analysis system for a shared platform according to the present invention;
FIG. 2 is a flow chart of a first government terminal according to the present invention;
fig. 3 is a flowchart of a second government terminal according to the present invention;
FIG. 4 is a flow chart of the operation of the service platform of the present invention;
FIG. 5 is a flow chart of the operation of the data comparison module of the first analysis unit and the second analysis unit according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.
In the prior art, a portal, an interactive platform and an office platform of a government information sharing system are respectively in communication connection with a data information sharing platform, and data sharing is realized through the data information sharing platform. As platforms operate, shared memory is increasingly data stressed. For a service type that partially stores a large amount of picture data, the picture data amount is large but there is partially duplicated data. Especially in credit investigation services, different government ports provide different credit files (including but not limited to administrative penalty files, judicial officials files), but there is duplication of the content of part of the credit files for the same kind of events. Therefore, the inventor proposes a government affair data analysis system for a sharing platform, which stores the credit file as index information according to the identification data, and achieves the purposes of reducing the data memory and improving the data reading efficiency. The embodiments of the present invention will be further explained with reference to the drawings.
A government affairs data analysis system according to an embodiment of the present invention is provided as shown in fig. 1 to 5. The system framework comprises a first storage part, a second storage part, a first government affair terminal, a second government affair terminal, a service platform and a credit investigation terminal. The first storage unit and the second storage unit are, for example, remote or local large-capacity non-temporary memories. The first storage unit stores a plurality of first credit files W, P1And W is an event identifier, and the event identifier adopts a referee number or a penalty file number. P1The first execution data is composed of a plurality of first data blocks. The first data block is mainly image data, including identification document, program document, decision document, etc. The first credit file may generally contain a plurality of administrative objects for a single administrative event, and thus may contain identification files and program files. Each image file is stored as a single first data block. The second storage unit stores a plurality of second credit files W, P2},P2For the second execution data, the second execution data is composed of a plurality of second data blocks. The second credit document is used to document the subsequent processing of the administrative event, such as the performance of the officials' auscultation and official documents. And the second execution data corresponds to an identity document and a program document containing the administrative object, an execution document and a payment document.
The first government affair terminal processes the administrative events and displays the processing results of the administrative events in an original file, and the first government affair terminal comprises a first input unit, a first identification unit, a first processing unit and a first output unit. The first input unit inputs at least one original file, for example, an effective file sent by a government affair terminal. The first recognition unit extracts an event identifier and an identity identifier D from the original file1...DnThe first execution data A. It is contemplated that the number data of the administrative event and the identification data of the object and the penalty amount are recorded in the original file, and the event identifier and the identification identifier can be extracted from the original fileAnd first execution data. The first execution data a is, for example, a referee result of an original document or a penalty result of a penalty document. The first processing unit generates a first credit file and a first summary { W }1,D1...DnA, X }. The first credit file has a content corresponding to the original file, but its data format satisfies the aforementioned requirements. The first abstract is a structure class index log of the first credit file, and the corresponding first credit file can be quickly searched through the first abstract. The first output unit stores the first credit file in the first storage unit, and X is a storage index of the first data block. The second government affair terminal is a subsequent processing port of the administrative event. The second government terminal comprises a second execution unit, a second identification unit, a second processing unit and a second output unit. The second execution unit inputs at least one original file, and the second identification unit extracts an event identifier and an identity identifier D from the original filekAnd second execution data B. The second processing unit generates a second credit file and a second digest { W, D }kB, Y }. The second output unit stores the second credit file in the second storage unit, and Y is a storage index of the second data block.
The credit investigation terminal can be embedded in any client computer, including but not limited to financial regulatory departments, banks, government departments, and commercial financial institutions. The credit investigation inquiry end can inquire the credit investigation conclusion of a certain client (administrative object) according to the application. The credit investigation end generates a third abstract { W, D) according to the first abstract and the second abstractkC, C is not executed data, B is already executed data, here the subtraction between the minus sign "-" unspecified exponent words, it is also possible to calculate the non-executed part of the credit data, for example: c is the unfinished penalty record of the customer, B is the finished penalty record of the customer; c is the amount of outstanding debt of the unit, and B is the amount of paid debt of the unit. In addition, the credit investigation terminal can also extract a corresponding second credit investigation file according to the storage index of the second abstract. And according to the storage index of the second database, the second credit investigation file retrieves the first credit investigation file. Preferably, the second government affair terminal needle of the invention can generate a plurality of second government affair events (event identifications) aiming at the same government affair eventTwo digests, unfinished penalty record C ═ a-B.
The service platform comprises a storage unit, a first analysis unit, a second analysis unit and a data management unit. The storage unit receives the first abstract and the second abstract. The first analysis unit and the second analysis unit modify at least one second data block according to the first storage part and the second storage part respectively. And if the second data block to be stored is consistent with the first data block with the same event identification, the first analysis unit modifies the second data block to be stored into a storage index of the first data block. The consistent data mainly comprises a judgment file or a penalty file, an original image file of the second data block to be stored is deleted, and the data memory is reduced. And if the second data block to be stored is consistent with the stored second data block with the same identity, the second analysis unit modifies the second data block to be stored into a storage index of the stored second data block. The consistent data mainly comprises the identity file, original image data of the second data block to be stored are deleted, and data memory is reduced. The data management unit stores the modified second credit file in the second storage unit.
There may be many ways to determine whether the credit files are the same, and the data comparison module may compare differences between a first data block in the first credit file and a second data block in the second credit file.
Example one
In this embodiment, referring to fig. 5, the process of comparing data by the data comparing module includes the following steps.
Step 1: performing gray scale conversion on the data picture in the first data block and the data picture in the second data block to obtain a gray scale map of the data picture in the first data block and a gray scale map of the data picture in the second data block, for example, the gray scale conversion step may be S201: acquiring red, green and blue values of each pixel; s202: calculating a Gray value Gray by using a Gray algorithm; s203: replacing original red, green, blue values of pixels with the Gray, wherein one Gray scale algorithm provided by the embodiment is a mean algorithm: gray = (red + green + blue)/3, although other Gray scale algorithms (based on human eye perception, desaturation, decomposition, single channel algorithm) may be used by one of ordinary skill in the art to calculate the Gray scale value.
Step 2, calculating difference values according to the gray-scale image of the data picture in the first data block and the gray-scale image of the data picture in the second data block, wherein the two difference value calculation algorithms provided by the embodiment are an average hash algorithm (aHash) and a perceptual hash algorithm (pHash), and the inventor finds that the difference effect between the pictures is best when the two algorithms are mutually combined and applied to a data comparison module.
And 3, determining whether the data picture in the first data block and the data picture in the second data block are consistent pictures or not according to the difference value of the data picture in the first data block and the data picture in the second data block.
And 4, if the difference value between the data picture in the first data block and the data picture in the second data block and a certain threshold value meet a given condition, determining that the first data and the second data are repeated data.
Example two
In this embodiment, the step 2 of calculating the difference value between the grayscale map of the data picture in the first data block and the grayscale map of the data picture in the second data block includes the following steps. S21: and normalizing the image sizes of the gray scale image of the data picture in the first data block and the gray scale image of the data picture in the second data block to obtain the gray scale image of the data picture in the first data block with the preset size and the gray scale image of the data picture in the second data block with the preset size. S22: and determining the rough screening difference value of the data picture in the first data block and the data picture in the second data block according to the gradient information of the gray scale image of the data picture in the first data block with the preset size and the gray scale image of the data picture in the second data block with the preset size. S23: and if the rough screening difference value is within a preset rough screening threshold range, calculating a fine screening difference value for the gray level image of the data image in the first data block and the gray level image of the data image in the second data block. And if the fine screening difference value is within a preset fine screening threshold range, determining that the data picture in the first data block is consistent with the data picture in the second data block, otherwise, determining that the data picture in the first data block is inconsistent with the data picture in the second data block.
The preset size is the image size of 32 × 32 pixels. And determining the rough screening difference value of the data picture in the first data block and the data picture in the second data block according to the gradient information of the gray scale image of the data picture in the first data block with the preset size and the gray scale image of the data picture in the second data block with the preset size. The method specifically comprises the following steps: s1: and respectively calculating the gray average value of all 1024 pixels of the gray images of the two 32 x 32 pixel points. S2: the gray scales of two gray scale image pixels are compared, the gray scale of each pixel is compared with the average value of the gray scales, the average value of the gray scales which is larger than or equal to the average value of the gray scales is marked as '1', and the average value of the gray scales which is smaller than the average value of the gray scales is marked as '0', and the gray scales are respectively arranged at the positions of each pixel. S3: the hash fingerprints are calculated, and the comparison results of the S2 are combined together in sequence from left to right and from top to bottom to form a 1024-bit integer. S4: and calculating the Hamming distance (Hamming distance) of the two pictures according to the hash fingerprint calculated in the step S3. And if the Hamming distance is less than 10, calculating a fine screening difference value for the gray level image of the data image in the first data block and the gray level image of the data image in the second data block. In this embodiment, the Hamming distance (Hamming distance) is explained as follows: the two strings are subjected to exclusive or operation, and the number of 1 is counted, so that the number is the hamming distance, for example: 0100 → 1001 is a Hamming distance of 3; the Hamming distance of 0110 → 1110 is 1.
And calculating a fine screening difference value for the gray level image of the data image in the first data block and the gray level image of the data image in the second data block. The present embodiment provides steps including S101: calculating DCT (discrete cosine transform), and performing DCT (discrete cosine transform) on two gray-scale images with preset sizes of 32 x 32 pixel points to obtain two 32 x 32 DCT coefficient matrixes. S102: the DCT is scaled down, leaving a portion of the DCT coefficient matrix at the upper left corner of 8 x 8. S103: the mean value is calculated, and the DCT mean value of the 8 × 8 DCT coefficient matrix is calculated. S104: calculating a hash fingerprint, and setting a 64-bit hash value of 0 or 1 according to an 8 × 8 DCT coefficient matrix, specifically, comparing each coefficient in the 8 × 8 DCT coefficient matrix with a DCT average value, setting the value which is greater than or equal to the DCT average value as '1', setting the value which is smaller than the DCT average value as '0', and sequentially comparing from left to right and from top to bottom to form a 64-bit integer. S105: and calculating the Hamming distance of the two gray level pictures according to the hash fingerprint calculated in the S104, wherein if the Hamming distance is less than 5, the two data pictures can be considered to be consistent, otherwise, the two data pictures are inconsistent.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. All or part of the steps of the method of the above embodiments may be implemented by hardware that is configured to be instructed to perform the relevant steps by a program, which may be stored in a computer-readable storage medium, and which, when executed, includes one or a combination of the steps of the method embodiments.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various changes or substitutions within the technical scope of the present application, and these should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种自动录音书签存储