File analysis method and device, electronic equipment and computer readable medium
1. A file parsing method includes:
acquiring a file set to be analyzed;
carrying out format conversion on each file to be analyzed in the file set to be analyzed so as to generate structured information to be analyzed and obtain a structured information set to be analyzed;
analyzing each piece of structured information to be analyzed in the structured information set to be analyzed to generate an information group to be warehoused, and obtaining an information group set to be warehoused;
determining the warehousing priority corresponding to each information group to be warehoused in the information group set to be warehoused to obtain a warehousing priority set;
and storing the information groups to be warehoused in the information group set to be warehoused to a target database according to the warehousing priority set.
2. The method of claim 1, wherein the structuring information to be parsed in the set of structuring information to be parsed comprises: the first type warehousing information, the second type warehousing information, the third type warehousing information and the fourth type warehousing information; and
the analyzing each piece of structured information to be analyzed in the structured information set to be analyzed to generate an information group to be put into a warehouse includes:
and in response to the fact that a first file analysis type exists in a target file analysis type group, extracting first type warehousing information in the structured information to be analyzed as the information to be warehoused, wherein the target file analysis type group is a file analysis type group corresponding to the structured information to be analyzed.
3. The method of claim 2, wherein the parsing each structured information to be parsed in the set of structured information to be parsed to generate a set of information to be binned further comprises:
and in response to the fact that the second file analysis type exists in the target file analysis type group, extracting second type warehousing information in the structured information to be analyzed as information to be warehoused.
4. The method of claim 3, wherein the parsing each structured information to be parsed in the set of structured information to be parsed to generate a set of information to be binned further comprises:
and in response to determining that a third file analysis type exists in the target file analysis type group, extracting third type warehousing information in the structured information to be analyzed as information to be warehoused.
5. The method of claim 4, wherein the parsing each structured information to be parsed in the set of structured information to be parsed to generate a set of information to be binned further comprises:
and in response to the fact that a fourth file analysis type exists in the target file analysis type group, extracting fourth type warehousing information in the structured information to be analyzed as information to be warehoused.
6. The method of claim 5, wherein prior to said extracting a first type of binned information in the structured information to be parsed as information to be binned in response to determining that a first file parsing type exists in the set of target file parsing types, the method further comprises:
acquiring a file analysis type information set;
and determining a target file analysis type group corresponding to each piece of structured information to be analyzed in the structured information set to be analyzed according to the file analysis type information set to obtain a target file analysis type group set.
7. The method of claim 1, wherein the structuring information to be parsed in the set of structuring information to be parsed further comprises: a first total value, a second total value, a first unit value, and a second unit value; and
the determining the warehousing priority corresponding to each information group to be warehoused in the information group set to be warehoused includes:
and determining the warehousing priority based on a first total value, a second total value, a first unit value and a second unit value which are included by the structured information to be analyzed set and the structured information to be analyzed corresponding to the information group to be warehoused.
8. An apparatus for file parsing, comprising:
the acquisition unit is configured to acquire a file set to be analyzed;
the format conversion unit is configured to perform format conversion on each file to be analyzed in the file set to be analyzed so as to generate structured information to be analyzed, and obtain a structured information set to be analyzed;
the analysis unit is configured to analyze each piece of structured information to be analyzed in the structured information set to be analyzed so as to generate an information group to be warehoused, and obtain an information group set to be warehoused;
the determining unit is configured to determine a warehousing priority corresponding to each information group to be warehoused in the information group set to be warehoused, so as to obtain a warehousing priority set;
and the storage unit is configured to store the information groups to be warehoused in the information group set to be warehoused to a target database according to the warehousing priority set.
9. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
10. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-7.
Background
In the field of engineering construction and the like, a large number of unstructured documents (for example, bid documents) are often required to be processed. At present, when the bid document is operated, the following modes are generally adopted: and manually processing the bidding document in the form of the unstructured document in a manual mode.
However, when the above-described manner is adopted, there are often technical problems as follows:
firstly, in the prior art, the bid document is often directly stored in the database, and because the bid document is often an unstructured document, the bid document is often required to be processed manually, so that the processing efficiency of the bid document is reduced;
secondly, the bidding documents are often stored in the database according to a random sequence, so that the bidding documents with high priority cannot be backed up in the database in time, and when data loss occurs, the bidding documents with high priority which should be backed up cannot be taken out from the database, which causes the whole project progress to be affected.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Some embodiments of the present disclosure propose a file parsing method, apparatus, electronic device and computer readable medium to solve one or more of the technical problems mentioned in the background section above.
In a first aspect, some embodiments of the present disclosure provide a method of file parsing, the method including: acquiring a file set to be analyzed; carrying out format conversion on each file to be analyzed in the file set to be analyzed so as to generate structured information to be analyzed and obtain a structured information set to be analyzed; analyzing each piece of structured information to be analyzed in the structured information set to be analyzed to generate an information group to be warehoused, and obtaining an information group set to be warehoused; determining the warehousing priority corresponding to each information group to be warehoused in the information group set to be warehoused to obtain a warehousing priority set; and storing the information groups to be warehoused in the information group set to be warehoused to a target database according to the warehousing priority set.
In a second aspect, some embodiments of the present disclosure provide a file parsing apparatus, including: the acquisition unit is configured to acquire a file set to be analyzed; the format conversion unit is configured to perform format conversion on each file to be analyzed in the file set to be analyzed so as to generate structured information to be analyzed, and obtain a structured information set to be analyzed; the analysis unit is configured to analyze each piece of structured information to be analyzed in the structured information set to be analyzed so as to generate an information group to be warehoused, and obtain an information group set to be warehoused; the determining unit is configured to determine a warehousing priority corresponding to each information group to be warehoused in the information group set to be warehoused, and obtain a warehousing priority set; and the storage unit is configured to store the information groups to be warehoused in the information group set to be warehoused to a target database according to the warehousing priority set.
In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method described in any of the implementations of the first aspect.
In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect.
The above embodiments of the present disclosure have the following beneficial effects: the processing efficiency of the bid document is improved by the document parsing method of some embodiments of the present disclosure. Specifically, the reason why the processing efficiency of the bid document is low is that: in the prior art, the bid document is often directly stored in the database, and because the bid document is an unstructured document, the bid document often needs to be processed in a manual manner (for example, manual information extraction), so that the processing efficiency of the bid document is reduced. Based on this, in the file parsing method according to some embodiments of the present disclosure, first, a set of files to be parsed is obtained. Therefore, data support can be provided for subsequent information warehousing for information processing. And then, carrying out format conversion on each file to be analyzed in the file set to be analyzed so as to generate structured information to be analyzed, and obtaining a structured information set to be analyzed. By analyzing the file, the data conversion from unstructured data to structured data is realized, and the subsequent information extraction and storage are facilitated. And then, analyzing each piece of structured information to be analyzed in the set of structured information to be analyzed to generate an information group to be warehoused, and obtaining a set of information groups to be warehoused. Therefore, the information group to be put in storage can be obtained as a reference for processing the bidding document. In addition, the information in the files can be extracted in batches, the information extraction efficiency is improved, and the subsequent information utilization efficiency is improved. Secondly, determining the warehousing priority corresponding to each information group to be warehoused in the information group set to be warehoused to obtain a warehousing priority set. In actual conditions, the importance degrees of different files are different, so that the files with higher importance degrees can be guaranteed to be stored preferentially by determining the warehousing priority corresponding to each file. And finally, storing the information groups to be warehoused in the information group set to be warehoused to a target database according to the warehousing priority set. Therefore, the analysis information of the bid document can be obtained, so that the analysis information can be directly processed subsequently. Therefore, the analysis and storage of the unstructured files are realized, and compared with a manual processing mode, the processing efficiency of the bid files is greatly improved.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.
FIG. 1 is a schematic diagram of an application scenario of a file parsing method according to some embodiments of the present disclosure;
FIG. 2 is a flow diagram of some embodiments of a file parsing method according to the present disclosure;
FIG. 3 is a flow diagram of further embodiments of a file parsing method according to the present disclosure;
FIG. 4 is a schematic block diagram of some embodiments of a file parsing apparatus according to the present disclosure;
FIG. 5 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 is a schematic diagram of an application scenario of a file parsing method of some embodiments of the present disclosure.
In the application scenario of fig. 1, first, the computing device 101 may obtain a set of files to be parsed 102. Secondly, the computing device 101 may perform format conversion on each file to be parsed in the set of files to be parsed 102 to generate structured information to be parsed, and obtain a structured set of information to be parsed 103. Further, the computing device 101 may parse each structured to-be-parsed information in the structured to-be-parsed information set 103 to generate a to-be-warehoused information group, resulting in a to-be-warehoused information group set 104. Then, the computing device 101 may determine the warehousing priority corresponding to each information group to be warehoused in the information group set 104 to obtain a warehousing priority set 105. Finally, the computing device 101 may store the information groups to be warehoused in the set of information groups to be warehoused to the target database 106 according to the set of warehousing priorities 105.
The computing device 101 may be hardware or software. When the computing device is hardware, it may be implemented as a distributed cluster composed of multiple servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices enumerated above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.
It should be understood that the number of computing devices in FIG. 1 is merely illustrative. There may be any number of computing devices, as implementation needs dictate.
With continued reference to fig. 2, a flow 200 of some embodiments of a file parsing method according to the present disclosure is shown. The file analysis method comprises the following steps:
step 201, acquiring a set of files to be analyzed.
In some embodiments, an executing entity (such as the computing device 101 shown in fig. 1) of the file parsing method may obtain the set of files to be parsed by means of wired connection or wireless connection. Each file to be analyzed in the file set to be analyzed may be a file containing information that is unstructured information. The unstructured information may be character string information (for example, the unstructured information may be 'file identifier: 202101259656475; minute label identifier: 2021010499368260; minute label identifier: 2021010499652094').
Step 202, performing format conversion on each file to be analyzed in the file set to be analyzed to generate structured information to be analyzed, so as to obtain a structured information set to be analyzed.
In some embodiments, the execution main body may perform format conversion on each file to be parsed in the set of files to be parsed, so as to generate structured information to be parsed, and obtain a structured set of information to be parsed. The execution body may first read data in the file to be parsed. The executing entity may read data in the file to be parsed by an OCR (Optical character recognition) technique. And then, carrying out format conversion on the read data through a regular expression to generate structured information to be analyzed. The structured information to be parsed may be information in JSON (JavaScript Object Notation) format. As an example, the structured set of information to be parsed may be:
001 of { 'File identification,' 001 of partial label identification, '001 of partial packet identification',
{ 'File identification' 002, 'mark separation identification' 001, 'package identification' 001},
{ 'File identification' 003, 'Label separation identification' 001, 'subpackage identification' 002},
{ 'File identification' 004, 'subpackage identification' 002, 'subpackage identification' 001},
{ 'File identification' 005, 'subpackage identification' 003, 'subpackage identification' 001} ].
Step 203, analyzing each piece of structured information to be analyzed in the structured information set to be analyzed to generate an information group to be warehoused, and obtaining an information group set to be warehoused.
In some embodiments, the executing body may analyze each piece of structured information to be analyzed in the set of structured information to be analyzed to generate an information group to be warehoused, so as to obtain a set of information groups to be warehoused.
The execution main body can analyze each piece of structured information to be analyzed in the structured information set to be analyzed through the following steps to generate an information group to be put into storage:
firstly, using a regular expression to perform information segmentation on the structured information to be analyzed so as to generate a key value pair group.
And secondly, determining the key value pairs in the key value pair group as information to be stored in a warehouse.
As an example, the structured information to be parsed may be: { 'File identification': 001, 'packet identification': 001 }. Therefore, the obtained group of information to be put in storage may be: [ (file identification, 001), (sub-label identification, 001), (sub-package identification, 001) ].
And 204, determining the warehousing priority corresponding to each information group to be warehoused in the information group set to be warehoused to obtain a warehousing priority set.
In some embodiments, the execution main body may determine a warehousing priority corresponding to each information group to be warehoused in the information group set to be warehoused, to obtain a warehousing priority set.
The execution main body may determine the warehousing priority corresponding to the information group to be warehoused by:
the first step is to determine the first information to be warehoused in the information group to be warehoused as the first information to be warehoused, and obtain a first information group to be warehoused.
And secondly, sequencing the first information to be warehoused in the first information group to be warehoused according to the sequence of the values included by the first information to be warehoused from large to small to obtain a first information warehousing sequence.
And thirdly, determining the position of the first information to be warehoused corresponding to the information group to be warehoused in the first warehousing information sequence as the warehousing priority.
As an example, the set of information groups to be put in storage may be: { [ (file identification: 001), (subcontract identification: 001) ], [ (file identification: 002), (subcontract identification: 001), (file identification: 003), (subcontract identification: 001), (subcontract identification: 002) ], [ (file identification: 004), (subcontract identification: 002), (subcontract identification: 001) ], [ (file identification: 005), (subcontract identification: 003), (subcontract identification: 001) ] }. The information group to be put in storage may be: (file identifier: 001, sub-label identifier: 001, sub-package identifier: 001). The first information to be warehoused of the information group to be warehoused may be: (File identification: 001). The first information to be entered may include: and (5) identifying the file. The first information to be stored may include: 001. the first warehousing information sequence obtained by sorting may be { (file identification: 005), (file identification: 004), (file identification: 003), (file identification: 002), (file identification: 001, subscript identification: 001) }. Therefore, the binning priority of the to-be-binned information group [ (file identification: 001), (sub-label identification: 001), (sub-package identification: 001) ] may be 5. Thus, the set of binning priorities may be {5, 4, 3, 2, 1 }.
And step 205, storing the information groups to be warehoused in the information group set to be warehoused to the target database according to the warehousing priority set.
In some embodiments, the execution subject may store the information group to be binned in the information group set to be binned in the target database according to the binning priority set. For example, the execution main body may sequentially store the information groups to be warehoused in the information group set to be warehoused to the target server according to the descending order of the warehousing priorities in the warehousing priority set.
The above embodiments of the present disclosure have the following beneficial effects: the processing efficiency of the bid document is improved by the document parsing method of some embodiments of the present disclosure. Specifically, the reason why the processing efficiency of the bid document is low is that: in the prior art, the bid document is often directly stored in the database, and because the bid document is an unstructured document, the bid document often needs to be processed in a manual manner (for example, manual information extraction), so that the processing efficiency of the bid document is reduced. Based on this, in the file parsing method according to some embodiments of the present disclosure, first, a set of files to be parsed is obtained. Therefore, data support can be provided for subsequent information warehousing for information processing. And then, carrying out format conversion on each file to be analyzed in the file set to be analyzed so as to generate structured information to be analyzed, and obtaining a structured information set to be analyzed. By analyzing the file, the data conversion from unstructured data to structured data is realized, and the subsequent information extraction and storage are facilitated. And then, analyzing each piece of structured information to be analyzed in the set of structured information to be analyzed to generate an information group to be warehoused, and obtaining a set of information groups to be warehoused. Therefore, the information group to be put in storage can be obtained as a reference for processing the bidding document. In addition, the information in the files can be extracted in batches, the information extraction efficiency is improved, and the subsequent information utilization efficiency is improved. Secondly, determining the warehousing priority corresponding to each information group to be warehoused in the information group set to be warehoused to obtain a warehousing priority set. In actual conditions, the importance degrees of different files are different, so that the files with higher importance degrees can be guaranteed to be stored preferentially by determining the warehousing priority corresponding to each file. And finally, storing the information groups to be warehoused in the information group set to be warehoused to a target database according to the warehousing priority set. Therefore, the analysis information of the bid document can be obtained, so that the analysis information can be directly processed subsequently. Therefore, the analysis and storage of the unstructured files are realized, and compared with a manual processing mode, the processing efficiency of the bid files is greatly improved.
With further reference to FIG. 3, a flow 300 of further embodiments of a file parsing method is illustrated. The process 300 of the file parsing method includes the following steps:
step 301, acquiring a set of files to be analyzed.
Step 302, performing format conversion on each file to be analyzed in the file set to be analyzed to generate structured information to be analyzed, so as to obtain a structured information set to be analyzed.
In some embodiments, the specific implementation manner and technical effects of the steps 301 and 302 can refer to the steps 201 and 202 in the embodiments corresponding to fig. 2, which are not described herein again.
Step 303, in response to determining that the first file parsing type exists in the target file parsing type group, extracting the first type of warehousing information in the structured information to be parsed as information to be warehoused.
In some embodiments, an executing agent of the file parsing method (e.g., the computing device 101 shown in fig. 1) may extract, as the information to be binned, a first type of binned information in the structured information to be parsed, in response to determining that the first file parsing type exists in the target file parsing type group. Wherein, the structured information to be analyzed may include: the first type warehousing information, the second type warehousing information, the third type warehousing information and the fourth type warehousing information. The first warehousing information may be, but is not limited to, at least one of the following information: and combining the head-pulling party and member information. The federated lead can be a federated bid lead (e.g., a federated bid lead, Shanghai xxxx development, Inc.). The member information can be member information of the united body bidding leaders. The member information may be, but is not limited to, at least one of the following: identity information (e.g., name: zhang san). The second-type warehousing information may be, but is not limited to, at least one of the following information: technical specification bid response value information. The technical specification bid response value information described above may be situational descriptive information of the technical response in the bid document (e.g., a response value description, "D: \2021010499649784\ for details in the bid document. docx"). The third type of warehousing information may be bid agent and bidding details. The bid agent may be a bid agent in a bid document (e.g., bid agent: Shanghai xx, Inc.). The bidding details may be, but are not limited to, at least one of: a non-tax total item value (e.g., non-tax total, 5096025), a tax-containing total item value (e.g., tax-containing total, 5096025), a tax-free unit item value (e.g., tax-free unit price, 5), and a tax-containing unit value (e.g., tax-containing total, 5096025). The fourth type of warehousing information may be, but is not limited to, at least one of the following information: a qualification performance voucher number. The quality performance voucher number can be a voucher number of a quality performance voucher (e.g., quality performance voucher number, 80000000000818632021012413591828). The target file parsing type in the target file parsing type group may be a parsing type of a file to be parsed corresponding to the structured information to be parsed. The set of file types may include at least one of: a first file resolution type (e.g., a business type), a second file resolution type (e.g., a technology type), a third file resolution type (e.g., a price type), and a fourth file resolution type (e.g., a qualification type). The step of extracting the first type of warehousing information in the structured information to be analyzed as the information to be warehoused by the execution main body comprises the following steps:
first, the execution main body may obtain a file parsing type information set through a wired connection or a wireless connection. The file parsing type information may include a file name and a file parsing type. The file name may be a name of a file to be parsed corresponding to the structured information to be parsed (e.g., a1b2c3d4. json). The file parsing type may be a parsing type (e.g., a business type) of a file to be parsed corresponding to the structured information to be parsed.
And secondly, the execution main body can extract the file analysis type included in the file analysis type information meeting the extraction condition in the file analysis type information set as a target file analysis type group. The extraction condition may be a name of a file to be parsed corresponding to the structured information to be parsed, where the name of the file included in the file parsing type information is the name of the file to be parsed.
And thirdly, in response to determining that a first file parsing type exists in a target file parsing type group, the execution main body may extract first type warehousing information in the structured information to be parsed as information to be warehoused, where the target file parsing type group may be a file parsing type group corresponding to the structured information to be parsed.
As an example, the target file parsing type group may be: [ business type, qualification type, price type, technology type ]. The first file parsing type may be: a business type. The first type of warehousing information may be: ((Union Bid Pioneer, Shanghai xxxx development Co., Ltd.), (name, Zhang). Then, the information to be put in storage may be: ((Union Bid Pioneer, Shanghai xxxx development Co., Ltd.), (name, Zhang).
And 304, in response to determining that the second file analysis type exists in the target file analysis type group, extracting second type warehousing information in the structured information to be analyzed as information to be warehoused.
In some embodiments, the executing entity may extract, in response to determining that the second file parsing type exists in the target file parsing type group, second-type warehousing information in the structured information to be parsed as the information to be warehoused.
As an example, the target file parsing type group may be: [ price type, technology type ]. The second file parsing type may be: type of technology. The second type of warehousing information may be: (response value description, "D: \2021010499649784\ for details in the invitation document. docx.)). Then, the information to be put in storage may be: (response value description, "D: \2021010499649784\ for details in the invitation document. docx.)).
Step 305, in response to determining that the third file parsing type exists in the target file parsing type group, extracting the third type of warehousing information in the structured information to be parsed as information to be warehoused.
In some embodiments, the executing entity may extract, in response to determining that a third file parsing type exists in the target file parsing type group, third-type warehousing information in the structured information to be parsed as the information to be warehoused.
As an example, the target file parsing type group may be: [ price type, qualification type ]. The third file parsing type may be: the price type. The third type of warehousing information may be: ((bid Agents: Shanghai xx (group) Co., Ltd.), (No tax Total, 5096025), (No tax Total, 5096025), (No tax Unit, 5), (No tax Total, 5096025)). Then, the information to be put in storage may be: ((bid Agents: Shanghai xx (group) Co., Ltd.), (No tax Total, 5096025), (No tax Total, 5096025), (No tax Unit, 5), (No tax Total, 5096025)).
Step 306, in response to determining that the fourth file parsing type exists in the target file parsing type group, extracting fourth type warehousing information in the structured information to be parsed as information to be warehoused.
In some embodiments, the executing entity may extract, in response to determining that a fourth file parsing type exists in the target file parsing type group, fourth type warehousing information in the structured information to be parsed as the information to be warehoused.
As an example, the target file parsing type group may be: [ business type, qualification type ]. The fourth file parsing type may be: the type of qualification. The fourth type of warehousing information may be: ((qualification performance voucher number, 80000000000818632021012413591828)). Then, the information to be put in storage may be: ((qualification performance voucher number, 80000000000818632021012413591828)).
And 307, determining the warehousing priority corresponding to each information group to be warehoused in the information group set to be warehoused to obtain a warehousing priority set.
In some embodiments, the execution subject may determine a warehousing priority corresponding to each information group to be warehoused in the information group set to be warehoused, to obtain a warehousing priority set. Wherein, the structured information to be analyzed in the structured information set to be analyzed further includes: a first total value (e.g., a tax-containing total bid), a second total value (e.g., a tax-free total bid), a first unit value (e.g., a tax-containing unit bid), and a second unit value (e.g., a tax-free unit bid). The first total value may be a tax-bearing total bid (e.g., 1060). The second total value may be a tax free total bid (e.g., 1000). The first total value of units may be a tax-containing unit price (e.g., 10.6). The second unit total value may be a non-tax unit bid (e.g., 10). The execution subject may determine the warehousing priority according to the following formula based on a first total value, a second total value, a first unit value, and a second unit value included in the structured to-be-analyzed information set and the structured to-be-parsed information corresponding to the to-be-warehoused information group:
。
wherein.Representing a first total value score value.Indicating a serial number.And representing the number of the information groups to be warehoused in the information group set to be warehoused.And representing a first total value included by the structured information to be analyzed corresponding to the information group to be warehoused in the information group set to be warehoused.Representing the first in the information group set to be put in storageAnd the first total value included by the corresponding structured information to be analyzed in the information group to be put in storage.And representing a first total value included by the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehoused.And representing the minimum value in the first total value included by the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehoused.And representing a first total value included by the structured information to be analyzed corresponding to the information group to be put in storage.Representing a second total value score value.And representing a second total value included by the structured information to be analyzed corresponding to the information group to be warehoused in the information group set to be warehoused.Representing the first in the information group set to be put in storageLetter to be put in storageAnd the second total value included by the structured information to be analyzed corresponding to the information group.And representing a second total value included by the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehoused.And representing the maximum value in the second total value included by the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehoused.And representing a second total value included by the structured information to be analyzed corresponding to the information group to be put in storage.Representing a first unit value score value.And representing a first unit value included in the structured information to be analyzed corresponding to the information group to be warehoused in the information group set to be warehoused.Representing the first in the information group set to be put in storageAnd the first unit value included by the structured information to be analyzed corresponding to the information group to be put in storage.And representing a first unit value included in the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehoused.Is shown aboveAnd the minimum value in the first unit value included by the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehoused.And representing a first unit value included in the structured information to be analyzed corresponding to the information group to be put in storage.And expressing the second unit value score value of the information group to be warehoused.And representing a second unit value included in the structured information to be analyzed corresponding to the information group to be warehoused in the information group set to be warehoused.Representing the first in the information group set to be put in storageAnd the second unit value included by the structured information to be analyzed corresponding to the information group to be put in storage.And representing a second unit value included in the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehoused.And representing the maximum value in the second unit value included by the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehoused.And representing a second unit value included in the structured information to be analyzed corresponding to the information group to be put in storage.And representing the warehousing priority of the information group to be warehoused.A first preset weight is represented by a first weight,has a value range of。A second preset weight is represented by a second weight,has a value range of。A third preset weight is represented by a third preset weight,has a value range of。A fourth preset weight is represented by a fourth preset weight,has a value range of。
As an example, the first total value included in the structured information to be analyzed corresponding to the information group to be put into storage isMay be 1060, the second total valueMay be 1000, first unit valueMay be 10.6 and a second unit valueMay be 10, and the first total value included in the structured to-be-analyzed information corresponding to each to-be-warehoused information group in the to-be-warehoused information group set may be 10Can be [1060, 1545, 1800, 1236, 1016]. A second total value included in the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehousedMay be [1000, 1500, 1800, 1200, 900 ]]. The first unit value included in the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehousedCan be [10.6, 18.6, 20, 10.3, 9 ]]. The second unit value included in the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehousedCan be [10, 18, 20, 10, 8 ]]. First preset weightMay be 0.2. Second preset weightMay be 0.4. Third preset weightMay be 0.2. Fourth preset weightMay be 0.2. Determining the warehousing priority of the information group to be warehoused according to the formula:
。
the code for determining the warehousing priority of the information group to be warehoused is as follows:
import math
L = [[1060,1545,1800,1236,1016],
[1000,1500,1800,1200,900],
[10.6,18.6,20,10.3,9],
[10,18,20,10,8]]
Q = [0.2,0.4,0.2,0.2]
tmp = 0
for i in range(len(L)):
if i % 2 == 0:
tmp += Q[i] * 1 / (1 + pow(math.e,-(min(L[i])/(L[i][0])) ))
else:
tmp += Q[i] * 1 / (1 + pow(math.e,-(max(L[i])/(L[i][0])) ))
print(tmp)
the formula and related content in step 307 are used as an invention point of the present disclosure, thereby solving a technical problem mentioned in the background art, i.e., "the bidding documents are often stored in the database according to a random sequence, so that the bidding documents with high priority cannot be backed up in the database in time, and when data loss occurs, the bidding documents with high priority which should be backed up cannot be taken out from the database, which causes the whole project progress to be affected. ". The influencing factors that lead to the reduction of data storage efficiency are often as follows: the bid documents are often stored in a database in a random order. To achieve the effect of improving the data storage efficiency if the above-mentioned influencing factors are solved, first, the present disclosure introduces a first total value (e.g., a total price with tax), a second total value (e.g., a total price without tax), a first unit value (e.g., a first unit price) and a second unit value (e.g., a second unit price) included in the structured to-be-analyzed information corresponding to the above-mentioned group of information to be put in storage. In practical situations, the priority of the bid evaluation document is often determined by the value score of the bid evaluation document, and the priority of the bid document with a high value score is also high. Second, the value score is often composed of four sub-value scores, a first total value score, a second total value score, a first unit value score, and a second unit value score. The lower the first total value and the first unit value included in the information group to be put in storage, the less financial resources required to be invested into the sub-processes corresponding to the information group to be put in storage by the total process. Therefore, a result obtained by normalizing a ratio of a minimum value in the first total value included in the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehoused to the first total value corresponding to the information group to be warehoused is determined as a first total value score. And then, determining a result obtained by normalizing a ratio of a minimum value in first unit values included in the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehoused to a first unit value score. Then, the larger the second total value and the second unit value included in the information group to be warehoused is, the higher the cost performance of the sub-project corresponding to the information group to be warehoused is. Therefore, the result obtained by normalizing the ratio of the minimum value of the second total values included in the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehoused to the second total value corresponding to the information group to be warehoused is determined as a second total value score. And then, determining a result obtained by normalizing a ratio of a minimum value in second unit values included in the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehoused to a first unit value corresponding to the information group to be warehoused as a second unit value score. The normalization operation can unify the four sub-value scores into a numerical value between 0 and 1, so that the result deviation caused by too large numerical value difference is avoided, and the weighted summation is convenient. In addition, different sub-value scores have different effects on the total value score. The influence degree of the sub-value score on the value score can be well represented by introducing the first preset weight, the second preset weight, the third preset weight and the fourth preset weight. And then, weighting and summing the first total value score value, the second total value score value, the first unit value score value and the second unit value score value to obtain a warehousing priority, and comprehensively considering the influence of the first total value, the second total value, the first unit value and the second unit value on the value of the bid document. And then, storing the information groups to be warehoused in the information group set to be warehoused to a target database according to the warehousing priority set. Therefore, the information to be put in storage corresponding to the high-value bidding document can be stored preferentially. And furthermore, the bidding documents with high document priority are backed up in priority, so that the influence of data loss on the progress of the whole project is reduced.
And 308, storing the information groups to be warehoused in the information group set to be warehoused to a target database according to the warehousing priority set.
In some embodiments, the specific implementation manner and technical effects of step 308 may refer to step 205 in those embodiments corresponding to fig. 2, and are not described herein again.
The above embodiments of the present disclosure have the following advantages: first, the present disclosure introduces a first total value (e.g., tax-containing total price), a second total value (e.g., tax-free total price), a first unit value (e.g., first unit price), and a second unit value (e.g., second unit price) included in the structured to-be-parsed information corresponding to the above-mentioned to-be-warehoused information set. In practical situations, the priority of the bid evaluation document is often determined by the value score of the bid evaluation document, and the priority of the bid document with a high value score is also high. Second, the value score is often composed of four sub-value scores, a first total value score, a second total value score, a first unit value score, and a second unit value score. The lower the first total value and the first unit value included in the information group to be put in storage, the less financial resources required to be invested into the sub-processes corresponding to the information group to be put in storage by the total process. Therefore, a result obtained by normalizing a ratio of a minimum value in the first total value included in the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehoused to the first total value corresponding to the information group to be warehoused is determined as a first total value score. And then, determining a result obtained by normalizing a ratio of a minimum value in first unit values included in the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehoused to a first unit value score. Then, the larger the second total value and the second unit value included in the information group to be warehoused is, the higher the cost performance of the sub-project corresponding to the information group to be warehoused is. Therefore, the result obtained by normalizing the ratio of the minimum value of the second total values included in the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehoused to the second total value corresponding to the information group to be warehoused is determined as a second total value score. And then, determining a result obtained by normalizing a ratio of a minimum value in second unit values included in the structured information to be analyzed corresponding to each information group to be warehoused in the information group set to be warehoused to a first unit value corresponding to the information group to be warehoused as a second unit value score. The normalization operation can unify the four sub-value scores into a numerical value between 0 and 1, so that the result deviation caused by too large numerical value difference is avoided, and the weighted summation is convenient. In addition, different sub-value scores have different effects on the total value score. The influence degree of the sub-value score on the value score can be well represented by introducing the first preset weight, the second preset weight, the third preset weight and the fourth preset weight. And then, weighting and summing the first total value score value, the second total value score value, the first unit value score value and the second unit value score value to obtain a warehousing priority, and comprehensively considering the influence of the first total value, the second total value, the first unit value and the second unit value on the value of the bid document. And then, storing the information groups to be warehoused in the information group set to be warehoused to a target database according to the warehousing priority set. Therefore, the information to be put in storage corresponding to the high-value bidding document can be stored preferentially. And furthermore, the bidding documents with high document priority are backed up in priority, so that the influence of data loss on the progress of the whole project is reduced.
With further reference to fig. 4, as an implementation of the methods shown in the above figures, the present disclosure provides some embodiments of a file parsing apparatus, which correspond to those shown in fig. 2, and which can be applied in various electronic devices.
As shown in fig. 4, the file parsing apparatus 400 of some embodiments includes: acquisition unit 401, format conversion unit 402, parsing unit 403, determination unit 404, and storage unit 405. The acquiring unit 401 is configured to acquire a set of files to be analyzed; a format conversion unit 402, configured to perform format conversion on each file to be analyzed in the file set to be analyzed, so as to generate structured information to be analyzed, and obtain a structured information set to be analyzed; an analyzing unit 403, configured to analyze each piece of structured information to be analyzed in the set of structured information to be analyzed to generate a set of information to be warehoused, so as to obtain a set of information to be warehoused; a determining unit 404, configured to determine a warehousing priority corresponding to each information group to be warehoused in the information group set to be warehoused, to obtain a warehousing priority set; a storage unit 405 configured to store the information groups to be warehoused in the information group set to be warehoused to a target database according to the warehousing priority set.
It will be understood that the elements described in the apparatus 400 correspond to various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 400 and the units included therein, and will not be described herein again.
Referring now to FIG. 5, a block diagram of an electronic device (e.g., computing device 101 of FIG. 1) 500 suitable for use in implementing some embodiments of the present disclosure is shown. The server shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 5 may represent one device or may represent multiple devices as desired.
In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of some embodiments of the present disclosure.
It should be noted that the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a file set to be analyzed; carrying out format conversion on each file to be analyzed in the file set to be analyzed so as to generate structured information to be analyzed and obtain a structured information set to be analyzed; analyzing each piece of structured information to be analyzed in the structured information set to be analyzed to generate an information group to be warehoused, and obtaining an information group set to be warehoused; determining the warehousing priority corresponding to each information group to be warehoused in the information group set to obtain a warehousing priority set; and storing the information groups to be warehoused in the information group set to be warehoused to a target database according to the warehousing priority set.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a format conversion unit, a parsing unit, a determination unit, and a storage unit. The names of these units do not in some cases form a limitation to the unit itself, and for example, the obtaining unit may also be described as a "unit for obtaining a file set to be parsed".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种矢量数据处理方法、装置及系统