Bidding information processing method, system and readable storage medium
1. A bid information processing method, comprising the steps of:
s1: collecting bid inviting information from the Internet and acquiring a linked entry of a bid inviting item detail page;
s2: requesting detail page links of the bidding notices, acquiring detail pages of bidding projects, analyzing the issuing time of the bidding notices and the specific details of the bidding notices in the detail pages, classifying the bidding notices according to preset notice classification rules, and storing the sorted notices in a corresponding classified table in a database;
s3: the elements for extracting bidding information include item number, item name, bidder, agent, and bidder.
2. The bid-inviting and bidding information processing method according to claim 1, wherein in step S1, the bid-inviting information collection is based on a java language development data collection system, and the frame is based on an open-source webmagic frame.
3. The bid-inviting and bidding information processing method according to claim 2, wherein in step S1, during the process of collecting bid-inviting information, a summary entry of items is located, and a bid-inviting notice detail page link entry is obtained according to a specific jsup or Xpath parsing rule or a regular expression.
4. The bid information processing method according to claim 1, wherein the extracting of the elements of the bid information in step S3 specifically includes:
s31: html data of the bidding project detail page is processed, and the html data is converted into text data through a python third-party library bs 4;
s32: sentence dividing is carried out on the text data;
s33: obtaining the classification of each character in the text data after sentence division:
s34: obtaining a project number and a project name by matching a normal project number and a normal project name category string;
s35: and acquiring the tenderer, the agent and the bidder.
5. The bid information processing method of claim 4, wherein the steps S33-S34 specifically include:
obtaining the classification of each character through a deep learning model of a word vector + bidirectional LSTM + CRF, specifically nine classifications of B _ code, M _ code, E _ code, S _ code, B _ name, M _ name, E _ name, S _ name and 0, which respectively represent a number starting character, a number middle character, a number end character, a single number character, a name starting character, a name middle character, a name end character, a single name character and a common character;
and finally obtaining the item number and the item name by matching the normal item number and item name category strings, such as B _ code + M _ code x n + E _ code, B _ name + M _ name x n + E _ name.
6. The bid information processing method of claim 4, wherein in step S35, the step of obtaining a bidder, an agent, and a bidder specifically comprises:
the method comprises the steps of firstly identifying enterprise names through a python third-party library foolnltk, then taking corresponding context through the position where each name appears for a stock company, and identifying classification of the company through a deep learning model (the deep learning model comprises but is not limited to word vectors + bidirectional LSTM + softmax) according to the context, wherein classification results comprise a tenderer, an agent, a first winning bid candidate, a second winning bid candidate, a third winning bid candidate and none.
7. The bid information processing method of claim 1, further comprising:
and providing a bid item searching interface, receiving a searching condition input by a user, and pushing bid information according to the searching condition.
8. The bid information processing method of claim 7, wherein the method further comprises:
and automatically pushing bid information according to the search condition, the pushing time and the receiving mode input by the user.
9. A bid information processing system, comprising:
an entry acquisition module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for acquiring bidding information from the Internet and acquiring a detailed page link entry of a bidding project;
the bidding announcement classification module: the method comprises the steps that a detail page link for requesting the bidding announcement is obtained, a detail page of a bidding project is obtained, the publishing time of the bidding announcement and the specific details of the bidding announcement are analyzed in the detail page, the bidding announcement is classified according to a preset announcement classification rule, and then the classified listing is stored in a corresponding classified table in a database;
the bid information extraction module: elements for extracting bid information include a project number, a project name, a bidder, an agent, and a bidder.
10. A computer-readable storage medium, characterized in that a bid information processing method program is included in the computer-readable storage medium, and when executed by a processor, implements the bid information processing method according to any one of claims 1 to 8.
Background
When business processing is carried out, enterprises need to process bidding information frequently, and therefore latest dynamic states of bidding announcements in various bidding websites need to be checked in real time. This requires the arrangement of specialized personnel for monitoring, but is limited by the inefficiency and time and labor involved in manually viewing the information. How to automatically acquire bidding information is urgent and can not be solved.
Disclosure of Invention
In view of the above problems, it is an object of the present invention to provide a bid information processing method, system and readable storage medium, which can automatically and efficiently acquire bid information.
In order to solve the technical problems, the technical scheme of the invention is as follows:
the invention provides a bid information processing method in a first aspect, which comprises the following steps:
s1: collecting bid inviting information from the Internet and acquiring a linked entry of a bid inviting item detail page;
s2: requesting a detail page link of the bidding announcement, acquiring a detail page of the bidding project, analyzing the publishing time of the bidding announcement and the specific details of the bidding announcement in the detail page, and classifying the bidding announcement according to a preset announcement classification rule, wherein the category of the bidding announcement comprises any one or more of the following categories: the method comprises the following steps of changing announcements, tendering announcements, winning bid information, tendering preview, tendering answer questions, tendering documents, qualification results, laws and regulations, news information, proposed projects, exhibition promotion and owner purchase, and then storing the results into a corresponding classified table in an oracle database;
s3: the elements for extracting bidding information include item number, item name, bidder, agent, and bidder.
In the scheme, in step S1, collecting bidding information is based on a data collection system developed by java language, and the framework is based on an open-source webmagic framework.
In this scheme, in step S1, in the process of collecting bidding information, a summary entry of a project is located, and a link entry of a bidding announcement detail page is obtained according to a specific jsup or Xpath parsing rule, or a regular expression.
In this embodiment, in step S3, the extracting elements of the bid information specifically includes:
s31: html data of the bidding project detail page is processed, and the html data is converted into text data through a python third-party library bs 4;
s32: sentence dividing is carried out on the text data;
s33: obtaining the classification of each character in the text data after sentence division:
s34: obtaining a project number and a project name by matching a normal project number and a normal project name category string;
s35: and acquiring the tenderer, the agent and the bidder.
In this embodiment, the steps S33-S34 specifically include:
obtaining the classification of each character through a deep learning model of a word vector + bidirectional LSTM + CRF, specifically nine classifications of B _ code, M _ code, E _ code, S _ code, B _ name, M _ name, E _ name, S _ name and 0, which respectively represent a number starting character, a number middle character, a number end character, a single number character, a name starting character, a name middle character, a name end character, a single name character and a common character;
and finally obtaining the item number and the item name by matching the normal item number and item name category strings, such as B _ code + M _ code x n + E _ code, B _ name + M _ name x n + E _ name.
In this scheme, in step S35, acquiring a bidder, an agent, and a bidder specifically includes:
the method comprises the steps of firstly identifying enterprise names through a python third-party library foolnltk, then taking corresponding context through the position where each name appears for a stock company, and identifying classification of the company through a deep learning model (the deep learning model comprises but is not limited to word vectors + bidirectional LSTM + softmax) according to the context, wherein classification results comprise a tenderer, an agent, a first winning bid candidate, a second winning bid candidate, a third winning bid candidate and none.
In this embodiment, the method further includes:
and providing a bid item searching interface, receiving a searching condition input by a user, and pushing bid information according to the searching condition. For example, the term can be searched in all directions by using a condition search such as term keywords, regions, announcement distribution time, information categories and the like, and an advanced search mode such as precision, fuzzy, intelligence and the like can be used.
In this embodiment, the method further includes:
and automatically pushing bid information according to the search condition, the pushing time and the receiving mode input by the user.
The second aspect of the present invention also provides a bid information processing system, including:
an entry acquisition module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for acquiring bidding information from the Internet and acquiring a detailed page link entry of a bidding project;
the bidding announcement classification module: the method comprises the following steps of linking a detail page for requesting the bidding announcement, acquiring the detail page of the bidding project, analyzing the publishing time of the bidding announcement and the specific details of the bidding announcement in the detail page, and classifying the bidding announcement according to a preset announcement classification rule, wherein the category of the bidding announcement comprises any one or more of the following categories: the method comprises the following steps of changing announcements, tendering announcements, winning bid information, tendering preview, tendering answer questions, tendering documents, qualification results, laws and regulations, news information, proposed projects, exhibition promotion and owner purchase, and then storing the results into a corresponding classified table in an oracle database;
the bid information extraction module: elements for extracting bid information include a project number, a project name, a bidder, an agent, and a bidder.
A third aspect of the present invention provides a computer-readable storage medium having embodied therein a bid information processing method program, which when executed by a processor implements the bid information processing method.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that: the invention provides a bid inviting and bidding information processing method, a bid inviting and bidding information processing system and a readable storage medium, wherein bid inviting information is collected from the Internet, and a linked entry of a detail page of a bid inviting project is obtained; requesting a detail page link of the bidding announcement, analyzing the issuing time of the bidding announcement and the specific details of the bidding announcement, and classifying the bidding announcement according to a preset announcement classification rule; the elements for extracting bidding information include item number, item name, bidder, agent, and bidder. The bidding information can be automatically and efficiently acquired by the method and the system.
Drawings
Fig. 1 is a flowchart of a bidding information processing method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a bidding information processing system according to an embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
As shown in fig. 1, the present invention discloses a bid information processing method, comprising the following steps:
s1: collecting bid inviting information from the Internet and acquiring a linked entry of a bid inviting item detail page;
s2: requesting a detail page link of the bidding announcement, acquiring a detail page of the bidding project, analyzing the publishing time of the bidding announcement and the specific details of the bidding announcement in the detail page, and classifying the bidding announcement according to a preset announcement classification rule, wherein the category of the bidding announcement comprises any one or more of the following categories: the method comprises the following steps of changing announcements, tendering announcements, winning bid information, tendering preview, tendering answer questions, tendering documents, qualification results, laws and regulations, news information, proposed projects, exhibition promotion and owner purchase, and then storing the results into a corresponding classified table in an oracle database;
s3: the elements for extracting bidding information include item number, item name, bidder, agent, and bidder.
In step S1, the collecting bidding information is based on a java-language-developed data collection system, and the framework is based on an open-source webmagic framework.
It should be noted that webmagic is an open-source Java vertical crawler framework, and aims to simplify the development process of a crawler and let a developer concentrate on development of logical functions. The webmagic adopts a fully modular design, and has the functions of covering the life cycle of the whole crawler (link extraction, page downloading, content extraction and persistence), supporting multi-thread grabbing and distributed grabbing, and supporting functions of automatic retry, user-defined UA/cookie and the like. The webmagic comprises a page extraction function, and developers can use cs selectors, xpaths and regular expressions to extract links and contents, and support chain calling of a plurality of selectors.
According to the embodiment of the invention, in step S1, in the process of collecting bidding information, a summary entry of an item is located, and a link entry of a bidding announcement detail page is obtained according to a specific jsup or Xpath parsing rule or a regular expression.
It should be noted that the jsup is a Java HTML parser, and can directly parse a certain URL address and HTML text content. It provides a very labor-saving set of APIs that can fetch and manipulate data through DOM, CSS and jQuery-like manipulation methods.
XPath, the full XML Path Language, is an XML Path Language that is a Language for looking up information in XML documents. Originally intended for searching XML documents, but the same applies to searching HTML documents. So XPath can be used for corresponding information extraction when the crawler is made.
Regular expression (regular expression) describes a pattern of matching character strings, which can be used to check whether a string contains a certain substring, replace the matching substring, or take out a substring meeting a certain condition from a certain string, etc.
According to the embodiment of the present invention, in step S3, the extracting elements of the bid information specifically includes:
s31: html data of the bidding project detail page is processed, and the html data is converted into text data through a python third-party library bs 4;
s32: sentence dividing is carried out on the text data;
s33: obtaining the classification of each character in the text data after sentence division:
s34: obtaining a project number and a project name by matching a normal project number and a normal project name category string;
s35: and acquiring the tenderer, the agent and the bidder.
It should be noted that bs4 is called Beatiful Soup, and provides some simple functions of python formula for processing navigation, search, modification of parse tree, and so on. The input document can be automatically converted into a Unicode code, and the output document can be automatically converted into an utf-8 code.
According to the embodiment of the present invention, the steps S33-S34 specifically include:
obtaining the classification of each character through a deep learning model of a word vector + bidirectional LSTM + CRF, specifically, nine classifications, namely B _ code, M _ code, E _ code, S _ code, B _ name, M _ name, E _ name, S _ name and O, respectively representing a numbering beginning character, a numbering middle character, a numbering end character, a single numbering character, a name beginning character, a name middle character, a name end character, a single name character and a common character;
and finally obtaining the item number and the item name by matching the normal item number and item name category strings, such as B _ code + M _ code x n + E _ code, B _ name + M _ name x n + E _ name.
According to the embodiment of the present invention, in step S35, the acquiring a tenderer, an agent, and a bidder specifically includes:
the method comprises the steps of firstly identifying enterprise names through a python third-party library foolnltk, then taking corresponding context through the position where each name appears for a stock company, and identifying classification of the company through a deep learning model (the deep learning model comprises but is not limited to word vectors + bidirectional LSTM + softmax) according to the context, wherein classification results comprise a tenderer, an agent, a first winning bid candidate, a second winning bid candidate, a third winning bid candidate and none.
According to an embodiment of the invention, the method further comprises:
and providing a bid item searching interface, receiving a searching condition input by a user, and pushing bid information according to the searching condition. For example, the term can be searched in all directions by using a condition search such as term keywords, regions, announcement distribution time, information categories and the like, and an advanced search mode such as precision, fuzzy, intelligence and the like can be used.
According to an embodiment of the invention, the method further comprises:
and automatically pushing bid information according to the search condition, the pushing time and the receiving mode input by the user.
As shown in fig. 2, the present invention discloses a bid information processing system, comprising:
an entry acquisition module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for acquiring bidding information from the Internet and acquiring a detailed page link entry of a bidding project;
the bidding announcement classification module: the method comprises the following steps of linking a detail page for requesting the bidding announcement, acquiring the detail page of the bidding project, analyzing the publishing time of the bidding announcement and the specific details of the bidding announcement in the detail page, and classifying the bidding announcement according to a preset announcement classification rule, wherein the category of the bidding announcement comprises any one or more of the following categories: the method comprises the following steps of changing announcements, tendering announcements, winning bid information, tendering preview, tendering answer questions, tendering documents, qualification results, laws and regulations, news information, proposed projects, exhibition promotion and owner purchase, and then storing the results into a corresponding classified table in an oracle database;
the bid information extraction module: elements for extracting bid information include a project number, a project name, a bidder, an agent, and a bidder.
A third aspect of the present invention provides a computer-readable storage medium having embodied therein a bid information processing method program, which when executed by a processor implements the bid information processing method.
The invention discloses a bid inviting and bidding information processing method, a bid inviting and bidding information processing system and a readable storage medium, wherein bid inviting information is collected from the Internet, and a linked entry of a detail page of a bid inviting project is obtained; requesting a detail page link of the bidding announcement, analyzing the issuing time of the bidding announcement and the specific details of the bidding announcement, and classifying the bidding announcement according to a preset announcement classification rule; the elements for extracting bidding information include item number, item name, bidder, agent, and bidder. The bidding information can be automatically and efficiently acquired by the method and the system.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:数据处理方法与装置