FPGA chip design method for increasing running speed

文档序号:8227 发布日期:2021-09-17 浏览:63次 中文

1. An FPGA chip design method for increasing the running speed is characterized by comprising the following steps:

acquiring an RTL level description file, and converting the RTL level description file into a general circuit;

dividing the general circuit into a plurality of sub-circuits with connection relation, wherein each sub-circuit corresponds to a hardware area in the FPGA chip, and the total logic resource amount of each sub-circuit does not exceed the total logic resource amount in the corresponding hardware area;

respectively carrying out logic optimization and device mapping on each sub-circuit to obtain a sub-netlist corresponding to each sub-circuit, wherein each sub-netlist comprises a plurality of example modules and netlist nets among the example modules, and a connection relation exists among the sub-netlists;

and performing packing, layout and wiring on the FPGA chip based on each sub netlist to obtain a global result of the layout and wiring of the FPGA chip, and completing the chip design of the FPGA chip by using the global result of the layout and wiring.

2. The method according to claim 1, wherein the FPGA chip is a multi-die FPGA, the FPGA chip includes a plurality of connected FPGA dies therein, each hardware area of the FPGA chip represents one FPGA die, and the total amount of logic resources of each sub-circuit is the same or different.

3. The method according to claim 1, wherein the FPGA chip is a single-die FPGA, and the FPGA chip includes one FPGA die therein, so that each hardware area of the FPGA chip represents a different area on the same FPGA die, and the total amount of logic resources of each sub-circuit is the same or different.

4. The method of claim 3, wherein the performing box routing on the FPGA chip based on the respective sub-netlists to obtain a global result of placement and routing of the FPGA chip comprises:

and combining the sub netlists to form a global netlist aiming at the FPGA bare chip, and performing packing, layout and routing on the FPGA bare chip based on the global netlist to obtain a global result of the layout and routing.

5. The method of claim 3, wherein the performing box routing on the FPGA chip based on the respective sub-netlists to obtain a global result of placement and routing of the FPGA chip comprises:

performing boxing layout in respective corresponding hardware areas based on each sub netlist to obtain corresponding layout sub-results;

combining the layout sub-results to form a global layout result for the FPGA bare chip;

and carrying out wiring on the FPGA die based on the global layout result to obtain the global layout and wiring result.

6. The method according to claim 2 or 3, wherein the performing box layout routing on the FPGA chip based on each sub netlist to obtain a global result of layout routing of the FPGA chip comprises:

performing boxing layout and wiring in respective corresponding hardware areas based on each sub netlist to obtain corresponding layout and wiring sub-results;

and combining the sub-results of the layout and the wiring to form a global result of the layout and the wiring of the FPGA chip.

7. The method according to any one of claims 1-3, wherein said dividing said general circuit into several sub-circuits having connection relations comprises:

determining a hierarchical structure of the general circuit according to the RTL level description file, wherein the hierarchical structure indicates the relationship among different sub-modules formed by instantiating gate-level elements in the general circuit, and each sub-module respectively comprises one or more sub-modules located at the next level along the direction from the topmost layer to the bottommost layer until reaching the bottommost layer, and the bottommost sub-module comprises one or more gate-level elements;

determining the logic resource demand of each submodule, wherein the logic resource demand of each submodule is the sum of the logic resource demands of all submodules contained in the submodule;

dividing each sub-module into corresponding sub-circuits according to the logic resource demand of each sub-module and the logic resource total amount of each sub-circuit according to a preset algorithm, and dividing all sub-modules contained in the sub-modules into the corresponding sub-circuits when one sub-module is divided into the corresponding sub-circuits, wherein the logic resource total amount of each sub-circuit meets the sum of the logic resource demand amounts of all the sub-modules divided into the sub-circuits.

8. The method of claim 7, wherein when the FPGA chip is a multi-die FPGA, the total amount of cross-circuit signals for each sub-circuit, including input and output signals connected to other sub-circuits, meets the sum of cross-circuit signal requirements for all sub-modules divided into the sub-circuits.

9. The method of claim 7, wherein the partitioning each sub-module into corresponding sub-circuits according to a predetermined algorithm comprises:

and sequentially traversing all the sub-modules according to a preset traversal sequence based on a greedy algorithm, and dividing each sub-module into corresponding sub-circuits.

10. The method of claim 9, further comprising:

starting from the top sub-module in the hierarchy of the general circuit, determining the preset traversal order by using a topological sorting algorithm, wherein the traversal order of each sub-module always precedes the sub-module of the next level contained in the sub-module in the preset traversal order.

11. The method of claim 10,

the topological sorting algorithm used is a BFS algorithm.

12. The method of claim 10,

the topology ranking algorithm used is the DFS algorithm.

13. The method of claim 7, wherein the partitioning each sub-module into corresponding sub-circuits according to a predetermined algorithm comprises:

performing 0-1 knapsack algorithm solution on the first sub-circuit by using all sub-modules, and determining each sub-module corresponding to the first sub-circuit;

performing 0-1 knapsack algorithm solving on the (i + 1) th sub-circuit by using the remaining undivided sub-modules to determine each sub-module corresponding to the (i + 1) th sub-circuit, wherein i is a parameter and the initial value of i is 1;

and (5) enabling i to be i +1, and executing the step of performing 0-1 knapsack algorithm solution on the i +1 th sub-circuit by using the remaining undivided sub-modules again until all the sub-circuits are traversed.

14. The method of claim 7, wherein determining the logical resource demand of each sub-module comprises:

adding the logic resource demand quantities of all the sub-modules contained in the sub-modules to obtain the logic resource demand quantity of the sub-modules;

alternatively, flattening the sub-modules determines the logical resource requirements of the sub-modules.

15. The method of claim 7, wherein each sub-module at the bottom layer includes a gate level component.

Background

The design process of the FPGA (field Programmable Gate array) is a process of developing an FPGA chip by using EDA (Electronic design automation) development software and a programming tool. The development process of the EDA development software mainly comprises the operations of user design input, logic synthesis, packing layout and wiring, time sequence analysis, code stream generation and the like. The logic synthesis process is used for converting an input user design (RTL level description file) into a device netlist and mainly comprises four processes of reading, translating, optimizing and mapping, wherein the first step is to read the RTL level description file, the second step is to convert the RTL level description file into a universal circuit (irrelevant to a specific process), the third step is to optimize a circuit structure of the universal circuit according to a design target, and the fourth step is to map the optimized circuit structure to a target process library corresponding to an FPGA chip and select elements in the appropriate target process library to realize the optimized circuit structure, so that the device netlist suitable for the FPGA chip is obtained.

In the logic synthesis process, the time consumed by the first step of reading and the second step of translation is in direct proportion to the scale of the problem to be processed, the time consumed by the third step of optimization and the fourth step of mapping exponentially increases along with the increase of the scale of the problem to be processed (the number of examples), and the overall operation time of the logic synthesis process is longer along with the continuous increase of the scale of the integrated circuit, so that the design flow of the FPGA is overlong.

Disclosure of Invention

The invention provides an FPGA chip design method for improving the running speed aiming at the problems and the technical requirements, and the technical scheme of the invention is as follows:

a method for designing an FPGA chip for improving the running speed comprises the following steps:

obtaining an RTL level description file, and converting the RTL level description file into a general circuit;

dividing the general circuit into a plurality of sub-circuits with connection relation, wherein each sub-circuit corresponds to a hardware area in the FPGA chip, and the total logic resource amount of each sub-circuit does not exceed the total logic resource amount in the corresponding hardware area;

respectively carrying out logic optimization and device mapping on each sub-circuit to obtain a sub-netlist corresponding to each sub-circuit, wherein each sub-netlist comprises a plurality of example modules and netlist nets among the example modules, and a connection relation exists among the sub-netlists;

and packing, laying out and wiring on the FPGA chip based on each sub netlist to obtain a global result of laying out and wiring of the FPGA chip, and completing the chip design of the FPGA chip by using the global result of laying out and wiring.

The FPGA chip is a multi-die FPGA, the FPGA chip internally comprises a plurality of connected FPGA dies, each hardware area of the FPGA chip respectively represents one FPGA die, and the total amount of logic resources of each sub-circuit is the same or different.

The FPGA chip is a single-bare-chip FPGA, the FPGA chip internally comprises an FPGA bare chip, each hardware area of the FPGA chip respectively represents different areas on the same FPGA bare chip, and the total amount of logic resources of each sub-circuit is the same or different.

The technical scheme is that the method comprises the following steps of packing, laying out and wiring on the FPGA chip based on each sub netlist to obtain a global result of the laying out and wiring of the FPGA chip, and comprises the following steps:

and combining the sub netlists to form a global netlist aiming at the FPGA bare chip, and carrying out packing, layout and routing on the FPGA bare chip based on the global netlist to obtain a global result of layout and routing.

The technical scheme is that the method comprises the following steps of packing, laying out and wiring on the FPGA chip based on each sub netlist to obtain a global result of the laying out and wiring of the FPGA chip, and comprises the following steps:

performing boxing layout in respective corresponding hardware areas based on each sub netlist to obtain corresponding layout sub-results;

combining the layout sub-results to form a global layout result for the FPGA bare chip;

and wiring on the FPGA die based on the global layout result to obtain a global layout and wiring result.

The technical scheme is that the method comprises the following steps of packing, laying out and wiring on the FPGA chip based on each sub netlist to obtain a global result of the laying out and wiring of the FPGA chip, and comprises the following steps:

performing boxing layout and wiring in respective corresponding hardware areas based on each sub netlist to obtain corresponding layout and wiring sub-results;

and combining the sub-results of the layout and the wiring to form a global result of the layout and the wiring of the FPGA chip.

The further technical scheme is that the general circuit is divided into a plurality of sub-circuits with connection relations, and the method comprises the following steps:

determining a hierarchical structure of the universal circuit according to the RTL-level description file, wherein the hierarchical structure indicates the relationship among different sub-modules formed by instantiating gate-level elements in the universal circuit, and along the direction from the topmost layer to the bottommost layer of the hierarchical structure, each sub-module respectively comprises one or more sub-modules positioned at the next layer until reaching the bottommost layer, and the bottommost sub-module comprises one or more gate-level elements;

determining the logic resource demand of each submodule, wherein the logic resource demand of each submodule is the sum of the logic resource demands of all submodules contained in the submodule;

dividing each sub-module into corresponding sub-circuits according to the logic resource demand of each sub-module and the logic resource total amount of each sub-circuit according to a preset algorithm, and dividing all sub-modules contained in the sub-modules into the corresponding sub-circuits when one sub-module is divided into the corresponding sub-circuits, wherein the logic resource total amount of each sub-circuit meets the sum of the logic resource demand of all the sub-modules divided into the sub-circuits.

The further technical scheme is that when the FPGA chip is a multi-die FPGA, the total quantity of cross-circuit signals of each sub-circuit meets the sum of cross-circuit signal requirements of all sub-modules divided into the sub-circuits, and the cross-circuit signals comprise input signals and output signals connected to other sub-circuits.

The further technical scheme is that dividing each sub-module into corresponding sub-circuits according to a preset algorithm comprises the following steps:

and sequentially traversing all the sub-modules according to a preset traversal sequence based on a greedy algorithm, and dividing each sub-module into corresponding sub-circuits.

The further technical scheme is that the method also comprises the following steps:

starting from the top-most submodule in the hierarchical structure of the general circuit, a predetermined traversal order is determined by using a topological sorting algorithm, and the traversal order of each submodule always precedes the submodule of the next level contained in the predetermined traversal order.

The further technical scheme is that the used topological sorting algorithm is a BFS algorithm.

The further technical scheme is that the used topological sorting algorithm is a DFS algorithm.

The further technical scheme is that dividing each sub-module into corresponding sub-circuits according to a preset algorithm comprises the following steps:

performing 0-1 knapsack algorithm solution on the first sub-circuit by using all sub-modules, and determining each sub-module corresponding to the first sub-circuit;

performing 0-1 knapsack algorithm solving on the (i + 1) th sub-circuit by using the remaining undivided sub-modules to determine each sub-module corresponding to the (i + 1) th sub-circuit, wherein i is a parameter and the initial value of i is 1;

and (5) enabling i to be i +1, and executing the step of performing 0-1 knapsack algorithm solution on the i +1 th sub-circuit by using the remaining undivided sub-modules again until all the sub-circuits are traversed.

The further technical scheme is that the determining of the logic resource demand of each submodule comprises the following steps:

adding the logic resource demand of all the sub-modules contained in the sub-modules to obtain the logic resource demand of the sub-modules;

alternatively, the flattening submodule determines a logical resource requirement of the submodule.

The further technical scheme is that each submodule at the bottommost layer comprises a gate-level element.

The beneficial technical effects of the invention are as follows:

the application discloses a design method of an FPGA chip for improving the running speed, which optimizes the design flow of the FPGA chip, particularly optimizes the logic synthesis process, in the logic synthesis process, after an RTL-level description file is converted into a universal circuit, firstly, the circuit structure is split, then, a plurality of sub-circuits obtained by division are respectively subjected to logic optimization and device mapping, so that the general circuit with larger circuit scale is not integrally processed, but a plurality of small-scale circuit structures are respectively processed, the overall consumption time can be reduced, the design time of the FPGA is shortened, and the design method is particularly suitable for large-scale FPGA chips such as multi-bare-chip FPGA.

Drawings

Fig. 1 is a flow chart of an FPGA chip design method disclosed in the present application.

FIG. 2 is a flowchart of a method of an embodiment in which the method is applied in the context of a multi-die FPGA.

FIG. 3 is a flowchart of a method of an embodiment in which the method is applied in the context of a single-die FPGA.

FIG. 4 is a flow chart illustrating a general circuit divided into several sub-circuits according to the present application.

Detailed Description

The following further describes the embodiments of the present invention with reference to the drawings.

The application discloses a method for designing an FPGA chip for increasing the running speed, please refer to a flow chart shown in FIG. 1, and the method comprises the following steps:

step S1, an RTL level description file is acquired and converted into a general circuit. The step of reading the RTL level description file for translation is the same as the reading and translation operations in the logic synthesis process of the conventional FPGA chip design, and therefore, the present application is not expanded.

And step S2, dividing the general circuit into a plurality of sub-circuits with connection relation, wherein each sub-circuit corresponds to a hardware area in the FPGA chip.

The area ranges of the hardware areas are the same or different, and the total amount of logic resources in the hardware areas is the same or different. The total amount of logic resources of each sub-circuit does not exceed the total amount of logic resources in the corresponding hardware area, and the total amounts of logic resources of the sub-circuits are the same or different.

In the logic synthesis process of the conventional FPGA chip design, after a general circuit is formed by translation, the logic optimization and the device mapping of the whole chip are directly carried out.

And step S3, respectively carrying out logic optimization and device mapping on each sub-circuit to obtain a sub-netlist corresponding to each sub-circuit, wherein each sub-netlist comprises a plurality of example modules and netlist nets among the example modules, and the sub-netlists are connected with one another.

The specific operations of logic optimization and device mapping are the same as those of logic optimization and device mapping in the logic synthesis process of the conventional FPGA chip design, and the application is not expanded. The chip design method is different from the conventional chip design process in that logic optimization and device mapping are conventionally performed on the whole general circuit with a large circuit scale, and logic optimization and device mapping are respectively performed on each sub-circuit with a small circuit scale, so that the total time for completing logic optimization and device mapping of all sub-circuits is far shorter than the time for directly completing logic optimization and device mapping of the whole general circuit even if the sub-circuits are sequentially processed. Moreover, the logic optimization and the device mapping of the plurality of sub-circuits can be operated in parallel, and the consumed time length is further reduced.

And step S4, packing, laying out and routing on the FPGA chip based on each sub netlist to obtain a global result of the laying out and routing of the FPGA chip, and completing the chip design of the FPGA chip by using the global result of the laying out and routing. The overall layout and wiring result, that is, the overall layout and wiring result of the entire FPGA chip, is obtained, and then the chip design can be finally completed only by performing operations such as timing analysis and code stream generation, and these subsequent operations are similar to conventional operations, so that the present application is not expanded.

In the application, the FPGA chip needing chip design is a multi-die FPGA or a single-die FPGA, the multi-die FPGA comprises a plurality of connected FPGA dies, and the single-die FPGA comprises one FPGA die. When the method is applied to a multi-die FPGA and a single-die FPGA, the specific implementation method has some differences, and the method introduces two situations through the following two embodiments respectively:

the FPGA chip is a multi-die FPGA. The chip design method includes the following steps, please refer to the flowchart shown in fig. 2:

step 1-1, obtaining an RTL level description file, and converting the RTL level description file into a general circuit.

Step 1-2, dividing the general circuit into a plurality of sub-circuits with connection relation, wherein each sub-circuit corresponds to a hardware area in the FPGA chip.

In this embodiment, for convenience of implementation, each hardware region of the FPGA chip represents one FPGA die, the total amount of logic resources of each sub-circuit does not exceed the total amount of logic resources in the corresponding FPGA die, the total amounts of logic resources included in the respective FPGA dies are the same or different, and the total amounts of logic resources of the respective sub-circuits are the same or different.

It should be noted that, in actual implementation, the hardware area may not correspond to the FPGA die, and one hardware area may include a partial area in one FPGA die, or one hardware area includes areas in a plurality of FPGA dies, which may increase the difficulty of subsequent operations to some extent, but may also be applicable to the technical solution provided in this embodiment.

And 1-3, respectively carrying out logic optimization and device mapping on each sub-circuit to obtain a sub-netlist corresponding to each sub-circuit.

And 1-4, performing boxing layout and wiring in respective corresponding hardware areas based on each sub netlist to obtain corresponding layout and wiring sub-results, namely performing boxing layout and wiring on respective corresponding FPGA bare chips to obtain corresponding layout and wiring sub-results. The operation of binning layout routing on each FPGA die is the same as the operation of existing binning layout routing and is not expanded in this application.

And 1-5, combining all the layout and wiring sub-results to form a layout and wiring global result of the FPGA chip. The connection relation exists among the sub-circuits, the correspondingly formed sub-netlists also exist, and inherent physical transmission lines also exist among different FPGA bare chips in the multi-bare-chip FPGA, so that the connection relation of signals among the different FPGA bare chips can be determined according to the connection relation among the sub-netlists and the physical transmission lines among the FPGA bare chips, and the overall layout and wiring result of the FPGA chip is combined and formed.

And 1-6, then completing the chip design of the FPGA chip by utilizing the overall result of the layout and wiring.

And secondly, the FPGA chip is a single-bare-chip FPGA. The chip design method includes the following steps, please refer to the flowchart shown in fig. 3:

and 2-1, acquiring the RTL level description file, and converting the RTL level description file into a general circuit.

And 2-2, dividing the general circuit into a plurality of sub-circuits with connection relations, wherein each sub-circuit corresponds to a hardware area in the FPGA chip, each hardware area respectively represents different areas on the same FPGA bare chip, the total amount of logic resources in each hardware area is the same or different, and correspondingly, the total amount of the logic resources of each sub-circuit is the same or different. Generally, the total amount of logic resources in each hardware area and the total amount of logic resources of each sub-circuit are configured to be substantially the same, so that the consumed time length when performing logic optimization and device mapping on each sub-circuit is substantially balanced, and the total consumed time length is short.

And 2-3, respectively carrying out logic optimization and device mapping on each sub-circuit to obtain a sub-netlist corresponding to each sub-circuit.

Step 2-4, packing, arranging and wiring on the FPGA chip based on each sub netlist to obtain a layout and wiring global result of the FPGA chip, wherein for the single-die FPGA, the step can be realized by adopting any one of the following three methods:

the first method is similar to the multi-die FPGA, and after the boxing layout and wiring are respectively carried out in the corresponding hardware areas based on each sub-netlist to obtain the corresponding layout and wiring sub-results, the layout and wiring sub-results are combined to form the layout and wiring overall result of the FPGA chip.

Although this could be done theoretically, the actual operation is relatively large because the binning place-and-route needs to be modeled based on the underlying resource template. In the foregoing multi-die FPGA embodiment, when each sub-netlist performs binning, layout and routing on each corresponding FPGA die, the used bottom resource template is relatively easy to describe with one FPGA die as a unit. In this embodiment, if the routing is to be performed on one hardware region inside the FPGA die based on a single sub-netlist, the resources inside the FPGA die need to be subdivided, and the corresponding bottom resource templates need to be constructed and described in units of a single hardware region.

And secondly, performing boxing layout in the corresponding hardware area based on each sub netlist to obtain a corresponding layout sub-result, namely performing boxing and layout in each hardware area at first without wiring. And then combining the layout sub-results to form a global layout result for the FPGA die. And finally, wiring is carried out on the FPGA die based on the global layout result to obtain a global layout and wiring result.

As described in the first method, when performing binning layout and wiring in a single hardware region of an FPGA die, the wiring operation difficulty is mainly high, so the method performs binning and layout which are easy to implement in each hardware region, respectively, without performing difficult wiring, and performs wiring in the unit of the whole FPGA die after merging, which can shorten the overall operation time compared with the first method.

And thirdly, directly combining the sub netlists to form a global netlist aiming at the FPGA bare chip, and carrying out packing, layout and routing on the FPGA bare chip based on the global netlist to obtain a global result of the layout and routing.

And 2-5, then completing the chip design of the FPGA chip by utilizing the overall result of the layout and wiring.

In any of the above embodiments, the general circuit needs to be divided into several sub-circuits, and in different embodiments, the dividing method is similar, and includes the following steps, please refer to the flowchart shown in fig. 4:

and 3-1, determining the hierarchical structure of the universal circuit according to the RTL-level description file, wherein the universal circuit is a circuit structure which is formed before the universal circuit is mapped to the lookup table and is irrelevant to a specific process and is expressed by gate-level elements, and the hierarchical structure of the universal circuit is defined in the read RTL-level description file. The hierarchical structure indicates the relationship between different sub-modules instantiated by gate-level elements in the general circuit, and each sub-module respectively comprises one or more sub-modules positioned at the next level along the direction from the topmost layer to the bottommost layer of the hierarchical structure until the sub-module at the bottommost layer is reached.

Optionally, each of the sub-modules at the bottom layer includes a gate-level element, for example, the hierarchical structure is as follows:

(a) the top-level design A comprises a submodule B and a submodule C;

(b) sub-module B includes sub-module D and sub-module E, and sub-module C includes sub-module F and sub-module G. Sub-module F includes gate level element OR2 (a two-input OR gate) AND sub-module G includes gate level element AND2 (a two-input AND gate).

(c) Submodule D includes submodule D1 and submodule D2, and submodule E includes submodule E1 and submodule E2. Submodule D1 includes a gate level element INV (NOT GATE), AND submodule D2 includes a gate level element AND 2. Submodule E1 includes AND2, AND submodule E2 includes AND 2.

In the above example, sub-modules F, G, D1, D2, E1, and E2 are all the bottommost sub-modules, each of which includes a gate level element.

Optionally, a sub-module at the bottom layer includes a plurality of gate-level components, such as the following hierarchy structure formed by taking the above example as an example:

(a) the top-level design A comprises a submodule B and a submodule C;

(b) sub-module B includes sub-module D and sub-module E, and sub-module C includes sub-module F and sub-module G. Submodule D contains one INV AND one AND2, submodule E contains two AND2, submodule F contains OR2, AND submodule G contains AND 2.

Then in the above example, sub-modules D, E, F and G are each the bottom-most sub-module, where sub-modules F and G each include a gate level element. And sub-modules D and E each include a plurality of gate level elements.

As can be seen from the above, which sub-module in the hierarchy of the general-purpose circuit is the lowest sub-module can be configured by itself, and the lowest sub-module in the hierarchy has a sub-module containing only one gate-level element and/or has a sub-module containing a plurality of gate-level elements.

And 3-2, determining the logic resource demand of each sub-module, wherein the logic resource demand of each sub-module is the sum of the logic resource demands of all the sub-modules contained in the sub-module. For example, taking the above-mentioned illustrated hierarchical structure as an example, since the sub-module B includes the sub-module D and the sub-module E, the logic resource requirement of the sub-module B is the sum of the logic resource requirements of the sub-modules D and E, and the sub-module D further includes the sub-module D1 and the sub-module D2, the logic resource requirement of the sub-module D is the sum of the logic resource requirements of the sub-modules D1 and D2, and the rest are analogized in turn.

The logic resource requirements of the sub-modules may be determined in two ways: and adding the logic resource demand of all the sub-modules contained in the sub-modules to obtain the logic resource demand of the sub-modules. Alternatively, the sub-module is directly flattened to determine the logical resource demand of the sub-module.

And 3-3, dividing each sub-module into corresponding sub-circuits according to the logic resource demand of each sub-module and the total logic resource amount of each sub-circuit according to a preset algorithm.

When a sub-module is divided into corresponding sub-circuits, all sub-modules included in the sub-module are divided into corresponding sub-circuits. For example, because the sub-module B includes the sub-module D and the sub-module E, when the sub-module B is divided into the corresponding sub-circuits, the sub-module D and all the sub-modules included therein, and the sub-module E and all the sub-modules included therein are all divided into the corresponding sub-circuits, that is, for the hierarchical structure of the tree structure formed by the general-purpose circuit, all the branches where the sub-module B is located are divided into the corresponding sub-circuits.

In both the multi-die FPGA embodiment and the single-die FPGA embodiment, the total amount of logic resources of each sub-circuit satisfies the sum of the logic resource demands of all sub-modules divided into the sub-circuits, that is, when a sub-module is divided into the corresponding sub-circuit, the total amount of logic resources remaining in the sub-circuit must be greater than or equal to the logic resource demand of the sub-module, otherwise, the sub-module cannot be divided into the sub-circuits.

In embodiments of a multi-die FPGA, in addition to the number of logic resources needed to meet the above requirements, the total number of cross-circuit signals per sub-circuit meets the sum of the cross-circuit signal requirements divided into all of the sub-modules in the sub-circuit, the cross-circuit signals including input signals and output signals connected to other sub-circuits. This is because the cross-circuit signals between the sub-circuits need to be implemented by physical transmission lines between the corresponding hardware areas, and in the multi-die FPGA embodiment, the cross-circuit signals between the sub-circuits are implemented by connection channels between different FPGA dies, and the number of connection channels between different FPGA dies is limited, so the total number of cross-circuit signals between the sub-circuits is also limited and cannot exceed the number of connection channels between the corresponding FPGA dies. Therefore, the sum of the cross-circuit signal requirements of each sub-module divided into the sub-circuit needs to be less than or equal to the total cross-circuit signal amount of the sub-circuit, and specific needs ensure that the input signal and the output signal respectively meet, that is, the sum of the input signal requirements of each sub-module divided into the sub-circuit is less than or equal to the total input signal amount of the sub-circuit, and meanwhile, the sum of the output signal requirements of each sub-module is less than or equal to the total output signal amount of the sub-circuit.

(1) In one embodiment, the adopted predetermined algorithm is a greedy algorithm, and all sub-modules are sequentially traversed according to a predetermined traversal order based on the greedy algorithm, and each sub-module is divided into corresponding sub-circuits.

In this embodiment, the traversal order of the sub-modules needs to be determined, and the embodiment uses a topological sorting algorithm to determine a predetermined traversal order starting from the top sub-module in the hierarchy of the general-purpose circuit, in which the traversal order of each sub-module always precedes the sub-module of the next hierarchy that it contains, for example, in the above example, since sub-module B contains sub-module D and sub-module E, the traversal order of sub-module B always precedes D and E.

Optionally, the topology ranking algorithm used is a BFS (break-First Search) algorithm. Or alternatively, the topology ranking algorithm used is the DFS algorithm.

(2) In another embodiment, the adopted predetermined algorithm is a 0-1 knapsack algorithm, and all the sub-modules are used for solving the 0-1 knapsack algorithm of the first sub-circuit, so as to determine each sub-module corresponding to the first sub-circuit. Performing 0-1 knapsack algorithm solving on the (i + 1) th sub-circuit by using the remaining undivided sub-modules to determine each sub-module corresponding to the (i + 1) th sub-circuit, wherein i is a parameter and the initial value of i is 1; and (5) enabling i to be i +1, and executing the step of performing 0-1 knapsack algorithm solution on the i +1 th sub-circuit by using the remaining undivided sub-modules again until all the sub-circuits are traversed.

What has been described above is only a preferred embodiment of the present application, and the present invention is not limited to the above embodiment. It is to be understood that other modifications and variations directly derivable or suggested by those skilled in the art without departing from the spirit and concept of the present invention are to be considered as included within the scope of the present invention.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:一种基于深度学习的芯片供电网络凸快电流估算方法及系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类