Assembly line processor based on asynchronous single track
1. A pipelined processor based on an asynchronous single rail, comprising: the system comprises an asynchronous control module, an instruction fetching module, a decoding module, an execution module, a self-adaptive selection module, an access module, a write-back module, a storage module, a control and state register and a general register, wherein data communication is completed among the modules through asynchronous single-rail handshaking, the asynchronous control module comprises a plurality of control units, the control units are a plurality of phase decoupling Click units, and the phase decoupling Click units are mutually cascaded through handshaking and are respectively connected with corresponding pipelines.
2. The asynchronous single-rail based pipeline processor as recited in claim 1, wherein a Click signal for controlling the current stage pipeline is generated after successful handshake between the phase decoupling Click units.
3. The asynchronous single-rail based pipeline processor as recited in claim 1, wherein a first stage pipeline comprises the fetch module, a second stage pipeline comprises the decode module, a third stage pipeline comprises the execute module, a fourth stage pipeline comprises the access module, and a fifth stage pipeline comprises the write-back module.
4. The asynchronous monorail-based pipeline processor as claimed in claim 1, wherein said phase decoupling Click unit generates a Click signal, causes a program counter to calculate an instruction address according to a control signal, transmits said instruction address to said instruction fetching module, and simultaneously sends a request signal to a next-stage phase decoupling Click unit, wherein said control signal is an electrical signal comprising jump, exception, and interrupt, and said request signal is an electrical signal sent by a previous-stage phase decoupling Click unit to a next-stage phase decoupling Click unit to request the next-stage phase decoupling Click unit to operate.
5. The asynchronous monorail-based pipeline processor of claim 4, wherein the instruction fetch module reads instructions from the storage module according to the instruction address and transfers the instructions to the decode module.
6. The asynchronous monorail-based pipeline processor of claim 5, wherein the decode module decodes the instruction, reads pending data associated with the instruction from the control and status register and the general purpose register, and transfers the pending data to the execution module.
7. The asynchronous monorail-based pipeline processor as recited in claim 6, wherein said execution module executes corresponding operations according to said data to be processed, and transmits operation results to said memory access module after obtaining operation results.
8. The asynchronous monorail-based pipeline processor as recited in claim 7, wherein said memory access module performs read and write operations on said memory module according to said operation result.
9. The asynchronous monorail-based pipeline processor as claimed in claim 1, wherein said execution module comprises a predictive flush module, a jump module, and a bypass module, wherein said predictive flush module flushes incorrect predicted instructions, said jump module generates jump addresses according to jump signals and operation results and returns the jump addresses to a program counter, and said bypass module obtains needed data from a post-module in advance according to register addresses.
10. The asynchronous monorail-based pipeline processor as claimed in claim 1, wherein said adaptive selection module obtains operation results from said execution module and transmits said operation results to said memory access module and said write-back module according to said control signal, said write-back module receives operation results transmitted by said execution module and said memory access module and writes said operation results back to a register, said storage module comprises an instruction storage module for storing received instructions; and the data storage module is used for storing the received data.
Background
With the rapid development of the internet of things and artificial intelligence, the SOC technology is mature continuously, most of the existing chips integrate their own processors, and it can be seen that the processors play an important role in the electronic technology, so the design of the processors is concerned extensively. The processor structure is roughly composed of an arithmetic logic unit, a register unit and a control unit, wherein the units are all composed of a large number of registers, and data processing instructions only operate on the registers. Due to the existence of the global clock, although the operation speed and the execution efficiency are high, the register always turns over continuously along with the clock, more energy is consumed, and extra power consumption is increased. In addition, most processors are designed by synchronous circuits, the global clock offset problem is serious, a complex clock tree network exists, the design is difficult, and the clock tree can seriously occupy the chip design area and the power consumption. Meanwhile, in the synchronous circuit, all paths work under the same clock, in order to ensure that one clock cycle can complete all logic operations, the clock frequency is limited by the delay of a key path in the circuit, other paths are influenced at the same time, and the optimization of the key path is difficult, so that the clock frequency is difficult to improve, and the performance of the whole processor is limited. Therefore, the prior art has the problems of high power consumption, serious global clock skew and low speed of limiting the clock frequency of the pipeline processor.
Disclosure of Invention
In the embodiment, a pipeline processor based on an asynchronous single-rail is provided to solve the problems of high power consumption, serious global clock skew and low clock frequency limiting speed of the pipeline processor in the related art.
The application discloses an asynchronous single-rail based pipeline processor includes: the system comprises an asynchronous control module, an instruction fetching module, a decoding module, an execution module, a self-adaptive selection module, an access module, a write-back module, a storage module, a control and state register and a general register, wherein data communication is completed among the modules through asynchronous single-rail handshaking, the asynchronous control module comprises a plurality of control units, the control units are a plurality of phase decoupling Click units, and the phase decoupling Click units are mutually cascaded through handshaking and are respectively connected with corresponding pipelines.
And generating a Click signal for controlling the current stage of the pipeline after the handshaking between the phase decoupling Click units is successful.
The first-stage pipeline comprises the instruction fetching module, the second-stage pipeline comprises the decoding module, the third-stage pipeline comprises the execution module, the fourth-stage pipeline comprises the access module, and the fifth-stage pipeline comprises the write-back module.
The phase decoupling Click unit generates a Click signal, a program counter calculates an instruction address according to a control signal, the instruction address is transmitted to the instruction fetching module, and a request signal is sent to a next-stage phase decoupling Click unit at the same time, wherein the control signal comprises jumping, abnormity and interruption, and the request signal is an electric signal which is sent by the previous-stage phase decoupling Click unit to the next-stage phase decoupling Click unit and requests the next-stage phase decoupling Click unit to work.
The instruction fetching module reads an instruction from the storage module according to the instruction address and transmits the instruction to the decoding module.
The decoding module decodes the instruction, reads data to be processed related to the instruction from the control and status register and the general register, and transmits the data to be processed to the execution module.
The execution module executes corresponding operation according to the data to be processed, and transmits the operation result to the memory access module after obtaining the operation result.
And the memory access module carries out read-write operation on the memory module according to the operation result.
The execution module comprises a predictive flushing module, a skip module and a bypass module, wherein the predictive flushing module flushes incorrect predictive instructions, the skip module generates skip addresses according to skip signals and operation results and returns the skip addresses to a program counter, and the bypass module obtains needed data from a post-positioned module in advance according to register addresses.
The self-adaptive selection module acquires an operation result from the execution module and transmits the operation result to the memory access module and the write-back module according to the control signal. The write-back module receives the operation results transmitted by the execution module and the memory access module and writes the operation results back to the register. The storage module comprises an instruction storage module used for storing the received instruction; and the data storage module is used for storing the received data.
Compared with the related art, the pipeline processor based on the asynchronous single-rail is provided in the embodiment, the problems that the pipeline processor in the related art is high in power consumption, serious in global clock offset and low in speed of limiting clock frequency are solved, and the pipeline runs at a high speed without a clock under low power consumption.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic diagram of an asynchronous monorail-based pipeline processor according to the present application;
FIG. 2 is a logic diagram of a phase decoupling Click cell of an embodiment of the present application;
FIG. 3 is a logic diagram of the control unit C _ EX2MEM according to an embodiment of the present application;
FIG. 4 is a logic diagram of a control unit C _ MEM2WB according to an embodiment of the present application;
FIG. 5 is a logic diagram of a first-level control unit of an asynchronous control module in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
In the present embodiment, a pipeline processor based on an asynchronous single track is provided. Fig. 1 is a schematic structural diagram of an asynchronous single-rail-based pipeline processor according to the present application, and the asynchronous single-rail-based pipeline processor shown in fig. 1 includes: the system comprises an asynchronous control module, an instruction fetching module, a decoding module, an execution module, a self-adaptive selection module, an access module, a write-back module, a storage module, a control and state register and a general register, wherein data communication is completed among the modules through asynchronous single-rail handshaking, the asynchronous control module comprises a plurality of control units, the control units are a plurality of phase decoupling Click units, and the phase decoupling Click units are mutually cascaded through handshaking and are respectively connected with corresponding pipelines. In the above embodiment, the fetch module corresponds to the first stage pipeline 20, the decode module corresponds to the second stage pipeline 30, the execute module corresponds to the third stage pipeline 40, the access module corresponds to the fourth stage pipeline 50, and the write-back module corresponds to the fifth stage pipeline 60.
In the embodiment, after the handshake between the phase decoupling Click units is successful, a Click signal for controlling the pipeline of the stage is generated. The phase decoupling Click units generate a Click signal through 'request' and 'response' signal handshake, the Click signal replaces a global clock of a synchronous circuit to control the Click signal of the pipeline of the stage, the clock is driven by an event, and the phase decoupling Click units belong to a circuit without the global clock. As shown in fig. 1, the connection between the first control unit 101 and the first-stage pipeline 20, the specific working process of the first control unit 101 and the first-stage pipeline 20 is as follows: the first control unit 101 generates a click signal of the program counter according to an enable signal of the processor operation, once operates the program counter, calculates a next instruction address according to control signals such as jump, exception, interrupt, and the like, and transmits the next instruction address to the first stage pipeline 20. The control unit generates a request signal of the control unit of the next stage while generating a click signal of the present stage. The control signal is an electric signal comprising jump, abnormity and interruption, and the request signal is an electric signal which is sent by the previous-stage phase decoupling Click unit to the next-stage phase decoupling Click unit and requests the next-stage phase decoupling Click unit to work. The request signal is transmitted between the upper and lower levels of the adjacent control units, and the specific outline is as follows: the upper control unit (such as a first control unit) sends a request signal to an adjacent lower control unit (such as a second control unit), wherein the request signal is an electric signal which can enable the next-stage control unit to control the pipeline corresponding to the upper control unit to work, after the next-stage control unit receives the request signal sent by the previous-stage control unit, the pipeline connected with the next-stage control unit starts to work, and meanwhile, the previous-stage control unit sends a response signal to the previous-stage control unit to inform the previous-stage control unit that the current-stage control unit has completed a control task. The embodiment only demonstrates a simple five-stage pipeline structure, and the content included in the pipeline can be changed if the practical application needs. Compared with the prior art, the circuit has the advantages that the clock signal generated by the Click module (control unit) in the asynchronous single-rail circuit according to the states of the request signal and the response signal replaces a global clock in the synchronous circuit to realize that external clock signal input is not needed, so that a complex clock network is not needed, the condition that the clock tree network occupies a large amount of chip area and power consumption is increased is avoided, and meanwhile, the circuit has the advantage of simple design and can improve the running speed of a processor and reduce the power consumption.
It should be noted that the initialization signal is required to initialize the processor before the processor starts to operate. After initialization, the data inside the processor is in an initial state and the asynchronous pipeline is in a stalled state. When the enable signal level for the processor to operate is high, the asynchronous single-rail pipeline processor starts to operate. Therefore, when the processor is used for processing data, an initialization signal is needed to initialize the processor, and the data of each internal module, the memory and the register are ensured to be in an initial state.
In this embodiment, the first to fifth control units refer to different phase decoupling Click units, and the naming rule and the corresponding relationship are as follows: the first control unit is a phase decoupling Click unit C _ PC, the second control unit is a phase decoupling Click unit C _ IF2ID, the third control unit is a phase decoupling Click unit C _ ID2EX, the fourth control unit is a phase decoupling Click unit C _ EX2ME, and the fifth control unit is a phase decoupling Click unit C _ MEM2 WB. Correspondingly, five control units respectively control five major registers, specifically: the first control unit C _ PC controls the PC register, the second control unit C _ IF2ID controls the IF2ID register, the third control unit C _ ID2EX controls the ID2EX register, the fourth control unit C _ EX2ME controls the EX2ME register, and the fifth control unit C _ MEM2WB controls the CSR status register and the general purpose register. The correspondence relationship is only one of specific embodiments, and does not represent that only the correspondence relationship described above can be obtained. In the above embodiment, after the instruction fetch module (the first stage pipeline 20) receives the instruction address, the operation content and operation mode of each module of the subsequent processor are as follows: the instruction fetching module reads the instruction from the storage module according to the instruction address and transmits the instruction to the decoding module. Meanwhile, the control unit C _ IF2ID receives the request signal generated by the C _ PC unit and the response signal returned by the C _ ID2EX unit, generates a click signal for controlling the current stage of the pipeline after the handshake is successful, and generates a request signal for the next stage of the pipeline and a response signal for returning to the previous stage of the pipeline. The decoding module (second stage pipeline 30) performs decoding operation on the instruction, reads data to be processed related to the instruction from the control and status register and the general purpose register, and transmits the data to be processed to the execution module. Meanwhile, the control unit C _ ID2EX receives the request signal generated by the C _ IF2ID unit and the response signal returned by the C _ EX2ME unit, generates a click signal for controlling the operation of the current stage of the pipeline after the handshake is successful, generates a request signal for the next stage of the pipeline and returns a response signal for the previous stage of the pipeline. The execution module (the third-stage pipeline 40) executes corresponding operation according to the data to be processed, obtains an operation result and transmits the operation result to the memory access module. Meanwhile, the control unit C _ EX2MEM receives the request signal generated by the C _ ID2EX unit and the response signal returned by the C _ MEM2WB unit, generates a click signal for controlling the operation of the current stage of the pipeline after the handshake is successful, and generates a request signal for the next stage of the pipeline and a response signal returned to the previous stage of the pipeline. And the memory access module (the fourth-stage pipeline 50) performs read-write operation on the memory module according to the operation result. The self-adaptive selection module acquires the operation result from the execution module and transmits the operation result to the memory access module and the write-back module according to the control signal. The write-back module (the fifth-stage pipeline 60) receives the operation results transmitted by the execution module and the memory access module, and writes the operation results back to the register. The control units C _ Wreg and C _ Wcsr generate a click signal for writing back to the CSR status register and the general purpose register, and belong to a write-back control unit. Meanwhile, the control unit C _ MEM2WB receives 2 request signals generated by the C _ EX2MEM unit and response signals returned by the C _ Wreg and C _ Wcsr units, generates a click signal for controlling the operation of the current stage pipeline after the handshake succeeds, and simultaneously generates a request signal for the next stage pipeline and a response signal returned to the previous stage pipeline. Specifically, the control units C _ Wreg and C _ Wcsr are control units that control the CSR and REGISTER REGISTERs, and employ phase decoupling Click units. The storage module comprises an instruction storage module used for storing the received instruction; and the data storage module is used for storing the received data. In the above working mode and connection relation, each stage of pipelines is driven by the pulse signal generated by event completion, the frequency of the pulse signal of each stage of pipelines is different, and the frequency is limited only by the longest path of the stage of pipelines, so that the processing speed of the processor is faster than that of a synchronous circuit.
It is necessary to supplement that, the execution module includes a predictive flush module, a skip module, and a bypass module, the predictive flush module flushes incorrect predictive instructions, the skip module generates skip addresses according to skip signals and operation results and returns the skip addresses to the program counter, and the bypass module obtains needed data from the post module in advance according to register addresses, so as to avoid conflicts between data.
The register IF2ID used in the present invention is equivalent to a switch for data transmission with respect to the second control unit C _ IF2ID, and when the Click pulse generated by the second control unit C _ IF2ID arrives, the switch is opened to transmit the data transmitted from the pipeline of the previous stage, and is closed at other times. The relatively closed design can effectively avoid the data in the register from being interfered or damaged when the control unit which is correspondingly controlled does not send out a Click pulse signal. It should be noted that the other registers used in the present invention have the same effect as the register IF2ID described above, i.e. the switch is open when the click pulse arrives, and the rest of the time is closed.
Fig. 2 is a logic diagram of a phase decoupling Click cell according to an embodiment of the present application. First, the naming convention in the figure is: d is a module controlled by the phase decoupling Click unit, In _ Data and Out _ Data respectively represent Data input into the module and Data output by the module, In _ Req and Out _ Req respectively represent input request signals and externally output request signals, and In _ Ack and Out _ Ack respectively represent input response signals and externally output response signals. Therefore, the first control unit C _ PC has only two signals, Out _ Req and In _ Ack, and there is no need to output a response signal and a reception request signal to the outside because the control unit is not connected to the first control unit preamble. As shown in fig. 2, the phase decoupling Click unit workflow is: assuming that In _ Req is 1, In _ Ack is 0, Out _ Ack is 1, Out _ Req is 0, the In _ Req signal is exclusive-or with the In _ Ack signal, and the Out _ Req signal is exclusive-or with the Out _ Ack signal, the output results are all 1, and then through an and gate, a Click pulse is generated, Pi and Po are triggered, the In _ Ack and Out _ Req are inverted, and the values are all 1. I.e. the values of the return acknowledge signal and the generate request signal are 1, i.e. a handshake is completed. Through the phase decoupling Click unit, a Click pulse signal can be generated at a required time to drive a pipeline connected with the phase decoupling Click unit, a global clock in a synchronous circuit is replaced by the generated Click signal, the situation that a clock tree network occupies a large amount of chip area and power consumption is increased is avoided, and meanwhile the phase decoupling Click unit has the advantages of being simple in design, and capable of improving the running speed of a processor and reducing power consumption.
It should be noted that the control unit C _ EX2MEM and the control unit C _ MEM2WB are a pair of control units used in cooperation with each other, and the control units are combined together to implement the selection function of the handshake signal, and the operation principle of the two is described next.
Fig. 3 is a logic diagram of the control unit C _ EX2MEM according to the embodiment of the present application. As shown in fig. 3, the C _ EX2MEM control unit is a pre-handshake selector, and selects and outputs different handshake request signals according to control signals. The Sel signal and the last generated request signal are respectively exclusive-or and exclusive-or, and then the request signal is obtained through a register triggered by a click signal. When Sel is 1, the handshake generation request signal req2 is 1 and req1 is 0; when Sel is 0, the handshake generation request signal req1 is 1, at which time req2 is 0. Through the design, the control unit C _ EX2MEM can output different request signals according to different Sel values to control the corresponding pipelines to execute corresponding operations.
Fig. 4 is a logic diagram of the control unit C _ MEM2WB according to the embodiment of the present application. As shown in fig. 4, the C _ MEM2WB control unit is a post handshake selector, which selects different request acknowledge signals for handshake according to the control signal. The Sel signal and the response signal generated by the control unit of the previous stage are respectively subjected to exclusive OR and exclusive OR, the response signal is obtained through a register triggered by the Click signal, the response signal is subjected to exclusive OR with the request signal given by the previous stage, the Click signal is obtained through combinational logic, and the register is triggered by the Click signal to enable the request signal given to the pipeline of the next stage to be inverted. As shown in fig. 4, the C _ MEM2WB control unit may implement a handshake function for two pairs of request acknowledge signals. The Sel signal and the Sel of the C _ EX2MEM unit are the same signal, and the two control units are used in cooperation to realize the selection function of the handshake signal.
Fig. 5 is a logic diagram of a first-level control unit of an asynchronous control module in an embodiment of the present application, and as shown in fig. 5, a control unit C _ PC generates a click signal of a program counter according to an enable signal of processor operation, so that the program counter operates once, calculates a next instruction address according to control signals such as jump, exception, interrupt, and the like, and transmits the next instruction address to a value taking module. The first stage control unit generates a request signal of the next stage control module C _ IF2ID while generating the click signal of the present stage. The C _ PC unit is the first stage control unit of the pipeline, so it does not require input of a request signal and output of a response signal. D in the figure is a first-stage pipeline corresponding to the instruction fetching module. The first stage pipeline D starts to work after receiving the click signal generated by the first control unit, and transmits output Data Out _ Data to the next stage pipeline.
As can be seen from the above description of the embodiments, the present invention has at least the following advantages: the asynchronous single-rail circuit is adopted, and a global clock in the synchronous circuit is replaced by a click signal, so that a clock tree network does not need to be designed, and the consumed power consumption and the occupied area of the clock tree network are large, so that the power consumption of the processor circuit can be greatly reduced, the designable area in the circuit is saved, a larger space is provided for circuit reconstruction and optimization, meanwhile, the problem that the global clock frequency is limited by the time delay of a key path in the circuit is avoided by adopting the asynchronous circuit, the key paths of all modules are independent, and the key path can be optimized better.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the patent protection. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.