Simulation method of C64x + DSP software flow circulation buffer mechanism
1. A simulation method of a C64x + DSP software pipeline circular buffer mechanism is characterized by comprising the following steps:
(1) and judging the state and the stage of the state machine through the circular buffering control information.
(2) And (3) selecting and executing sequence combination of circular buffering operation according to the state and the stage of the state machine obtained in the step (1) so as to realize the actual operation of the program memory and the circular buffer during the circular buffering operation.
(3) And (3) updating the cycle buffering time sequence information according to the execution result of the step (2), performing state and stage conversion of the state machine according to all the updated cycle control information, and performing cycle jump judgment.
(4) And (4) judging and processing the restart of the common pipeline and the interrupt according to the circulation control information obtained in the step (3).
Repeatedly executing the steps until the end, wherein the circular buffering control information comprises:
circular buffer status information: the circular buffer status information records the status, stage, and interrupt request and mask information of the circular buffer.
And the circular buffering completion judgment information: including the loop termination condition class and the ILC register availability identification.
Circular buffering timing information: including software pipeline stage information and its corresponding cycle information.
Information for fetching: including instruction mask information, pipeline start-stop identification, and the number of cycles required from the drain phase to pipeline restart.
The states of the state machine include:
initial termination _ termination state: the initial _ termination state is the case where software pipeline loop buffering is activated by the SPLOOP instruction and the ILC register may be identified with a 0 when the SPLOOP instruction executes.
Normal state: the software pipeline loop buffer is activated by the execution of three instructions SPLOOP (D/W) when the non-interrupt returns, and is in a normal state when the loop iterates at least once.
Heavy load status: the software pipeline loop buffer is activated by execution of the SPMASKR or SPKERNELR instructions and is in a load state when a reload condition is satisfied.
After interrupt, after _ interrupt state: when the software pipeline circulation buffer is restarted by the return of the interrupt, the software pipeline circulation buffer is in an after _ interrupt state.
Idle state: when the software pipeline loop buffer is not activated or the loop is jumped out, the state is idle.
The phases of the state machine include:
prolog _ lb stage: the first stage, which includes the prolog stage and the kernel stage in the software pipeline loop composition stage.
kernel _ lb stage: including the kernel phase in the software pipeline loop composition phase except the first stage, the state of the loop will transition back to normal state when entering this phase.
epilog _ lb stage: the method comprises the epilog stage plus a stage in the software pipeline cycle composition stage, and an idle operation period loaded before reaching the stage edge.
early _ exit _ lb stage: the early _ exit _ lb stage occurs when the prolog _ lb stage is converted to early _ exit _ lb stage if the loop has decremented to 0 before reaching the kernel _ lb stage due to insufficient iterations.
2. The method for simulating the C64x + DSP software pipeline circular buffering mechanism according to claim 1, wherein the circular buffering operation comprises:
pipeline operation: fetch instructions from program memory and execute.
And (3) load operation: fetching instructions from program memory, the instructions only executing when the instructions are parallel to SPLOOP (D/W) instructions, the instructions are masked by SPMASK (R) instructions, or the instructions belong to NOP, loop buffer related instructions; other instructions are loaded into the loop buffer in the corresponding locations, asserted, and executed.
Catch operation: the instruction corresponding to the cycle from the LBC is fetched from the loop buffer and executed when the instruction is valid.
Drain operation: and setting the instruction of the stage corresponding to the cycle corresponding to the draincount in the circular buffer as invalid. The LBC records the number of cycles currently at this stage, and the draincounts records the number of cycles that the cyclical drain operation is performed in total. When a loop reload occurs, if the loop after the reload enters early _ exit _ lb stage, it may happen that both the first and the last loops need to perform drain operation, and the previous loop uses the second pair of LBCs and loadcounts to indicate information.
A load operation: and setting the instruction of the stage corresponding to the cycle corresponding to the loadcounts in the circular buffer as valid. The loadcounts record the number of cycles that the circular load operation is performed in common.
initial _ load operation: operations specific to the initial _ termination state, fetching instructions from program memory, executing when the instructions are parallel to SPLOOP (D/W) instructions, the instructions are masked by SPMASK (R) instructions, or the instructions belong to loop buffer related instructions; other instructions are loaded into the corresponding locations in the instruction buffer and invalidated.
3. The method of claim 2, wherein the combination sequence of the circular buffer status and phase corresponding operations is as follows:
where, cond a indicates that last _ spl ═ 1, that is, the last loop epilog _ lb phase has not ended during the loop reload, and cond B indicates that loadcounts < length of loop buffer load or drain phase dynlen or a branch instruction takes effect. Drain2, fetch2 represent operations on a loop reloading a previous loop.
4. The method for simulating the C64x + DSP software pipeline circular buffering scheme according to claim 1, wherein the state and phase transitions of the state machine are as follows:
wherein n/r/a respectively represents normal/load/after _ interrupt, initial _ t represents initial _ termination, and after _ itr represents after _ interrupt; judge _ spl _ terminal () function judges whether the loop is completed, terminal _ cond represents the loop termination condition, terminal _ cond 3 corresponds to the SPLOOPW loop termination condition being satisfied, terminal _ cond! 3 corresponds to no splopopw cycle termination condition being met.
5. The method for simulating the pipeline circular buffering mechanism of the C64x + DSP software according to claim 1, wherein in the step 4, the restart condition of the normal pipeline is specifically as follows:
SPLOOP (D) loops out after completion of the epilog _ lb stage, at which point the loop buffer changes to idle state and the normal pipeline restarts.
When the loop termination condition of the SPLOOPW loop is met, the loop buffer is directly changed into an idle state, and the ordinary pipeline is restarted.
SPLOOP (D) loops into the epilog _ lb stage, and normal pipeline restarts in parallel with loop buffering after the SPKERNEL (R) instruction executes the cycle indicated by the fstg/fcyc field.
And (4) circulating in a non-load state, and if a branch instruction takes effect when the circulation is activated, directly changing the circulation buffer into an idle state, and restarting the common pipeline.
Interrupts are handled causing the circular buffer to enter the epilog _ lb stage and jump out after completion of draining. At this time, the circular buffer is changed into an idle state, and the ordinary pipeline is restarted.
6. The method for simulating the C64x + DSP software pipeline circular buffering mechanism according to claim 1, wherein in the step 4, when the circular buffering is activated, the pending interrupt can be processed under all the following conditions, and then the circular buffering enters the epilog _ lb stage:
the LBC register value is 0 and does not satisfy the condition of loop termination;
the loop is in the kernel _ lb phase and the loop is not in the load state.
The cycle runs for at least 4 cycles.
ILC ≧ ceil (dynlen ÷ interaction _ interval) ensures that the loop after the interrupt return does not enter early _ exit _ lb state.
Background
The DSP is a microprocessor chip specially designed for digital signal processing tasks, and the program development thereof usually adopts a cross-compiling development mode, however, such a development process requires hardware devices and is difficult to obtain sufficient debugging information feedback. In order to realize important functions required by DSP program development such as dynamic debugging, fault injection and the like, and help the program to carry out function debugging, algorithm optimization and performance evaluation, a software simulator corresponding to a DSP chip needs to be developed.
TMS320C64x + (C64 x +) series DSP is a high-performance DSP developed by TI (Texas instruments corporation) based on a Velociti system structure, and is widely applied to national defense and aerospace in China. Since digital signal processing algorithms typically require a large number of mathematical operations to be performed on a series of data sample loops, the loops can be run 4.2 times better by software pipelining. Compared to other VLIW DSP chips, the C64x + DSP introduced for the first time a loop buffer mechanism for software pipelining to optimize code and execution of the software pipelining. The introduction of the circular buffer mechanism can reduce the size of the generated software pipeline code by 17.4% on average and reduce the power consumption of the program memory by 2.6 times on average, but the introduction of the circular buffer core hardware structure, a group of instructions for controlling the circular operation and the detailed convention of the operation flow of the circular buffer under different conditions make the circular buffer become the finest and complicated part in the implementation process of the C64x + simulator. Therefore, a correct and high-performance software pipeline circular buffering mechanism is the bottleneck and key for the C64x + DSP simulator implementation.
Disclosure of Invention
The invention aims to provide a simulation method of a C64x + DSP software flow circulation buffer mechanism aiming at the simulation complexity of the C64x + software flow circulation buffer mechanism.
The purpose of the invention is realized by the following technical scheme: a simulation method of a C64x + DSP software flow circulation buffer mechanism comprises the following steps:
(1) and judging the state and the stage of the state machine through the circular buffering control information.
(2) And (3) selecting and executing sequence combination of circular buffering operation according to the state and the stage of the state machine obtained in the step (1) so as to realize the actual operation of the program memory and the circular buffer during the circular buffering operation.
(3) And (3) updating the cycle buffering time sequence information according to the execution result of the step (2), performing state and stage conversion of the state machine according to all the updated cycle control information, and performing cycle jump judgment.
(4) And (4) judging and processing the restart of the common pipeline and the interrupt according to the circulation control information obtained in the step (3).
And repeating the steps until the end. Wherein the circular buffering control information includes:
circular buffer status information: the circular buffer status information records the status, stage, and interrupt request and mask information of the circular buffer. And the circular buffering completion judgment information: including the loop termination condition class and the ILC register availability identification.
Circular buffering timing information: including software pipeline stage information and its corresponding cycle information.
Information for fetching: including instruction mask information, pipeline start-stop identification, and the number of cycles required from the drain phase to pipeline restart.
The states of the state machine include:
initial termination _ termination state: the initial _ termination state is the case where software pipeline loop buffering is activated by the SPLOOP instruction and the ILC register may be identified with a 0 when the SPLOOP instruction executes.
Normal state: the software pipeline loop buffer is activated by the execution of three instructions SPLOOP (D/W) when the non-interrupt returns, and is in a normal state when the loop iterates at least once.
Heavy load status: the software pipeline loop buffer is activated by execution of the SPMASKR or SPKERNELR instructions and is in a load state when a reload condition is satisfied.
After interrupt, after _ interrupt state: when the software pipeline circulation buffer is restarted by the return of the interrupt, the software pipeline circulation buffer is in an after _ interrupt state.
Idle state: when the software pipeline loop buffer is not activated or the loop is jumped out, the state is idle.
The phases of the state machine include:
prolog _ lb stage: the first stage, which includes the prolog stage and the kernel stage in the software pipeline loop composition stage. kernel _ lb stage: including the kernel phase in the software pipeline loop composition phase except the first stage, the state of the loop will transition back to normal state when entering this phase.
epilog _ lb stage: the method comprises the epilog stage plus a stage in the software pipeline cycle composition stage, and an idle operation period loaded before reaching the stage edge.
early _ exit _ lb stage: the early _ exit _ lb stage occurs when the prolog _ lb stage is converted to early _ exit _ lb stage if the loop has decremented to 0 before reaching the kernel _ lb stage due to insufficient iterations.
Further, the circular buffering operation includes:
pipeline operation: fetch instructions from program memory and execute.
And (3) load operation: fetching instructions from program memory, the instructions only executing when the instructions are parallel to SPLOOP (D/W) instructions, the instructions are masked by SPMASK (R) instructions, or the instructions belong to NOP, loop buffer related instructions; other instructions are loaded into the loop buffer in the corresponding locations, asserted, and executed.
Catch operation: the instruction corresponding to the cycle from the LBC is fetched from the loop buffer and executed when the instruction is valid.
Drain operation: and setting the instruction of the stage corresponding to the cycle corresponding to the draincount in the circular buffer as invalid. The LBC records the number of cycles currently at this stage, and the draincounts records the number of cycles that the cyclical drain operation is performed in total. When a loop reload occurs, if the loop after the reload enters early _ exit _ lb stage, it may happen that both the first and the last loops need to perform drain operation, and the previous loop uses the second pair of LBCs and loadcounts to indicate information.
A load operation: and setting the instruction of the stage corresponding to the cycle corresponding to the loadcounts in the circular buffer as valid. The loadcounts record the number of cycles that the circular load operation is performed in common.
initial _ load operation: operations specific to the initial _ termination state, fetching instructions from program memory, executing when the instructions are parallel to SPLOOP (D/W) instructions, the instructions are masked by SPMASK (R) instructions, or the instructions belong to loop buffer related instructions; other instructions are loaded into the corresponding locations in the instruction buffer and invalidated.
Further, the combination sequence of the circular buffer status and phase corresponding operations is as follows:
cond a indicates (when a loop reloads, whether the epilog lb stage of the previous loop ends) last _ spl ═ 1 (not ended), and cond B indicates loadcounts < dynlen or a branch instruction takes effect
Drain2, fetch2 represents the operation on the cycle prior to reloading the cycle
Further, the state and phase transition of the state machine specifically includes:
n/r/a is the abbreviation of normal/load/after _ interrupt
Initial _ t is an initial _ termination abbreviation, and after _ itr is an after _ interruption abbreviation
Judge _ spl _ terminal () function judges whether loop is completed
Further, in step 4, the restart conditions of the normal pipeline are specifically as follows:
SPLOOP (D) loops out after completion of the epilog _ lb stage, at which point the loop buffer changes to idle state and the normal pipeline restarts.
When the loop termination condition of the SPLOOPW loop is met, the loop buffer is directly changed into an idle state, and the ordinary pipeline is restarted.
SPLOOP (D) loops into the epilog _ lb stage, and normal pipeline restarts in parallel with loop buffering after the SPKERNEL (R) instruction executes the cycle indicated by the fstg/fcyc field.
And (4) circulating in a non-load state, and if a branch instruction takes effect when the circulation is activated, directly changing the circulation buffer into an idle state, and restarting the common pipeline.
Interrupts are handled causing the circular buffer to enter the epilog _ lb stage and jump out after completion of draining. At this time, the circular buffer is changed into an idle state, and the ordinary pipeline is restarted.
Further, in step 4, when the circular buffer is activated, the pending interrupt may be processed under all the following conditions, and then the circular buffer enters the epilog _ lb stage:
the LBC register value is 0 and does not satisfy the condition of loop termination;
the loop is in the kernel _ lb phase and the loop is not in the load state.
The cycle runs for at least 4 cycles.
ILC ≧ ceil (dynlen ÷ interaction _ interval) ensures that the loop after the interrupt return does not enter early _ exit _ lb state.
The invention has the beneficial effects that: the invention can accurately simulate the CPU period of the 64x + DSP software flow circulation buffer mechanism, and the simulation speed is 3.3 times of that of an official simulator.
Drawings
FIG. 1 is a block diagram of a circular buffer simulation module of the present invention.
Detailed Description
A simulation method of a C64x + DSP software pipelining circulating buffer mechanism is shown in FIG. 1, and comprises the following steps:
(1) and judging the state and the stage of the state machine through the circular buffering control information.
(2) And (3) selecting and executing sequence combination of circular buffering operation according to the state and the stage of the state machine obtained in the step (1) so as to realize the actual operation of the program memory and the circular buffer during the circular buffering operation.
(3) And (3) updating the time sequence information of the circular buffer according to the execution result of the step (2), performing state and stage conversion of the state machine according to all the updated circular control information, and performing circular jump judgment.
(4) And (4) judging and processing the restart of the common pipeline and the interrupt according to the circulation control information obtained in the step (3).
And repeating the steps until the end.
Further, the circular buffering control information specifically includes:
circular buffer status information: the circular buffer status information records the status, stage, and interrupt request and mask information of the circular buffer. And the circular buffering completion judgment information: including the loop termination condition class and the ILC register availability flag (used to flag whether the ILC register is available). The loop termination condition type comprises three termination conditions of different SPLOOP (D/W) instructions, recording is carried out by using terminal _ cond, and values 1-3 respectively correspond to the different termination conditions of the SPLOOP (D/W).
Circular buffering timing information: including software pipeline stage information and its corresponding cycle information. The software pipeline stage information specifically refers to the stage in the loop buffer stage, such as the stage in the prolog _ lb stage. The cycle information refers to the number of cycles in the software pipeline stage information and the number of cycles each stage has, namely: the number of cycles executed by LBC, cycle buffer operation, load operation and the number of cycles executed by drain operation are the same.
Information for fetching: including instruction mask information, pipeline start-stop identification, and the number of cycles required from the drain phase to pipeline restart.
Further, the state of the state machine is specifically:
(1) initial _ termination state: if software pipeline loop buffering is enabled by the SPLOOP instruction and the ILC is 0 (not available) at SPLOOP instruction execution, it is referred to as the initial _ termination state. In this loop buffer state, the loop initialization instruction should be executed, and the instruction in the loop body is loaded into the loop buffer and invalidated, although it is executed as a NOP instruction (an empty instruction, which causes a stall of one cycle).
(2) normal state: the software pipeline loop buffer is activated by execution of three SPLOOP (D/W) instructions on the non-interrupt return, and is called normal when the loop iterates at least once. In this state the program memory loads the fetched loop body instruction into the loop buffer, while the loop initialization instruction of the program memory fetch is in parallel with the fetched instruction in the loop buffer. Thereafter, the loop buffer is cycled through the core phase and then drained.
(3) The load state: the software pipeline loop buffer is activated by execution of the SPMASKR or SPKERNELR instructions and when a reload condition is satisfied, it is called a reload state. In this state, an already executed software pipeline is re-executed and the instructions in the loop buffer become valid again. Instructions in the loop body are fetched from the loop buffer only, and initialization instructions for the outer loop and the inner loop are fetched from the program memory and executed in parallel with the instructions in the loop body.
(4) The after _ interrupt state refers to the after _ interrupt state when the software pipeline buffer is reactivated by the return of an interrupt. In this state, the interrupted software pipeline loop is re-executed. The interrupt return of the hardware mechanism is to resume execution of the instruction from the loop start, but simulation can be performed by reloading in the simulator. This differs from the load state in that there is no initialization phase and no instruction is fetched from the program memory.
(5) idle state when the software pipeline loop buffer is not active or the loop has jumped out, it is called idle state. At the moment, the common assembly line is normally executed, and a circular buffer module is not needed.
Further, the phases of the state machine are specifically:
(1) prolog _ lb stage: this phase includes the prolog phase and the first phase of the kernel phase in the constituent phases of the software pipeline loop, which functions in the loop to increase the effective instructions in the loop buffer. The increased valid instruction is a load from program memory in the normal state and from the reactivation of instructions in the loop buffer in the load state and the after _ interrupt state. At this stage, program memory may be fetched and executed in parallel with the circular buffer.
(2) kernel _ lb stage: this stage includes the kernel stage of the software pipeline loop composition stage, except the first stage, and the state of the loop transitions back to normal upon entering this stage. At this stage, the loop buffer status is unchanged, and the simulator executes only valid instructions fetched from the loop buffer.
(3) epilog _ lb stage: this phase includes the epilog phase plus one stage in the software pipeline composition phase, and the empty operation period loaded before reaching the stage edge. This phase may also overlap part or all of the prolog lb phase of the reload cycle due to the presence of cyclic reloading. At this stage, instructions in the loop buffer are invalidated on the corresponding cycle. When the emptying is completed but the stage edge is not reached, the empty operation is continuously executed until the circular buffer state is converted into idle at the stage edge. During the drain, the program memory may restart and then fetch and execute in parallel with the circular buffer; a cyclic reload may also occur, after which reactivation of the instructions in the cyclic buffer occurs simultaneously with draining.
(4) early _ exit _ lb stage: this phase occurs when the prolog lb phase is converted to early _ exit _ lb phase in the case where the loop has decremented to 0 before reaching the kernel _ lb phase due to insufficient iterations. The stage load and drain of the circular buffer occur simultaneously, so program memory may be fetched and executed in parallel with the circular buffer.
Further, the circular buffering operation specifically includes:
(1) pipeline operation: fetch instructions from program memory and execute. For easy implementation, the fetching and executing operations of the common pipeline during the cyclic reloading are executed by the cyclic buffering operation.
(2) And (3) load operation: fetching instructions from program memory, the instructions only executing when the instructions are parallel to SPLOOP (D/W) instructions, the instructions are masked by SPMASK (R) instructions, or the instructions belong to NOP, loop buffer related instructions; other instructions are loaded into the loop buffer in the corresponding locations, asserted, and executed.
(3) Catch operation: the instruction corresponding to the cycle (stored in the LBC) is fetched from the loop buffer and executed when the instruction is valid.
(4) Drain operation: the instruction corresponding to the stage (stored in the memories) in the loop buffer corresponding to the cycle is invalidated. When a loop reload occurs, if the loop after the reload enters early _ exit _ lb stage, it may happen that both the first and the last loops need to perform drain operation, and the previous loop uses the second pair of LBCs and loadcounts to indicate information.
(5) A load operation: the instruction corresponding to the stage (stored in the loadcounts) of the corresponding cycle in the loop buffer is set to valid.
(6) initial _ load operation: operations specific to the initial _ termination state, fetching instructions from program memory, executing when the instructions are parallel to SPLOOP (D/W) instructions, the instructions are masked by SPMASK (R) instructions, or the instructions belong to loop buffer related instructions; other instructions are loaded into the loop buffer corresponding locations and invalidated.
Further, the combination sequence of the operation corresponding to the circular buffer state and the phase specifically includes:
TABLE 1 circular buffer status and phase correspondence operation combination sequence
Cond a indicates that last _ spl ═ 1, that is, the epilog _ lb stage of the previous cycle did not end during the cycle reload, cond B indicates that loadcounts < dynlen or a branch instruction takes effect, and (if X) Y indicates that Y operation is performed when the a condition is satisfied.
Drain2, fetch2 represents the operation on the cycle prior to reloading the cycle
Further, the state and phase transition of the state machine specifically includes:
TABLE 2 determination of circular buffer status, phase and transition table
N/r/a is the abbreviation of normal/load/after _ interrupt
Initial _ t is an initial _ termination abbreviation, and after _ itr is an after _ interruption abbreviation
Judge _ spl _ terminal () function judges whether loop is completed
Where & & represents a logical AND and | | represents a logical OR. Active _ nums is the number of cyclic buffer activations
Further, the restart conditions of the normal pipeline and the interrupt in the step (4) are specifically:
the normal pipeline is restarted under the following conditions:
SPLOOP (D) loop out after completion of epilog _ lb stage, at which point the loop buffer changes to idle state and the normal pipeline restarts.
When the loop termination condition is met, the loop buffer directly changes to idle state and the normal pipeline restarts because the epilog _ lb stage is not needed.
SPLOOP (D) loops into the epilog _ lb stage, and after the cycle indicated by the fstg/fcyc field in the SPKERNEL (R) instruction is executed, the normal pipeline restart is performed in parallel with the loop buffering.
And if a branch instruction takes effect when the loop is activated, the loop buffer is directly changed into an idle state, and the ordinary pipeline is restarted.
Interrupts are handled causing the circular buffer to enter the epilog _ lb phase and jump out after completion of draining. At this time, the circular buffer is changed into an idle state, and the ordinary pipeline is restarted.
When the circular buffer is active, pending interrupts may be handled if all of the following conditions are met, and then the circular buffer enters the epilog _ lb phase:
LBC register value is 0 and does not satisfy the loop termination condition, otherwise the loop should enter the epilog _ lb phase to jump out after normal emptying.
The loop is in the kernel _ lb phase and the loop is not in the load state.
The cycle runs for at least 4 cycles.
ILC ≧ ceil (dynlen ÷ interaction _ interval), ensures that the loop after the interrupt return does not enter early _ exit _ lb state. Wherein, the interaction _ interval represents the number of cycles of each stage, and ceil represents rounding up.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should all embodiments be exhaustive. And obvious variations or modifications of the invention may be made without departing from the scope of the invention.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:交互配置方法、装置、系统及电子设备