您好,欢迎来到六九路网。
搜索
您的当前位置:首页Engineering Research Centers Program of the National Science Foundation Award Number EEC-98

Engineering Research Centers Program of the National Science Foundation Award Number EEC-98

来源:六九路网
Acceleration of Behavioral Simulation on

Simulation Specific Machines

Minoru Shoji, Fumiyasu Hirose, Shintaro Shimogori, Satoshi Kowatari, Hiroshi Nagai

CAD group, FUJITSU LIMITED

4-1-1, KAMIKODANAKA NAKAHARA-KU, KAWASAKI, 211 JAPAN

E-mail: shoji@fd.cad.fujitsu.co.jp

Abstract

Behavioral simulation is faster than gate-level logic simu-lation, however, the simulation speed is too slow for largesystems. Simulation specific machines accelerated simula-tion by parallel processing. We developed the method toextract parallelism from behavioral descriptions for fastsimulation utilizing these machines.

We evaluated our methods utilizing CAD acceleratorTP5000. By the extraction of the parallelism the simulationspeed is accelerated about 7 times.

level behavioral descriptions consist of a number of sequen-tial statements. The parallelism of those descriptions is nothigh as that of the gate-level descriptions. For the speedingup of simulation by utilizing simulation machines, we mustparallelize those behavioral descriptions.

It is the purpose of this paper to present methods toparallelize high-level behavioral descriptions. Experimen-tal results showed that the simulation speed utilizing a simu-lation specific machine is accelerated 7 times by our meth-ods.

2 Simulation model and the problem

1 Introduction

In order to quickly develop a large and complex digitalsystem, we need logic verification at as early design stagespossible. Therefore, we describe the design in high-levelbehavioral descriptions by languages as VHDL and employbehavioral simulation. Behavioral simulation is about 10 to100 times faster than gate level one, however, the simula-tion speed is still slow for a large scale design.

For the acceleration of behavioral simulation, we mustemploy simulation specific machines composed of a num-ber of processors that simulate a part of the description inparallel. These machines accelerate simulation utilizing theparallelism of gate-level simulation models. If we utilizethese machines for behavioral simulations, the simulationspeed is limited by the parallelism of the description. High- To accelerate logic-simulation, we employed logic simu-lation specific machines[1][2]. These machines are com-posed of many processors, simulating gate-level simulationmodels in parallel. The speeding-up by these machines wasvery high[3][4] when simulating gate-level descriptions. Ifwe utilize these machines for behavioral simulation, we mustconvert behavioral descriptions to the models suitable forthese machines. In this section, we describe the simulationmodel for high-level behavioral descriptions, and the prob-lem of the model for speeding-up of simulation.

2.1 Simulation model

The simulation models for behavioral descriptions are di-vided into concurrent models and sequential models. Con-current signal assignment statements of VHDL are converted

ED&TC ’97 on CD-ROM

Permission to make digital/hard copy of part or all of this work for personal or classroom use if granted without fee provided that copies are not made of distributed for fee or

commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise,to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee.©1997 ACM/0-791-849-5/97/0003/$3.50

to concurrent simulation models. Concurrent simulationmodels are equivalent to the gate-level simulation models.Each sequential model corresponds to a sequence of state-ments as process statement of VHDL. The models consistof operators and the controllers that control the executionorder of operators. The operators and controllers consist ofgate-level simulation models. To realize this model, we splitthe function of event of event-driven simulation algorithm.The first one is value delivery. The second one is instruc-tion to start the evaluations of fan-out gates. We call theevent that has only the first function as “update-only event”,and the event that has only the second function as “evalu-ate-only”. Figure 1 shows an example of VHDL sequentialstatements and the corresponding simulation models. Eachrectangle shown in figure 1(b) denotes the ‘block’ corre-sponds to an operator of the statement. Line connectingblocks denotes connections of events. Each block consistsof an evaluation-part and an evaluation-control-part. Whenthe evaluation-control-part receives an “evaluate-onlyevent” from the previous block through a broken line, thenthe result of the operation is evaluated by the evaluation-part. The result is fed to other blocks through dotted linesby “update-only event”. The blocks labeled ‘if_true’ or‘if_false’ send an event only if the input value is true orfalse respectively.

2.2 Problem

Simulation specific machines accelerated simulation uti-lizing a number of processors. Each processor simulates apart of circuit concurrently. For the gate-level simulationmodels, each gate is simulated in parallel in each processor.If the time to simulate a gate is unit-time, each processorsimulates each part in parallel. It is unnecessary to wait theend of the simulation by other processors.

The delta-delay of VHDL is defined as the number of unit-time between the start of process statements and the end ofsimulation of all process statements. For the gate-level de-scription, a delta-delay is simulated in a unit-time. How-ever, for the behavioral simulation, the time for simulationof a delta-delay is determined by the longest process state-ment that started simulation. For the processor that simu-lates shorter model, the time after the end of its simulationis the waste of time. The parallelism factor is reduced toone for the duration. To utilize the parallelism of simulationspecific machines for the fast simulation, the length of eachprocess statement model should be minimized.

Behavioral descriptions consist of many sequential state-ments because of the ease of writing algorithms or proto-cols. Therefore, the problem of long process statementsarises during behavioral simulations. In the following sec-tion, we describe the methods to minimize the length ofstatements and extract maximum parallelism from behav-ioral descriptions.

if A > B then C := D and E;else G := D or F;end if;3 Increasing parallelism factor

We developed following three methods to increase paral-lelism factor, (1) extraction of independent parts from de-scriptions, (2) conversion to concurrent models, (3) extrac-tion of parallel parts from descriptions.

(a) an example of VHDL description for sequential statementstart-eventAB>evaluate-onlyupdate-onlyevaluation-control-partif_falseForGif_trueandCDE3.1 Independent parts extraction

There are descriptions that contain parts that can be simu-lated independently. The behavioral descriptions written inblockVHDL contain many those parts. Figure 2 shows a part of(b) simulation model for (a)Figure 1 simulation model for sequential statementsVHDL process statement. The part A of the example con-evaluation-partandtains only one conditional branch (if statement) and a num-ber of signal assignment statements within the conditionalbranch. VHDL defines that the values assigned to signalsbecome valid after delta delay. In this example, the value ofsignal assigned in part A becomes valid at the next simula-tion of the process statement. Therefore, there is no depen-dencies between part A and part B. This means that we cansimulate part A and part B in parallel. In this example, partB consists of three statements having no dependencies eachother. These statements can be simulated in parallel.

The study of real descriptions showed that the longer mod-els usually contain loop statements. Circuit designers al-ways use loop statements for the descriptions of operationsbetween array objects. In most cases, we can determine thenumber of loop iterations. We utilized ‘loop-unrolling andconstant propagation’ method to reduce the length of se-quential simulation models. Figure 3 shows an example ofthe application of this method. Figure 3(a) shows a part of adescription with a loop statement. Each rectangle shown inthe figure denotes a block defined as figure 1. Line denotesthe connection of the events. The connections of the valuesare omitted in this figure. The simulation of the model shownin figure 3(b) requires total number of 19 blocks evaluated.By the application of this method, the length of the modelbecomes 19 to 4.

There are cases where the size of the simulation modelincreases by this method. For the object code of conven-tional computers, large code size increases the probabilityof miss hit of instruction cache. This causes the decrease ofexecution speed. However, simulation speed of simulationspecific machine is determined by the number of evalua-tions and the length of

process

begin

descriptions. This method

.......

reduces the number ofif A = '0' then

S <= SOLD;part A

evaluations and the

T <= TOLD;

length of the description.end if;Therefore, the simulationU := S and T;speed is accelerated.V := S or T;W := S xor T;

part B Furthermore, in many.......

cases, the relations be-end process;

tween the statements dis-Figure 2 dependencies of signals and variablesappear after the applica-tion of this method. Fig-

for I in 0 to 3 loop

D(I) <= A and B(I);end loop;

(a) a VHDL example

I := 0D(0) <= A and B(0)I <= 3D(1) <= A and B(1)if_trueif_falseD(2) <= A and B(2)D(I) <= A and B(I)D(3) <= A and B(3)I := I + 1to the next of theto the next of the loop statement loop statement

(b) original simulation (c) after loop-unrolling andmodelconstant propagationFigure 3 an example of loop-unrolling and

constant propagationure 3(c) shows that there are no relations between each state-ment. Therefore, we can split the description into 4 smalldescriptions.

3.2 Conversion to concurrent models

Because simulation specific machines are designed to ac-celerate gate-level simulation, the gate-level simulationmodels are suitable for those machines. If we can convertsequential descriptions into gate-level simulation models,the simulation speed is accelerated.

If a sequential description satisfies some conditions, wecan convert the corresponding simulation model to a num-ber of concurrent simulation models. The conditions aredescribed below.

(1) There is no loop statement or the variable/signal depen-dency that forms a loop.

(2) The statement has sensitivity list and all the signals readfrom the process statement is declared in the sensitivity list.(3) There is no variable that should be stored after the endof the simulation of process statement.

Figure 4 shows an example of VHDL process statementand the corresponding sequential and concurrent models.Figure 4(b) shows the blocks and the event connections. In

P1:process(C, S, T) variable X : bit;begin T <= S or C; X := not T; U <= X and S;end process;Swait until CLK'event wait until CLK'event and CLK = '1';and CLK='1'V := D and E;

part AU := F or W;V := D and EU := F or Wwait until S = '0';W := U and T;

part Bto the next statement of partAC(a) example VHDL statementsense (C,S,T)orT(a) a part of VHDL description (b) parallelized simulation

model of part Awith wait statements

Figure 5 an example of VHDL with wait statement

T <= S or CnotXX := not TandU <= X and SU(b) sequential (c) concurrentsimulation modelsimulation modelFigure 4 an example of conversion to concurrent modelthis case, we can convert this model to the concurrent simu-lation model shown in figure 4(c). A circle shown in thefigure denotes a concurrent simulation model for each op-eration This model corresponds the evaluation part of fig-ure 1(b) except that it starts the evaluation only when anyinput values are changed. A line connecting them denotesthe data flow that may start other operations. The originaldescription is simulated every time when at least one of theinput signals changed the values. If we utilize event-drivensimulation algorithm, we don’t have to simulate the partwhose input values have not changed. Furthermore, con-current models do not need any evaluation-control-part forsequential models, which reduces the number of evaluationfor the part. This increases the simulation speed furthermore.

3.3 Extraction of parallel parts

Figure 5(a) shows another example of VHDL statements.In this case we must consider the activation and suspensionstatements that is ‘wait’ statement of VHDL. Because thedescription contains wait statements, we can not utilize themethods described in previous sections. In this section, wedescribe the method that accelerates simulation of thosedescriptions.

If the simulation of the statements reached to the wait state-

ment, the simulation of the process statement is suspendeduntil the condition written in the wait statement becomestrue. In this case, we must split the process statement to anumber of parts starting from the wait statements. Duringthe simulation, only one part is simulated. Therefore thedependencies of variables are removed at each wait state-ment. In this example, the variable U is assigned new valuein part A, and the value is used in part B. This dependency

is ignored because there is a wait statement between them. After this partitioning, we can find dependencies betweenstatements within each part. The part A of figure 5(a) con-sists of 3 statements. There is no dependency between thesecond and the third statement. Therefore, we can simulatethose statements in parallel. Figure 5(b) shows theparallelized simulation model for the part A of figure 5(a). Furthermore, we can apply algorithms utilized by the com-pilers for super scalar machines, VLIW machines, or vectormachines[7][10] for the extraction of parts simulated in par-allel.

4 Experimental results

We experimented the methods described before using unit-delay event-driven VHDL simulator running on CAD ac-celerator TP5000. First we describe the specifications ofTP5000. Next we show the experimental results.

4.1 TP5000

TP5000 accelerates gate-level simulation at about 100 timesthan software simulators. It consists of a number of proces-sor groups, each of which contains 15 processors. Each pro-cessor has micro code memory to store the executable code

for the processor. They also have fast memory to store apart of simulation models. The processor group forms a pipe-line realizing event-driven unit-delay logic simulator. Eachpipe-line simulates a part of simulation models concurrently.

4.2 Results

We used the description of two commercial circuit sys-tems for the experimentation. The first system (circuit sys-tem A) consists of descriptions of a processor and a numberof pseudo circuit for the simulation. The descriptions con-sist of 106k VHDL description steps and contain 936 pro-cess statements. The second system (circuit system B) con-sists of descriptions of a processor. The descriptions consistof 40k VHDL description steps and contain 172 processstatements.

Table 1 shows the number of unit-time spent for each delta-delay of the descriptions, and the simulation speed ratio of(a) original descriptions, (b) apply methods described insection 3.1 and 3.2, and (c) apply all methods. The maxi-mum number of unit-time is the largest number of unit-timefor a delta-delay during the simulation. The simulation speedis the ratio to the original description. By the application ofour methods, the average number of unit-time is reduced to15% of the original. The simulation speed becomes 6.7 timesfaster than that of the original descriptions. For the circuitsystem B, the average number of unit-time is reduced to6%. The simulation speed for circuit system B becomes 7.8times faster. By those results, our method accelerates simu-lation at about 7 times.

5 Conclusions

We showed that speeding-up of the behavioral simulationutilizing simulation specific machines is restricted by thelength of statements. We developed the methods to reducethe length of each sequential simulation model and to con-vert sequential simulation models to concurrent simulationmodels for maximizing the parallelism of the simulationmachines. The experimental results using real systems de-scriptions showed that the methods accelerated simulationat about 7 times.

Table1resultsofexperimentsCircuitsystemABNumberofprocessstmt.936172Simulationmodel(a)(b)(c)(a)(b)(c)Ave.numberofunit-time141.440.5721.1251.43.13.0Max.numberofunit-time420942094195841952786Simulationspeed14.56.717.57.8References

[1] T. Blank, “A survey of hardware accelerators used in computeraided design,” IEEE Design and Test of Computers, 1, pp.21-39,1984.

[2] F. Hirose, “Simulation Processor SP,” Proc. of IEEE Interna-tional Conference on Computer Aided Design, pp. 484-487, 1987.[3] S.Shimogori, K.Takayama, H.Matsuoka, “TP5000:Reconfigurable Hardware Accelerator for CAD Applications,”FUJITSU Sci. Tech. J., 31, 2, pp.152-160, 1995.

[4] F. Hirose, “Performance Evaluation of an Event-Driven Simu-lation Machine,” Proc. of 29th Design Automation Conference,pp. 428-431, 1992.

[5] M. Shoji, and F. Hirose, “High-Level VHDL Simulator Run-ning on Logic Simulation Machines,” Proc. of VHDL InternationalUsers’ Forum Spring 1993, pp. 145-1, 1993.

[6] M. Shoji, F. Hirose, and K. Takayama, “VHDL compiler ofbehavioral descriptions for ultrahigh-speed simulation,” Proc. 2ndAsian Pacific Conference on Hardware Description Languages,pp.85-88, 1994.

[7] A.V.Aho, R.Sethi, and J.D.Ullman, “Compilers,” Addison-Wesley Publishers, 1986.

[8] “IEEE Standard VHDL Language Reference Manual,” IEEEStd. 1076-1987. IEEE, New York, NY, 1988

[9] R.Lipsett, C.Schaefer, and C.Ussery, “VHDL: Hardware De-scription and Design,” Kluwer Academic Publishers, Norwell, MA,19.

[10] J.L.Hennessy and D.A.Patterson, ”Computer Architecture: AQuantitative Approach,” Morgan Kaufman Publishers, San Mateo,CA, 1990.

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- 69lv.com 版权所有 湘ICP备2023021910号-1

违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务