-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathUPDATE
205 lines (190 loc) · 11.9 KB
/
UPDATE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
############################################################
# Kite: Architecture Simulator for RISC-V Instruction Set #
# Developed by William J. Song #
# Computer Architecture and Systems Lab, Yonsei University #
# Version: 1.12 #
############################################################
Update history:
* Version 1.0 (May 21, 2019)
- Initial release
* Version 1.1 (May 23, 2019)
- A possible bug was fixed in the case that the execution result of an
instruction affects its next run if the instruction is re-executed in a
loop (or replayed for any reason). The instruction information is reset
in the writeback stage when it retires.
- Another bug was found for conditional branches not updating the PC when
the branch target address is zero. Thus, branch instructions (i.e., beq
bge, blt, bne) could not be used to terminate a program by setting the
next PC to zero.
* Version 1.2 (May 31, 2019)
- PC = 0 is reserved as an invalid address in the instruction memory, and
it no longer prints a debug message, "Instruction memory has reached out
of code segment (or end of program)." For PC = 0, the instruction fetch
stage silently stops fetching instructions from the instruction memory.
PC = 0 is re-purposed for branch instructions to disable instruction
fetching until the next PC is resolved.
- Stall cycles for conditional branch instructions are changed to match
those of the textbook. Every conditional branch instruction now causes
three stall cycles until the next PC is resolved. Unconditional jump
instructions do not incur stalls unless they have dependency on preceding
instructions.
- "bool stall" is removed from the inst_t class as it turns out to be
unnecessary for pipeline execution.
- The execute() function of proc_t is re-organized for better code
readability and maintenance.
- ALU detects divide-by-zero exceptions. It does not raise an exception
on detection but only prints out a debug message. If debug messages are
not enabled, divide-by-zero exceptions are silenced.
* Version 1.3 (June 1, 2019)
- Data forwarding feature is added. Since there is no way for a value-
producing instruction to figure out which subsequent instructions
are dependent on it, dependent instructions must catch the output of the
preceding instruction if it has become ready but not yet written back to
the register file. To enable this feature, inst_t was modified to add
"bool rd_ready" field to indicate that the rd value has been calculated
and thus can be forwarded to whoever depends on it. When determining data
hazards in the register file, dependent instructions check if the rd
value of the latest producer has become ready and thus can proceed as if
there is no data hazard.
- A branch predictor is added.
* Version 1.4 (June 13, 2019)
- A bug was fixed for the dependency checking table of the register file.
With the addition of the data forwarding feature, the entries of the
dependency checking table do not always point to retiring instructions.
Hence, simply clearing the entries of the dependency checking table at
the writeback of instructions accidentally misses data hazards in some
situations. A fix is to double-check if the latest producer of a
destination register is still a retiring instruction. If so, the entry of
the dependency checking table is cleared. Otherwise, the entry retains
its current pointer since it indicates that a subsequent instruction is
the new latest producer of the register.
* Version 1.5 (June 17, 2019)
- When calculating branch prediction accuracy, the stat counter of
num_br_predicts inadvertently counts predictions made for wrong-path
instructions. An update was made that only on-path instructions are
counted and reflected in the branch prediction accuracy.
* Version 1.6 (June 28, 2019)
- Data forwarding and branch prediction become options to choose when
compiling the simulator. To enable the data forwarding, compile with
make -j OPT="-DDATA_FWD" or set $OPT=-DDATA_FWD in the Makefile.
Similarly, enabling the branch prediction should compile with make -j
OPT="-DBR_PRED" or $OPT=-DBR_PRED in the Makefile.
- Data memory is split into data cache (data_cache.h/cc) and memory
(data_memory.h/cc) parts. The data cache is directly hooked up to the
memory stage of the pipeline, backed by the data memory. The data memory
is organized as an one-dimensional array of 64-bit unsigned integers,
which still requires strict 8-byte alignment for data access. The data
cache is flexibly configurable with variable cache size, block size, and
set associativity. A miss in the cache retrieves data from the lower-
level memory. Miss penalty is configurable but set to zero by default.
- inst_t was updated by renaming the "is_taken" flag to "pred_taken", and a
new variable "pred_target" was added to the class that carries a
predicted branch target address obtained from the branch target buffer
(BTB). When a branch prediction is made in the fetch stage, an inst
collects both pieces of information, i.e., pred_taken and pred_target.
These are used later in the writeback stage to confirm if the control
flow has made a correct progress. A pipeline flush occurs because of the
following reasons. The first cause is simply due to branch misprediction,
and the other is when the BTB supplies a wrong target address despite the
correct prediction of a taken or not-taken decision.
- A possible bug is found that having a multi-cycle instruction (e.g., mul,
div) alone in the pipeline may prematurely finish a simulation loop.
Extra conditions are added to the simulation loop to check if ALU and
data memory are busy. The simulation can terminate only when all pipeline
registers, ALU, and data memory are empty.
* Version 1.7 (April 26, 2020)
- A bug was fixed in inst_memory.cc when parsing UJ-type instruction
(i.e., jal).
- A redundant condition in proc_t::fetch() is removed. It previously
checked if inst was valid before get_op_type(inst->op) == op_sb_type for
conditional branches. It was unnecessary since this line of code was
never executed when inst_memory->read(pc) returned null.
- A few error-checking lines were added to reg_file.cc. Previously,
undefined registers in reg_state were initialized to zero by default.
After the update, all the registers must be explicitly initialized.
- Error-checking lines were added to data_memory.cc to detect an undefined
memory address or value on the left or right side of an equal sign.
- Documentation was updated by proofreading and structurally reorganizing
it.
* Version 1.8 (Apr 25, 2022)
- A typo in the documentation was fixed, i.e., sd instruction format in the
supported instruction table.
- RISC-V specifications define that a jalr instruction resets the LSB of
a new PC, which requires an extra AND operation with uint64_t(-2). This
minor change was added to Kite to conform to the specifications but has
no impact on simulations.
- Support for a U-type instruction (i.e., lui instruction) has been added.
- When loading a program code, constant values are checked if they fit into
the immediate field of instructions, i.e., 12 bits for I-type and 20 bits
for U-type instructions. Likewise, PC-relative distances of SB-type and
UJ-type instructions must fit to the immediate fields.
- block_t constructors in data_cache.h were fixed. The constructors did not
initialize the "last_access" variable that records the clock cycle that
the corresponding block was last accessed in the data cache.
- data_cache.cc was fixed for a smoother extension to a set-associative
cache for future assignments.
- print_state() function was added to reg_file_t. Previously, it was
directly printed in proc.cc. This change removes a need for the read()
function of reg_file_t.
- is_num_str() macro in defs.h is split into is_pos/neg_num_str() to
differentiate positive or negative integers.
- In the constructor of data_memory_t, memory and accessed are zeroed out
for initialization.
- #include <cstring> was added to data_memory.cc because of the compiler
error of "memset was not declared in this scope."
- The default size of memory was increased from 1KB to 4KB. The first 1KB
is assumed to be reserved for the code segment and thus inaccessible by
load or store instructions. Addresses in the code segment are used as PCs
of instructions.
* Version 1.9 (June 16, 2022)
- Support for non-doubleword memory instructions (i.e., lb, lbu, lh, lhu,
lw, lwu, sb, sh, sw) was added to data_cache.cc. Restriction on
doubleword-granular address alignment is loosened such that a memory
address can be any multiple of the accessed data size (e.g., word-
granular address for lw, lwu, and sw).
- Error checking for the validity of register numbers was added to
inst_memory.cc when loading a program code.
- Double-precision arithmetic instructions (i.e., fadd.d, fdiv.d, fmul.d,
fsub.d) are added to the list of supported instructions. Double-precision
numbers are stored as int64_t type but type-cast when they are read.
Similarly, double-precision numbers are type-cast when they are stored
in registers or memory.
- Double-precision memory instructions (i.e., fld, fsd) are added to the
list of supported instructions.
- Values in the memory_state file can be put as double-precision numbers in
the form of "(sign)int.fraction" such as -1234.5678.
- is_fp_str() macro in defs.h is fixed to double-check if the string
contains a decimal point.
- Branch prediction accuracy was previously calculated only for the branch
predictor itself, and it did not include branch target mispredictions.
After the fix, the pipeline stats print both branch mispredictions and
branch target mispredictions.
* Version 1.10 (April 3, 2023)
- Supports for non-double word memory instructions and floating-point
operations are removed. This version rolls back to version 1.8, except
for a few essential fixes.
- The validity of a register number is checked by a macro, is_reg_str(),
in defs.h when a program code is loaded.
- The code segment size is no longer fixed to 1KB. When a program code is
loaded, the number of instructions dynamically defines the code segment
size. When a data memory access is made, accessing the code segment
region throws an error.
* Version 1.11 (May 9, 2023)
- The lui instruction mistakenly loaded the immediate value to the 20th-
bit position of the destination register. The error was fixed to load
the value to the 12th-bit position.
- branch_taken state is added to inst_t to print branch taken/not-taken
states in the debugging messages.
- Branch stat is fixed to count branch target mispredictions in the
prediction accuracy.
- The branch predictor is formatted as a two-level predictor (b, p, h)
with b-bit branch history table (BHT) indexing, p-bit pattern history
table (PHT) indexing, and h-bit branch history per BHT entry.
* Version 1.12 (Mar 22, 2024)
- Debug messages for SB-type instructions are updated to print actual
branch results (i.e., inst->branch_taken) after they are resolved in
the execute stage. If the branch predictor option is enabled (i.e.,
OPT=-DBR_PRED), branch prediction results (i.e., inst->pred_taken)
are printed from the fetch stage, but the actual branch results show
up from the execute stage.
- The memeory_state file was renamed to mem_state.