Some habits are too hard to break, huh
After a few hours of working on nullDC Dynarec instrumentation/profiling …
Shenmue: Ingame
mov:32 25.07% 37931888
mov:64 0.61% 929370
readm:8 0.14% 215023
readm:16 0.88% 1325655
readm:32 21.96% 33223344
46.25% 15364458 to mem
0.00% 300 to route
53.75% 17858586 to inline
27.54% 9150191 static
18.71% 6214567 fmem
readm:64 0.54% 817860
90.76% 742252 to mem
0.00% 0 to route
9.24% 75608 to inline
8.65% 70780 static
82.10% 671472 fmem
writem:32 3.02% 4563780
98.73% 4505665 to mem
0.06% 2880 to route
1.21% 55235 to inline
0.00% 0 static
98.79% 4508545 fmem
writem:64 0.11% 168904
100.00% 168904 to mem
0.00% 0 to route
0.00% 0 to inline
0.00% 0 static
100.00% 168904 fmem
cmp:32 9.04% 13672121
test:32 2.98% 4502301
SaveT:32 13.33% 20163008
LoadT:32 1.95% 2943077
not:32 0.20% 302600
and:32 0.33% 500402
or:32 0.28% 429952
xor:32 0.14% 212814
shl:32 0.74% 1125146
shr:32 0.12% 182725
rcl:32 0.26% 398469
movex:8 0.19% 288026
add:32 10.19% 15414719
sub:32 2.02% 3055821
fadd:32 0.13% 192646
fsub:32 1.05% 1582237
fmul:32 1.30% 1969777
fdiv:32 0.17% 250172
fneg:32 0.11% 167815
fmac:32 0.30% 447457
ifb:8 0.41% 627217
ftrv:32 0.25% 377367
fipr:32 0.25% 376035
floatfpul:32 0.25% 382908
ftrc:32 0.25% 378803
fcmp:32 0.58% 880835
pref:32 0.46% 701954
rest(18 ops) 0.38% 578451
Total 151.28M
Profiling games sure is fun :D
These are IL opcode counts, per dreamcast second. Sadly its not very practical to get execution time, so execution count will have to do for now … It’s interesting to note that most games archive between 120 and 200 MIPS (With most 30 fps rps on the low side, and DOA2LE getting constantly around 202 MIPS ingame :p)
mov32, readm32, writem32, cmp32, tst32, SaveT, LoadT, add32, sub32 make up for 90% of the opcodes executed. Out of these, readm32, SaveT and LoadT could be optimized, and maybe something can be done for movs aswell.
Memory
readm:32 21.96% 33,223,344
46.25% 15,364,458 to mem
0.00% 300 to route
53.75% 17,858,586 to inline
27.54% 9,150,191 static
18.71% 6,214,567 fmem
writem:32 3.02% 4563780
98.73% 4,505,665 to mem
0.06% 2,880 to route
1.21% 55,235 to inline
0.00% 0 static
98.79% 4,508,545 fmem
Reads are 7x more common than reads. Array/Pointers access is pretty much the same between writes and reads (6.2M vs 4.5M — in other spots/games the difference is smaller). Whats interesting is static accesses — predicted static + inline — are over 27M for reads, but just 55K for writes. This verifies that sh4 really sucks at loading constants — so pretty much all of the constants are loaded as mem-reads — and also raises some questions about the generated code quality. Also, register reads+writes were REALLY low (10 mmr reads/frame, 96 mmr writes/frame).. Interesting huh ?
Anyway, these numbers and other statistics i plan to gather the following days will help to better optimize nullDC !