Some habits are too hard to break, huh

Posted in emulators, nullDC

After a few hours of working on nullDC Dynarec instrumentation/profiling …

Shenmue: Ingame

mov:32      25.07%  37931888
mov:64       0.61%  929370
readm:8      0.14%  215023
readm:16     0.88%  1325655
readm:32    21.96%  33223344
  46.25% 15364458 to mem
   0.00% 300 to route
  53.75% 17858586 to inline
  27.54% 9150191 static
  18.71% 6214567 fmem
readm:64     0.54%  817860
  90.76% 742252 to mem
   0.00% 0 to route
   9.24% 75608 to inline
   8.65% 70780 static
  82.10% 671472 fmem
writem:32    3.02%  4563780
  98.73% 4505665 to mem
   0.06% 2880 to route
   1.21% 55235 to inline
   0.00% 0 static
  98.79% 4508545 fmem
writem:64    0.11%  168904
 100.00% 168904 to mem
   0.00% 0 to route
   0.00% 0 to inline
   0.00% 0 static
 100.00% 168904 fmem
cmp:32       9.04%  13672121
test:32      2.98%  4502301
SaveT:32    13.33%  20163008
LoadT:32     1.95%  2943077
not:32       0.20%  302600
and:32       0.33%  500402
or:32        0.28%  429952
xor:32       0.14%  212814
shl:32       0.74%  1125146
shr:32       0.12%  182725
rcl:32       0.26%  398469
movex:8      0.19%  288026
add:32      10.19%  15414719
sub:32       2.02%  3055821
fadd:32      0.13%  192646
fsub:32      1.05%  1582237
fmul:32      1.30%  1969777
fdiv:32      0.17%  250172
fneg:32      0.11%  167815
fmac:32      0.30%  447457
ifb:8        0.41%  627217
ftrv:32      0.25%  377367
fipr:32      0.25%  376035
floatfpul:32 0.25%  382908
ftrc:32      0.25%  378803
fcmp:32      0.58%  880835
pref:32      0.46%  701954
rest(18 ops) 0.38%  578451
Total       151.28M

Profiling games sure is fun :D

These are IL opcode counts, per dreamcast second. Sadly its not very practical to get execution time, so execution count will have to do for now … It’s interesting to note that most games archive between 120 and 200 MIPS (With most 30 fps rps on the low side, and DOA2LE getting constantly around 202 MIPS ingame :p)

mov32, readm32, writem32, cmp32, tst32, SaveT, LoadT, add32, sub32 make up for 90% of the opcodes executed. Out of these, readm32, SaveT and LoadT could be optimized, and maybe something can be done for movs aswell.

Memory

readm:32    21.96%  33,223,344
  46.25% 15,364,458 to mem
   0.00% 300 to route
  53.75% 17,858,586 to inline
  27.54% 9,150,191 static
  18.71% 6,214,567 fmem
writem:32    3.02%  4563780
  98.73% 4,505,665 to mem
   0.06% 2,880 to route
   1.21% 55,235 to inline
   0.00% 0 static
  98.79% 4,508,545 fmem

Reads are 7x more common than reads. Array/Pointers access is pretty much the same between writes and reads (6.2M vs 4.5M — in other spots/games the difference is smaller). Whats interesting is static accesses — predicted static + inline — are over 27M for reads, but just 55K for writes. This verifies that sh4 really sucks at loading constants — so pretty much all of the constants are loaded as mem-reads — and also raises some questions about the generated code quality. Also, register reads+writes were REALLY low (10 mmr reads/frame, 96 mmr writes/frame).. Interesting huh ?

Anyway, these numbers and other statistics i plan to gather the following days will help to better optimize nullDC !