Cortex-A7 vs Cortex-A9 vs Cortex-A53 vs Silvermont

http://wlog.flatlib.jp/item/1800 この表を見てて、そういえば、私もRaspberry Pi 3 が家に来たし似たような実験できるなと思ったので私も似たような実験をした。

binutilsのビルド時間です。消費電力はKILL A WATT目視。

raspberry pi 3 (idle 1.8W, load 4.7W, diff=2.9W)
 Performance counter stats for 'sh -c ../configure ; make -j4':

     797005.468900      task-clock (msec)         #    2.953 CPUs utilized
           121,732      context-switches          #    0.153 K/sec
            29,936      cpu-migrations            #    0.038 K/sec
         8,586,527      page-faults               #    0.011 M/sec
   954,251,209,471      cycles                    #    1.197 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
   422,118,328,351      instructions              #    0.44  insns per cycle
    53,087,297,010      branches                  #   66.608 M/sec
     6,646,202,484      branch-misses             #   12.52% of all branches

     269.885547005 seconds time elapsed


raspberry pi 2 (idle 1.6W, load 2.9W, diff=1.3W)
 Performance counter stats for 'sh -c ../configure ; make -j4':

    1477301.886492      task-clock (msec)         #    3.017 CPUs utilized
           138,249      context-switches          #    0.094 K/sec
            31,201      cpu-migrations            #    0.021 K/sec
         8,587,257      page-faults               #    0.006 M/sec
 1,326,853,390,190      cycles                    #    0.898 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
   422,185,890,968      instructions              #    0.32  insns per cycle
    52,974,180,568      branches                  #   35.859 M/sec
    13,626,377,155      branch-misses             #   25.72% of all branches

     489.620091792 seconds time elapsed

parallella (idle 2.8W, load 3.5W, diff=0.7W)
 Performance counter stats for 'sh -c ../configure ; make -j2':

    1477010.321602      task-clock (msec)         #    1.674 CPUs utilized
            138413      context-switches          #    0.094 K/sec
             12771      cpu-migrations            #    0.009 K/sec
           8209396      page-faults               #    0.006 M/sec
      982777694036      cycles                    #    0.665 GHz
       83520783309      stalled-cycles-frontend   #    8.50% frontend cycles idle
      629510158808      stalled-cycles-backend    #   64.05% backend  cycles idle
      553902409016      instructions              #    0.56  insns per cycle
                                                  #    1.14  stalled cycles per insn
       57338355895      branches                  #   38.821 M/sec
       15218097669      branch-misses             #   26.54% of all branches

     882.186437213 seconds time elapsed

liva ecs (idle 3.5W, 7.2W, diff=3.7W) Celeron N2807 2core
 Performance counter stats for 'sh -c ../configure ; make -j2':

     399205.341262      task-clock (msec)         #    1.790 CPUs utilized
            180763      context-switches          #    0.453 K/sec
             24511      cpu-migrations            #    0.061 K/sec
           8537573      page-faults               #    0.021 M/sec
      832800007036      cycles                    #    2.086 GHz                      (51.63%)
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
      407625678041      instructions              #    0.49  insns per cycle          (76.75%)
       88939447423      branches                  #  222.791 M/sec                    (76.15%)
        7372201988      branch-misses             #    8.29% of all branches          (76.20%)

     222.991643196 seconds time elapsed
時間[秒] IPC W(load) W(idle) W(load-idle)
rpi3 270 0.44 4.7 1.8 2.9 Cortex A53 4core
rpi2 490 0.32 2.9 1.6 1.3 Cortex A7 4core
parallella 882 0.56 3.5 2.8 0.7 Cortex A9 2core
liva ecs 223 0.49 7.2 3.5 3.7 Silvermont 2core


わかることは、

  • Cortex-A9 と Cortex-A53 だと、A53のほうがIPC良いと思ってたけど、A9のほうがいいんだな。まあA9一応OoOだからとは思うけど、Cortex-A9 → Cortex-A53って5年も時間経ってるしなんとかなってると思ってた
  • 分岐ミスがA9,A7 は同じぐらいだけど、A53は半分くらいに減ってる。


まああんまり良い比較ではない、というのは

  • 今のCPUならidle は周辺IOのほうが効くのでidleの比較に意味あるかは謎
  • load-idle は、省電力機能が強いほうが大きくなる。ので、load-idleで比較すればいいわけでもない
  • ビルドのように並列処理できるものは1コアが小さくてコア数が多いほうが有利
  • コンパイル処理だと、x86とarmで処理内容が変わる