BERT inference cost/performance analysis CPU vs GPU

The model:

The data:

The infrastructure:

Results:

Cost /hour  Cost /inference   Max inference per hour
8cpu 0.197872 0.003848 51.42
16cpu 0.395744 0.004584 86.33
32cpu 0.791488 0.008266 95.74
1gpu 2.004 0.001447 1384.61

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store