Saturday, July 21, 2012

Intel OpenCL Implicit Vectorizer


OpenCL compiler differs depending on its vendor, and Intel optimizes its compiler to auto-vectorize some loops that may take the advantage of the SSE and AVX instructions.
For example, the Black-Scholes equation when executed with single thread C99 and single thread OpenCL thread gives the execution time as below:
  • Input: 10MB of data
  • calculates both call and put option
  • both uses -O3 compiler option of gcc-4.4
c99 : 1612.203 ms
OpenCL : 673.248 ms
This 'hidden' optimization is kinda cool, isn't it?

No comments:

Post a Comment