Saturday, July 21, 2012

Coding with Emacs and e2wm, a brilliant emacs windows manager

I've been using emacs for a while now and never intend to move on to vi, and especially other gui IDEs, because I simply just cant move my pinky finger from ctrl, it just automatically gets stuck there lol. and the IDEs editors are just too fancy for me.

Anyway the most annoying problem when developing with emacs is the project management, right? Since I believe you don't wanna ctrl+x ctrl+c to grep for some stuffs, and just split the emacs window to then do some shell stuffs there. It's pretty troublesome when you quit your emacs and then you have to split the windows to your preference again. Times like these, you have to define the splitting functions to .emacs and it's gonna suck more of your precious time again.




Well brace yourself, this tool is the perfect tool for project management for emacs, e2wm! Basically there are 5 kinds of perspectives you can choose for splitting the windows, Code, Two, Doc, Dashboard, and Array perspective, which is respectively equipped with various emacs tools.

To use e2wm is simple, what you have to do is just download e2wm-config.el, e2wm-vcs.el, and e2wm.el and then put it in your emacs path, and then add:
(require 'e2wm)
(global-set-key (kbd "M-+") 'e2wm:start-management)
to your .emacs file!

this is the link to kiwanami's github (the author of e2wm, thanks!):  https://github.com/kiwanami
and this is the link to e2wm's github repo: https://github.com/kiwanami/emacs-window-manager

Intel OpenCL Implicit Vectorizer


OpenCL compiler differs depending on its vendor, and Intel optimizes its compiler to auto-vectorize some loops that may take the advantage of the SSE and AVX instructions.
For example, the Black-Scholes equation when executed with single thread C99 and single thread OpenCL thread gives the execution time as below:
  • Input: 10MB of data
  • calculates both call and put option
  • both uses -O3 compiler option of gcc-4.4
c99 : 1612.203 ms
OpenCL : 673.248 ms
This 'hidden' optimization is kinda cool, isn't it?

Wednesday, July 18, 2012

Let's parallelize everything

Who does not want to see his programs run faster? Using exactly the same machine, same cpu, same gpu, same programming language? Many parallel programming frameworks/APIs have been developed and released recently, and with the help of many amazing programmers worldwide, these APIs have also been wrapped beautifully into many front-end languages such as Python, Ruby, or even Javascript. Some also are under development to be ported to more delicate programming languages such as Haskell, Scala, or Clojure. Soo for those of you who haven't seen/experienced the elegance of parallel computing, what are you waiting for? Let's parallelize everything!

Let me share here some of the parallel programming frameworks I know, and I believe they are famous enough to not get disappointed when using it.

First of all, there is CUDA, developed by NVIDIA, a leading company in the Graphics Processing Unit devices. I have been developing some programs using CUDA for 2 years now, and the API itself is pretty simple and straightforward to use, and I am amazed with the number of sample programs they provided inside the CUDA Toolkit, makes us incredibly easy to learn some stuffs, not only how to use CUDA API but also some important algorithms and tuning-up for developing a parallel executed program. Having said so, actually it was pretty hard to develop using CUDA back then (when it was still version 1.x), but NVIDIA just released a new version of CUDA (CUA 4.2 going to CUDA 5) and everything just became easier, to understand, to install, and to develop with that. Most important thing, note that since NVIDIA is the developer and it is not an open source software, CUDA is only available for NVIDIA's GPUs (from GeForce 8800 to the newest one). The technique of having your computation that is supposed to run sequentially on CPU is basically called a GPGPU (General Purposed GPU) computing.

And then there is OpenCL, derived from Open Computing Language, initially developed Apple and Khronos Group. OpenCL is a framework for parallelization and is aimed to be able to execute  in many platforms. At first OpenCL was only released with the standard C99 API but then they added the C++ wrappers to the runtime API, hence makes us easier to do some OOP stuffs. OpenCL can run on some major vendors' devices: AMD, Intel, and NVIDIA, where each vendor has its own compiler (or library) to interpret and/or optimize the standardized OpenCL runtime API. afaik it is Khronos who's been leading the OpenCL development and standardizing the API. Since every vendor has different technology equipped to its device, each of them releases its own OpenCL programming SDK and this can be seen inside the website respectively: AMD OpenCL SDK, Intel OpenCL SDK, and NVIDIA OpenCL SDK which comes with the CUDA Toolkit. Each SDK is provided with a unique library and some sample programs, and each vendor's compiler has its own way to optimize in compile time. IMO, learning OpenCL will not be that hard if you have previously done some CUDA programming.

Those two above are the APIs I've been using for a while to do some parallel stuffs, and there are plenty more out there and you can get some of them for free (or maybe even already installed in your computer) but some aren't.
Examples for free APIs: OpenMP, Intel's TBB, Intel's ArBB, Pthreads
Not free: PGI's Compiler with OpenACC, CAPS' Compiler with OpenACC