Hi,
I am OpenCL programmer. I use AMD tools (CodeXL) and libraries in my work. I haven't used Brook before.
I assume that you have the code in C or C++. I would like to know the work or the algorithm that you are looking to offload to GPU using OpenCL. I would like to understand the algo and see if I can use OpenCL or Bolt. Bolt is a C++ template library which uses OpenCL underneath, easy to use but very few functions are implemented.
You mentioned that some part of code is in Assembly is it host side code and you are expecting some SSE2/SSE3 optimization?
If the code that you have is proprietary, I am ready to sign a NDA.
We will renegotiate the price and schedule once I understand the scope of work.