Conclusions: Altera's Offerings and Competitive Landscape

FPGA vendors have long preached about the efficiency of reconfigurable hardware over general purpose processors. However, FPGAs have often been rejected as an option by many due to the programming challenges associated with them.  CPUs, and even GPUs, typically offered a much faster time to market and a larger talent pool of programmers. With OpenCL, FPGA vendors can play on an equal footing as far as programming is concerned.   Earlier, the decision to use an FPGA over another accelerator required significant resource commitment.  OpenCL allows FPGAs to be used as just another option lowering the risk and is potentially a game changer.

On a personal note, I am hoping for cheaper OpenCL capable FPGAs to hit the market. Currently, OpenCL capable FPGAs run into thousands of dollars. This is likely not an issue for the enterprise market typically targeted by FPGA vendors. However, OpenCL on FPGAs has not attracted as much mindshare as GPUs. GPU vendors have a huge advantage that anyone with a cheap laptop can start experimenting with and learning about GPUs. The easy and cheap access to GPUs enabled GPU computing to take off.  Whenever computing technology has become cheaper and/or easier to program, it has enabled many creative products around it in fields not thought of by the original technology makers. FPGAs have not yet reached that stage. While there is a community of FPGA enthusiasts, enabling OpenCL on cheaper FPGAs can increase this community many-fold.

Altera's OpenCL offering effectively promises customized hardware for your OpenCL kernels and the claim is that FPGAs will be more efficient than CPUs or GPUs at many tasks.  Applications that are not necessarily floating-point heavy, for example applications relying on custom integer datatypes, heavy bit-manipulation or fixed point calculations, are an area where FPGAs can shine because CPU and GPU hardware is not really tailored for such applications. The high-speed I/O connections available on an FPGA with external bandwidth far outstripping other accelerators is another advantage. I think streaming/filtering type of applications are an obvious niche that FPGAs can fulfill.  On the other hand, accelerators such as Nvidia Tesla and Xeon Phi will likely continue to do well in many double-precision floating-point applications because these accelerators are heavily optimized for such use cases. Applications such as image processing or data visualization that can make use of dedicated graphics related hardware on GPUs are also best done on a GPU. 

Finally, I would say I am cautiously optimistic at the prospect of using OpenCL on FPGAs.  I am impressed by the theoretical potential for OpenCL on FPGAs. However, I would  like to see third party studies comparing OpenCL SDKs for FPGAs and general purpose processors on various tasks to get a better understanding of performance and power consumption of various accelerator options. If you are evaluating GPUs or Xeon Phi for your application, you should definitely also consider evaluating OpenCL on FPGAs and compare their performance against other options for your application. OpenCL on FPGAs looks to be gaining steam and this will be an interesting space to watch in the near future and may very well be a turning point for wider adoption of FPGAs in various high-performance application segments.

Altera's OpenCL Implementation Details
Comments Locked

56 Comments

View All Comments

  • BryanC - Wednesday, October 9, 2013 - link

    Thanks for the article. Are you planning a follow up where you write some programs and measure performance? I'm curious to see how it compares when you actually try to use it.
  • rahulgarg - Wednesday, October 9, 2013 - link

    It is on my to-do list. Will have to ask Altera if they are up for it. Not sure if they are used to being covered and benched by websites such as Anandtech :P. I think it is likely new territory for both us and them.

    Also, experimental design will have to be careful. Doing an experiment would involve tuning the kernels for each device first. So even if assume that I do get some hardware, it will certainly be a time-consuming process.
  • Kevin G - Wednesday, October 9, 2013 - link

    I'd be curious to see the raw initial result. Knowing what you can get by recycling your OpenCL code is of interest to parties that don't have the resources to do a good port.
  • rahulgarg - Wednesday, October 9, 2013 - link

    Thanks for the feedback. If I do testing, I will keep that in mind.
  • toyotabedzrock - Wednesday, October 9, 2013 - link

    If you do a followup could you explain vectorization in more detail? Your other explanations where very understandable.
  • vladx - Friday, October 11, 2013 - link

    Vectorization simply means in his example adding all the vector's elements all at once instead of doing it iterative with a loop, thus the algorithm's time is constant (1) instead of linear (n).
  • Brutalizer - Tuesday, October 15, 2013 - link

    Vectorization is done like this. Compare this non vectorized code:
    for (i = 0; i < 10000; i++)
    A = B + C

    To this vectorized code:
    A = B + C
    Here, A, B and C are vectors. So you can add each element at once. You dont have to add one element at a time, instead you add them all at once. You add vectors in one operation, instead of lot of scalars. The many GPU processors will add one element each, at once - thus you have vectorized code.
  • GNUminex - Wednesday, October 16, 2013 - link

    Your post somewhat goes against my knowledge of FPGAs. FPGA performance is result of the number of slices of your fpga, the max frequency, the HDL compiler's optimization capabilities and your code. What exactly could you test other than the performance of openCL versus traditional hardware design languages on the same fpga? If you are comparing an FPGA to a GPU you might as well also compare them to a CPU because the optimal applications of each piece of hardware are completely different.
  • esoel - Wednesday, October 9, 2013 - link

    Interesting stuff but the article would be _so_much_ better with some hands on and benchmarks… Altera don't be cheap, send this guy a review unit! ;-)
  • rahulgarg - Wednesday, October 9, 2013 - link

    Yes, I do think doing actual benchmarking should ideally be next on the list but do see my reply above.

Log in

Don't have an account? Sign up now