VHDL: Using '*' operator when implementing multipliers in design

If you just want to multiply two numbers and they suit the DSP block then the * operator should infer a DSP block. If not, send the synthesis tool back :)

However, to take advantage of the more complex combinations of the DSP functionality often requires a direct instantiation of the block and configuring of its parameters. Examples of things which may not map well by inference (using the Xilinx DSP48E1 as an example):

  • Use of pre-adder
  • Use of post accumulator
  • Use of pattern detector
  • Use of the logic unit

And especially combinations of the above.

The synthesis tools are not yet good enough to map completely arbitrary combinations of logic and arithmetic as efficiently as you might hope.

I've done this a few times myself.

Generally, the design tools will choose between a fabric implementation and a DSP slice based on the synthesis settings.

For instance, for Xilinx ISE, in the synthesis process settings, HDL Options, there is a setting "-use_dsp48" with the options: Auto, AutoMax, Yes, No. As you can imagine, this controls how hard the tools try to place DSP slices. I once had a problem where I multiplied an integer by 3, which inferred a DSP slice - except I was already manually inferring every DSP slice in the chip, so the synth failed! I changed the setting to No, because I was already using every dsp slice.

This is probably a good rule of thumb (I just made up): if your design is clocked at less than 50 MHz, and you're probably going to use less than 50% of the DSP slices in the chip, then just use the *, +, and - operators. this will infer DSP slices with no pipeline registers. This really limits the top speed. (I have no idea what happens when you use division)

However, if it looks like you're going to run the slices closer to the max speed of the DSP slice (333 MHz for Spartan 6 normal speed grade) Of you're going to use all of the slices, you should manually infer them.

In this case, you have two options.

Option 1: manually use the raw DSP instantiation template. Option 2: use a IP block from Xilinx Core Generator. ( I would use this option. At the same time, you will learn all about core gen, which will help in the future)

Before you do either of these, read the first couple of pages of the DSP slice user guide. In the case of the Spartan 6, (DSP48A1), that would be Xilinx doc UG389: http://www.xilinx.com/support/documentation/user_guides/ug389.pdf

Consider the Core Generator option first. I usually create a testing project in Core Generator for the part I'm working with, where I create any number of IP blocks just to learn the system. Then, when I'm ready to add one to my design in ISE, I right click in the Design Hierarchy, click new source, and select "IP (CORE Generator & Architecture Wizard)" so that I can edit and regenerate the block directly from my project.

In Core gen, take a look at the different IP blocks you can choose from - there are a few dozen, most of which are pretty cool.

The Multiplier Core is what you should look at first. Check out every page, and click the datasheet button. The important parts are the integer bit widths, the pipeline stages (latency) and any control signals. This produces the simplest possible block by taking away all the ports you don't need.

When I was building a 5 by 3 order IIR filter last year, I had to use the manual instantiation template since I was building a very custom implementation, with 2 DSP slices clocked 4x faster than the sample rate. It was a total pain.

If there are DSP blocks present, you should use them if you can because it will be more efficient than using LUTs to do the same thing. Unless you don't need a high performance multiplication, in which case you should implement, say, a pipelined adder and shift register to save space.

However, I would look at inferring DSP blocks before going into the GUI tools. The Xilinx XST manual has HDL 'recipies' for how to instantiate DSP blocks with pure verilog/VHDL. Basically, if you add enough registers before and/or after the multipliers, XST will use a DSP block to implement the operation automatically. You can check in the synthesis logs to see if it is inferring the DSP blocks correctly. I presume Altera has something similar.

Incidentally, I was just mulling this over a few minutes ago as I am currently working on a Mersenne twister implementation that only uses a multiplier for the initial seed. My first pass implementation doesn't meet timing, but the functionality is correct. XST also put the multiply operation into DSP blocks, however it is not optimized so it runs about half as fast as I would like. I will likely be reimplementing the multiply using a shift-and-add technique that will take 32x the number of clock cycles, but will no longer require a hardware multiplier.