Build a multiplying machine using NAND logic gates

60 55 50 48 gates

48-gate multiplier

The original (60 gates) was the systematic approach - multiply each digit with each, and then sum them together. Namely, see Wallace trees and Dadda trees

60-gate multiplier

The top half is the multiplication network - multiply each digit with each, and group output digits with the same weight. Some bits have been left inverted to save gates.

The second half is the adder network. Each box represents a single adder - either a half-adder (5 gates - 1x XOR and an inverter), or a full adder (9 gates - 2x XOR and NAND the inverted carry bits). The top are inputs, the bottom output is the sum, the left output is the carry-out. see the previous challenge

The 2x2 multiplier has then been hand-optimised to a custom-built 13-gate network, which is the optimal size as found by @boothby. Thanks!

Pasting it into the low-bit corner and reoptimising the adder tree saves five gates (see revision #2). Pasting it into the high-bit corner as well, however, produces overlap. A little bit of math tells us, however, that dropping the low-bit of the high multiplier solves the overlap and what's left to do is to add the remaining two bits and sum up stuff.

This alone, unfortunately, does not provide any savings, but it does open two optimisations. First, the two multipliers have two gates in common, and can be fused together. At this point, we're back at 55. Second, in the addition network, we don't need a half-adder because we know its carry will be zero. We can replace it with an OR. An OR is a NAND with its inputs inverted. This produces us with two 2-chains of NOTs on each branch, which can then be removed, for a total saving of five gates. Unfortunately, the half-adder at C16 still carries, so we cannot do the same there. Third, a full adder has a useful property: if you invert its inputs and its outputs, it still behaves the same. Since all its inputs are already inverted, we can just as well move the inverters behind it. Twice. We could have done the same in the original, but... oh well. We still have a half-adder with two inverted inputs. I want optimise this part more, but I doubt I can.

Since we're pulling out a NOT from inside a component, we have to signify that somehow. We have obtained a half-adder with inverted carry (AKA tapped XOR) at a cost of four gates.

In the meantime, we have also redrawn the diagram significantly.

39 gates

I am quite sure there are no simpler designs than mine here. It was very difficult to make. I make other minimal circuits too.

Transmission delay is indicated by downward position of each NAND gate on the sheet.

Minimal 3 bit multiplier

Verilog code and testing:

// The simplest 3 bit multiplicator possible, using 39 NAND gates only.
// I have also made multiplicators that are faster, more efficient,
// use different gates, and multiply bigger numbers. And I also do
// hard optimization of other circuits. You may contact me at
// [email protected]
// This is my entry to win this hard Programming Puzzle & Code Golf
// at Stack Exchange:
// By Kim Øyhus 2018 (c) into (CC BY-SA 3.0)
// This work is licensed under the Creative Commons Attribution 3.0
// Unported License. To view a copy of this license, visit

module mul3x3 ( in_000, in_001, in_002, in_003, in_004, in_005, out000, out001, out002, out003, out004, out005 );
  input  in_000, in_001, in_002, in_003, in_004, in_005;
  output out000, out001, out002, out003, out004, out005;
  wire   wir000, wir001, wir002, wir003, wir004, wir005, wir006, wir007, wir008, wir009, wir010, wir011, wir012, wir013, wir014, wir015, wir016, wir017, wir018, wir019, wir020, wir021, wir022, wir023, wir024, wir025, wir026, wir027, wir028, wir029, wir030, wir031, wir032;

  nand gate000 ( wir000, in_000, in_005 );
  nand gate001 ( wir001, in_000, in_004 );
  nand gate002 ( wir002, in_000, in_003 );
  nand gate003 ( out000, wir002, wir002 );
  nand gate004 ( wir003, in_004, in_001 );
  nand gate005 ( wir004, wir003, wir003 );
  nand gate006 ( wir005, in_003, in_002 );
  nand gate007 ( wir006, wir000, wir005 );
  nand gate008 ( wir007, in_004, in_002 );
  nand gate009 ( wir008, in_001, in_005 );
  nand gate010 ( wir009, wir008, wir007 );
  nand gate011 ( wir010, in_001, in_003 );
  nand gate012 ( wir011, wir001, wir010 );
  nand gate013 ( wir012, out000, wir004 );
  nand gate014 ( wir013, wir004, wir012 );
  nand gate015 ( wir014, wir011, wir012 );
  nand gate016 ( out001, wir014, wir014 );
  nand gate017 ( wir015, in_002, in_005 );
  nand gate018 ( wir016, wir015, wir015 );
  nand gate019 ( wir017, out000, wir016 );
  nand gate020 ( wir018, wir017, wir013 );
  nand gate021 ( wir019, wir016, wir018 );
  nand gate022 ( wir020, wir019, wir009 );
  nand gate023 ( wir021, wir020, wir017 );
  nand gate024 ( wir022, wir020, wir009 );
  nand gate025 ( wir023, wir022, wir021 );
  nand gate026 ( out005, wir022, wir022 );
  nand gate027 ( wir024, wir016, wir022 );
  nand gate028 ( wir025, wir006, wir018 );
  nand gate029 ( wir026, wir025, wir006 );
  nand gate030 ( wir027, wir025, wir018 );
  nand gate031 ( out002, wir026, wir027 );
  nand gate032 ( wir028, wir004, wir027 );
  nand gate033 ( wir029, wir023, wir028 );
  nand gate034 ( wir030, wir028, wir028 );
  nand gate035 ( wir031, wir030, wir021 );
  nand gate036 ( out004, wir031, wir024 );
  nand gate037 ( wir032, wir029, wir031 );
  nand gate038 ( out003, wir032, wir032 );

module mul3x3_test; 
   reg  [5:0] AB; // C=A*B
   wire [5:0] C;
  mul3x3 U1 ( 
  .in_000 (AB[0]), 
  .in_001 (AB[1]), 
  .in_002 (AB[2]), 
  .in_003 (AB[3]), 
  .in_004 (AB[4]), 
  .in_005 (AB[5]), 
  .out000 (C[0]), 
  .out001 (C[1]), 
  .out002 (C[2]), 
  .out003 (C[3]), 
  .out004 (C[4]), 
  .out005 (C[5])
  initial  AB=0;
  always  #10  AB = AB+1;
  initial  begin
    $monitor("%d,\t%b\t%b\t%b",$time, AB[5:3], AB[2:0],C); 
  initial  #630  $finish; 

// iverilog -o mul3x3_test mul3x3_test.v
// vvp mul3x3_test

Kim Øyhus