Why does lisp use gensym and other languages don't?

Common Lisp has a powerful macro system. You can make new syntax patterns that behave exactly the way you want them to behave. It's even expressed in its own language, making everything in the language available to transform the code from what you want to write to something that CL actually understands. All languages with powerful macro systems provide gensym or do it implicitly in their macro implementation.

In Common Lisp you use gensym when you want to make code where the symbol shouldn't match elements used any other places in the result. Without it there is no guarantee that a user uses a symbol that the macro implementer also use and they start to interfere and the result is something different than the intended behavior. It makes sure nested expansions of the same macro don't interfere with previous expansions. With the Common Lisp macro system it's possible to make more restrictive macro systems similar to Scheme syntax-rules and syntax-case.

In Scheme there are several macro systems. One with pattern matching where new introduced symbols act automatically as if they are made with gensym. syntax-case will also by default make new symbols as if they were made with gensym and there is also a way to reduce hygiene. You can make CL defmacro with syntax-case but since Scheme doesn't have gensym you wouldn't be able to make hygienic macros with it.

Java, C, C++, Python, Javascript are all Algol dialects and none of them have other than simple template based macros. Thus they don't have gensym because they don't need it. Since the only way to introduce new syntax in these languages is to wish next version of it will provide it.

There are two Algol dialects with powerful macros that come to mind. Nemerle and Perl6. Both of them have hygienic approach, meaning variables introduced behave as if they are made with gensym.

In CL, Scheme, Nemerle, Perl6 you don't need to wait for language features. You can make them yourself! The news in both Java and PHP are easily implemented with macros in any of them should it not already be available.

Can't say which languages have an equivalent of GENSYM. Many languages don't have a first-class symbol data type (with interned and uninterned symbols) and many are not providing similar code generation (macros, ...) facilities.

An interned symbol is registered in a package. An uninterned is not. If the reader (the reader is the Lisp subsystem which takes textual s-expressions as input and returns data) sees two interned symbols in the same package and with the same name, it assumes that it is the same symbol:

CL-USER 35 > (eq 'cl:list 'cl:list)
T

If the reader sees an uninterned symbol, it creates a new one:

CL-USER 36 > (eq '#:list '#:list)
NIL

Uninterned symbols are written with #: in front of the name.

GENSYM is used in Lisp to create numbered uninterned symbols, because it is sometimes useful in code generation and then debugging this code. Note that the symbols are always new and not eq to anything else. But the symbol name could be the same as the name of another symbol. The number gives a clue to the human reader about the identity.

An example using MAKE-SYMBOL

make-symbol creates a new uninterned symbol using a string argument as its name.

Let's see this function generating some code:

CL-USER 31 > (defun make-tagbody (exp test)
               (let ((start-symbol (make-symbol "start"))
                     (exit-symbol  (make-symbol "exit")))
                 `(tagbody ,start-symbol
                           ,exp
                           (if ,test
                               (go ,start-symbol)
                             (go ,exit-symbol))
                           ,exit-symbol)))
MAKE-TAGBODY

CL-USER 32 > (pprint (make-tagbody '(incf i) '(< i 10)))

(TAGBODY
 #:|start| (INCF I)
         (IF (< I 10) (GO #:|start|) (GO #:|exit|))
 #:|exit|)

Above generated code uses uninterned symbols. Both #:|start| are actually the same symbol. We would see this if we would have *print-circle* to T, since the printer then would clearly label identical objects. But here we don't get this added information. Now if you nest this code, then you would see more than the one start and one exit symbol, each which was used in two places.

An example using GENSYM

Now let's use gensym. Gensym also creates an uninterned symbol. Optionally this symbol is named by a string. A number (see the variable CL:*GENSYM-COUNTER*) is added.

CL-USER 33 > (defun make-tagbody (exp test)
               (let ((start-symbol (gensym "start"))
                     (exit-symbol  (gensym "exit")))
                 `(tagbody ,start-symbol
                           ,exp
                           (if ,test
                               (go ,start-symbol)
                             (go ,exit-symbol))
                           ,exit-symbol)))
MAKE-TAGBODY

CL-USER 34 > (pprint (make-tagbody '(incf i) '(< i 10)))

(TAGBODY
 #:|start213051| (INCF I)
         (IF (< I 10) (GO #:|start213051|) (GO #:|exit213052|))
 #:|exit213052|)

Now the number is an indicator that the two uninterned #:|start213051| symbols are actually the same. When the code would be nested, the new version of the start symbol would have a different number:

CL-USER 7 > (pprint (make-tagbody `(progn
                                     (incf i)
                                     (setf j 0)
                                     ,(make-tagbody '(incf ij) '(< j 10)))
                                  '(< i 10)))

(TAGBODY
 #:|start2756| (PROGN
                 (INCF I)
                 (SETF J 0)
                 (TAGBODY
                  #:|start2754| (INCF IJ)
                          (IF (< J 10)
                              (GO #:|start2754|)
                            (GO #:|exit2755|))
                  #:|exit2755|))
         (IF (< I 10) (GO #:|start2756|) (GO #:|exit2757|))
 #:|exit2757|)

Thus it helps understanding generated code, without the need to turn *print-circle* on, which would label the identical objects:

CL-USER 8 > (let ((*print-circle* t))
              (pprint (make-tagbody `(progn
                                       (incf i)
                                       (setf j 0)
                                       ,(make-tagbody '(incf ij) '(< j 10)))
                                    '(< i 10))))

(TAGBODY
 #3=#:|start1303| (PROGN
                    (INCF I)
                    (SETF J 0)
                    (TAGBODY
                     #1=#:|start1301| (INCF IJ)
                             (IF (< J 10) (GO #1#) (GO #2=#:|exit1302|))
                     #2#))
         (IF (< I 10) (GO #3#) (GO #4=#:|exit1304|))
 #4#)

Above is readable for the Lisp reader (the subsystem which reads s-expressions for textual representations), but a bit less for the human reader.

Why does lisp use gensym and other languages don't?

Tags:

Lisp

Common Lisp

Related

Recent Posts