How can I map HTML named entities to LaTeX commands?

In ConTeXt, the char-ent.lua file contains the list of all HTML entities. You can access them using the characters.entities table. For example, the following code prints all the entities and their values to screen.

\starttext
  \startluacode
      local entities = characters.entities
        for name, value in next, entities do
            print(name,value)
        end
  \stopluacode
\stoptext

The entities are not translated to the corresponding TeX command, rather they are translated to the corresponding unicode symbol. If you want the corresponding TeX name for the symbol, you can search the characters.data table (defined in char-def.lua)

\starttext
  \startluacode
      local data = characters.data
      local function context_name(value)
          value = data[value]
          if value then
              if value.contextname then
                  return value.contextname 
              elseif value.mathname then
                  return value.mathname
              elseif value.mathspec then
                  return value.mathspec[1].name
              else
                return "not defined"
              end
          else
            return "not defined"
          end
      end
      local entities = characters.entities
        for name, value in next, entities do
            print(name,value, context_name(value))
        end
  \stopluacode
\stoptext

The resulting list is here. I don't know if LaTeX follows the same naming convention for all commands.


Just to provide closure on this, my original question was for mapping HTML 4.0 entities to LaTeX. Aditya's response provides the ability to map a far wider range of named character entities.

I've used Aditya's data set to produce the mappings just for HTML 4.0 as below. If it is not on the list, there is no mapping (or I've messed up the data set reduction).

Please see comments -- this table is only of much value in ConTeXt.

    HTML 4.0 / LaTeX
    Aacute  /  Aacute
    aacute  /  aacute
    Acirc  /  Acircumflex
    acirc  /  acircumflex
    acute  /  textacute
    AElig  /  AEligature
    aelig  /  aeligature
    Agrave  /  Agrave
    agrave  /  agrave
    alefsym  /  aleph
    Alpha  /  greekAlpha
    alpha  /  greekalpha
    and  /  wedge
    ang  /  angle
    Aring  /  Aring
    aring  /  aring
    asymp  /  approx
    Atilde  /  Atilde
    atilde  /  atilde
    Auml  /  Adiaeresis
    auml  /  adiaeresis
    bdquo  /  quotedblbase
    Beta  /  greekBeta
    beta  /  greekbeta
    brvbar  /  textbrokenbar
    bull  /  textbullet
    cap  /  cap
    Ccedil  /  Ccedilla
    ccedil  /  ccedilla
    cedil  /  textcedilla
    cent  /  textcent
    Chi  /  greekChi
    chi  /  greekchi
    circ  /  textcircumflex
    clubs  /  clubsuit
    cong  /  approxEq
    copy  /  copyright
    crarr  /  carriagereturn
    cup  /  cup
    curren  /  textcurrency
    dagger  /  textdag
    Dagger  /  textddag
    darr  /  downarrow
    dArr  /  Downarrow
    deg  /  textdegree
    Delta  /  greekDelta
    delta  /  greekdelta
    diams  /  blacklozenge
    divide  /  textdiv
    Eacute  /  Eacute
    eacute  /  eacute
    Ecirc  /  Ecircumflex
    ecirc  /  ecircumflex
    Egrave  /  Egrave
    egrave  /  egrave
    empty  /  emptyset
    emsp  /  emspace
    ensp  /  enspace
    Epsilon  /  greekEpsilon
    epsilon  /  greekepsilon
    equiv  /  equiv
    Eta  /  greekEta
    eta  /  greeketa
    ETH  /  Eth
    eth  /  eth
    Euml  /  Ediaeresis
    euml  /  ediaeresis
    exist  /  exists
    fnof  /  fhook
    forall  /  forall
    frac12  /  onehalf
    frac14  /  onequarter
    frac34  /  threequarter
    frasl  /  textfraction
    Gamma  /  greekGamma
    gamma  /  greekgamma
    ge  /  geq
    gt  /  gt
    harr  /  leftrightarrow
    hArr  /  Leftrightarrow
    hellip  /  textellipsis
    Iacute  /  Iacute
    iacute  /  iacute
    Icirc  /  Icircumflex
    icirc  /  icircumflex
    iexcl  /  exclamdown
    Igrave  /  Igrave
    igrave  /  igrave
    image  /  Im
    infin  /  infty
    int  /  intop
    Iota  /  greekIota
    iota  /  greekiota
    iquest  /  questiondown
    isin  /  in
    Iuml  /  Idiaeresis
    iuml  /  idiaeresis
    Kappa  /  greekKappa
    kappa  /  greekkappa
    Lambda  /  greekLambda
    lambda  /  greeklambda
    lang  /  langle
    laquo  /  leftguillemot
    larr  /  leftarrow
    lArr  /  Leftarrow
    lceil  /  lceiling
    ldquo  /  quotedblleft
    le  /  leq
    lfloor  /  lfloor
    lowast  /  ast
    loz  /  lozenge
    lsaquo  /  guilsingleleft
    lsquo  /  quoteleft
    lt  /  lt
    macr  /  textmacron
    mdash  /  emdash
    micro  /  textmu
    middot  /  periodcentered
    Mu  /  greekMu
    mu  /  greekmu
    nbsp  /  nobreakspace
    ndash  /  endash
    ne  /  neq
    ni  /  ni
    not  /  textlognot
    notin  /  nin
    nsub  /  nsubset
    Ntilde  /  Ntilde
    ntilde  /  ntilde
    Nu  /  greekNu
    nu  /  greeknu
    Oacute  /  Oacute
    oacute  /  oacute
    Ocirc  /  Ocircumflex
    ocirc  /  ocircumflex
    OElig  /  OEligature
    oelig  /  oeligature
    Ograve  /  Ograve
    ograve  /  ograve
    Omega  /  greekOmega
    omega  /  greekomega
    Omicron  /  greekOmicron
    omicron  /  greekomicron
    oplus  /  oplus
    or  /  vee
    ordf  /  ordfeminine
    ordm  /  ordmasculine
    Oslash  /  Ostroke
    oslash  /  ostroke
    Otilde  /  Otilde
    otilde  /  otilde
    otimes  /  otimes
    Ouml  /  Odiaeresis
    ouml  /  odiaeresis
    para  /  paragraphmark
    part  /  partial
    permil  /  perthousand
    perp  /  bot
    Phi  /  greekPhi
    phi  /  greekphi
    Pi  /  greekPi
    pi  /  greekpi
    piv  /  greekpialt
    plusmn  /  textpm
    pound  /  textsterling
    prime  /  prime
    Prime  /  doubleprime
    prod  /  prod
    prop  /  propto
    Psi  /  greekPsi
    psi  /  greekpsi
    radic  /  surd
    rang  /  rangle
    raquo  /  rightguillemot
    rarr  /  rightarrow
    rArr  /  Rightarrow
    rceil  /  rceiling
    rdquo  /  quotedblright
    real  /  Re
    reg  /  registered
    rfloor  /  rfloor
    Rho  /  greekRho
    rho  /  greekrho
    rsaquo  /  guilsingleright
    rsquo  /  quoteright
    sbquo  /  quotesinglebase
    Scaron  /  Scaron
    scaron  /  scaron
    sdot  /  cdot
    sect  /  sectionmark
    shy  /  softhyphen
    Sigma  /  greekSigma
    sigma  /  greeksigma
    sigmaf  /  greekfinalsigma
    sim  /  sim
    spades  /  spadesuit
    sub  /  subset
    sube  /  subseteq
    sum  /  sum
    sup  /  supset
    sup1  /  onesuperior
    sup2  /  twosuperior
    sup3  /  threesuperior
    supe  /  supseteq
    szlig  /  ssharp
    Tau  /  greekTau
    tau  /  greektau
    there4  /  therefore
    Theta  /  greekTheta
    theta  /  greektheta
    thetasym  /  greekthetaalt
    thinsp  /  breakablethinspace
    THORN  /  Thorn
    thorn  /  thorn
    tilde  /  texttilde
    times  /  textmultiply
    trade  /  trademark
    Uacute  /  Uacute
    uacute  /  uacute
    uarr  /  uparrow
    uArr  /  Uparrow
    Ucirc  /  Ucircumflex
    ucirc  /  ucircumflex
    Ugrave  /  Ugrave
    ugrave  /  ugrave
    uml  /  textdiaeresis
    Upsilon  /  greekUpsilon
    upsilon  /  greekupsilon
    Uuml  /  Udiaeresis
    uuml  /  udiaeresis
    weierp  /  wp
    Xi  /  greekXi
    xi  /  greekxi
    Yacute  /  Yacute
    yacute  /  yacute
    yen  /  textyen
    yuml  /  ydiaeresis
    Yuml  /  Ydiaeresis
    Zeta  /  greekZeta
    zeta  /  greekzeta
    zwj  /  zwj
    zwnj  /  zwnj

There's a chart here: ISO character entities and their LATEX equivalents by Vidar Bronken Gundersen and Rune Mathisen. Source material is here: http://www.bitjungle.com/isoent/ - including a big XML file with mappings between various formats.

The same folks have turned that into a Perl program, available here: http://llg.cubic.org/docs/ent2latex.html (from this answer to How to look up a symbol or identify a math symbol or character?.)

And here's another list: http://www.w3.org/Math/characters/unicode.xml, and the same data compiled into python: https://gist.github.com/piquadrat/798549

Tags:

Html