Transform fancy usage messages in 1D string

Edit:

Since my old approach with TeXForm turned out to be quite a bad idea, here is a new one that uses InputForm. It is much more stable and already correctly covers many, many symbols. Let's start with the code:

usageString[s_Symbol] := Module[{string,
   rules = {
     "\\\"" ~~ a___ ~~ "\\\"" /; StringFreeQ[a, "\\"] :> a,
     "\"" ~~ a__ ~~ "\"" :> a,
     "StyleBox[" ~~ a___ ~~ ", " ~~ ("TI]" | "TR]") /; 
       StringFreeQ[a, "]"] :> a,
     "\\!\\(\\*" ~~ a___ ~~ "\\)" /; StringFreeQ[a, "\\("] :> a, 
     "SubscriptBox[" ~~ a__ ~~ ", " ~~ b__ ~~ "]" /; 
       StringFreeQ[a <> b, "Box" | "]"] :> a <> "_" <> b,
     "SuperscriptBox[" ~~ a__ ~~ ", " ~~ b__ ~~ "]" /; 
       StringFreeQ[a <> b, "Box" | "]"] :> a <> "^" <> b,
     "SubsuperscriptBox[" ~~ a___ ~~ ", " ~~ b___ ~~ ", " ~~ c___ ~~ 
        "]" /; StringFreeQ[a <> b <> c, "Box" | "]"] :> 
      a <> "_(" <> b <> ")^(" <> c <> ")"}
   },
  string = 
   Fold[StringReplace, ToString[MessageName[s, "usage"], InputForm], 
    rules];
  string = FixedPoint[StringReplace[#,
      "RowBox[{" ~~ a__ ~~ "}]" /; StringFreeQ[a, "RowBox" | "}]"] :> 
       a] &,
    string];
  StringReplace[StringJoin@StringSplit[string, ", "], "\\n" -> "\n"]
  ]

Before covering the main problem that usageString still has, let's have a look at what it can do (sorry, but I have to use images to convey this):

enter image description here

You can see that it transforms many of the RowBox, SupersciptBox etc. constructs found in those fancy usage messages to standard strings. It still lacks some tranformation rules, however, for things like UnderoverscriptBox or StyleBox with options:

enter image description here

I think that by adding some more replacement rules to cover the remaining boxing constructs and options, this could be a nice way to get simple string representations of the fancy 2D strings.


I think I found an easy solution. Although my question was how to extract a simple 1d string, I show how to transform usages into nice and simple html. The rules for this can be adapted so that each box-structure is converted into whatever representation is wanted.

The basic trick is the following: A usage message consists of simple text and of special 2d string which are embraced in "\!\(\*" and "\)". Now the way is to extract the contents of such a special string and to transform it into a Mathematica box expression. In this nested boxes we can replace reliable. This is the main difference to what @einbandi proposed who did this box replacement in the string which will always fail at some point.

Now we have to think about a set of rules to replace the box expressions. Since I wanted this for my IDEA plugin which can handle HTML, I will create a mixture of HTML and MathML. The boxes use pure HTML

boxRules = {
   StyleBox[f_, "TI"] :> {"<em>", f, "</em>"},
   StyleBox[f_, ___] :> {f},
   RowBox[l_] :> {l},
   SubscriptBox[a_, b_] :> {a, "<sub>", b, "</sub>"},
   SuperscriptBox[a_, b_] :> {a, "<sup>", b, "</sup>"},
   RadicalBox[x_, n_] :> {x, "<sup>1/", n, "</sup>"},
   FractionBox[a_, b_] :> {"(", a, ")/(", b, ")"},
   SqrtBox[a_] :> {"&radic;(", a, ")"},
   CheckboxBox[a_, ___] :> {"<u>", a, "</u>"},
   OverscriptBox[a_, b_] :> {"Overscript[", a, b, "]"},
   OpenerBox[a__] :> {"Opener[", a, "]"},
   RadioButtonBox[a__] :> {"RadioButton[", a, "]"},
   UnderscriptBox[a_, b_] :> {"Underscript[", a, b, "]"},
   UnderoverscriptBox[a_, b_, c_] :> {"Underoverscript[", a, b, c, 
     "]"},
   SubsuperscriptBox[a_, b_, c_] :> {a, "<sub><small>", b, 
     "</small></sub><sup><small>", c, "</small></sup>"}
   };

With this rules we can replace inside a box expression until nothing changes anymore.

convertBoxExpressionToHTML[boxexpr_] := 
 StringJoin[
  ToString /@ 
   Flatten[ReleaseHold[MakeExpression[boxexpr] //. boxRules]]]

This is basically everything you need to create a html-page of the usages of all known functions. Since I put some more stuff in it like

  • creating of links to the official online documentation
  • display of attributes
  • display of options

I'll put the whole code at the end of this post and please note that it is not cleaned. The whole page looks then like this. Although there are some minor things (like nested 2d strings which are used about 5 times) I think for my plugin I can live with this:

enter image description here

extractUsage[str_] := 
 With[{usg = 
    Function[expr, expr::usage, HoldAll] @@ MakeExpression[str]},
  If[Head[usg] === String, usg, ""]]

createLinkName[s_] := 
 If[StringMatchQ[ToString@FullForm[s], "\"\\[" ~~ __ ~~ "]\""],
  {StringReplace[ToString@FullForm[s], {"\"" :> "", "\\" -> "\\\\"}],
   StringReplace[
    ToString@FullForm[s], {"\"" :> "", 
     "\\[" ~~ c__ ~~ "]" :> "character/" ~~ c}]},
  {s, s}]

createOptionString[s_] := 
 With[{opts = 
    Function[expr, Options[expr], HoldAll] @@ MakeExpression[s]},
  If[opts === {},
   "</p><b>Symbol has no options.</b>",
   "</p><b>Options: </b>" <> 
    StringJoin@Riffle[ToString[First[#]] & /@ opts, ", "]
   ]
  ]

createHtmlUsage[s_String] := Module[{
   usg = extractUsage[s],
   attr = Attributes[s],
   link, linkname},
  {linkname, link} = createLinkName[s];

  "<h3><a href=\"http://reference.wolfram.com/mathematica/ref/" <> 
   link <> ".html\">" <> linkname <> "</a></h3>" <> If[usg =!= "",
    "<ul><li>" <>
     StringReplace[
      StringReplace[
       usg, {Shortest["\!\(\*" ~~ content__ ~~ "\)"] :> 
         convertBoxExpressionToHTML[content],
        "\n" :> "<li>"}
       ], {"\[Null]" :> "", 
       a_?(StringMatchQ[ToString@FullForm[#], 
            "\"\\[" ~~ __ ~~ "]\""] &) :> 
        StringReplace[
         ToString[a, MathMLForm], {WhitespaceCharacter :> ""}]}
      ] <> "</ul>", ""] <> "<b>Attributes:</b> " <> 
   StringJoin[ToString /@ Riffle[attr, ", "]] <> 
   createOptionString[s] <> "\n"
  ]

names = Names["System`*"];
Export["tmp/usageMessages.html", StringJoin[createHtmlUsage /@ names], "Text"]