Searching a phrase in all *.nb files

Here is a way to search from within mathematica:

notebooks = Quiet@FileNames["*.nb", NotebookDirectory[], 2];
Monitor[Select[
   Table[{nb, 
     StringJoin@Select[ StringSplit[Import[nb, "Plaintext"], "\n"] ,
        ((If[#, Print["match on:", nb]]; #) &@
             StringMatchQ[#, "*NIntegrate*"]) &, 5]},
                {nb,notebooks}], #[[2]] != "" &], {nb}] // Grid[#, Alignment -> {Left, Top}, Dividers -> All] &

This is painfully slow, but it does just search and show only the plain text of the notebook.


Note: the following method isn't robust. See this answer of mine for a robust solution.


Here is an approach which does not rely on the NBImport.exe (which actually performs importing of the NB files as "Plaintext" under the hood) and performs all the operations in the Kernel only. Currently NBImport.exe contains a bug due to which it returns $Failed when have to import a NB file with non-ASCII file path.

The weak side of the following method is that it relies upon the ability of MakeExpression to convert a low-level Notebook expression into the high-level DocumentNotebook what it doesn't always able to do even for correct NB files (and this ability is not guaranteed by the developers in general). This conversion is necessary because ToString doesn't accept raw boxes as the low-level representation of a WL expression (even wrapping the raw boxes by RawBoxes is simply ignored).

The simple function presented below currently fails in many situations but demonstrates the idea.

Here is a function which Gets the contents of a NB file as Notebook expression, then extracts all the Cells as the actual WL expressions wrapped by HoldComplete, converts them into strings and checks whether they contain specified string pattern or not:

findInNBFile[NBFilePath_String, stringPattern_] := 
  Module[{expr = MakeExpression[Get[NBFilePath], StandardForm], cellExprPos, foundPos},
   cellExprPos = Replace[Position[expr, ExpressionCell | TextCell], 0 -> 1, {2}];
   foundPos = 
    Flatten@Position[
      StringFreeQ[
       StringTake[ToString /@ Extract[expr, cellExprPos, HoldComplete], {14, -2}], 
       stringPattern], False];
   If[foundPos =!= {}, 
    Grid[Join[{{Row[{"Found \"", stringPattern, "\" in file \"", NBFilePath, "\""}], 
        SpanFromLeft}, {"Cell #", "The Cell"}}, 
      Transpose[{foundPos, Extract[expr, Most /@ cellExprPos[[foundPos]], HoldForm]}]], 
     Frame -> All], {NBFilePath, False}]
   ];

It can be used as follows:

findInNBFile["ExampleData/document.nb", "abcde"]

output


A couple of additional solutions. The first, with FindList, is probably the simplest and quickest.

Using FindList

searchDir = "<NB dir>";

fnames = FileNames["*.nb", searchDir, 2];
Length@fnames

sres = {#, FindList[#, {"curve"}, WordSearch -> False]} & /@ fnames;
sres = Select[sres, Length[#[[2]]] > 0 &];
Grid[sres, Dividers -> All, Alignment -> {Left, Top}]

See the options of FindList.

Using CreateSearchIndex

searchDir = "<NB dir>";

index = CreateSearchIndex[searchDir]

sobjs = TextSearch[index, {"curve", "regression"}]

sres = MapThread[{#1, 
     StringCases[#2, "curve" ~~ (Except["\n"] ...) ~~ "regression", 
      IgnoreCase -> True]} &,
   {Through[sobjs["Location"]], Through[sobjs["Plaintext"]]}];

Grid[sres, Dividers -> All, Alignment -> {Left, Top}]

See the signature of TextSearch -- it allows complicated "and", "or", "except" searches.

This solution seems to be fairly slow.