Notebook cells space on disk profiler

Here is another try, this time more close to what you actually wanted: It generates a list of buttons for the largest cells (button is labeled with size in kB). If you press a button, the corresponding cell will be selected. You can change the number of rows to show if you wish.

The code needs the CellIDs to be set, so it has buttons to create and delete such ids (there is not a real reason to ever delete them, but if the notebook has not had set CreateCellID, then they won't exist). I have not tested the generation of CellIDs with a 500MB notebook, what I do to get them generated is somewhat naive, so be sure to only try the buttons to create/delete cellids with a copy of your notebook (the other code should not do any harm even when it crashes...). If you run into problems, you can of course create CellIDs manually as well: Set CreateCellID->True for your notebook and do a cut and paste per cell (or cellgroup or section or whatever works well). More work but probably safer...

Here is the code for the palette:

CreatePalette[
  DynamicModule[{
      numrows=5,update,buttons={},numcells, cellcount
    },
    update=Function[nb,
      Module[{cellcontent,cellsizes={},thisid,prgrsdia = Null, progress = 0},
        cellcount = 0;
        SelectionMove[nb, All, Notebook];
        numcells = CurrentValue[nb, "CellCount"];
        If[numcells > 10,
          prgrsdia = CreateDialog[
            ProgressIndicator[Dynamic[N[cellcount/numcells]]],
            WindowTitle -> "Cell-Size-Progress"
          ]
        ];
        SelectionMove[nb,Before,Notebook];
        SelectionMove[nb,Next,Cell];
        cellcontent=NotebookRead[nb];
        While[cellcontent=!={},
          thisid=CurrentValue[NotebookSelection[nb],CellID];
          If[thisid==0,
            MessageDialog["found cell with not tag, aborting..."];
            Abort[]
          ];
          cellsizes={cellsizes,thisid->ByteCount@cellcontent};
          SelectionMove[nb,Next,Cell,AutoScroll->False];
          cellcontent=NotebookRead[nb];
          cellcount++;
        ];
        If[prgrsdia =!= Null, NotebookClose[prgrsdia]];
        Apply[
          Button[#2,NotebookFind[nb,#1,All,CellID],ImageSize->Scaled[1]]&,
          Reverse@SortBy[Flatten[cellsizes],Last],
          {1}
        ]
      ]
    ];
    Column[{
      Button["Create CellIDs",
        With[{nb=InputNotebook[]},
          SetOptions[nb,CreateCellID->True];
          SelectionMove[nb,All,Notebook];
          FrontEndTokenExecute[nb,"Cut"];
          FrontEndTokenExecute[nb,"Paste"];
        ],
        Method->"Queued"
      ],
      Button["Remove CellIDs",
        With[{nb=InputNotebook[]},
          SetOptions[nb,CreateCellID->False];
          NotebookPut[
            NotebookGet[nb] /. {
              Cell[x___,Verbatim[Rule][CellID,_],y___]:>Cell[x,y]
            },
            nb
          ];
        ],
        Method->"Queued"
      ],
      Row[{
        Button["Update",buttons=update[InputNotebook[]],Method->"Queued"],
        Button["+",numrows++],
        Button["-",numrows--]
      }],
      Dynamic[Column[Take[buttons,Clip[numrows,{1,Length[buttons]}]]]]
    }]
  ],
  WindowTitle->"Large Cells"
];

EDIT: I have added now a progress bar which will show the progress when updating the list of cell sizes. Note that the code which adds and removes the cell tags does copy the complete content of the notebook into kernel memory, which might fail for huge notebooks. If that would be the case, one could use an approach similar to the one which determines the cell sizes and add/remove tags cell by cell. While that would be more memory efficient, it might be much slower. To get a good balance one probably would need to use a chunked approach: select e.g. 10 cells and treat them in one go, then select and treat the next 10 cells etc. I have not tried if and how much that will speed things up and don't have time to do so...


The code below does not exactly what you have asked for, but it should contain the relevant stuff. To create an index ordered by cell sizes I think it would be easiest to set CreateCellID to True for your notebook. To create ids for the existing cells you'd have to "Cut" and "Paste" all of them once, new cells will automatically get an unique CellID. Then you could use code similar to the one below to collect a list of cell-ids and corresponding cell-sizes, order that list by cell-size and write a column of buttons which do a NotebookFind or NotebookLocate to select the given cell-id in your notebook.

The code following creates a palette which will add a cell-tag showing the size of each cell in Kilobytes. There is also a button which removes those cell tags and one which switches the "ShowCellTags" back to its default False:

CreatePalette[Column[{
   Button["Show Sizes",
    Module[{nb = InputNotebook[], cellcontent},
     SelectionMove[nb, Before, Notebook];
     SelectionMove[nb, Next, Cell];
     cellcontent = NotebookRead[nb];
     While[cellcontent =!= {},
      SetOptions[NotebookSelection[nb], CellTags -> Append[
         Select[
          Flatten[{CurrentValue[NotebookSelection[nb], CellTags]}],
          Not@StringMatchQ[#, "size:" ~~ DigitCharacter .. ~~ "kB"] &
          ],
         "size:" <> ToString[Round[ByteCount@cellcontent/10.^3]] <> 
          "kB"
         ]
       ];
      SelectionMove[nb, Next, Cell];
      cellcontent = NotebookRead[nb];
      ]
     ],
    Method -> "Queued"
    ],
   Button["Remove Size Tags",
    Module[{nb = InputNotebook[], cellcontent},
     SelectionMove[nb, Before, Notebook];
     SelectionMove[nb, Next, Cell];
     cellcontent = NotebookRead[nb];
     While[cellcontent =!= {},
      SetOptions[NotebookSelection[nb],
       CellTags -> Select[
         Flatten[{CurrentValue[NotebookSelection[nb], CellTags]}],
         Not@StringMatchQ[#, "size:" ~~ DigitCharacter .. ~~ "kB"] &
         ]
       ];
      SelectionMove[nb, Next, Cell];
      cellcontent = NotebookRead[nb];
      ]
     ],
    Method -> "Queued"
    ],
    Button["Hide Cell-Tags",
      CurrentValue[nb, ShowCellTags] = False,
      Method -> "Queued"
    ]
}], WindowTitle -> "Cell Sizes"]