Best way to convert numerical columns of a dataset with missing values into a matrix

It's not entirely clear if you want to replace the rows that contain a Missing or if you want to leave them out. When you want to leave them out, the very verbose way would be to write it out like this:

Query[
  Select[!MissingQ[#"passengerAge"] && ! MissingQ[#"passengerWeight"] &], 
  {#"passengerAge", #"passengerWeight"} &
][ds]

and then normalizing.

Mathematica graphics

Query is not strictly necessary, but due to the good advertisement of WReach, I learned to prefer it. This is possible too

ds[Select[! MissingQ[#"passengerAge"] && !MissingQ[#"passengerWeight"] &],
 {#"passengerAge", #"passengerWeight"} &]

Edit

Thank you for your answer! Preferably, I would like the Missing to be replaced by 0's.

Then probably:

Query[ReplaceAll[Missing[] -> 0], {#"passengerAge", #"passengerWeight"} &][ds]

This works:

Block[{Missing},
 Missing[] = 0;
 ds[All, Select[NumberQ]][Values]]

enter image description here


ds[associationIndex][Transpose][
  GroupBy[Query[DeleteMissing /* Query[Apply[And], NumericQ]]] /* 
   Query[{Key[True] -> Query[All, All, Replace[_Missing -> 0.]]}] /* 
   Values /* Apply[Join]][Transpose]

enter image description here

Where the utility associationIndex is one of dozens one-liners included in the API of my forthcoming book "Functional Data Workflow".

associationIndex[a_Association]:=Query[Normal/*MapIndexed[First[#2]->#1&]/*Association][a]

associationIndex[l_List]:=Query[MapIndexed[First[#2]->#1&]/*Association][l]

Without it, after the Apply[Join] you'd have to thread keys back in - note the use of bracketing Transpose. Such commutation is a common pattern, (btw Transpose will also impute Missing values to normalize ragged data.) - but careful as the op-form is buggy (I avoided RightComposition)

You might want to mod the logic below to suit various applications:

Query[DeleteMissing /* Query[Apply[And], NumericQ]]

If you just want the numeric matrix, there's no need for the Join:

ds[associationIndex][Transpose][
  GroupBy[Query[DeleteMissing /* Query[Apply[And], NumericQ]]] /* 
   Key[True] /* 
   Query[Values, Values, Replace[_Missing -> 0.]]][Transpose]

{{36.5, 186.5}, {9., 90.}, {35., 0.}, {60., 160.}, {23., 0.}, {22., 122.}, {0., 120}, {0., 0.}, {17., 170.}}

Tags:

Dataset

Matrix