Bad linear fit for simple data

Use PlotRange -> All. Most plot functions tend to throw away points that aren't nicely clustered with the bulk:

Show[{ListPlot[data, PlotRange -> All], Plot[ab, {x, 0, 2}, PlotStyle -> Red]}]

enter image description here

As you can see, there is a number of points that completely mess up the fit.


Maybe you could use RANSAC to find inliers by consensus. This implementation isn't exactly right but it finds a pretty decent fit:

samplesize = 30;
inliers[fit_, points_, d_] :=
 Select[points, Abs[#[[2]] - (fit /. x -> #[[1]])] < d &]
votes = Association[# -> 0 & /@ data];
Do[
  sample = RandomSample[data, samplesize];
  fit = Fit[sample, {1, x}, x];
  Scan[votes[#] += 1 &, inliers[fit, data, 0.05]];
  , 2000];
finalfit = Fit[Keys[TakeLargest[votes, samplesize]], {1, x}, x];
Show[{ListPlot[data], Plot[finalfit, {x, 0, 2}, PlotStyle -> Red]}, PlotRange -> All]

ransac fit


Use Quantile Regression:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MonadicProgramming/MonadicQuantileRegression.m"]

QRMonUnit[data]⟹
  QRMonQuantileRegressionFit[{1, x}]⟹
  QRMonLeastSquaresFit[{1, x}]⟹
  QRMonPlot;

enter image description here

(And, yes, that is a good example of Quantile Regression's robustness.)

Update

Instead of computing with the QRMon package utilized above, the computations can be done with the Wolfram Function Repository function QuantileRegression. That function uses B-splines, but if the fitting is made with one knot and interpolation order one then linear function fits are obtained.

probs = {0.25, 0.5, 0.75};
qFuncs = ResourceFunction["QuantileRegression"][data, 1, probs, InterpolationOrder -> 1];
Simplify[Through[qFuncs[x]]]
Show[{ListPlot[data, PlotStyle -> Gray, PlotRange -> All, ImageSize -> Large]},
 Plot[Evaluate[Through[qFuncs[x]]], {x, Min[data[[All, 1]]], 
   Max[data[[All, 1]]]}, PlotLegends -> probs, PlotTheme -> "Detailed"]] 

enter image description here

Tags:

Fitting