Error using geom_density_2d() in R : Computation failed in `stat_density2d()`: bandwidths must be strictly positive

So the real problem is that the S1.x and S1.y values only have one non-zero value in their columns. And it turns out that geom_density_2d can't really estimate a density with only a value or two. But read on...

Update:

This question has been asked before, and the answers are usually that you need to have non-zero variance in your data columns. But you do have non-zero variance, so why isn't it working?

  • Looking at the internals of geom_density_2d we see that it uses the MASS::kde2d package function to calculate the distribution.
  • Looking at kde2d we see that it uses MASS::bandwidth.nrd(df$x) to get an estimate of the bandwidth.
  • Looking at the help (which has the code) for bandwidth.nrd we see it uses a rule of thumb that gets the quantile of the distribution, and subtracts the 2nd quantile from the 1st quantile to get a bandwidth estimate.
  • Doing a quantile on your original data we see that the quantiles of the data were zero.
  • And running MASS::kde2d on your original data with that bandwidth.nrd estimate of the bandwidth gives you the same error:
library(MASS)
nn <- c("DQ459412","DQ459413","DQ459415","DQ459418","DQ459419","DQ459420")
s1x <- c(0,1.584963,0,0,0,0)
s1y <- c(0,2.358379,0,0,0,0) 
s2x <- c(0,4.392317,0,0,4,0)
s2y <- c(0,3.085722,0,0,2.891544,0) 
df <- data.frame(transcriptID=nn,S1.x=s1x,S1.y=s1y,S2.x=s2x,S2.y=s2y)
> quantile(df$s1x)
      0%      25%      50%      75%     100% 
0.000000 0.000000 0.000000 0.000000 1.584963 
> quantile(df$s1y)
      0%      25%      50%      75%     100% 
0.000000 0.000000 0.000000 0.000000 2.358379 
h <- c(MASS::bandwidth.nrd(df$x), MASS::bandwidth.nrd(df$y))
dens <- MASS::kde2d(df$s1x, df$s1y, h = h, n = n,  lims = c(0,1,0,1))

Error in MASS::kde2d(df$s1x, df$s1y, h = h, n = n, lims = c(0, 1, 0, 1)) : bandwidths must be strictly positive

So the real criteria for using geom_density_2D is that both the x- and the y-data needs to have a non-zero gap between their 1st and 2nd quantiles.

Now to fix it, if I make a small modification - replacing one of the zeros with 0.1, like this:

nn <- c("DQ459412","DQ459413","DQ459415","DQ459418","DQ459419","DQ459420")
s1x <- c(0,1.584963,0,0,0.1,0)
s1y <- c(0,2.358379,0,0,0.1,0) 
s2x <- c(0,4.392317,0,0,4,0)
s2y <- c(0,3.085722,0,0,2.891544,0) 
df <- data.frame(transcriptID=nn,S1.x=s1x,S1.y=s1y,S2.x=s2x,S2.y=s2y)
print(df)

yielding:

  transcriptID     S1.x     S1.y     S2.x     S2.y
1     DQ459412 0.000000 0.000000 0.000000 0.000000
2     DQ459413 1.584963 2.358379 4.392317 3.085722
3     DQ459415 0.000000 0.000000 0.000000 0.000000
4     DQ459418 0.000000 0.000000 0.000000 0.000000
5     DQ459419 0.100000 0.100000 4.000000 2.891544
6     DQ459420 0.000000 0.000000 0.000000 0.000000

Then I get this plot instead of your error.

enter image description here You can let that 0.1 value approach zero, eventually it will not be able to calculate a distribution anymore and you will get your error again.

One general way to deal with this situation is to add a very small quantity of noise to your data, kind of simulating the fact that any meaningful calculation based on a real measurement from a continuous distribution should be impervious to that small quantity of noise.

Hope that helps.


The answer of @Mike Wise is pretty solid indeed and my answer is somewhat complementary to it. Actually, the bandwidth.nrd function computes the difference between the 3rd and the 1st quantile not the 2nd and 1st (code from the function):

r <- quantile(distances, c(0.25, 0.75))

Instead of adding random noise to your data, I would suggest to precompute the bandwidths yourself and pass them to the function, testing for non-zero values like so:

kde2d(df$s1x, df$s1y, 
      h = c(ifelse(bandwidth.nrd(df$s1x) == 0, 0.1, bandwidth.nrd(df$s1x)),
            ifelse(bandwidth.nrd(df$s1y) == 0, 0.1, bandwidth.nrd(df$s1y))))

Hope this helps.

Tags:

R

Ggplot2