Derivative and calculus over sets such as the rational numbers

It really looks to me that you're after the concept of a generalized function which allow you to manipulate things that are decidedly not functions in a way that really looks like they are functions. The basic premise of a generalized function is that you know how they're supposed to integrate against nice functions, but they clearly are not functions themselves - which is exactly the path you seem to be going down. You can also think of them as what happens if you keep differentiating a continuous function - much in the same way that "a measure" is a reasonably good answer to the question of what happens if you differentiate an increasing function.

More formally, to define a generalized function, we start by defining some functions which are very well behaved, known as test functions. A function $f:\mathbb R\rightarrow\mathbb R$ is a test function if and only if it is smooth and compactly supported. The set of test functions is known as $T$. We also put a topology on $T$ by the rule that $f_1,f_2,\ldots$ converges to $f$ if and only if, for each $n$, the $n^{th}$ derivatives of this sequence converge uniformly to the $n^{th}$ derivative of $f$.

A generalized function is then just a continuous map from $T$ to $\mathbb R$. The idea is that we would associate a continuous function $g:\mathbb R\rightarrow\mathbb R$ to the map $$f\mapsto \int f(x)g(x)\,dx.$$ More generally, we can associate to each measure $\mu$ the generalized function $$f\mapsto \int f(x)\,d\mu.$$ The really neat thing about this definition is that we can use tricks to start to reason about distributions the same way we would reason about functions; let me focus on your second example as it's a nice example of how this reasoning works out without too much difficulty.

So, first, we would like to define the derivative of a generalized function $g$. You can either think about this either as a convenient extension of the map $g\mapsto g'$ from the subset of generalized functions which are actually differentiable to the entire set, or as, noting that differentiation is linear, taking the transpose of the map that takes $g\mapsto g'$ in the set of test functions taken as an inner product space - or you can just think about integration by parts as justification. In any case, you should come across the following formula (where we now abuse the integral sign $\int g' \cdot f$ to formally mean $g'(f)$ - but avoid this notation because it is more confusing than abusing notation!): $$\int g'\cdot f = -\int g\cdot f'.$$ This is, of course, actually true if $g$ and $f$ are differentiable and their product is compactly supported - being a consequence of integration by parts - and one can (and should) use it to differentiate distributions in general.

You can also define compositions of distributions with nice functions in a similar manner; let's stick with linear functions for simplicity. In particular, if $g(x)$ is a distribution, lets define $g(2x)$ in the same way as before (abusing notation even more now - just remember that $g(x)$ and $g(2x)$ are not truly functions, but symbols of their own right): $$\int g(2x)\cdot f(x)\,dx = \frac{1}2\int g(x)\cdot f\left(\frac{x}2\right)\,dx$$ which is essentially the result of a $u$-substitution. Similarly, we could define $$\int g(2x-1)\cdot f(x)\,dx = \frac{1}2\int g(x)\cdot f\left(\frac{x+1}2\right)\,dx.$$

So, even though we can't plot these things (though, if we're at least still within the realm of measure theory - which these density functions absolutely are - we can make histograms which look like what you've done), we can reason about them by analogies to functions that are rooted formally in a theory that will let us extract results via integration later. In particular, our theory lets us, without reservation, differentiate any increasing function - and thus fills in what your values $f_Z$ have to be.

So, let's let $G_Z$ be the cumulative distribution given by randomly selecting the bits of a number in $[0,1]$ from a Bernoulli distribution with parameter $p$. While we could notice some properties of $G_Z$ directly and infer things from differentiation, let's instead look at some more interesting reasoning directly on $g_Z$ enabled by this.

So, as a first step, let's let $Z_n$ be a discrete random variable given as a base two value $0.b_1b_2b_3\ldots b_n$ of bits chosen from the given Bernoulli distribution. Then, $g_{Z_n}$ will represent our probability mass function and we will take it to satisfy: $$\int g_{Z_n}(x) \cdot f(x) = \mathbb E[f(Z_n)].$$ This is, of course, just summing up some of the values of $f$. However, it is helpful, because to produce the distribution of $Z_{n+1}$, all we need to do is randomly pick the first bit, then add half of a value chosen as in $Z_n$. This works out to the following equation: $$g_{Z_{n+1}}(x) = 2(1-p)g_{Z_n}(2x) + 2p\cdot g_{Z_n}(2x-1).$$ which can be explicitly verified in terms of the expectations, if desired. However, the final distribution you want ought to be (and is, in pretty much any topology one might put on the generalized functions) is the limit of $g_{Z_n}$ as $n$ goes to $\infty$ - and the operations here are continuous, so we will get $$g_{Z}(x) = 2(1-p)\cdot g_Z(2x) + 2p\cdot g_Z(2x-1)$$ which is exactly the sort of algebraic equation that would lead to the sort of structures you are seeing - and gives you the iterative formula that leads to the sort of density plots you've made. Of course, as a measure, this is supported on the set of numbers $x$ in $[0,1]$ where the asymptotic density of the set of indices where the corresponding bit of $x$ in binary is $1$ equals $p$ - so can't be converted to a function (even via tools like the Radon-Nikodym derivative which converts from measures to functions where possible) - but, nonetheless, these generalized functions can still provide some of the framework you seem to be after.


As an aside, you can work with measures this way too; if you let $\mu$ be the probability measure associated to the process we've been discussing, you can write $$\mu(S) = p\mu\left(\frac{S}2\right) + (1-p)\mu\left(\frac{S+1}2\right)$$ which has the same structure. If you're happy to work with measures and don't want to try differentiating the measure any further, then they're a good answer to your search as well.


I might note that these absolutely cannot see the sort of holes that you are talking about - but every finite measure on $\mathbb R$ is supported on a totally disconnected set (because only countably many points can have positive finite mass, and any countable set of points not including such a point has no measure). Generally, with probability, the "blurring" effect of using smooth test functions is actually desirable, because the measure of an interval is not the sum of the measures of its points. This is also why one rarely every works with functions when dealing with measures, but instead works with $L^p$ spaces (whose members are not functions, but rather almost-everywhere equivalence classes of functions).

To say this more generally: a function is defined by the fact that it can be evaluated at any point. This is not really something you desire when talking about probability distributions, because the quantity is too spread out to register evaluation. A measure, more usefully, is defined by the fact that it can integrate against continuous functions (e.g. as in the Riesz representation theorem) and then a distribution is a generalization which can integrate against smooth functions. The latter two objects tend to be more useful in these situations.


Measure-theoretic probability provides a formalism in which all quantities in your question are rigorously defined. In my opinion this is the natural framework to use (instead of introducing an ad-hoc approach) since you have already invoked measure theory in your question!

Even better, there is substantial (and beautiful) literature studying the exact same fractal systems you are interested in - read to the bottom of this answer (skimming the pedantic bits you probably already know) to find out more.

TL;DR The answer to your question

Is there an existing theory to handle this type of density-like-substance?

is a resounding Yes!


Some background

In measure-theoretic probability, one starts with a sample space $(\Omega,\mathcal F)$ which is an arbitrary set equipped with a $\sigma$-algebra. In this case, it is natural to take $\Omega=\{0,1\}^{\mathbb N}$ and use the product $\sigma$-algebra. Equip $(\Omega,\mathcal F)$ with the measure $\mathbb P_p=\textrm{Ber}_p^{\mathbb N}$ which is the product of $p$-Bernoulli measures as you have described.

In this formalism, a random variable is nothing other than a measurable function $f\colon \Omega\to\mathbb R$ where $\mathbb R$ is equipped with the Borel $\sigma$-algebra. In particular, we have a random variable given by $$ Z(\omega_1,\omega_2,\ldots)=\sum_{n=1}^{\infty}\frac{\omega_n}{2^n}. $$ It is a measurable function since it is a limit of measurable functions (namely, each partial sum is straightforwardly seen to be measurable).

Since $\mathbb P_p(0\leq Z\leq 1)=1$, the random variable $Z$ is seen to be integrable, so that the mean $\mathbb EZ$ is well-defined. By the same token, $\mathbb E[Z^k]$ is well-defined for all integers $k\geq 0$.

This is not just abstract nonsense: it leads to an operational calculus of the kind you seek. For instance, we have the well-known tail integral for moments of a random variable given by $$ \mathbb E[Z^k]=\int_0^{\infty}kZ^{k-1}\mathbb P_p(Z>t)\ dt,\qquad k\in\mathbb N, $$ which is a special case of Tonelli's theorem.

The distribution of the random variable $Z$, let's call it $\mu_p$, is defined to be the pushforward of the measure $\mathbb P_p$ under the mapping $Z\colon\Omega\to\mathbb R$. In symbols, $$\mu_p(A)=\mathbb P_p\bigl(Z^{-1}(A)\bigr),$$for all Borel subsets $A$ of $\mathbb R$. It is a probability measure on $\mathbb R$, with support contained in $[0,1]$. As you have pointed out, the density does not exist in general. In fact, by the Radon-Nikodym theorem it exists if and only if the measure $\mu_p$ is absolutely continuous with respect to Lebesgue measure, in which case the density is precisely the Radon-Nikodym derivative of $\mu_p$ with respect to the Lebesgue measure. The power of measure-theoretic probability is that all techniques and formulas that one is accustomed to working with carry over to the language of probability measures and random variables, ceasing to rely upon the existence of a density. (Of course, complications arise and a little adjustment is required - but this forms the core of the modern probabilist's arsenal of tools.) Thus one has the Lebesgue integral $$ \mathbb E[Z^k]=\int_0^{\infty}t^{k}\ d\mu_p(t),\qquad k\in\mathbb N, $$ regardless of whether $\mu_p$ possesses a density.


Modern research

With all this being said, it is still a very interesting question (related to open lines of current research) to understand when the density exists, and what are its fractal properties. I refer you to the survey paper by Peres, Schlag, and Solomyak entitled "Sixty Years of Bernoulli Convolutions", where a closely related generalization of your $p=\tfrac12$ case is considered, namely to the random variable $$ Z_\lambda(\omega_1,\omega_2,\ldots)=\sum_{n=1}^{\infty}\omega_n\lambda^n,\qquad \lambda\in[0,1]. $$ The distribution $\nu_\lambda$ of the random variable $Z_\lambda$ has fascinating properties when $\lambda\in(\tfrac12,1)$, and the study of these properties leads to beautiful connections with "harmonic analysis, the theory of algebraic numbers, dynamical systems, and Hausdorff dimension estimation" (direct quote from the abstract).

The fractal nature of these measures is a prominent motif throughout this line of inquiry, and appears at the forefront in the article Self-similar measures and intersections of Cantor sets written by a subset of the authors of the survey paper, which goes beyond the $p=\tfrac12$ case to study the measure $\nu_{\lambda}^{p}:=(Z_\lambda)_*(\mathbb P_p)$, of which the special case $\lambda=\tfrac12$ is the distribution of your random variable $Z$.

The aforementioned survey article is featured prominently in the proceedings of an international conference on fractal geometry, collected in the (unfortunately non-freely available) book Fractal Geometry and Stochastics II, which serves as a jumping off point to the literature on dynamical systems leading to fractal geometries closesly related to your list of "other related problems" as well. Note that many of the individual articles collected in this book (and its bibliography) are freely available at either the authors' webpages or on preprint servers.


Broader picture

To make contact with the answer posted by Milo here, let me point out that the theory of measures may be regarded as a special case of the theory of generalized functions - i.e., every measure is a generalized function with respect to the class of bounded continuous test functions (with the Lebesgue integral supplying the pairing between measure and test function).

My perspective is that measure theory is (broadly speaking) a more developed field than that of generalized functions, for the same reason that continuous functions are better understood than measures. What I mean is that one can group the objects of study in real analysis by how "well-behaved" the objects are. After linear functions, the best behaved functions are polynomials, after which comes (in order of increasing generality) analytic functions, then smooth functions, then continuous functions, then measurable functions, then measures, and finally generalized functions. (Everything function $f$ that appears before measures in this list can be canonically associated with the measure $f\cdot \textrm{Lebesgue}$; and measures can be regarded as generalized functions as already explained - so everything is on the same footing, just with varying degrees of regularity.)

Fortunately the problems you are interested in (and all the different flavors of dynamical systems, including ergodic theory) live at (or below) the "measure theory" rung on the ladder of abstractions.