What is the most scientific way to assign weights to historical data?

There are several ways to do that, depending on your goals. You can pick a few examples from any website that reports economic data, for example stock quotes history. Here are most popular methods:

Method 1: Perhaps the most well-behaved function would be exponential, as Rahul suggested. That means that you pick some number $a<1$, and use geometric progression $1,a,a^2,a^3,…$ to assign weights to each year. The total sum would be $1\over{1-a}$, so you if you pick $a$ you normalize weight to $\frac{1}{1-a}, \frac{a}{1-a}, \frac{a^2}{1-a}, …$ if you want all weights to add up to $1$.

Method 2: If you want to give equal weights to the most recent $n$ years and $0$ to earlier ones you can do what economists call "running average", weights $\frac{1}{n}$ to most recent $n$ years. This function is popular as well, but not as well-behaved as exponential. I guess this answers the 2nd question as well.

Method 3: Some engineers prefer sigmoid. This is an analytic function that pretends to assign equal weights to most recent data, then quickly recedes to $0$, but without the discontinuous step of the running average.

Other methods: In some circumstances physics or probability dictate other distributions. For example, if signal propagation is Gaussian (which happens often enough in physics) then the only relevant choice is erfc.

About justification: yes, there are many reasons to put more emphasis on more recent data and assign equal weight $0$ to older one. For example, older economic data may be irrelevant to projections. Or you are modeling some other effects that naturally decay in time.


Michael's ideas sound great. If you are able to frame your problem as a prediction problem, there is an empirical way to make this choice between Michael's suggestions.

In the context of machine learning, we are talking about solving a supervised learning problem where the optimal choice can be done using cross validation, given sufficient historical data. E.g. the method (1, 2, or 3) and its parameters (e.g. $n$ years) can be selected such that expected predictive performance is maximized on a holdout set. Be careful so as to evaluate performance on future holdout windows. "Time series cross validation" is easy to do wrong.