Correction: evaluation of Matérn 3/2 GP density in Stan using the SDE representation takes quadratic time!

Summary: The method proposed in has quadratic time complexity (per evaluation of density+gradient) due to the gradient evaluation. This explains why the speedup seen in the experiment was only proportional to the number of input points (consistent with the difference between cubic and quadratic time complexity), not to the square of the number of input points as I expected. NB: I have not examined the code Stan produces, the possibility remains that the time-complexity is even higher due to something I have missed...

1   Correction

The idea of the method I presented in was that the log-density can be evaluated using the decomposition

\begin{equation*} \log p(x_{1:T} \mid \sigma^2,l,\mu) = \sum_{k=1}^T \log p(x_k \mid x_{1:k-1}, \sigma^2, l,\mu). \end{equation*}

where each component of the right hand side takes constant time due to the SDE representation of the Matérn 3/2 process enabling a Kalman filter type iteration. That is, each component is \(p(x_k \mid x_{1:k-1}) = \mathrm{N}(x_k\mid m_k,P_k)\), where \(m_k\) and \(P_k\) come from the Kalman filter. The log-density can then be evaluated in linear time. However, since the terms \(m_k\) and \(P_k\) depend on all of \(x_1,x_2,\ldots,x_{k-1}\), differentiating the decomposition with respect to all parameters leads to computing terms of the form

\begin{equation*} \frac{\partial}{\partial x_j} \log p(x_i \mid x_{1:i-1}, \sigma^2, l, \mu) \end{equation*}

for all \(j\leq i\). The number of such pairs is quadratic in the number of datapoints.

When writing the original post, my intuition was that the gradient evaluation would be linear, too, as the so called sensitivity equations [1] obtained by differentiating the Kalman filter equations enable evaluating the derivative of the log-likelihood with respect to parameters in linear time. However, the present case has the crucial difference that the parameters \(x_k\) of the Stan model are measurements in the Kalman filter analogy. The classical sensitivity equations (that would be linear-time) produce only the derivatives with respect to static parameters, not with respect to the measurements.

2   References

  1. N. K. Gupta and R. K. Mehra. Computational aspects of maximum likelihood estimation and reduction in sensitivity function calculations. IEEE Transactions on Automatic Control, 19(6):774–783, 1974. doi: 10.1109/TAC.1974.1100714.