Smoothed Empirical Likelihood in R

Smoothed Empirical Likelihood R package announcement

GitHub repository: https://github.com/Fifis/smoothemplik

This package provides the following features.

  1. Fast kernel methods in C++. On machines with limited memory, round-off inevitably occurs, therefore, even infinite-support kernels, like Gaussian, will be evaluated in the tails to something that will not change the sum. Motivating example: Gaussian CDF, pnorm(8.3) - 1 == 0 is TRUE. All valid kernels are bounded in practice, and a quick check if X is in the kernel support saves CPU cycles. The highly efficient RcppArmadillo library speeds up vector handling, making it faster than in the vanilla C++.

  2. Memory saving in C++. Given the large size of modern data sets, even sparse matrices cannot handle kernel weight matrices due to memory limitations. On the other hand, smoothing for each separate observation introduces too much overhead. Hence, internal data chunking is required to carry out non-parametric estimation in reasonable time. The functions kernelWeights(), kernelDensity(), and kernelWeights() from smoothemplik never load more than 1 GB of matrix data into the memory.

  3. Parallelisation in C++. With each non-parametric job now taking less than 1 GB RAM, parallel processing of these manageable chunks through RcppParallel is feasible and efficient.

  4. Built-in de-duplication routines. Data pre-processing can reduce the number of operations required for non-parametric estimation. Many base R functions have the weights option to handle repeated observation via multiplying their contribution to the objective function. However, popualar packages, like np package does not support weights for smoothing. In smoothemplik weighted non-parametric have a fast pre-processing stage with data.table that counts duplicates in nearly linear time and uses weights to save quadratic kernel-matrix time.

  5. Efficient kernel estimation with mixed kernels. If an exogenous variable is discrete, then, is forms zero product kernel weights for pairs with different values of this variable. This creates a block kernel-weight matrix structure that facilitate blockwise parallel processing and reduce memory usage. As a result, non-parametric estimation with mixed kernels takes less time and memory.

  6. Fast and reliable univariate EL. Art B. Owen (2017)provides an R function of SEL with counts where the lambda search maximises \(\log(1+\lambda'Z_i)\) via a quasi-Newton method. In the univariate case, a more efficient bisection method (Brent) searches for the root (zero) of its derivative on a bracketed interval, allowing for faster convergence and preliminary checks. The function weightedEL() from smoothemplik checks the spanning condition to stop immediately instead of waiting until reaching iteration limit. In many cases, this bracketed root search is more than 10 times faster than the quasi-Newton maximisation for univariate weighted EL.

  7. SEL in the presence of discrete variables. This package proves a function to efficiently compute SEL for block-wise weight matrices. It saves memory and enables parallelisation.

Other non-parametric procedures, like the Sieve Minimum Distance (SMD) of Ai & Chen (2003), can also benefit from de-duplication and optimised kernel routines.

In overall, this package makes SEL a computationally feasible method that shares such desirable properties as internal studentisation and no need for variance estimation with generalised empirical likelihood (GEL).