Polynomial Representation of K-dimensional Data¶
The resampler allows polynomial order to vary by dimension, represented by \(o_{k}\) for dimension \(k\) in \(K\) dimensions. Let the polynomial approximation at coordinate \(x\) be represented as
\[f(x) = \sum_{p_{1}=0}^{o_{1}} \sum_{p_{2}=0}^{o_{2}} ... \sum_{p_{K}=0}^{o_{K}} \lambda_{(p_{1},p_{2},...,p_{K})} a_{(p_{1},p_{2},...,p_{K})} x_{1}^{p_{1}} x_{2}^{p_{2}} ... x_{K}^{p_{K}}\]
where \(\lambda_{(p_{1},p_{2},...,p_{K})} \rightarrow \{0,1\}\) defining redundancy where
\begin{eqnarray} \lambda_{(p_{1},p_{2},...,p_{K})} = \Biggl \lbrace{ 1, \text{ if } \sum_{k=1}^{K}{p_k} \leq max(o) \atop 0, \text{ otherwise } } \end{eqnarray}
Now define a the redundancy set \(s\) so that \(f(x)\) may be represented by the sum over \(S\) sets when \(\lambda=1\).
\begin{eqnarray} s &=& \{ (p_{1}, p_{2},...,p_{K}) \; | \; \lambda_{(p_{1}, p_{2},...,p_{K})} = 1 \} \\ c_{m} &=& a_{s_{m}} \\ \Phi_{m} &=& \prod_{k=1}^{K}{x_{k}^{s_{(m, k)}}}\\ f(x) &=& \sum_{m=1}^{S}{c_{m} \Phi_{m}} \end{eqnarray}
The set \(s\) is in lexographic order such that \({\{0, 1, 1\}}\) is before \({\{1, 0, 0\}}\) (for \(K=3\)). For example, when \(K=2\) and the polynomial order is 2 in all dimensions:
\begin{eqnarray} & s_{(p=2,K=2)} & = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (2, 0)] \\ & \Phi & = [1, x_{2}, {x_{2}}^{2}, x_{1}, x_{1}x_{2}, {x_{1}}^{2} ] \\ \therefore & f(x) & = c_{1} + c_{2}x_{2} + c_{3}{x_{2}}^2 + c_{4}x_{1} + c_{5}x_{1}x_{2} + c_{6}{x_{1}}^{2} \end{eqnarray}
Likewise, for 3-dimensional (\(K=3\)) data where \(p_{1}=1\), \(p_{2}=2\), and \(p_{3}=3\):
\begin{eqnarray} s_{(p=[1,2,3], K=3)} = [ &(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 0, 3), (0, 1, 0), (0, 1, 1), (0, 1, 2), \\ &(0, 2, 0), (0, 2, 1), (1, 0, 1), (1, 0, 2), (1, 1, 0), (1, 1, 1), (1, 2, 0) \;] \end{eqnarray}\begin{eqnarray} \therefore f(x) = &c_{1} + c_{2}x_{3} + c_{3}{x_{3}}^{2} + c_{4}{x_{3}}^{3} + c_{5}x_{2} + c_{6}x_{2}x_{3} + &c_{7}x_{2}{x_{3}}^2 + \\ &c_{8}{x_{2}}^{2} + c_{9}{x_{2}}^{2}x_{3} + c_{10}x_{1}x_{3} + c_{11}x_{1}{x_{3}}^2 + c_{12}x_{1}x_{2} + &c_{13}x_{1}x_{2}x_{3} + c_{14}x_{1}{x_{2}}^2 \end{eqnarray}
Polynomial Resampling¶
When resampling, we wish to evaluate \(f(x)\) at coordinate \(v\). To derive \(f(x)\), consider a set of \(N\) samples contained within an ellipsoid hyper-surface (the window region set \(\Omega\)) centered about \(v\).
The preliminary step is to first define \(s(o)\) which for each sample \(i\), allows us to define the set \(\Phi_{i}\). This is then used to build the design matrix \(\Phi\) of \(N\) rows by \(S\) columns. \(c\) may then be found by finding the least-squares solution of
\[\Phi c + \varepsilon = y\]
such that \(y_{i}\) is the value of sample \(i\), with an associated error of \(\varepsilon_{i}\) or,
\[\begin{split}\begin{bmatrix} 1 & \Phi_{(1,2)} & \Phi_{(1,3)} & \cdots & \Phi_{(1,S)} \\ 1 & \Phi_{(2,2)} & \Phi_{(2,3)} & \cdots & \Phi_{(2,S)} \\ 1 & \Phi_{(3,2)} & \Phi_{(3,3)} & \cdots & \Phi_{(3,S)} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \Phi_{(N,2)} & \Phi_{(N,3)} & \cdots & \Phi_{(N,S)} \end{bmatrix} \begin{bmatrix} c_{1} \\ c_{2} \\ \vdots \\ c_{S} \end{bmatrix} + \begin{bmatrix} \varepsilon_{1} \\ \varepsilon_{2} \\ \varepsilon_{3} \\ \vdots \\ \varepsilon_{N} \end{bmatrix} = \begin{bmatrix} y_{1} \\ y_{2} \\ y_{3} \\ \vdots \\ y_{N} \end{bmatrix}\end{split}\]
Where \(\Phi_{(i, m)}\) is the \(m^{th}\) element of set \(\Phi\) for sample \(i\).
Using least-squares, the estimated values for \(c\) is given by:
\[\hat{c} = (\Phi^{T} \Phi)^{-1} \Phi^{T} y\]
If performing a weighted fit, \(\hat{c}\) is given by:
\[\hat{c} = (\Phi^{T}W \Phi)^{-1} \Phi^{T} W y\]
where \(W\) is an \((N, N)\) diagonal weight matrix such that \(W_{i,i}\) contains the assigned weighting for sample \(i\).
Once \(\hat{c}\) is known, we can then evaluate \(f(v)\) as:
\begin{eqnarray} &\Phi_{m}^{v} & = \prod_{k=1}^{K}{v_{k}^{s_{(m, k)}}} \\ &f(v) & = \sum_{m=1}^{S}{\hat{c}_{m} \Phi_{m}^{v}} \end{eqnarray}
Implementation¶
Terminology¶
In following sections, the samples at coordinates \(x\) will be referred to as samples. The points at which we wish to derive \(f(v)\) will be referred to as points. For any set \(A\), \(|A|\) represents the number of members within \(A\) which by definition will all be unique.
Unit Conversion and the Window Parameter¶
The window parameter \(\omega\) defines the semi-principle axes of a hyper-ellipsoid such that samples \(\Omega_{j}\) are said to be within the window region of point \(j\) if:
\[\sum_{k=1}^{K}{\frac{(x_{k}^{\prime} - v_{k, j}^{\prime})^{2}} {\omega_{k}^{2}}} \leq 1\]
Where \(x^{\prime}\) and \(v^{\prime}\) are coordinates before any unit conversion has been applied. \(\omega\) should be supplied in the same units as \(x^{\prime}\) and \(v^{\prime}\) for each dimension.
Coordinates are then converted into units of \(\omega\) using:
\begin{eqnarray} x &= \{ \frac{x_{1}^{\prime} - min(x_{1}^{\prime})}{\omega_{1}}, \frac{x_{2}^{\prime} - min(x_{2}^{\prime})}{\omega_{2}}, ..., \frac{x_{K}^{\prime} - min(x_{K}^{\prime})}{\omega_{K}}, \} \\ v &= \{ \frac{v_{1}^{\prime} - min(x_{1}^{\prime})}{\omega_{1}}, \frac{v_{2}^{\prime} - min(x_{2}^{\prime})}{\omega_{2}}, ..., \frac{v_{K}^{\prime} - min(x_{K}^{\prime})}{\omega_{K}}, \} \end{eqnarray}
We can then define the set of samples within the window region of point \(j\) as
\begin{eqnarray} & \Omega_{j} = & \{ i \; | \; \left\| x_{i} - v_{j} \right\|_2 \leq 1 \; | \; i \in \mathbb{Z^{+}} \leq N \} \\ & N_{j} = & | \Omega_{j} | \end{eqnarray}
At this point, \(\Phi\) and \(\Phi^{v}\) are also calculated for all samples and points. This is a fast operation, but does change the memory requirements from \(O(K)\) to \(O(S)\) so that in most cases, memory requirements are increased. Generally though, one expects low order (\(p \leq 3\)) polynomials to be used such that calculating \(\Phi\) at this stage does not introduce significant memory issues. The alternative would be to calculate \(\Phi\) later, for all samples within the window region of a resampling point leading to many duplicate calculations in situations where the same sample is within the window of multiple points.
The Search Problem¶
Finding each \(\Omega_{j}\) for all \(M\) samples using the standard brute force method of looping through all \(N\) samples comes at the heavy computational cost of \(O(MN)\).
This problem is overcome in two stages. The first stage breaks up the full extent of sample and point space into blocks (hypercubes in \(K\) dimensions) where the length of each side of a block is 1 following unit conversion.
The search problem may then be constrained by recognizing that for each point within a block, one should only consider membership of samples to that point’s window region from within that same block and neighboring blocks (direct and diagonal). Block membership \(\square\) for sample coordinates \(x\) or point coordinates \(v\) is simply defined by:
\begin{eqnarray} & \square(x) = & \lfloor x \rfloor \\ & \square(v) = & \lfloor v \rfloor \end{eqnarray}
The block itself and all neighboring blocks are referred to as a neighborhood such that for block \(l\), the neighborhood is defined as:
\[\square_{b}^{hood} = \square_{b} + (-1, 0, 1)^K\]
where \((-1, 0, 1)^K\) indicates all permutations of (-1, 0, 1) over \(K\) dimensions. The algorithm creates two sparse matrices for samples and points where each row \(b\) is the block index, and sample or point indices are stored in the columns. This allows fast access to all samples and points contained within each block. A block will be removed from any further processing if the point block population \(| \square_{b}(v) | = 0\) or the sample neighborhood population \(| \square_{b}^{hood}(x) | = 0\).
Once all valid blocks have been identified, the algorithm may proceed to process these blocks in parallel. Having narrowed down the number of samples to search through for each point in a block to a local neighborhood, euclidean distances from each point within the block to every sample within the local neighborhood must be derived in order to find \(\Omega\). There are multiple ways this could be accomplished, but for this implementation, a \(K\) dimensional ball tree is used to directly derive all \(\Omega_{j \in \square_{b}(v)}\) efficiently.
Block Processing¶
For a single block \(b\), a value for \(f(v_{j})\) is derived in series for each \(j \in \square_{b}(v)\). However, in order to calculate \(f(v_{j})\) we must first ensure that \(\hat{c}_{j}\) is solvable via least-squares based on the sample distribution of \(\Omega_{j}\). Let us define the vector \(x(j)\) such that
\[x(j) = x_{i} \; | \; i \in \Omega_{j}\]
and \(x(j)_{k}\) are the coordinates of vector \(x(j)\) along the \(k^{th}\) dimension.
Sample Distribution¶
There are three methods available to check whether an attempt to solve \(\hat{c}_{j}\) should be made, based on sample distribution. The most basic method (“counts”) is to require that
\[| \Omega_{j} | > \prod_{k=1}^{K}{(o_{k} + 1)}\]
That is, the number of samples within the window region is greater than that required to derive a solution assuming zero redundancy (\(\lambda_{(p_{1},p_{2},...,p_{K})} \equiv 1\)). This can be dangerous as we are only placing a limit on the number of samples, not how they are distributed. For example, if the samples are co-linear in any dimension, a solution is not possible. Therefore, while fast, “counts” should only be used when it is assured that samples are uniformly distributed (e.g. an image) and \(\omega\) is sufficiently large.
The second method, “extrapolate” is robust, but does not guarantee that \(v_{j}\) is bounded by \(x(j)\). As such, it is possible that any solution may deviate significantly, especially for higher order polynomials. The “extrapolate” method requirement is
\[| \{ x(j){k} \} | > o_{k} + 1, \forall k = [1, 2, ..., K]\]
or that in each dimension, there must be enough uniquely valued coordinates \(x\) to solve for \(\hat{c}\) given zero redundancy. There may be circumstances in which the user wishes to attain extrapolated values of a polynomial fit. For example, when deriving \(f(v)\) near the edge of an image.
Finally, “edges” (default) is the most robust of the three methods, similar to “extrapolate” with the additional requirement that \(v_{j}\) is bounded by \(x(j)\):
\[min(| \{ x(j)_{k} < v_{k} \} |, | \{ x(j)_{k} > v_{k} \} |) > o_{k} + 1, \; \forall k = [1, 2, ..., K]\]
or that in each dimension there are more than \(o + 1\) uniquely valued sample coordinates to the “left” and “right” of \(v\).
If a sample distribution fails the above check, no value will be derived for \(f(v_{j})\). However, in certain cases when the polynomial order is symmetrical across all dimensions (\(|\{p\}| = 1\)), the user may allow \(p\) to decrease until the condition is met. While this is theoretically possible for asymmetric polynomial orders, doing so would require recalculating \(\Phi\) with potentially significant overhead.
Edge Clipping¶
An additional check may be performed on the sample distribution \(x(j)\) based on how close \(v_{j}\) is to the “edge” of \(x(j)\). This edge is defined by the parameter \(\epsilon_{k} \; | \; k = [1, 2, ..., K]\) where \(0 < \epsilon_{k} < 1\) and one of the three definitions of an edge.
The “range” definition of the edge requires that point \(j\) satisfies
\[\exists (x(j)_{k} - v_{j, k} \leq -\epsilon_{k}) \land \exists (x(j)_{k} - v_{j, k} \geq \epsilon_{k}) \; \forall k, k = [1, 2, ..., M]\]
or that there is at least one sample to both the “left” and “right” of a resampling point in each dimension at a distance of at least \(\epsilon\).
The “com_feature” edge clipping mode requires that
\[\frac{1}{N_{j}} \sum_{i=1}^{N_{j}}{ \frac{| x(j)_{(i,k)} - v_{(j,k)} |}{1 - \epsilon_{k}}} \leq 1 \; \forall{k}, k = [1, 2, ..., M]\]
Finally, the “com_distance” edge clipping mode (default) requires that
\[\frac{1}{N_{j}} \sum_{i=1}^{N_{j}} { \left[ \sum_{k=1}^{K} { \left(\frac{ x(j)_{(i,k)} - v_{(j,k)} } { 1 - \epsilon_{k} }\right)^{2} } \right]^{0.5} } \leq 1\]
In all cases, as \(\epsilon \to 1\), edge clipping effects will become increasingly severe.
Weighting¶
The solution to \(f(v)\) may be derived by placing optional weights on each sample based on associated measurement error in \(y\) and/or the proximity of \(x\) from \(v\). The weight matrix \(W\) is a diagonal matrix \((N \times N)\) in which we define the weight for sample \(i\) as \(w_{i} = W_{(i,i)}\) and
\[w_{i} = w_{i}^{\varepsilon} w_{i}^{\delta}\]
For each point \(j\), the vector of weights is
\[w(j) = w(j)^{\varepsilon} w(j)^{\delta}\]
where \(w^{\delta}\) is the distance weighting and \(w^{\varepsilon}\) is the error weighting. If error weighting is disabled then \(w^{\varepsilon} = 1\), and if distance weighting is disabled, \(w^{\delta} = 1\).
Error Weighting¶
For error weighting to be applied, \(\varepsilon\) must be supplied by the user.
\begin{eqnarray} w_{i}^{\varepsilon} & = & \frac{1}{{\varepsilon_{i}}^{2}} \\ w^{\varepsilon}(j) & = & w_{i}^{\varepsilon} \; | \; i \in \Omega_{j} \end{eqnarray}
Distance Weighting¶
Distance weights use a Gaussian function of the euclidean distance of samples from a point. The user may either supply a smoothing parameter \(\alpha\), or the Gaussian sigma in the original coordinate units \(\alpha^{\prime}\).
\[\alpha_{k} = \frac{2 {\alpha_{k}^{\prime}}^{2}}{{\omega_k}^{2}}, k = [1, 2, ..., K]\]
The vector of distance weights applied to point \(j\) is
\[w^{\delta}(j) = exp \left( -\sum_{k=1}^{K}{ \frac{(x(j)_{k} - v_{(j,k)})^{2}}{\alpha_{k}} } \right)\]
In the initial units (marked by \(\prime\)), this is equivalent to
\[w^{\delta}(j) = exp \left( -\sum_{k=1}^{K}{ \frac{(x^{\prime}(j)_{k} - v^{\prime}_{(j,k)})^{2}} {2 {\alpha_{k}^{\prime}}^{2}} } \right)\]
Deriving Point Solutions¶
At point \(j\), define
\begin{eqnarray} X & = & [\Phi_{i} \; \forall \; i \in \Omega_{j}] \\ W & = & \text{ diagonal matrix } \; | \; diag(W) = w(j) \\ Y & = & [y_{i} \; \forall \; i \in \Omega_{j}] \\ Z & = & \text{ diagonal matrix } \; | \; diag(Z) = \varepsilon(j) \end{eqnarray}
The estimated polynomial coefficients \(\hat{c}\) are then solved for by finding the least-squares solution of
\begin{eqnarray} & (X^{T} W X) \hat{c_{j}} = X^{T} W Y \\ & \hat{c_{j}} = (X^{T} W X)^{-1} X^{T} W Y \end{eqnarray}
\(f(v_{j})\) can then be fitted for by
\[f(v_{j}) = \sum_{m=1}^{S}{\hat{c}_{m} \Phi_{j, m}^{v}}\]
Error Estimates¶
If errors are to be propagated through the system, the estimated covariance-variance matrix \(C\) is
\[C = (X^{T} W X)^{-1} X^{T}W Z^{T} Z W^{T} X (X^{T} W X)^{-1}\]
If no errors are provided, but an estimate for the error in the fit is required, the covariance-variance matrix is derived from the residuals (\(r\)) on the fit.
\begin{eqnarray} r & = & Y - f(v_{j}) \\ C & = & (X^{T} W X)^{-1} X^{T} W r^{T} r W^{T} X (X^{T} W X)^{-1} \end{eqnarray}
The error \(\sigma_{j}\) is then given by
\[\sigma_{j}^{2} = \frac{\sum_{s1=1}^{S} \sum_{s2=1}^{S} \Phi_{j,s1}^{v} C_{s1, s2} \Phi_{j,s2}^{v}} {N_{j} - rank(X^{T} W X)}\]
Symmetric Order Zero¶
If \(o_{k} = 0 \; \forall k\) then the weighted mean is used instead of a polynomial fit. In this case
\begin{eqnarray} f(v_{j}) & = & \frac{tr(W Y)}{tr(W)} \\ \sigma_{j}^{2} & = & \frac{tr(W Z Z^{T} W^{T})}{tr(W^{T} W)} \end{eqnarray}
Goodness of Fit¶
If needed, a measure of how well the fit matched the data is given by the reduced \(\chi^2\).
\begin{eqnarray} R & = & Z^{-1} (Y - f(v_{j})) \\ \chi_{r}^{2} & = & \frac{tr(R^{T} W R)}{tr(W)} \frac{N_{j}}{N_{j} - rank(X^{T} W X)} \end{eqnarray}