fitpeaks1d() is designed to fit a signal, or superimposed signals, along with a background to 1-dimensional data. It is essentially a wrapper for the astropy.modeling module allowing for customization, and ease of use for the most common case uses (developed using astronomical data). The algorithm is divided into three phases: the search phase, an initial baseline estimate, and the refined fitting phase:

Search Phase¶

The purpose of the search phase is to get a good approximation of peak locations in the data along the independent axis. These may be supplied by the user (e.g. known apertures), or an estimate may be made. Estimates are made using the following process:

1. Processing function (Optional)¶

Apply a processing function to \((x, y)\) data to return a approximate baseline and new dependent values (\(y^{\prime}\)) for the subsequent step to operate on. The default processing function (medabs_baseline()) subtracts the median y value and then returns absolute values:

\[y^{\prime} = | y - median(y) |\]

(Source code, png, hires.png, pdf)

The default medabs_baseline() function is just designed to be a general usage quick function to handle positive and negative peaked signals. This may be not be suitable for some data sets in which case the user should write their own function that takes in (x, y) and returns (\(y^{\prime}\), baseline) where all inputs and outputs are 1-D numpy arrays of the same size.

2. Guess function (Optional)¶

The next step passes \((x, y^{\prime}\) to the guess function which should be designed to return a quick estimate of the x and y value for the most prominent peak \((x_{peak}, y^{\prime}_{peak}) `in :math:`y^{\prime}\) (not \(y\)). The default guess function (guess_xy_mad()) is designed to work with the default processing function (medabs_baseline()). Therefore, if changing one, be considerate of the other. medabs_baseline() returns the following:

\begin{eqnarray} y_{peak} & = & max(|y^{\prime} - median(y^{\prime}) | ) \\ x_{peak} & = & x \, | \, y^{\prime}(x) = y_{peak} \end{eqnarray}

(Source code, png, hires.png, pdf)

3. Rough Initial Parameter Estimation and the Box Function¶

A fit is then performed at either the most prominent peak identified in steps 1 and 2, or the first user supplied guess position. The parameters determined here will be used as initialization parameters for the final fitting and also (if required), to search for other peaks in the data. When fitting peaks that are close together, or the data has structure in the baseline, it can be important to use a box function to limit the range of the fitting along \(x\). Initial fits to the most prominent peak are shown below with and without the boxing function.

(Source code, png, hires.png, pdf)

There are a couple of things to note above. Firstly, the fit on the right (without a box function) has fit both peaks present in the data with a single Gaussian, which is obviously incorrect. Secondly, the fit on the left did not fit the peak very well, which is why a second more refined fit occurs at a later stage. Clipping of the fit due to the box function can also be seen in the left plot. Finally, while the initial fit on the right failed, there is still a good chance that the final fit will still fit the data correctly since this is a fairly simple model.

The default box function is a astropy.modeling.models.Box1D with a width of 6 times the width parameter of the peak function (fwhm in the case of the default astropy.modeling.models.Gaussian1D). However, since it will not always be the case that we wish to fit Gaussians, or that the best way to separate one peak from another is a standard box (hat) function, the box function, the peak width parameter and it’s relationship to the width of the box function can be configured. The box model must be a astropy.modeling.Fittable1DModel and is passed into fitpeaks1d() with the box_class parameter.

The box_width parameter can either be a single float or int value, in which case the width if the box will be held fixed (in units of \(x\)). Alternatively, a tuple of the form (peak_parameter_name, width) can be supplied indicating that the width of the box should always a width multiple of certain parameter of the peak model. More complex relationships can also be defined through the search_kwargs parameter which will be described later.

4. Identify remaining peaks¶

Once the first or most prominent peak has undergone initial parameterization, it’s fit will be subtracted from \(y^{\prime}\) and steps 1-3 will be repeated do identify up to npeaks (default=1) peaks. If more than one peaks are to be identified, this MUST be specified by the user. If automatically detecting peaks, peaks will be fit in the order of most prominent to least as determined from the initial search phase. Otherwise, if guess positions were supplied by the user, peaks will be fit in the supplied order.

(Source code, png, hires.png, pdf)

Initial Baseline Fitting¶

Once estimates of the peak parameters have been determined from the search phase, they are used to initialize a final model comprising of npeaks peaks and an optional baseline. By default, a constant offset value will be used to model the baseline (astropy.modeling.models.Const1D) although any astropy.modeling.Fittable1DModel may be used if passed into via the background_class keyword argument.

Before final fitting, the user may process the data one final time in a similar way to baseline_func using the optional_func keyword argument to pass in a function of the form \(y^{\prime} = f(x, y)\). There are not many instances when one may want to do this, except for possibly filtering or smoothing the initial data.

Next, if the user has requested a baseline fit, a fit may be performed to initialize the background component of the final model. This baseline fit will be performed on the residual between the initial peak fit and the original data. Alternatively, the user may supply expected background parameters. For example, offset for a constant background or slope and offset for a linear function. These may be supplied through the binit keyword argument.

(Source code, png, hires.png, pdf)

The above plot shows an initial baseline fit on the residual of the data to the initial peak fits on the left, and the initialized model generated from the peak and baseline approximation on the right. As can be seen, the model does show some discrepancy from the actual data, but provides a good starting point for a final fit to be performed.

Final Refined Fit¶

Finally, a new model, initialized by the previous peak and baseline fits is used as a starting point to fit the full model and baseline in it’s entirety. This time, no boxing function is applied as we will have hopefully untangled any compound peaks or confusing baseline structure in the previous steps. The code below runs all steps using the fitpeaks1d() function.

import numpy as np
import matplotlib.pyplot as plt
from astropy.modeling import models, fitting
from sofia_redux.toolkit.fitting.fitpeaks1d import fitpeaks1d

# Create some fake data with noise
x = np.linspace(0, 10, 1001)
model1 = models.Gaussian1D(amplitude=3, mean=5, stddev=0.5)
model2 = models.Gaussian1D(amplitude=4, mean=4, stddev=0.2)
y = model1(x) + model2(x) + 0.05 * x + 100
rand = np.random.RandomState(42)
y += rand.normal(0, 0.01, y.size)

model = fitpeaks1d(x, y, npeaks=2, background_class=models.Linear1D,
                   box_width=('stddev', 3),
                   search_kwargs={'stddev_0': 0.1})

fit = model(x)
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10, 5))
ax[0].plot(x, y, label='Data', color='blue')
ax[0].plot(x, model(x), label='Model', color='red')
ax[0].legend()
ax[0].set_xlabel('x')
ax[0].set_ylabel('y')
ax[0].set_title("Data and Fit")
ax[1].plot(x, y - fit)
ax[1].set_xlabel('x')
ax[1].set_ylabel('y')
ax[1].set_title("Residual")

(Source code, png, hires.png, pdf)

Note that in the above example, search_kwargs={'stddev_0': 0.1} is used to set an initial value for the fwhm of each peak to 0.1 as if not specified, the default value for the astropy model will be used instead. In this case, for a astropy.modeling.models.Gaussian1D, the default value is 1 which is much too large for our expected peaks. Therefore, during the iterative fitting process, as fwhm is decreased, a local minimum in the cost function will be encountered by fitting both peaks with a single Gaussian, causing the iteration to stop at a result similar to the plot showing a fit without a box function. Also note that when using a boxing function during the search phase, peak name parameters are distinguished from box name parameters by the ‘_0’ suffix (stddev_0 in this case). Likewise, box name parameters are suffixed by ‘_1’.

Model Functionality¶

Once a model has been fit to the data, there are a few features that should hopefully be useful to the user:

Fit Information¶

The print() function can be used to display what the components of the model are, how they are related, and give parameter values. For example, using print on the above model generates the following:

>> print(model)
Model: CompoundModel
Name: 2_peaks_with_background
Inputs: ('x',)
Outputs: ('y',)
Model set size: 1
Expression: [0] + [1] + [2]
Components:
    [0]: <Gaussian1D(amplitude=2.99771258, mean=5.0000777, stddev=0.49993379)>

    [1]: <Gaussian1D(amplitude=4.00083062, mean=4.00004859, stddev=0.20010122)>

    [2]: <Linear1D(slope=0.05014067, intercept=99.99969687)>
Parameters:
       amplitude_0          mean_0      ...       slope_2         intercept_2
    ------------------ ---------------- ... ------------------- ----------------
    2.9977125792665062 5.00007770385712 ... 0.05014066710980406 99.9996968671665

This tells us that there are 3 components to our final model consisting of 2 peaks and one background. If a baseline was fit to the data, it will always exist as the last component of the model. Each component may be accessed through standard python indexing., i.e.:

peak1 = model[0]
peak2 = model[1]
baseline = model[2]

Again, each component can also display information to the user through print():

>> print(peak1)
Model: Gaussian1D
Inputs: ('x',)
Outputs: ('y',)
Model set size: 1
Parameters:
        amplitude            mean             stddev
    ------------------ ---------------- ------------------
    2.9977125792665062 5.00007770385712 0.4999337879214602

Parameter values may be extracted programmatically using the parameters attribute of the full model or any component model. Names of the parameters are stored in the param_names attribute. Note that parameter names are suffixed with “_n” if accessed from the full model where n indicates the component index. For example, “mean_1” would contain the mean \(x\) location of the 2nd Gaussian peak in the above examples. Component parameter names have no such suffix. Parameters may also be accessed through model attributes:

print(model.param_names[:3])
print(model[0].param_names)
assert np.allclose(model.parameters[:3], model[0].parameters)
assert np.equal(model.amplitude_0, model[0].amplitude)

Output:

('amplitude_0', 'mean_0', 'stddev_0')
('amplitude', 'mean', 'stddev')

Evaluating Models¶

The full and component models may be evaluated at user supplied dependent variable locations. Standard arithmetic operations may also be performed on models (or components) to create new manipulations:

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10, 5))
ax[0].plot(x, y - model[2](x))
ax[0].set_title('Baseline Subtracted Data')
ax[1].plot(x, y, label='Data', color='blue')
ax[1].plot(x, (model[0] + model[2])(x), '--',
           label='baseline + 2nd peak', color='red')
ax[1].set_title("Adding Model Components")
ax[1].legend()

(Source code, png, hires.png, pdf)

Further Examples¶

Large Baseline Structure (baseline_func usage and fixing known parameters)¶

import numpy as np
import matplotlib.pyplot as plt
from astropy.modeling import models
from sofia_redux.toolkit.fitting.fitpeaks1d import fitpeaks1d
from astropy.modeling.polynomial import Polynomial1D

x = np.linspace(0, 10, 1000)
baseline = 10 - (x - 5) ** 2
positive_source = models.Voigt1D(x_0=4, amplitude_L=5, fwhm_G=0.1)
negative_source = models.Voigt1D(x_0=7, amplitude_L=-5, fwhm_G=0.2)
rand = np.random.RandomState(41)
noise = rand.normal(loc=0, scale=1.5, size=x.size)
y = baseline + positive_source(x) + negative_source(x) + noise

def baseline_func(x, y):
    baseline = np.poly1d(np.polyfit(x, y, 2))(x)
    return np.abs(y - baseline), baseline

model = fitpeaks1d(x, y, npeaks=2,
                   peak_class=models.Voigt1D,
                   box_width=(None, 3),
                   background_class=Polynomial1D(2),
                   baseline_func=baseline_func)

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10, 5))
ax[0].plot(x, y, '.', label='Data', color='blue', markersize=2)
ax[0].plot(x, model(x), label='Fit', color='red')
ax[0].plot(x, model[2](x), '--', label='Baseline', color='green',
           linewidth=3)
ax[0].legend(loc='lower center')
ax[0].set_title("Fit to data with large baseline structure")
ax[0].set_xlabel('x')
ax[0].set_ylabel('y')
ax[1].plot(x, (model[0] + model[1])(x), label='Fitted peaks')
ax[1].plot(x, positive_source(x) + negative_source(x), label='True peaks')
ax[1].legend(loc='lower left')
ax[1].set_title("Identified peaks")
ax[1].set_xlabel('x')
ax[1].set_ylabel('y')
print(model)

(Source code, png, hires.png, pdf)

Output:

Model: CompoundModel
Name: 2_peaks_with_background
Inputs: ('x',)
Outputs: ('y',)
Model set size: 1
Expression: [0] + [1] + [2]
Components:
    [0]: <Voigt1D(x_0=7.00235917, amplitude_L=-10.44529587, fwhm_L=0.244132,
                  fwhm_G=0.57882632)>

    [1]: <Voigt1D(x_0=3.95826805, amplitude_L=5.05682365, fwhm_L=0.63766468,
                  fwhm_G=0.00500501)>

    [2]: <Polynomial1D(2, c0=-15.1200609, c1=10.00847117, c2=-1.00159934)>
Parameters:
          x_0_0          amplitude_L_0    ...        c1_2                c2_2
    ----------------- ------------------- ... ------------------ -------------------
    7.002359167002396 -10.445295870401356 ... 10.008471173888237 -1.0015993391982998

In the above example, a synthetic data set was generated where the structure of the baseline would confuse a naive fit. fitpeaks1d() is used to overcome this by defining baseline_func for use during the search phase. This user-defined function subtracts a 2nd order polynomial fit to the data as an estimate of the baseline, and also returns the absolute values of the baseline subtracted data in order to search for peak locations. The absolute value is taken since we are interested in negative peaks as well.

Some other points to note are:

1. box_width=(None, 3) links the width of the box function to the autodetected (None) primary width parameter of the peak function so that the box width is always 3 times the peak width.

2. A second order Polynomial1D has been used to model the background. It is perfectly acceptable to use 1-dimensional classes from the astropy.modeling.polynomial module. However, unlike classes from astropy.modeling.models, they require positional arguments during initialization. This can be done by passing in a initialized object, as was done above, or through the bg_args keyword argument to fitpeaks1d().

3. Examining the output fit parameters above, we see the background fit was pretty close to the actual synthetic baseline (\(-15 + 10x - x^2\)), but the peak width parameters (fwhm_L, fwhm_G) do not match the actual peak parameters (0.634, 0.1). However, the overall fit is not bad considering the noise and these two parameters are fit in parallel.

If some parameters are known and fixed, this information may be passed into the fit through keyword arguments. For example:

kwargs = {'fixed': {'fwhm_G_0': True, 'fwhm_G_1': True},
          'fwhm_G_0': 0.1, 'fwhm_G_1': 0.1}
model = fitpeaks1d(x, y, npeaks=2,
                   peak_class=models.Voigt1D,
                   box_width=(None, 3),
                   background_class=Polynomial1D(2),
                   baseline_func=baseline_func,
                   **kwargs)
print(model[:2])

Output:

Model: CompoundModel
Inputs: ('x',)
Outputs: ('y',)
Model set size: 1
Expression: [0] + [1]
Components:
    [0]: <Voigt1D(x_0=7.00295202, amplitude_L=-5.03354134, fwhm_L=0.65804315, fwhm_G=0.1)>

    [1]: <Voigt1D(x_0=3.95908859, amplitude_L=5.0493435, fwhm_L=0.5885241, fwhm_G=0.1)>
Parameters:
          x_0_0         amplitude_L_0         fwhm_L_0      ...      fwhm_L_1      fwhm_G_1
    ----------------- ------------------ ------------------ ... ------------------ --------
    7.002952020756621 -5.033541341853574 0.6580431499448574 ... 0.5885241025687908      0.1

This time, fwhm_L_0 and fwhm_L_1 are closer to the true value of 0.63661977.

Default Behaviour (Over-fitting with composite peaks)¶

import numpy as np
import matplotlib.pyplot as plt
import imageio
from sofia_redux.toolkit.fitting.fitpeaks1d import fitpeaks1d

image = imageio.imread('imageio:hubble_deep_field.png')
y = image[400].sum(axis=1).astype(float)
x = np.arange(y.size)
model = fitpeaks1d(x, y, npeaks=10)

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))
ax[0].plot(x, y)
background = model[-1].amplitude

for i in range(10):
    px, py = model[i].mean.value, model[i].amplitude.value + background
    ax[0].plot(px, py, 'x',
             markersize=10, color='red')
    ax[0].annotate(str(i + 1), (px - 40, py))

ax[0].legend(['Data', 'Fitted peak'], loc='upper right')
ax[0].set_title("Default Settings and Identification Order")
ax[0].set_xlabel('x')
ax[0].set_ylabel('y')

ax[1].plot(x, y, label='Data', color='blue')
ax[1].plot(x, model[0](x) + background, '-.', label='Gaussian Fit',
           color='green')
ax[1].plot(x, model(x), '--', label='Composite Fit',
           color='red')
ax[1].set_xlim(90, 160)
ax[1].legend(loc='upper right')
ax[1].set_title("Peak 1: Simple and Composite Fit")
ax[1].set_xlabel('x')
ax[1].set_ylabel('y')

(Source code, png, hires.png, pdf)

In this example, we use the default behaviour of fitpeaks1d() to identify the 10 most prominent peaks (positive and negative), in order of amplitude, from a set of data containing many possible sources. By default, fitpeaks1d() will fit Gaussian sources and with a constant background offset.

For this particular data set, attempting to identify this many peaks resulted in 4 false detections (points 7-10). This is because we are using Gaussians to model sources that do not fit such a profile. Therefore, the model has fit the most prominent source as a composite of 5 Gaussian sources as shown in the right-most plot.

Navigation

Search Phase¶

1. Processing function (Optional)¶

2. Guess function (Optional)¶

3. Rough Initial Parameter Estimation and the Box Function¶

4. Identify remaining peaks¶

Initial Baseline Fitting¶

Final Refined Fit¶

Model Functionality¶

Fit Information¶

Evaluating Models¶

Further Examples¶

Large Baseline Structure (baseline_func usage and fixing known parameters)¶

Default Behaviour (Over-fitting with composite peaks)¶