Processing Data#

Tip

All of the presets metioned below can be set during instantiation:

magdata = MagentroData(..., presets={...})

from magentropy import MagentroData

magdata = MagentroData('magdata.dat')

"[Data]" tag detected, assuming QD .dat file.
The sample mass was determined from the QD .dat file: 0.1

Grouping#

Before smoothing, the magnetization data must be grouped by field. Normally, the measured fields are not exact, so groups must be inferred. The method test_grouping() can be used to test grouping presets prior to fully processing the data.

grouping_presets, grouped_by = magdata.test_grouping()

With no arguments passed, the defaults in presets are used. The method returns a dictionary of the grouping presets used to perform the grouping and a pandas DataFrameGroupBy object to see the results. In particular, the DataFrameGroupBy.count() method is useful.

grouping_presets

{'fields': array([], dtype=float64), 'decimals': 5, 'max_diff': inf}

grouped_by['T'].count()

0     100
0     100
0     100
0     100
0    100
Name: T, dtype: int64

Above, we see that the default grouping presets are an empty array of fields, a decimal place of 5 for rounding, and infinite max_diff. Detailed information on each of these can be found in the process_data() documentation.

In this instance, the presets direct the grouping method to group the fields simply by rounding to the 5th decimal place, which accurately determines the field groups, as shown by the count() method. There are five fields, each with 100 temperature measurements. In most cases, grouping by rounding should be sufficient.

Smoothing#

Reference

J. J. Stickel, Comput. Chem. Eng. 34, 467 (2010)

There are a number of options to control the smoothing. The default presets have been chosen sensibly but can be easily changed. All parameters, including grouping parameters, can be either set as new defaults using presets or set_presets(), or used for a single process_data() run by entering an argument in process_data(). See the documentation for a complete description of all parameters. They are summarized below. The use of set_presets() is demonstrated in each case with the default presets purely for example; it is not necessary to set presets if the defaults are to be used.

Output temperatures#

The smoothed magnetic moment will be evaulated at npoints evenly-spaced temperatures in the range temp_range. npoints expects an integer, and temp_range expects an array_like of length 2. The default range [-numpy.inf, numpy.inf] adjusts to the maximum range in the data automatically. Additionally, only those fields with at least min_sweep_len measured temperatures in their respective temperature sweeps will be processed. The default is 10.

from numpy import inf

magdata.set_presets(npoints=1000, temp_range=[-inf, inf], min_sweep_len=10)

Regularization#

The two most important options for the regularization (smoothing) itself are the derivative order d_order and the regularization parameter \(\lambda\) for each field, lmbds.

The derivative of the magnetic moment with respect to temperature of order d_order is used to quantify the “roughness” of the fitted curves. Generally, 2 or 3 work well. The default is 2.

The regularization parameter determines the empahsis that is given to the roughness regularization penalty. A higher \(\lambda\) results in a smoother curve, and a \(\lambda\) of zero results in interpolation. A \(\lambda\) can be specified for each field (in increasing field order) as an array_like of the same length as the number of fields. Any field with a corresponding \(\lambda\) of numpy.nan will have an “optimal” \(\lambda\) determined automatically; see below. The default lmbds is an array with a single numpy.nan, which indicates that an optimal \(\lambda\) should be found for each field. The same behavior occurs if an empty list is given.

magdata.set_presets(d_order=2, lmbds=[])

Optimal regularization parameters#

Numerical optimization is used to determine the optimal regularization parameter for each field without a \(\lambda\) provided. Three metrics are available to quantify the meaning of “optimal”:

Generalized cross validation (GCV). The GCV variance is minimized. Set match_err to False (default).
Error matching. The standard deviation of the absolute differences between the measured and smoothed magnetic moment points is matched to a value. The squared difference between the standard deviation and this value is minimized. Set match_err to a single value to match this value for all fields, an array_like of the same length as the number of fields to match a different value for each field (in order of increasing field), or one of 'min', 'mean', or 'max' to use the minimum, mean, or maximum value of the error column for each field as the value.
Per-point error matching (experimental). The absolute differences between the measured and smoothed magnetic moment points are computed, and the sum of squared differences between these and the corresponding values in the error column is minimized. Set match_err to True.

Each of these requires an initial guess, given by lmbd_guess. Currently, a single guess to use for all fields is supported. For control over the minimization, keyword arguments to pass to scipy.optimize.minimize() can be given as a dictionary to min_kwargs. Keep in mind that any values passed to min_kwargs should be with respect to \(\log_{10} \lambda\), since this is the value that is minimized internally. (However, lmbd_guess is the guess for \(\lambda\) itself; no \(\log_{10}\).) Lastly, weight_err specifies whether to weight measurements by the normalized inverse squares of the errors. The default is True.

See process_data() for full documentation.

magdata.set_presets(
    lmbd_guess=1e-4, weight_err=True, match_err=False,
    min_kwargs = {
        'method': 'Nelder-Mead',
        'bounds': ((-inf, inf),),
        'options': {'maxfev': 50, 'xatol': 1e-2, 'fatol': 1e-6}
    }
)

Integrating from zero field#

The calculation of entropy requires that the derivative of the magnetic moment with respect to temperature be integrated with respect to magnetic field, starting at zero field. Zero field measurements (with zero moment) are prepended before integration during processing, so it is not necessary to include zero field measurements in the input data.

The zeros can be included in processed_df if add_zeros is set to True. It is False by default.

magdata.set_presets(add_zeros=False)

Demonstration#

Simple usage of process_data() is shown, including the adjustment of the regularization parameters by eye after they have been estimated initially. Plots are used to verify the success of the smoothing. See Plotting Data for more information.

magdata.process_data()

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 2, figsize=(14, 5))

magdata.plot_lines(data_prop='M_per_mass', data_version='compare', ax=ax[0])

magdata.plot_lines(
    data_prop='dM_dT', data_version='compare', ax=ax[1], colorbar=True,
    colorbar_kwargs={'ax': ax, 'fraction': 0.1, 'pad': 0.02}
);

Note

The errors in this fake data are greatly exaggerated! Most instruments will have relative errors much smaller than those shown here.

We can see that the smoothed data (lines) looks much better than the raw data (dots), especially in the derivative plot on the right. Generalized cross validation has done a pretty good job of selecting optimal regularization parameters, which we can view using last_presets:

magdata.last_presets['lmbds']

array([0.00091728, 0.00054639, 0.00072862, 0.00091728, 0.00095775])

The presets are the same as they were before; however, setting them to last_presets is simple:

magdata.presets = magdata.last_presets
magdata.presets

{'npoints': 1000,
 'temp_range': array([  0.99999934, 100.00000083]),
 'fields': array([ 20.,  40.,  60.,  80., 100.]),
 'decimals': 5,
 'max_diff': inf,
 'min_sweep_len': 10,
 'd_order': 2,
 'lmbds': array([0.00091728, 0.00054639, 0.00072862, 0.00091728, 0.00095775]),
 'lmbd_guess': 0.0001,
 'weight_err': True,
 'match_err': False,
 'min_kwargs': {'method': 'Nelder-Mead',
  'bounds': ((-inf, inf),),
  'options': {'maxfev': 50, 'xatol': 0.01, 'fatol': 1e-06}},
 'add_zeros': False}

We could also adjust lmbds for a single run and re-process:

magdata.process_data(lmbds=[1e-4, 5e-5, 1e-4, 1e-5, 1e-5])

fig, ax = plt.subplots(1, 2, figsize=(14, 5))

magdata.plot_lines(data_prop='M_per_mass', data_version='compare', ax=ax[0])

magdata.plot_lines(
    data_prop='dM_dT', data_version='compare', ax=ax[1], colorbar=True,
    colorbar_kwargs={'ax': ax, 'fraction': 0.1, 'pad': 0.02}
);

The error column in processed_df will still be empty after all this. See Bootstrap Estimates.