Processing Data#
Tip
All of the presets
metioned below can be set during instantiation:
magdata = MagentroData(..., presets={...})
from magentropy import MagentroData
magdata = MagentroData('magdata.dat')
"[Data]" tag detected, assuming QD .dat file.
The sample mass was determined from the QD .dat file: 0.1
Grouping#
Before smoothing, the magnetization data must be grouped by field.
Normally, the measured fields are not exact, so groups must be inferred.
The method test_grouping()
can be used to test grouping presets prior to
fully processing the data.
grouping_presets, grouped_by = magdata.test_grouping()
With no arguments passed, the defaults in presets
are used. The method returns a dictionary
of the grouping presets used to perform the grouping and a pandas
DataFrameGroupBy
object
to see the results. In particular, the
DataFrameGroupBy.count()
method is useful.
grouping_presets
{'fields': array([], dtype=float64), 'decimals': 5, 'max_diff': inf}
grouped_by['T'].count()
20.0 100
40.0 100
60.0 100
80.0 100
100.0 100
Name: T, dtype: int64
Above, we see that the default grouping presets are an empty array of fields
, a decimal
place
of 5 for rounding, and infinite max_diff
. Detailed information on each of these can be found in
the process_data()
documentation.
In this instance, the presets direct the grouping method to group the fields simply by rounding to
the 5th decimal place, which accurately determines the field groups, as shown by the count()
method. There are five fields, each with 100 temperature measurements. In most cases, grouping
by rounding should be sufficient.
Smoothing#
Reference
J. J. Stickel, Comput. Chem. Eng. 34, 467 (2010)
There are a number of options to control the smoothing. The default presets have been chosen
sensibly but can be easily changed. All parameters, including grouping parameters, can be either
set as new defaults using presets
or set_presets()
, or used for a single process_data()
run by entering an argument in process_data()
. See the documentation for a complete
description of all parameters. They are summarized below. The use of set_presets()
is
demonstrated in each case with the default presets purely for example; it is not necessary to set
presets if the defaults are to be used.
Output temperatures#
The smoothed magnetic moment will be evaulated at npoints
evenly-spaced temperatures in the range
temp_range
. npoints
expects an integer, and temp_range
expects an array_like of
length 2. The default range [-numpy.inf, numpy.inf]
adjusts to the maximum range in the data automatically.
Additionally, only those fields with at least min_sweep_len
measured temperatures in their
respective temperature sweeps will be processed. The default is 10.
from numpy import inf
magdata.set_presets(npoints=1000, temp_range=[-inf, inf], min_sweep_len=10)
Regularization#
The two most important options for the regularization (smoothing) itself are the derivative order
d_order
and the regularization parameter \(\lambda\) for each field, lmbds
.
The derivative of the magnetic moment with respect to temperature of order d_order
is used to
quantify the “roughness” of the fitted curves. Generally, 2 or 3 work well. The default is 2.
The regularization parameter determines the empahsis that is given to the roughness
regularization penalty. A higher \(\lambda\) results in a smoother curve, and a \(\lambda\)
of zero results in interpolation. A \(\lambda\) can be specified for each field
(in increasing field order) as an array_like of the same length as the number of fields.
Any field with a corresponding \(\lambda\) of numpy.nan
will have an “optimal”
\(\lambda\) determined automatically; see below.
The default lmbds
is an array with a single numpy.nan
, which indicates that an optimal
\(\lambda\) should be found for each field. The same behavior occurs if an empty list
is given.
magdata.set_presets(d_order=2, lmbds=[])
Optimal regularization parameters#
Numerical optimization is used to determine the optimal regularization parameter for each field without a \(\lambda\) provided. Three metrics are available to quantify the meaning of “optimal”:
Generalized cross validation (GCV). The GCV variance is minimized. Set
match_err
toFalse
(default).Error matching. The standard deviation of the absolute differences between the measured and smoothed magnetic moment points is matched to a value. The squared difference between the standard deviation and this value is minimized. Set
match_err
to a single value to match this value for all fields, an array_like of the same length as the number of fields to match a different value for each field (in order of increasing field), or one of'min'
,'mean'
, or'max'
to use the minimum, mean, or maximum value of the error column for each field as the value.Per-point error matching (experimental). The absolute differences between the measured and smoothed magnetic moment points are computed, and the sum of squared differences between these and the corresponding values in the error column is minimized. Set
match_err
toTrue
.
Each of these requires an initial guess, given by lmbd_guess
. Currently, a single guess to use
for all fields is supported. For control over the minimization, keyword arguments to pass
to scipy.optimize.minimize()
can be given as a dictionary to min_kwargs
. Keep in mind
that any values passed to min_kwargs
should be with respect to \(\log_{10} \lambda\),
since this is the value that is minimized internally. (However, lmbd_guess
is the guess for
\(\lambda\) itself; no \(\log_{10}\).) Lastly, weight_err
specifies whether to weight measurements by the
normalized inverse squares of the errors. The default is True
.
See process_data()
for full documentation.
magdata.set_presets(
lmbd_guess=1e-4, weight_err=True, match_err=False,
min_kwargs = {
'method': 'Nelder-Mead',
'bounds': ((-inf, inf),),
'options': {'maxfev': 50, 'xatol': 1e-2, 'fatol': 1e-6}
}
)
Integrating from zero field#
The calculation of entropy requires that the derivative of the magnetic moment with respect to temperature be integrated with respect to magnetic field, starting at zero field. Zero field measurements (with zero moment) are prepended before integration during processing, so it is not necessary to include zero field measurements in the input data.
The zeros can be included in processed_df
if add_zeros
is set to True
. It is False
by
default.
magdata.set_presets(add_zeros=False)
Demonstration#
Simple usage of process_data()
is shown, including the adjustment of the regularization
parameters by eye after they have been estimated initially. Plots are used to verify the
success of the smoothing. See Plotting Data for more information.
magdata.process_data()
Show code cell output
The data contains the following 5 magnetic field strengths and observations per field:
20.0 100
40.0 100
60.0 100
80.0 100
100.0 100
Name: T, dtype: int64
Processing data using the following settings:
{
npoints: 1000,
temp_range: [-inf inf],
fields: [],
decimals: 5,
max_diff: inf,
min_sweep_len: 10,
d_order: 2,
lmbds: [nan],
lmbd_guess: 0.0001,
weight_err: True,
match_err: False,
min_kwargs: {'method': 'Nelder-Mead', 'bounds': ((-inf, inf),), 'options': {'maxfev': 50, 'xatol': 0.01, 'fatol': 1e-06}},
add_zeros: False
}
scipy.optimize.minimize: Optimization terminated successfully.
Processed M(T) at field: 20.0
scipy.optimize.minimize: Optimization terminated successfully.
Processed M(T) at field: 40.0
scipy.optimize.minimize: Optimization terminated successfully.
Processed M(T) at field: 60.0
scipy.optimize.minimize: Optimization terminated successfully.
Processed M(T) at field: 80.0
scipy.optimize.minimize: Optimization terminated successfully.
Processed M(T) at field: 100.0
Calculated raw derivative and entropy.
last_presets set to:
{
npoints: 1000,
temp_range: [ 0.99999934 100.00000083],
fields: [ 20. 40. 60. 80. 100.],
decimals: 5,
max_diff: inf,
min_sweep_len: 10,
d_order: 2,
lmbds: [0.00091728 0.00054639 0.00072862 0.00091728 0.00095775],
lmbd_guess: 0.0001,
weight_err: True,
match_err: False,
min_kwargs: {'method': 'Nelder-Mead', 'bounds': ((-inf, inf),), 'options': {'maxfev': 50, 'xatol': 0.01, 'fatol': 1e-06}},
add_zeros: False
}
Finished.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 2, figsize=(14, 5))
magdata.plot_lines(data_prop='M_per_mass', data_version='compare', ax=ax[0])
magdata.plot_lines(
data_prop='dM_dT', data_version='compare', ax=ax[1], colorbar=True,
colorbar_kwargs={'ax': ax, 'fraction': 0.1, 'pad': 0.02}
);
Note
The errors in this fake data are greatly exaggerated! Most instruments will have relative errors much smaller than those shown here.
We can see that the smoothed data (lines) looks much better than the raw data (dots), especially
in the derivative plot on the right. Generalized cross validation has done a pretty good job of
selecting optimal regularization parameters, which we can view using last_presets
:
magdata.last_presets['lmbds']
array([0.00091728, 0.00054639, 0.00072862, 0.00091728, 0.00095775])
The presets
are the same as they were before; however, setting them to last_presets
is simple:
magdata.presets = magdata.last_presets
magdata.presets
{'npoints': 1000,
'temp_range': array([ 0.99999934, 100.00000083]),
'fields': array([ 20., 40., 60., 80., 100.]),
'decimals': 5,
'max_diff': inf,
'min_sweep_len': 10,
'd_order': 2,
'lmbds': array([0.00091728, 0.00054639, 0.00072862, 0.00091728, 0.00095775]),
'lmbd_guess': 0.0001,
'weight_err': True,
'match_err': False,
'min_kwargs': {'method': 'Nelder-Mead',
'bounds': ((-inf, inf),),
'options': {'maxfev': 50, 'xatol': 0.01, 'fatol': 1e-06}},
'add_zeros': False}
We could also adjust lmbds
for a single run and re-process:
magdata.process_data(lmbds=[1e-4, 5e-5, 1e-4, 1e-5, 1e-5])
Show code cell output
The data contains the following 5 magnetic field strengths and observations per field:
20.0 100
40.0 100
60.0 100
80.0 100
100.0 100
Name: T, dtype: int64
Processing data using the following settings:
{
npoints: 1000,
temp_range: [ 0.99999934 100.00000083],
fields: [ 20. 40. 60. 80. 100.],
decimals: 5,
max_diff: inf,
min_sweep_len: 10,
d_order: 2,
lmbds: [1.e-04 5.e-05 1.e-04 1.e-05 1.e-05],
lmbd_guess: 0.0001,
weight_err: True,
match_err: False,
min_kwargs: {'method': 'Nelder-Mead', 'bounds': ((-inf, inf),), 'options': {'maxfev': 50, 'xatol': 0.01, 'fatol': 1e-06}},
add_zeros: False
}
Processed M(T) at field: 20.0
Processed M(T) at field: 40.0
Processed M(T) at field: 60.0
Processed M(T) at field: 80.0
Processed M(T) at field: 100.0
Calculated raw derivative and entropy.
last_presets set to:
{
npoints: 1000,
temp_range: [ 0.99999934 100.00000083],
fields: [ 20. 40. 60. 80. 100.],
decimals: 5,
max_diff: inf,
min_sweep_len: 10,
d_order: 2,
lmbds: [1.e-04 5.e-05 1.e-04 1.e-05 1.e-05],
lmbd_guess: 0.0001,
weight_err: True,
match_err: False,
min_kwargs: {'method': 'Nelder-Mead', 'bounds': ((-inf, inf),), 'options': {'maxfev': 50, 'xatol': 0.01, 'fatol': 1e-06}},
add_zeros: False
}
Finished.
fig, ax = plt.subplots(1, 2, figsize=(14, 5))
magdata.plot_lines(data_prop='M_per_mass', data_version='compare', ax=ax[0])
magdata.plot_lines(
data_prop='dM_dT', data_version='compare', ax=ax[1], colorbar=True,
colorbar_kwargs={'ax': ax, 'fraction': 0.1, 'pad': 0.02}
);
The error column in processed_df
will still be empty after all this.
See Bootstrap Estimates.