Reading Data#
from magentropy import MagentroData
Constructor#
Data must be read when a MagentroData object is instantiated via the constructor. Supported
input formats are Quantum Design .dat data files (default), delimited files, and DataFrames.
QD data files#
The default arguments are configured for QD .dat files. These files are expected to consist of:
A header section with metadata. In particular, the sample mass should be given as
INFO,<sample_mass>,SAMPLE_MASS, where<sample_mass>is replaced by a decimal number.A
\n[Data]\ntag separating the header section from the data. (Here,\nindicates a newline.)The delimited data. The default separator is
','.Data columns with names
'Comment','Temperature (K)','Magnetic Field (Oe)','Moment (emu)', and'M. Std. Err. (emu)'.
magdata_dat = MagentroData('magdata.dat')
"[Data]" tag detected, assuming QD .dat file.
The sample mass was determined from the QD .dat file: 0.1
Tip
If the column names, delimiter, or sample mass format are different, these can be set manually as
described below. The **read_csv_kwargs keyword arguments will also be applied
to the delimited data in .dat files.
Delimited files#
A delimited input file may be indicated by passing qd_dat = False to the constructor.
Additionally, different column names can be specified, including the absence of a comment
or error column. Here, the comment column is excluded.
magdata_csv = MagentroData(
'magdata.csv', qd_dat=False,
comment_col=None, T='T', H='H', M='M', M_err='M_err'
)
Since the sample mass is not present in delimited files, it is set to the default of 1.0. This can be changed after instantiation (per-mass columns are updated accordingly):
print(magdata_csv.sample_mass)
magdata_csv.sample_mass = 0.1
print(magdata_csv.sample_mass)
1.0
0.1
The mass can also be provided in the constructor itself:
magdata_csv = MagentroData(
'magdata.csv', qd_dat=False,
comment_col=None, T='T', H='H', M='M', M_err='M_err',
sample_mass=0.1
)
magdata_csv.sample_mass
0.1
Note
It is not strictly necessary to set qd_dat = False. The delimited data will be read correctly,
though a warning will be printed, and of course the sample mass must still be set manually.
Tip
Delimited data is read using pandas.read_csv(). Keyword arguments can be passed to
pandas.read_csv() as additional keyword arguments to the constructor.
For example, if the file is tab-delimited, magdata_tab = MagentroData(..., sep='\t').
These will be ignored if the input is a DataFrame, described in the next section.
DataFrames#
If data is in a DataFrame, perhaps because preprocessing was required, the procedure
is exactly the same as for delimited files. Here, we create a new
MagentroData instance using the raw data from magdata_csv:
magdata_df = MagentroData(
magdata_csv.raw_df,
comment_col=None, T='T', H='H', M='M', M_err='M_err',
sample_mass=0.1
)
The column labels and sample mass are specified as before. The qd_dat parameter is ignored
because a DataFrame is detected, so it need not be included.
Missing values#
If a comment column label is supplied, any row in which the comment column has a non-NaN value is
dropped. (i.e., rows with comments are removed, since comments in QD .dat output files indicate
measurement problems.)
Additionally, any row containing a missing value in the temperature, field, or moment column is dropped. If a moment error column is supplied, any row with a missing value or a value equal to zero in the error column will be dropped.
Viewing data#
Raw, converted (SI units), and processed (smoothed) data is available through the
attributes raw_df, converted_df, and processed_df. For example:
magdata_dat.raw_df
| T | H | M | M_err | M_per_mass | M_per_mass_err | dM_dT | Delta_SM | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1.000000 | 20.000001 | 0.002023 | 0.00005 | 20.232376 | 0.5 | NaN | NaN |
| 1 | 2.000000 | 20.000000 | 0.001977 | 0.00005 | 19.770351 | 0.5 | NaN | NaN |
| 2 | 3.000001 | 19.999998 | 0.001969 | 0.00005 | 19.691176 | 0.5 | NaN | NaN |
| 3 | 4.000000 | 19.999999 | 0.001970 | 0.00005 | 19.703463 | 0.5 | NaN | NaN |
| 4 | 4.999999 | 20.000001 | 0.001886 | 0.00005 | 18.861315 | 0.5 | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 495 | 95.999998 | 100.000002 | 0.001403 | 0.00005 | 14.030179 | 0.5 | NaN | NaN |
| 496 | 97.000000 | 99.999999 | 0.001446 | 0.00005 | 14.458498 | 0.5 | NaN | NaN |
| 497 | 98.000001 | 100.000001 | 0.001324 | 0.00005 | 13.244919 | 0.5 | NaN | NaN |
| 498 | 98.999999 | 99.999999 | 0.001184 | 0.00005 | 11.835327 | 0.5 | NaN | NaN |
| 499 | 100.000000 | 100.000000 | 0.001172 | 0.00005 | 11.722362 | 0.5 | NaN | NaN |
500 rows × 8 columns
Each DataFrame attribute contains columns corresponding to temperature, magnetic field strength,
moment, moment error, moment per mass, moment per mass error, moment derivative with respect to
temperature, and magnetic entropy.
Units can be viewed as a second header level by appending _with_units to any of these attributes.
magdata_dat.raw_df_with_units
| T | H | M | M_err | M_per_mass | M_per_mass_err | dM_dT | Delta_SM | |
|---|---|---|---|---|---|---|---|---|
| unit | K | Oe | emu | emu | emu/g | emu/g | cal/K/Oe/g | cal/K/g |
| 0 | 1.000000 | 20.000001 | 0.002023 | 0.00005 | 20.232376 | 0.5 | NaN | NaN |
| 1 | 2.000000 | 20.000000 | 0.001977 | 0.00005 | 19.770351 | 0.5 | NaN | NaN |
| 2 | 3.000001 | 19.999998 | 0.001969 | 0.00005 | 19.691176 | 0.5 | NaN | NaN |
| 3 | 4.000000 | 19.999999 | 0.001970 | 0.00005 | 19.703463 | 0.5 | NaN | NaN |
| 4 | 4.999999 | 20.000001 | 0.001886 | 0.00005 | 18.861315 | 0.5 | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 495 | 95.999998 | 100.000002 | 0.001403 | 0.00005 | 14.030179 | 0.5 | NaN | NaN |
| 496 | 97.000000 | 99.999999 | 0.001446 | 0.00005 | 14.458498 | 0.5 | NaN | NaN |
| 497 | 98.000001 | 100.000001 | 0.001324 | 0.00005 | 13.244919 | 0.5 | NaN | NaN |
| 498 | 98.999999 | 99.999999 | 0.001184 | 0.00005 | 11.835327 | 0.5 | NaN | NaN |
| 499 | 100.000000 | 100.000000 | 0.001172 | 0.00005 | 11.722362 | 0.5 | NaN | NaN |
500 rows × 8 columns
Similarly for sample mass:
magdata_dat.sample_mass
0.1
magdata_dat.sample_mass_with_units
(0.1, 'mg')
Tip
All DataFrame attributes are immutable and return copies of the internal instance attributes.
If repeated access is required, for example to a DataFrame’s columns, it is best to first save
the DataFrame as a local variable to avoid repeatedly copying large amounts of data.
Don’t do this:
col_means = [magdata.raw_df['T'].mean(), magdata.raw_df['H'].mean(), magdata.raw_df['M'].mean()]
Instead, do this:
raw_df = magdata.raw_df
col_means = [raw_df['T'].mean(), raw_df['H'].mean(), raw_df['M'].mean()]
Simulating data#
The class method sim_data() can be used to generate data for testing and examples.
A decreasing logistic function with a Gaussian “bump” whose center depends on the field strength
is used to “simulate” noisy data. (There are quotation marks because the function has no
physical significance.)
The following code returns a DataFrame with columns 'T', 'H', 'M', and 'M_err'.
This data is the same data found in the magdata.dat and magdata.csv files used in
these examples.
import numpy as np
sim_df = MagentroData.sim_data(
temps=np.linspace(1., 100., 100),
fields=np.linspace(20., 100., 5),
sigma_m=5e-5,
random_seed=0
)
Units and presets#
It is possible to set presets and units during instantiation. See Processing Data and Units and Conversions for additional information.