Reading Data#
from magentropy import MagentroData
Constructor#
Data must be read when a MagentroData
object is instantiated via the constructor. Supported
input formats are Quantum Design .dat
data files (default), delimited files, and DataFrame
s.
QD data files#
The default arguments are configured for QD .dat
files. These files are expected to consist of:
A header section with metadata. In particular, the sample mass should be given as
INFO,<sample_mass>,SAMPLE_MASS
, where<sample_mass>
is replaced by a decimal number.A
\n[Data]\n
tag separating the header section from the data. (Here,\n
indicates a newline.)The delimited data. The default separator is
','
.Data columns with names
'Comment'
,'Temperature (K)'
,'Magnetic Field (Oe)'
,'Moment (emu)'
, and'M. Std. Err. (emu)'
.
magdata_dat = MagentroData('magdata.dat')
"[Data]" tag detected, assuming QD .dat file.
The sample mass was determined from the QD .dat file: 0.1
Tip
If the column names, delimiter, or sample mass format are different, these can be set manually as
described below. The **read_csv_kwargs
keyword arguments will also be applied
to the delimited data in .dat
files.
Delimited files#
A delimited input file may be indicated by passing qd_dat = False
to the constructor.
Additionally, different column names can be specified, including the absence of a comment
or error column. Here, the comment column is excluded.
magdata_csv = MagentroData(
'magdata.csv', qd_dat=False,
comment_col=None, T='T', H='H', M='M', M_err='M_err'
)
Since the sample mass is not present in delimited files, it is set to the default of 1.0. This can be changed after instantiation (per-mass columns are updated accordingly):
print(magdata_csv.sample_mass)
magdata_csv.sample_mass = 0.1
print(magdata_csv.sample_mass)
1.0
0.1
The mass can also be provided in the constructor itself:
magdata_csv = MagentroData(
'magdata.csv', qd_dat=False,
comment_col=None, T='T', H='H', M='M', M_err='M_err',
sample_mass=0.1
)
magdata_csv.sample_mass
0.1
Note
It is not strictly necessary to set qd_dat = False
. The delimited data will be read correctly,
though a warning will be printed, and of course the sample mass must still be set manually.
Tip
Delimited data is read using pandas.read_csv()
. Keyword arguments can be passed to
pandas.read_csv()
as additional keyword arguments to the constructor.
For example, if the file is tab-delimited, magdata_tab = MagentroData(..., sep='\t')
.
These will be ignored if the input is a DataFrame
, described in the next section.
DataFrames#
If data is in a DataFrame
, perhaps because preprocessing was required, the procedure
is exactly the same as for delimited files. Here, we create a new
MagentroData
instance using the raw data from magdata_csv
:
magdata_df = MagentroData(
magdata_csv.raw_df,
comment_col=None, T='T', H='H', M='M', M_err='M_err',
sample_mass=0.1
)
The column labels and sample mass are specified as before. The qd_dat
parameter is ignored
because a DataFrame
is detected, so it need not be included.
Missing values#
If a comment column label is supplied, any row in which the comment column has a non-NaN
value is
dropped. (i.e., rows with comments are removed, since comments in QD .dat
output files indicate
measurement problems.)
Additionally, any row containing a missing value in the temperature, field, or moment column is dropped. If a moment error column is supplied, any row with a missing value or a value equal to zero in the error column will be dropped.
Viewing data#
Raw, converted (SI units), and processed (smoothed) data is available through the
attributes raw_df
, converted_df
, and processed_df
. For example:
magdata_dat.raw_df
T | H | M | M_err | M_per_mass | M_per_mass_err | dM_dT | Delta_SM | |
---|---|---|---|---|---|---|---|---|
0 | 1.000000 | 20.000001 | 0.002023 | 0.00005 | 20.232376 | 0.5 | NaN | NaN |
1 | 2.000000 | 20.000000 | 0.001977 | 0.00005 | 19.770351 | 0.5 | NaN | NaN |
2 | 3.000001 | 19.999998 | 0.001969 | 0.00005 | 19.691176 | 0.5 | NaN | NaN |
3 | 4.000000 | 19.999999 | 0.001970 | 0.00005 | 19.703463 | 0.5 | NaN | NaN |
4 | 4.999999 | 20.000001 | 0.001886 | 0.00005 | 18.861315 | 0.5 | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... |
495 | 95.999998 | 100.000002 | 0.001403 | 0.00005 | 14.030179 | 0.5 | NaN | NaN |
496 | 97.000000 | 99.999999 | 0.001446 | 0.00005 | 14.458498 | 0.5 | NaN | NaN |
497 | 98.000001 | 100.000001 | 0.001324 | 0.00005 | 13.244919 | 0.5 | NaN | NaN |
498 | 98.999999 | 99.999999 | 0.001184 | 0.00005 | 11.835327 | 0.5 | NaN | NaN |
499 | 100.000000 | 100.000000 | 0.001172 | 0.00005 | 11.722362 | 0.5 | NaN | NaN |
500 rows × 8 columns
Each DataFrame
attribute contains columns corresponding to temperature, magnetic field strength,
moment, moment error, moment per mass, moment per mass error, moment derivative with respect to
temperature, and magnetic entropy.
Units can be viewed as a second header level by appending _with_units
to any of these attributes.
magdata_dat.raw_df_with_units
T | H | M | M_err | M_per_mass | M_per_mass_err | dM_dT | Delta_SM | |
---|---|---|---|---|---|---|---|---|
unit | K | Oe | emu | emu | emu/g | emu/g | cal/K/Oe/g | cal/K/g |
0 | 1.000000 | 20.000001 | 0.002023 | 0.00005 | 20.232376 | 0.5 | NaN | NaN |
1 | 2.000000 | 20.000000 | 0.001977 | 0.00005 | 19.770351 | 0.5 | NaN | NaN |
2 | 3.000001 | 19.999998 | 0.001969 | 0.00005 | 19.691176 | 0.5 | NaN | NaN |
3 | 4.000000 | 19.999999 | 0.001970 | 0.00005 | 19.703463 | 0.5 | NaN | NaN |
4 | 4.999999 | 20.000001 | 0.001886 | 0.00005 | 18.861315 | 0.5 | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... |
495 | 95.999998 | 100.000002 | 0.001403 | 0.00005 | 14.030179 | 0.5 | NaN | NaN |
496 | 97.000000 | 99.999999 | 0.001446 | 0.00005 | 14.458498 | 0.5 | NaN | NaN |
497 | 98.000001 | 100.000001 | 0.001324 | 0.00005 | 13.244919 | 0.5 | NaN | NaN |
498 | 98.999999 | 99.999999 | 0.001184 | 0.00005 | 11.835327 | 0.5 | NaN | NaN |
499 | 100.000000 | 100.000000 | 0.001172 | 0.00005 | 11.722362 | 0.5 | NaN | NaN |
500 rows × 8 columns
Similarly for sample mass:
magdata_dat.sample_mass
0.1
magdata_dat.sample_mass_with_units
(0.1, 'mg')
Tip
All DataFrame
attributes are immutable and return copies of the internal instance attributes.
If repeated access is required, for example to a DataFrame
’s columns, it is best to first save
the DataFrame
as a local variable to avoid repeatedly copying large amounts of data.
Don’t do this:
col_means = [magdata.raw_df['T'].mean(), magdata.raw_df['H'].mean(), magdata.raw_df['M'].mean()]
Instead, do this:
raw_df = magdata.raw_df
col_means = [raw_df['T'].mean(), raw_df['H'].mean(), raw_df['M'].mean()]
Simulating data#
The class method sim_data()
can be used to generate data for testing and examples.
A decreasing logistic function with a Gaussian “bump” whose center depends on the field strength
is used to “simulate” noisy data. (There are quotation marks because the function has no
physical significance.)
The following code returns a DataFrame
with columns 'T'
, 'H'
, 'M'
, and 'M_err'
.
This data is the same data found in the magdata.dat
and magdata.csv
files used in
these examples.
import numpy as np
sim_df = MagentroData.sim_data(
temps=np.linspace(1., 100., 100),
fields=np.linspace(20., 100., 5),
sigma_m=5e-5,
random_seed=0
)
Units and presets#
It is possible to set presets and units during instantiation. See Processing Data and Units and Conversions for additional information.