Reading Data#

from magentropy import MagentroData

Constructor#

Data must be read when a MagentroData object is instantiated via the constructor. Supported input formats are Quantum Design .dat data files (default), delimited files, and DataFrames.

QD data files#

The default arguments are configured for QD .dat files. These files are expected to consist of:

A header section with metadata. In particular, the sample mass should be given as INFO,<sample_mass>,SAMPLE_MASS, where <sample_mass> is replaced by a decimal number.
A \n[Data]\n tag separating the header section from the data. (Here, \n indicates a newline.)
The delimited data. The default separator is ','.
Data columns with names 'Comment', 'Temperature (K)', 'Magnetic Field (Oe)', 'Moment (emu)', and 'M. Std. Err. (emu)'.

magdata_dat = MagentroData('magdata.dat')

"[Data]" tag detected, assuming QD .dat file.
The sample mass was determined from the QD .dat file: 0.1

Tip

If the column names, delimiter, or sample mass format are different, these can be set manually as described below. The **read_csv_kwargs keyword arguments will also be applied to the delimited data in .dat files.

Delimited files#

A delimited input file may be indicated by passing qd_dat = False to the constructor. Additionally, different column names can be specified, including the absence of a comment or error column. Here, the comment column is excluded.

magdata_csv = MagentroData(
    'magdata.csv', qd_dat=False,
    comment_col=None, T='T', H='H', M='M', M_err='M_err'
)

Since the sample mass is not present in delimited files, it is set to the default of 1.0. This can be changed after instantiation (per-mass columns are updated accordingly):

print(magdata_csv.sample_mass)
magdata_csv.sample_mass = 0.1
print(magdata_csv.sample_mass)

1.0
0.1

The mass can also be provided in the constructor itself:

magdata_csv = MagentroData(
    'magdata.csv', qd_dat=False,
    comment_col=None, T='T', H='H', M='M', M_err='M_err',
    sample_mass=0.1
)
magdata_csv.sample_mass

0.1

Note

It is not strictly necessary to set qd_dat = False. The delimited data will be read correctly, though a warning will be printed, and of course the sample mass must still be set manually.

Tip

Delimited data is read using pandas.read_csv(). Keyword arguments can be passed to pandas.read_csv() as additional keyword arguments to the constructor. For example, if the file is tab-delimited, magdata_tab = MagentroData(..., sep='\t'). These will be ignored if the input is a DataFrame, described in the next section.

DataFrames#

If data is in a DataFrame, perhaps because preprocessing was required, the procedure is exactly the same as for delimited files. Here, we create a new MagentroData instance using the raw data from magdata_csv:

magdata_df = MagentroData(
    magdata_csv.raw_df,
    comment_col=None, T='T', H='H', M='M', M_err='M_err',
    sample_mass=0.1
)

The column labels and sample mass are specified as before. The qd_dat parameter is ignored because a DataFrame is detected, so it need not be included.

Missing values#

If a comment column label is supplied, any row in which the comment column has a non-NaN value is dropped. (i.e., rows with comments are removed, since comments in QD .dat output files indicate measurement problems.)

Additionally, any row containing a missing value in the temperature, field, or moment column is dropped. If a moment error column is supplied, any row with a missing value or a value equal to zero in the error column will be dropped.

Viewing data#

Raw, converted (SI units), and processed (smoothed) data is available through the attributes raw_df, converted_df, and processed_df. For example:

magdata_dat.raw_df

	T	H	M	M_err	M_per_mass	M_per_mass_err	dM_dT	Delta_SM
0	1.000000	20.000001	0.002023	0.00005	20.232376	0.5	NaN	NaN
1	2.000000	20.000000	0.001977	0.00005	19.770351	0.5	NaN	NaN
2	3.000001	19.999998	0.001969	0.00005	19.691176	0.5	NaN	NaN
3	4.000000	19.999999	0.001970	0.00005	19.703463	0.5	NaN	NaN
4	4.999999	20.000001	0.001886	0.00005	18.861315	0.5	NaN	NaN
...	...	...	...	...	...	...	...	...
495	95.999998	100.000002	0.001403	0.00005	14.030179	0.5	NaN	NaN
496	97.000000	99.999999	0.001446	0.00005	14.458498	0.5	NaN	NaN
497	98.000001	100.000001	0.001324	0.00005	13.244919	0.5	NaN	NaN
498	98.999999	99.999999	0.001184	0.00005	11.835327	0.5	NaN	NaN
499	100.000000	100.000000	0.001172	0.00005	11.722362	0.5	NaN	NaN

500 rows × 8 columns

Each DataFrame attribute contains columns corresponding to temperature, magnetic field strength, moment, moment error, moment per mass, moment per mass error, moment derivative with respect to temperature, and magnetic entropy.

Units can be viewed as a second header level by appending _with_units to any of these attributes.

magdata_dat.raw_df_with_units

	T	H	M	M_err	M_per_mass	M_per_mass_err	dM_dT	Delta_SM
unit	K	Oe	emu	emu	emu/g	emu/g	cal/K/Oe/g	cal/K/g
0	1.000000	20.000001	0.002023	0.00005	20.232376	0.5	NaN	NaN
1	2.000000	20.000000	0.001977	0.00005	19.770351	0.5	NaN	NaN
2	3.000001	19.999998	0.001969	0.00005	19.691176	0.5	NaN	NaN
3	4.000000	19.999999	0.001970	0.00005	19.703463	0.5	NaN	NaN
4	4.999999	20.000001	0.001886	0.00005	18.861315	0.5	NaN	NaN
...	...	...	...	...	...	...	...	...
495	95.999998	100.000002	0.001403	0.00005	14.030179	0.5	NaN	NaN
496	97.000000	99.999999	0.001446	0.00005	14.458498	0.5	NaN	NaN
497	98.000001	100.000001	0.001324	0.00005	13.244919	0.5	NaN	NaN
498	98.999999	99.999999	0.001184	0.00005	11.835327	0.5	NaN	NaN
499	100.000000	100.000000	0.001172	0.00005	11.722362	0.5	NaN	NaN

500 rows × 8 columns

Similarly for sample mass:

magdata_dat.sample_mass

0.1

magdata_dat.sample_mass_with_units

(0.1, 'mg')

Tip

All DataFrame attributes are immutable and return copies of the internal instance attributes. If repeated access is required, for example to a DataFrame’s columns, it is best to first save the DataFrame as a local variable to avoid repeatedly copying large amounts of data.

Don’t do this:

col_means = [magdata.raw_df['T'].mean(), magdata.raw_df['H'].mean(), magdata.raw_df['M'].mean()]

Instead, do this:

raw_df = magdata.raw_df
col_means = [raw_df['T'].mean(), raw_df['H'].mean(), raw_df['M'].mean()]

Simulating data#

The class method sim_data() can be used to generate data for testing and examples. A decreasing logistic function with a Gaussian “bump” whose center depends on the field strength is used to “simulate” noisy data. (There are quotation marks because the function has no physical significance.)

The following code returns a DataFrame with columns 'T', 'H', 'M', and 'M_err'. This data is the same data found in the magdata.dat and magdata.csv files used in these examples.

import numpy as np

sim_df = MagentroData.sim_data(
    temps=np.linspace(1., 100., 100),
    fields=np.linspace(20., 100., 5),
    sigma_m=5e-5,
    random_seed=0
)

Units and presets#

It is possible to set presets and units during instantiation. See Processing Data and Units and Conversions for additional information.