Reading Data#

from magentropy import MagentroData

Constructor#

Data must be read when a MagentroData object is instantiated via the constructor. Supported input formats are Quantum Design .dat data files (default), delimited files, and DataFrames.

QD data files#

The default arguments are configured for QD .dat files. These files are expected to consist of:

  1. A header section with metadata. In particular, the sample mass should be given as INFO,<sample_mass>,SAMPLE_MASS, where <sample_mass> is replaced by a decimal number.

  2. A \n[Data]\n tag separating the header section from the data. (Here, \n indicates a newline.)

  3. The delimited data. The default separator is ','.

  4. Data columns with names 'Comment', 'Temperature (K)', 'Magnetic Field (Oe)', 'Moment (emu)', and 'M. Std. Err. (emu)'.

magdata_dat = MagentroData('magdata.dat')
"[Data]" tag detected, assuming QD .dat file.
The sample mass was determined from the QD .dat file: 0.1

Tip

If the column names, delimiter, or sample mass format are different, these can be set manually as described below. The **read_csv_kwargs keyword arguments will also be applied to the delimited data in .dat files.

Delimited files#

A delimited input file may be indicated by passing qd_dat = False to the constructor. Additionally, different column names can be specified, including the absence of a comment or error column. Here, the comment column is excluded.

magdata_csv = MagentroData(
    'magdata.csv', qd_dat=False,
    comment_col=None, T='T', H='H', M='M', M_err='M_err'
)

Since the sample mass is not present in delimited files, it is set to the default of 1.0. This can be changed after instantiation (per-mass columns are updated accordingly):

print(magdata_csv.sample_mass)
magdata_csv.sample_mass = 0.1
print(magdata_csv.sample_mass)
1.0
0.1

The mass can also be provided in the constructor itself:

magdata_csv = MagentroData(
    'magdata.csv', qd_dat=False,
    comment_col=None, T='T', H='H', M='M', M_err='M_err',
    sample_mass=0.1
)
magdata_csv.sample_mass
0.1

Note

It is not strictly necessary to set qd_dat = False. The delimited data will be read correctly, though a warning will be printed, and of course the sample mass must still be set manually.

Tip

Delimited data is read using pandas.read_csv(). Keyword arguments can be passed to pandas.read_csv() as additional keyword arguments to the constructor. For example, if the file is tab-delimited, magdata_tab = MagentroData(..., sep='\t'). These will be ignored if the input is a DataFrame, described in the next section.

DataFrames#

If data is in a DataFrame, perhaps because preprocessing was required, the procedure is exactly the same as for delimited files. Here, we create a new MagentroData instance using the raw data from magdata_csv:

magdata_df = MagentroData(
    magdata_csv.raw_df,
    comment_col=None, T='T', H='H', M='M', M_err='M_err',
    sample_mass=0.1
)

The column labels and sample mass are specified as before. The qd_dat parameter is ignored because a DataFrame is detected, so it need not be included.

Missing values#

If a comment column label is supplied, any row in which the comment column has a non-NaN value is dropped. (i.e., rows with comments are removed, since comments in QD .dat output files indicate measurement problems.)

Additionally, any row containing a missing value in the temperature, field, or moment column is dropped. If a moment error column is supplied, any row with a missing value or a value equal to zero in the error column will be dropped.

Viewing data#

Raw, converted (SI units), and processed (smoothed) data is available through the attributes raw_df, converted_df, and processed_df. For example:

magdata_dat.raw_df
T H M M_err M_per_mass M_per_mass_err dM_dT Delta_SM
0 1.000000 20.000001 0.002023 0.00005 20.232376 0.5 NaN NaN
1 2.000000 20.000000 0.001977 0.00005 19.770351 0.5 NaN NaN
2 3.000001 19.999998 0.001969 0.00005 19.691176 0.5 NaN NaN
3 4.000000 19.999999 0.001970 0.00005 19.703463 0.5 NaN NaN
4 4.999999 20.000001 0.001886 0.00005 18.861315 0.5 NaN NaN
... ... ... ... ... ... ... ... ...
495 95.999998 100.000002 0.001403 0.00005 14.030179 0.5 NaN NaN
496 97.000000 99.999999 0.001446 0.00005 14.458498 0.5 NaN NaN
497 98.000001 100.000001 0.001324 0.00005 13.244919 0.5 NaN NaN
498 98.999999 99.999999 0.001184 0.00005 11.835327 0.5 NaN NaN
499 100.000000 100.000000 0.001172 0.00005 11.722362 0.5 NaN NaN

500 rows × 8 columns

Each DataFrame attribute contains columns corresponding to temperature, magnetic field strength, moment, moment error, moment per mass, moment per mass error, moment derivative with respect to temperature, and magnetic entropy.

Units can be viewed as a second header level by appending _with_units to any of these attributes.

magdata_dat.raw_df_with_units
T H M M_err M_per_mass M_per_mass_err dM_dT Delta_SM
unit K Oe emu emu emu/g emu/g cal/K/Oe/g cal/K/g
0 1.000000 20.000001 0.002023 0.00005 20.232376 0.5 NaN NaN
1 2.000000 20.000000 0.001977 0.00005 19.770351 0.5 NaN NaN
2 3.000001 19.999998 0.001969 0.00005 19.691176 0.5 NaN NaN
3 4.000000 19.999999 0.001970 0.00005 19.703463 0.5 NaN NaN
4 4.999999 20.000001 0.001886 0.00005 18.861315 0.5 NaN NaN
... ... ... ... ... ... ... ... ...
495 95.999998 100.000002 0.001403 0.00005 14.030179 0.5 NaN NaN
496 97.000000 99.999999 0.001446 0.00005 14.458498 0.5 NaN NaN
497 98.000001 100.000001 0.001324 0.00005 13.244919 0.5 NaN NaN
498 98.999999 99.999999 0.001184 0.00005 11.835327 0.5 NaN NaN
499 100.000000 100.000000 0.001172 0.00005 11.722362 0.5 NaN NaN

500 rows × 8 columns

Similarly for sample mass:

magdata_dat.sample_mass
0.1
magdata_dat.sample_mass_with_units
(0.1, 'mg')

Tip

All DataFrame attributes are immutable and return copies of the internal instance attributes. If repeated access is required, for example to a DataFrame’s columns, it is best to first save the DataFrame as a local variable to avoid repeatedly copying large amounts of data.

Don’t do this:

col_means = [magdata.raw_df['T'].mean(), magdata.raw_df['H'].mean(), magdata.raw_df['M'].mean()]

Instead, do this:

raw_df = magdata.raw_df
col_means = [raw_df['T'].mean(), raw_df['H'].mean(), raw_df['M'].mean()]

Simulating data#

The class method sim_data() can be used to generate data for testing and examples. A decreasing logistic function with a Gaussian “bump” whose center depends on the field strength is used to “simulate” noisy data. (There are quotation marks because the function has no physical significance.)

The following code returns a DataFrame with columns 'T', 'H', 'M', and 'M_err'. This data is the same data found in the magdata.dat and magdata.csv files used in these examples.

import numpy as np

sim_df = MagentroData.sim_data(
    temps=np.linspace(1., 100., 100),
    fields=np.linspace(20., 100., 5),
    sigma_m=5e-5,
    random_seed=0
)

Units and presets#

It is possible to set presets and units during instantiation. See Processing Data and Units and Conversions for additional information.