Difference between revisions of "PMF-AMS Analysis Guide"

From Jimenez Group Wiki
Jump to: navigation, search
(Creating the Data and Error Matrices (Step 0))
(View PMF Analysis Results *Step 2*)
 
(137 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
A shortcut to this page is [http://tinyurl.com/PMF-guide http://tinyurl.com/PMF-guide]
 
A shortcut to this page is [http://tinyurl.com/PMF-guide http://tinyurl.com/PMF-guide]
  
The PMF Evaluation Tool (PET) was described in [http://cires.colorado.edu/jimenez/ams-papers.html#Database_PMFTool Ulbrich et al., ACP 2009].  Please cite the tool with this work in publications in which you have used the PET.  The PET consists of several [http://www.wavemetrics.com Igor] procedure files (ipfs).  This wiki serves as the help and documentation for the software.  To run PMF with the panel, the PMF executable and associated files, accessed separately, are required (see Section 3, [[#Installing PMF with Igor| Installing PMF with Igor]]).   
+
The PMF Evaluation Tool (PET) was described in [http://cires.colorado.edu/jimenez/ams-papers.html#Database_PMFTool Ulbrich et al., ACP 2009].  Please cite the tool with this work in publications in which you have used the PET.  The PET consists of several [http://www.wavemetrics.com Igor] procedure files (ipfs).  This wiki serves as the help and documentation for the software.  To run PMF with the panel, the PMF executable and associated files, accessed separately, are required (see Section 2, [[#Installing PMF with Igor| Installing PMF with Igor]]).   
  
The ipfs were written by Ingrid Ulbrich and Donna Sueper (Jimenez Group, University of Colorado, Boulder) and Greg Brinkman (Hannigan Group, University of Colorado, Boulder).  Questions about this code can be addressed to [mailto:support@aerodyne.com?Subject=Igor-PMF%20Question Donna].
+
The ipfs were written by Ingrid Ulbrich (formerly of the Jimenez Group, University of Colorado, Boulder) and Donna Sueper (Aerodyne Research and Jimenez Group, University of Colorado, Boulder) and Greg Brinkman (Hannigan Group, University of Colorado, Boulder).  Questions about this code can be addressed to [mailto:support@aerodyne.com?Subject=Igor-PMF%20Question Donna].
  
PMF (Positive Matrix Factorization) was developed by Dr. P. Paatero (Dept. of Physics, University of Helsinki).  Links to Paatero's PMF documentation, including an order form for purchasing PMF from Paatero, and many PMF method papers can be found in Section 9, [[#Other Resources |Other Resources]].
+
PMF (Positive Matrix Factorization) was developed by Dr. P. Paatero (retired, Dept. of Physics, University of Helsinki).  Links to Paatero's PMF documentation and many PMF method papers can be found in Section 9, [[#Other Resources |Other Resources]].
  
This Igor toolkit was intended for use in analyzing AMS data, but there are only few assumptions in the toolkit relating to AMS-type data.  Some information on ways to create the necessary waves and matrices from non-AMS data are found in [[#For non-AMS Users|Section 4.1.4, Creating the Data and Error Matrices for non-AMS Users]].
+
Originally, the PMF executable and license files could be purchased by sending an email to Paatero. Now one can freely download the pmf executable files below, but a license file is needed for it to run. Please email [mailto:support@aerodyne.com?Subject=PMF%20Operating%20System%20Issue Donna] for credentials to download the executables. The Swiss company Datalystica is now the sole official seller of the multi-linear engine (ME-2) solver package that contains a license that is used for the PMF executable. Once the ME-2 package is downloaded, copy and rename the ME2key.key file to pmf2key.key and place this in the same folder as the PMF executable files. Please go to [http://datalystica.com/me-2-solver/ https://datalystica.com/me-2-solver/] to purchase ME-2.  One license file can be used for all in the same research group.
 +
 
 +
This Igor toolkit was intended for use in analyzing AMS data, but there are only few assumptions in the toolkit relating to AMS-type data.  Some information on ways to create the necessary waves and matrices from non-AMS data are found in Section 3.2.4 below.
  
 
== A Message to Contributors ==
 
== A Message to Contributors ==
Line 19: Line 21:
 
=== PMF and Operating Systems ===
 
=== PMF and Operating Systems ===
  
The PMF executable is compiled only for Windows/DOS.  The PET has generally been tested using Windows 11, 10 and older operating systems Windows XP and Windows Vista. It should run well on either platform; please contact [mailto:sueper@colorado.edu?Subject=PMF%20Operating%20System%20Issue Donna] if you suspect you have operating system problems with the PET. On Windows machines, using OneDrive to access the PMF executable file within PET has been problematic for some users.  If possible, it is recommended access the PMF executable within PET from a local drive.
+
The PMF executable is compiled only for Windows/DOS; it has been used with Windows 11, 10 and older operating systems Windows XP and Windows Vista. Using OneDrive to access the PMF executable file within PET has been problematic for some users.  If possible, it is recommended that users access the PMF executable and the Igor template experiment PET from a local drive.
  
 
For users with Macs, two options have proven successful.  One method is to execute PMF on a Windows computer and then analyze the experiment on a Macintosh.  The other method is to execute PMF under a Windows emulator on a Mac.  Note that in this latter case, you may need to have the PMF executable somewhere in the c:\ directory; this isn't necessary when running Windows on a PC.
 
For users with Macs, two options have proven successful.  One method is to execute PMF on a Windows computer and then analyze the experiment on a Macintosh.  The other method is to execute PMF under a Windows emulator on a Mac.  Note that in this latter case, you may need to have the PMF executable somewhere in the c:\ directory; this isn't necessary when running Windows on a PC.
Line 26: Line 28:
  
 
==== Download the PMF executable package ====
 
==== Download the PMF executable package ====
The file pmf2wopt.exe is the PMF solver. It is downloadable via the PMF executable package below for free; however a username and password is required.  Email [mailto:sueper@colorado.edu?Subject=PMF%20Operating%20System%20Issue Donna] for these credentials. An older executable file named pmf2wtst.exe, also comes with this package but is not typically used.  The PMF executable package will also contain an initialization, or configuration file named mypmft.ini  This pmf initialization file contains parameters that is used by PET but is not explicitly needed for the executable to run. For every call to the PMF executable PET will generate an initialization file based on mypmft.ini.  This file, that is specific to  that is named imupmf.ini
+
The file pmf2wopt.exe is the PMF solver. It is downloadable via the PMF executable package below for free; however a username and password is required.  Email [mailto:sueper@colorado.edu?Subject=PMF%20Operating%20System%20Issue Donna] for these credentials. An older executable file named pmf2wtst.exe also comes with this package but is not typically used.  The PMF executable package will also contain a initialization/configuration file named mypmft.ini. This pmf initialization file contains parameters that is used by PET but is not explicitly needed for the executable to run. For every call to the PMF executable PET will generate a 'working' initialization file based on mypmft.ini.  This file, named imupmf.ini, is specific to the PET package.
PMF executable package
 
  
 
PMF executable file step:
 
PMF executable file step:
Line 38: Line 39:
 
PMF license file steps:  
 
PMF license file steps:  
  
Step 1: Purchase and download ME-2 from Datalystica
+
Step 1: Purchase and download ME-2 solver from [https://datalystica.com/me-2-solver/ Datalystica]
  
 
Step 2: Copy the file named me2key.key into the PMF executable folder and rename as pmf2key.key
 
Step 2: Copy the file named me2key.key into the PMF executable folder and rename as pmf2key.key
  
 
==== Download PET Igor template experiment or zipped file of ipfs (ipf = Igor Procedure File) ====
 
==== Download PET Igor template experiment or zipped file of ipfs (ipf = Igor Procedure File) ====
The Igor interface for viewing and evaluating PMF solutions is called PET (PMF Evaluation Tool).
+
The Igor interface for viewing and evaluating PMF solutions is called PET (PMF Evaluation Tool).  
  
 
PET step:  
 
PET step:  
 
Step 1: [http://cires.colorado.edu/jimenez-group/PMFResources/Downloads/IgorPMF_Template_v3_08E.pxt Download the PET Igor template file (IgorPMF_Template_v*.pxt)] and save locally on the computer.  It is not recommended that the Igor template experiment reside in the same folder as the PMF executable package.
 
Step 1: [http://cires.colorado.edu/jimenez-group/PMFResources/Downloads/IgorPMF_Template_v3_08E.pxt Download the PET Igor template file (IgorPMF_Template_v*.pxt)] and save locally on the computer.  It is not recommended that the Igor template experiment reside in the same folder as the PMF executable package.
 +
 +
From the top most Igor menu there is a section named "PMF".  From this menu there are these options:
 +
*"Perform PMF analysis *Step 1*"
 +
*"View PMF analysis results *Step 2*"
 +
*"Small Pop-Plots Panel for *Step 2*",
 +
*"Compare PMF results with external factors *Step 3*"
 +
 +
Selecting "Perform PMF analysis" will bring up a panel from which you can preform data preparation steps and select data and conditions for running PMF.
 +
 +
Selecting "View PMF analysis results" will bring up a panel from which you can select waves that were previously input into PMF. Once choices have been selected the large PET panel with multiple graphs showing results and diagnostics will appear.  Within one PET experiment, it is possible to run PMF either on multiple data sets or under different conditions. This panel allows one to select the PMF results from possible different data sets or conditions.
 +
 +
Selecting "Small Pop-Plots Panel" will generate a mini version of the big PMF results panel. This option was initially intended for users with small screen sizes.
 +
 +
Selecting "Compare PMF results with external factors *Step 3*" will generate a panel from which one can compare PMF results with other data that was not included in the PMF analysis. For example, if one had some gas phase or meteorological measurements with the same time stamps as the input data, this option will show a panel of correlation plots of PMF factors with this 'external' data.
  
 
==== Set up options ====
 
==== Set up options ====
  
It is recommended that users keep the PET Igor template experiment either within a Wavemetrics folder/subfolder or adjacent to the PMF executable package folder. Once PET data preparation steps have been completed, the user will be prompted to save the template experiment with a unique name.
+
It is recommended that users keep the PET Igor template experiment either within a Wavemetrics folder/subfolder or adjacent to the PMF executable package folder. Once PET data preparation steps have been completed and the gold "Being PMF..." button is pushed, the user will be prompted to save the template experiment with a unique name.
 +
 
 +
One may have many copies of the PMF executable packages on one computer. A file organization suggestion is to set up multiple copies like so: C:\Users\*\Documents\PMF\PMF Executable1, C:\Users\*\Documents\PMF\PMF Executable2 , etc, each folder with a copy of the pmf2key.key file. One can have multiple versions of Igor open at once; this allows one to run multiple copies of PMF (from different Igor experiments) at the same time.
  
One may have many copies of the PMF executable packages on one computerOne file organization suggestion is to set up multiple copies like so: C:\Users\*\Documents\PMF\PMF Executable1, C:\Users\*\Documents\PMF\PMF Executable2 , etc, each folder with a copy of the pmf2key.key file. One can have multiple versions of Igor open at once; this allows one to run multiple copies of PMF (from different Igor experiments) at the same time.
+
===== Upgrade PET version options =====
 +
One can upgrade versions of the PET software without having to redo any PMF analysis stepsOnce can download the latest PET ipfs (Igor Procedure Files) below and load the latest versions of the ipfs into an existing PET experiment and then kill all the old ipfs. You may want to kill and recreate PET panels such as the "View Results" and "Setup" panels to take advantage of new features.
 
<!--
 
<!--
 
(The BareBones version is currently out of date.  You can skip the BareBones version and install the full version of PMF.)
 
(The BareBones version is currently out of date.  You can skip the BareBones version and install the full version of PMF.)
Line 118: Line 136:
 
0.  Make sure that the path to the executable is correct.  If red text appears in the section 'PMF Executable File Path' then something is wrong. Look in the folder you created with the PMF2wtst.exe or pmf2wopt.exe file. If necessary go to the tab entitled "PMF defaults and advanced options' and reselect the executable file; then go to the run tab, 'PMF Executable File Path' and reselect the folder with your chosen .exe file.
 
0.  Make sure that the path to the executable is correct.  If red text appears in the section 'PMF Executable File Path' then something is wrong. Look in the folder you created with the PMF2wtst.exe or pmf2wopt.exe file. If necessary go to the tab entitled "PMF defaults and advanced options' and reselect the executable file; then go to the run tab, 'PMF Executable File Path' and reselect the folder with your chosen .exe file.
  
1.  Look in the PMF executable folder that contains PMF2wtst.exe (and/or pmf2wopt.exe) and check for the existence of the files
+
1.  Look in the PMF executable folder that contains pmf2wopt.exe (and/or pmf2wtst.exe) and check for the existence of the files
 
   Matrix.dat and StdDev.dat
 
   Matrix.dat and StdDev.dat
  
Line 125: Line 143:
 
:If these files do not exist, Igor was not able to access the correct folder. Check the existence and spelling of the license file pmf2key.key
 
:If these files do not exist, Igor was not able to access the correct folder. Check the existence and spelling of the license file pmf2key.key
  
2.  Look in the PMF executable folder that contains PMF2wtst.exe for the existence of the file
+
2.  Look in the PMF executable folder that contains PMF2*.exe for the existence of the file
  
 
   PMF2.LOG
 
   PMF2.LOG
Line 153: Line 171:
 
:*Make sure that this folder contains the file imupmf.ini, provided with the PMF executable package.
 
:*Make sure that this folder contains the file imupmf.ini, provided with the PMF executable package.
  
:If the file imupmf.ini already exists in the same folder as the PMF2wtst.exe file, continue with Step 3.
+
:If the file imupmf.ini already exists in the same folder as the PMF2*.exe file, continue with Step 3.
  
 
2b)
 
2b)
Line 174: Line 192:
 
:If this file does not appear, Igor was not able to write this file to your C:\ drive.  This might be due to high security settings on your computer.  You should create a text file with this name (NOT runpmf.bat.txt) and choose to '''edit''' it (not open it) with a text editor (e.g., Notepad, WordPad, Emacs, etc.).  The file should contain the lines
 
:If this file does not appear, Igor was not able to write this file to your C:\ drive.  This might be due to high security settings on your computer.  You should create a text file with this name (NOT runpmf.bat.txt) and choose to '''edit''' it (not open it) with a text editor (e.g., Notepad, WordPad, Emacs, etc.).  The file should contain the lines
  
   cd C:\Documents and Settings\Ingrid Ulbrich\Desktop\
+
   cd C:\Documents and Settings\*\Desktop\
   pmf2wtst imupmf   
+
   pmf2wopt imupmf   
  
:'''NOTE''' that you must change the path in this example to the path to the folder where you have put the file PMF2wtst.exe!
+
:'''NOTE''' that you must change the path in this example to the path to the folder where you have put the file PMF2*.exe!
  
 
3a) Execute runpmf.bat by double-clicking on its icon.  You should see the black DOS window pop up, scroll output, and close again.   
 
3a) Execute runpmf.bat by double-clicking on its icon.  You should see the black DOS window pop up, scroll output, and close again.   
Line 191: Line 209:
 
   Start -> Programs -> Accessories -> Command Prompt
 
   Start -> Programs -> Accessories -> Command Prompt
  
:Change from the path displayed in the prompt to the folder where you put the PMF2wtst.exe file.  Use the command
+
:Change from the path displayed in the prompt to the folder where you put the PMF2*.exe file.  Use the command
  
 
   cd
 
   cd
  
:to change directories (e.g.,
+
:to change directories (e.g.)
  
 
   cd Desktop\PMF
 
   cd Desktop\PMF
  
:).  You can change one directory at a time, or several at a time as shown in the example.  To move up 1 directory, use
+
:You can change one directory at a time, or several at a time as shown in the example.  To move up 1 directory, use
  
 
   cd ..
 
   cd ..
Line 205: Line 223:
 
:Be sure that the folder contains the three files  
 
:Be sure that the folder contains the three files  
  
   PMF2wtst.exe
+
   pmf2wopt.exe and/or PMF2wtst.exe
 
   imupmf.ini
 
   imupmf.ini
 
   pmf2key.key
 
   pmf2key.key
Line 214: Line 232:
  
 
:Then type this command at the prompt to run PMF:
 
:Then type this command at the prompt to run PMF:
 +
 +
  pmf2wopt imupmf
 +
 +
or
  
 
   pmf2wtst imupmf
 
   pmf2wtst imupmf
Line 224: Line 246:
 
   2 rank1 step chi2=  7411.7    Penalty=  1.4084E+04 Flags GF
 
   2 rank1 step chi2=  7411.7    Penalty=  1.4084E+04 Flags GF
  
:PMF ran successfully here. D''elete the file PMF2.LOG'' and in the Igor experiment, press the second button on the panel to whether PMF runs successfully from Igor.
+
:PMF ran successfully here. Make sure to close this file after you are finished examining it so that the next time PMF runs it won't complain about writing to this file.  
  
 
:If the output does not contain these lines, go back to Step 2 of this section to examine errors that might be reported by PMF in the file PMF2.LOG.
 
:If the output does not contain these lines, go back to Step 2 of this section to examine errors that might be reported by PMF in the file PMF2.LOG.
  
== Creating the Data and Error Matrices (Step 0) ==
+
== Data Preparation *Step 1*==
  
=== Creating the Matrices, by Instrument (Software) type ===
+
=== Exporting the Data by Instrument, Analysis Software Type into PET ===
  
* PMF analysis requires two 2-dimensional matrices of the same dimensions, a data matrix and a matrix corresponding to uncertainties, which is typically a matrix corresponding to errors. Both of these matrices must not have any nans (Not a Number) values. The data matrix may contain negative values (PMF solutions will always be >=0) and the uncertainty matrix must have all values >=0).
+
* PMF analysis requires two 2-dimensional matrices of the same dimensions, a data matrix and a matrix corresponding to uncertainties, which is typically the calculated errors. Both of these matrices must not have any nans (Not a Number) values. In the data preparation section of PET there is a step for the automated removal of rows containing all nans and columns which contain all zeros or nans. The data matrix may contain negative values (PMF solutions will always be >=0) and the uncertainty matrix must have all values greater than zero. In the data preparation section of PET there is a step for the automated setting of a minimum non-zero error values.
  
 
The Igor PET PMF tool needs additional 1-dimensional input waves to facilitate the plotting and interpretation of PMF solutions. Required are a 1-dimenensional numeric wave corresponding to index values for the rows, which typically a indicates measurement time in increasing order and 1-dimenensional numeric wave corresponding to index values for the columns, which is typically an m/q in increasing order. For data sets where the columns correspond to high resolution data, i.e. each column indicates a chemical formula such as C2H3O, a 1-dimensional text wave will also be required.  
 
The Igor PET PMF tool needs additional 1-dimensional input waves to facilitate the plotting and interpretation of PMF solutions. Required are a 1-dimenensional numeric wave corresponding to index values for the rows, which typically a indicates measurement time in increasing order and 1-dimenensional numeric wave corresponding to index values for the columns, which is typically an m/q in increasing order. For data sets where the columns correspond to high resolution data, i.e. each column indicates a chemical formula such as C2H3O, a 1-dimensional text wave will also be required.  
  
==== ToF CIMS and non AMS/ACSM data ====
+
==== ToF AMS, Q-AMS & ToF ACSM, Q-ACSM ====
 
 
Please refer to controls in Tofware.
 
 
 
==== ToF AMS, Q-AMS ACSM ====
 
  
 
* Regardless as to whether the data has been generated from a quadrupole or ToF, users should select to use MS airbeam correction option for generating the matrices. This is because the error matrices will be generated using the airbeam correction factor if it exists. In the various analysis software packages users may generate the data in units of ug/m3 or Hz. An Igor text file can be generate to facilitate the exporting of all relevant input information for PET.
 
* Regardless as to whether the data has been generated from a quadrupole or ToF, users should select to use MS airbeam correction option for generating the matrices. This is because the error matrices will be generated using the airbeam correction factor if it exists. In the various analysis software packages users may generate the data in units of ug/m3 or Hz. An Igor text file can be generate to facilitate the exporting of all relevant input information for PET.
Line 253: Line 271:
  
 
* Buttons are provided for the export of data to PET.
 
* Buttons are provided for the export of data to PET.
 +
 +
* It is important to know that the CE and RIE are *NOT* applied to the Org and Org error matices and HROrg and HROrg error matrices if you choose unit so ug/m3.  The general convention is that whenever Squirrel or Pika outputs anything with an m/z dimension the units are NO3-equivalent.  By using nitrate equivalent mass, we are scaling all ions equally (i.e. not scaling differently according to which species each ion contributes to). This means that the mass spectra in ugm3 units are identical to those in Hz units except for a single scaling factor.  The thinking behind this convention is that whenever we have an m/z dimension, such as with an average mass spectra, we can sum the unit resolution at any nominal mass to compare to the raw spectra.
  
 
===== In the Q-AMS Analysis Software (James') =====
 
===== In the Q-AMS Analysis Software (James') =====
Line 264: Line 284:
 
* You may want to load the saved waves into a new experiment to run PMF.
 
* You may want to load the saved waves into a new experiment to run PMF.
 
** You'll also need to include your time series wave (t_series) and a wave of the m/z's in the matrix (amus).
 
** You'll also need to include your time series wave (t_series) and a wave of the m/z's in the matrix (amus).
 +
<!-- [[#Deleting NaNs/zeros for all Instruments|Continue to the Deleting NaNs/zeros for all Instruments section]]  -->
  
[[#Deleting NaNs/zeros for all Instruments|Continue to the Deleting NaNs/zeros for all Instruments section]]
+
====== Recommended Practice: Removing Spikes (generally, for Quadrupole data) ======
 
+
* "Spikes" in the time series of an ''m/z'' can occur in Q-AMS data from large but infrequent particles during the scanning of the quadrupole.  If such spikes have a common source with a factor that can be retrieved by PMF, they may increase the variation of that factor profile and additional factors may be found that represent this variation, but not a physically-meaningful, separate component.  The "excess signal" from these spikes can be subtracted from the spikes and the average mass spectrum of the spikes examined.  See [http://cires.colorado.edu/jimenez/ams-papers.html#ZhangEST05 Zhang, Q. et al., ES&T, 2005].
====== Recommended Practice: Removing Spikes ======
 
"Spikes" in the time series of an ''m/z'' can occur in Q-AMS data from large but infrequent particles during the scanning of the quadrupole.  If such spikes have a common source with a factor that can be retrieved by PMF, they may increase the variation of that factor profile and additional factors may be found that represent this variation, but not a physically-meaningful, separate component.  The "excess signal" from these spikes can be subtracted from the spikes and the average mass spectrum of the spikes examined.  See [http://cires.colorado.edu/jimenez/ams-papers.html#ZhangEST05 Zhang, Q. et al., ES&T, 2005].
 
  
Note that if you remove "excess signal" in the method of Zhang et al. 2005 and leave the error values for these points unchanged, you are automatically "downweighting" these points in PMF.  This is appropriate because the replacement value for the original spike is not known as well as the values for points without spikes.
+
* Note that if you remove "excess signal" in the method of Zhang et al. 2005 and leave the error values for these points unchanged, you are automatically "downweighting" these points in PMF.  This is appropriate because the replacement value for the original spike is not known as well as the values for points without spikes.
  
 
====== Optional Practice: Smoothing ======
 
====== Optional Practice: Smoothing ======
Smoothing can be used to reduce high-frequency noise in the data that could also be fit as additional factors.  If you smooth the data, be sure to propagate this smoothing in the error matrix ''(not added to the wiki yet)''.
+
* Smoothing can be used to reduce high-frequency noise in the data that could also be fit as additional factors.  If you smooth the data, be sure to propagate this smoothing in the error matrix ''(not added to the wiki yet)''.
 
+
<!-- [[#Deleting NaNs/zeros for all Instruments|Continue to the Deleting NaNs/zeros section]] -->
[[#Deleting NaNs/zeros for all Instruments|Continue to the Deleting NaNs/zeros for all Instruments section]]
+
<!--
 
 
<--
 
 
==== In SQUIRREL ====
 
==== In SQUIRREL ====
  
Line 284: Line 301:
 
* If the error matrix has not yet been calculated, this must be done first. In the Corrections tab, Errors sub-tab, check the calc MS errors checkbox.  Make sure you don't intend to do any other calculations (such as creating a new AB correction factor) in the corrections tab and then press the Do Corrections button.  This will generate the MSSDiff_p_err data set.   
 
* If the error matrix has not yet been calculated, this must be done first. In the Corrections tab, Errors sub-tab, check the calc MS errors checkbox.  Make sure you don't intend to do any other calculations (such as creating a new AB correction factor) in the corrections tab and then press the Do Corrections button.  This will generate the MSSDiff_p_err data set.   
  
* Squirrel verions 1.46 - 1.48
+
* Squirrel versions 1.46 - 1.48
 
In these versions the creation and export of the waves needed for PMF analysis is performed in several steps.
 
In these versions the creation and export of the waves needed for PMF analysis is performed in several steps.
 
** The squirrel interface for generating the Org and Org error matricies in Squirrel have changed slightly before and after version 1.46 - 1.48. In all versions one goes to the MS tab, average mass spectra section, enters (only) Org as a species.  In versions 1.46 and higher one needs to check the 'Calc, plot 1sigma err' checkbox and optionally use the 'Prompt for max m/z' checkbox and then press 'Calc Time Series Spectra' button (v. 1.47+).   
 
** The squirrel interface for generating the Org and Org error matricies in Squirrel have changed slightly before and after version 1.46 - 1.48. In all versions one goes to the MS tab, average mass spectra section, enters (only) Org as a species.  In versions 1.46 and higher one needs to check the 'Calc, plot 1sigma err' checkbox and optionally use the 'Prompt for max m/z' checkbox and then press 'Calc Time Series Spectra' button (v. 1.47+).   
Line 322: Line 339:
 
In Pika 1.09F there is a handy checkbox that when checked will automatically generate an itx containing all the waves necessary for the PMF (with the exception of any data preparation steps such as the minimum error, Nan removal, etc.)
 
In Pika 1.09F there is a handy checkbox that when checked will automatically generate an itx containing all the waves necessary for the PMF (with the exception of any data preparation steps such as the minimum error, Nan removal, etc.)
  
* It is important to know that the RIE and CE are *NOT* applied to the HROrg and HROrg error matrix if you choose unit so ug/m3.  The general convention is that whenever Squirrel or Pika outputs anything with an m/z dimension the units are NO3-equivalent.  By using nitrate equivalent mass, we are scaling all ions equally (i.e. not scaling differently according to which species each ion contributes to). This means that the mass spectra in ugm3 units are identical to those in Hz units except for a single scaling factor.  The thinking behind this convention is that whenever we have an m/z dimension, such as with an average mass spectra, we can sum the unit resolution at any nominal mass to compare to the unspeciated signal.
 
 
-->
 
-->
 +
 +
==== Tofware for ToF CIMS and ACSM data ====
 +
 +
Controls in Tofware allow the export of all necessary Igor waves into one external file.
 +
 
=== For AMS Organics + Other Signals including AMS Inorganics ===
 
=== For AMS Organics + Other Signals including AMS Inorganics ===
  
Setting up the signal and error matricies for org+inorg is not terribly difficult.  The interpretation is much more complicated and nuanced than for a PMF analysis on only the organics. More on this in the * and ** sections below.  
+
Setting up the signal and error matricies for org+inorg is not terribly difficult.  The interpretation is much more complicated and nuanced than for a PMF analysis on only the organics. One can look through 2009 - 2019 [https://cires1.colorado.edu/jimenez-group/wiki/index.php/AMSUsrMtgs AMS Users Meeting talks] by searching for "PMF" to review many options that have been presented and subsequently published.
 
+
<!--
 
Adding other signals to the organic matrix and error matrix is done most simply by a cut and paste operation.  In Igor make 2 tables:one containing the organic matrix, the other containing the error matrix.  You will cut and paste columns (columns correspond to the time series in the 2D matricies) to the organic and error matrices.  In the case of UMR you will likely have to add new columns to the organic matrix to the right side end, the max m/z, and keep track of which columns indicate which species m/z.  This is because, for example, m/z 30 will have independent signals from org and nitrate.
 
Adding other signals to the organic matrix and error matrix is done most simply by a cut and paste operation.  In Igor make 2 tables:one containing the organic matrix, the other containing the error matrix.  You will cut and paste columns (columns correspond to the time series in the 2D matricies) to the organic and error matrices.  In the case of UMR you will likely have to add new columns to the organic matrix to the right side end, the max m/z, and keep track of which columns indicate which species m/z.  This is because, for example, m/z 30 will have independent signals from org and nitrate.
  
Line 355: Line 376:
 
** A small but tedious detail to be aware of is the H2O contributions for sulfate and organics. If you do an organic+sulfate+xx PMF analysis I'd just remove all signals of OH and H2O (and O?).  We estimate these HR ion contributions from the major signals (CO2, etc) anyway.
 
** A small but tedious detail to be aware of is the H2O contributions for sulfate and organics. If you do an organic+sulfate+xx PMF analysis I'd just remove all signals of OH and H2O (and O?).  We estimate these HR ion contributions from the major signals (CO2, etc) anyway.
  
==== For non-AMS Users ====
+
==== For non-AMS/ACSM Users ====
  
 
You will need to create 5 waves to run your PMF analysis in Igor:
 
You will need to create 5 waves to run your PMF analysis in Igor:
Line 363: Line 384:
 
* a '''numeric''' wave that has index values for your species (1, 2, 3, ...)
 
* a '''numeric''' wave that has index values for your species (1, 2, 3, ...)
 
* a time wave
 
* a time wave
 
+
-->
 +
<!--
 
This section describes how you might make those waves, assuming your data is in Excel and is loaded to Igor with one wave for each species and another wave for each species' error values.  Depending on the initial format of your data, some steps may not apply to you.
 
This section describes how you might make those waves, assuming your data is in Excel and is loaded to Igor with one wave for each species and another wave for each species' error values.  Depending on the initial format of your data, some steps may not apply to you.
  
Line 417: Line 439:
  
 
  make/N=(numpnts(year))/D timeseries = date2secs(year,month,day) + hour*60*60 + minute*60 + second
 
  make/N=(numpnts(year))/D timeseries = date2secs(year,month,day) + hour*60*60 + minute*60 + second
 +
-->
  
=== Further Preparation of the Error Matrix ===
+
=== Further Data Preparation Before Calling PMF ('''Prep''' tab in the PET panel) ===
 +
* The following steps are recommended for AMS datasets and follow the practices laid out in [http://www.atmos-chem-phys.net/9/2891/2009/acp-9-2891-2009.html| Ulbrich et al., ACP, 2009] (with more detailed references in each section below).  Note that '''the only mandatory step''' is step 0 (importing data into the experiment and selecting it) and step 4, (Deleting NaNs/zeros). More extensive documentation on use of the remove/delete nan functions is given in the header comments in the ipf pmf_ErrPrep_AMS_v*.ipf.
  
The following steps are recommended for AMS datasets and follow the practices laid out in [http://www.atmos-chem-phys.net/9/2891/2009/acp-9-2891-2009.html| Ulbrich et al., ACP, 2009] (with more detailed references in each section below).  Note that '''the only mandatory step''' is [[#Deleting NaNs/zeros for all Instruments|Deleting NaNs/zeros]].  The functions for the error modifications can be found in [[#PET Software | pmf_ErrPrep_AMS_v2_3.ipf]].  More extensive documentation on use of the functions is given in the headers in that file.
+
''Before running these functions, you may want to rename your error matrix so that it has fewer than 12 characters. Each data prep step lengthens the wavename and the code will complain if the name gets too long.''
  
''Before running these functions, you should duplicate your error matrix and give it a short name (fewer than 12 characters). Each function lengthens the wavename and the functions will complain if the name gets too long.''
+
==== Data Prep Step 0. Select Initial data ====
 +
* From the top Igor menu Data - Load Waves load in the Igor file that you exported from the analysis software - Tofware, Squirrel, etc. It is often good practice to create a data folder to contain these newly imported data waves.
  
==== Recommended Practice: Set a Minimum Error ====
+
* Select the data folder containing the PMF input and subsequent DATA, ERROR, etc waves.
  
Ions arrive at the mass spectrometer detector with a [http://en.wikipedia.org/wiki/Poisson_distribution| Poisson distribution].  The error for a counted number of ions is sqrt(counted number of ions). The smallest number of ions we can count in one run is, of course, zero ions, but perhaps there was one and it was missed.  The error for counting zero is sqrt(0), but an error of 1 would be more appropriate in this case.  Hence a minimum error threshold of 1 ion is set.
+
==== Recommended Practice: Data Prep Step 1. Apply a Minimum Error ====
 +
* Ions arrive at the mass spectrometer detector with a [http://en.wikipedia.org/wiki/Poisson_distribution| Poisson distribution].  The error for a counted number of ions is sqrt(counted number of ions). The smallest number of ions we can count in one run is, of course, zero ions, but perhaps there was one and it was missed.  The error for counting zero is sqrt(0), but an error of 1 would be more appropriate in this case.  Hence a minimum error threshold of 1 ion is set.
  
Within this step the user must choose the mass spec detector type.  For ToF detectors, there is an m/z dependence to account for the unequal 'pushing' of the ions into the detector. Small m/z ions move faster than large m/z at an approximate rate of square root of the mass. This phenomena is sometimes referred to as an 'm/z duty cycle' correction.   The application of a minimum error must undo this duty cycle correction if it was applied to the data to reflect true ions counted.  In practice then, when one selects the AMS-ToF for the MS type, a sqrt(28)/sqrt(m/z) factor is temporarily multiplied to the error matrix. For AMS-quad data and for tofware-processed CIMS data, this undoing of the correction factor is not needed.
+
* Within this step the user must choose the mass spec detector type.  For ToF detectors, there is an m/z dependence to account for the unequal 'pushing' of the ions into the detector. Small m/z ions move faster than large m/z at an approximate rate of square root of the mass. This phenomena is sometimes referred to as an 'm/z duty cycle' correction. The application of a minimum error must undo this duty cycle correction if it was applied to the data to reflect true ions counted.  In practice then, when one selects the AMS-ToF for the MS type, a sqrt(28)/sqrt(m/z) factor is temporarily multiplied to the error matrix. For AMS-quad data and for Tofware-processed CIMS data, this undoing of the correction factor is not needed.
  
The one ion wave needs to be in the same units as the data and error matricies.  If the data and error matricies are in units of Hz and the data acquisition (the actual time measuring i.e. in the 'MS open' mode) for each run is 1 second, then the one ion wave is simply a wave containing all 1s.  If each run had an MS open mode lasting one minute, the wave would consist of 1/60.  If the data and error matricies are in typical AMS units of ug/m3 then one needs to convert from Hz using the flow rate, ionization efficiency, AB correction factor, etc.
+
* The one ion wave needs to be in the same units as the data and error matrices.  If the data and error matrices are in units of Hz and the data acquisition (for AMS the actual time measuring i.e. in the 'MS open' mode) for each run is 1 second, then the one ion wave is simply a wave containing all 1s.  If each run had an MS open mode lasting one minute, the wave would consist of 1/60.  If the data and error matrices are in typical AMS units of ug/m3 then one needs to convert from Hz using the flow rate, ionization efficiency, AB correction factor, etc.  This will have been done for you in the PMF export step in Squirrel, Pika, ACSM Tofware code.
  
 +
* By pressing the "Apply minimum error" button the function produces the following waves:
 +
# The error matrix with minimum error applied called nameOfWave(errMx)+"_min" (e.g., Orgerr becomes Orgerr_min)
  
 +
<!--
 
The minimum error is applied in three steps:
 
The minimum error is applied in three steps:
  
Line 442: Line 471:
 
# The error matrix with minimum error applied called nameOfWave(errMx)+"_min" (e.g., OrgMSerr becomes OrgMSerr_min)
 
# The error matrix with minimum error applied called nameOfWave(errMx)+"_min" (e.g., OrgMSerr becomes OrgMSerr_min)
 
# A matrix of the fractional increase of the errors called nameOfWave(errMx)+"_adjErrMask" where the value of each point is  (new/old)-1.
 
# A matrix of the fractional increase of the errors called nameOfWave(errMx)+"_adjErrMask" where the value of each point is  (new/old)-1.
 +
-->
 +
''See also'' discussion of Ulbrich et al., ACPD 2008, [http://www.cosis.net/copernicus/EGU/acpd/8/S2726/acpd-8-S2726.pdf?PHPSESSID=511a6d01a82e0953a68adf517ffd08f8 P. Paatero comment (p. S5730)] and [http://www.cosis.net/copernicus/EGU/acpd/8/S11954/acpd-8-S11954.pdf?PHPSESSID=511a6d01a82e0953a68adf517ffd08f8 Author response (p. S11960)]
  
''See also'' discussion of Ulbrich et al., ACPD 2008, [http://www.cosis.net/copernicus/EGU/acpd/8/S2726/acpd-8-S2726.pdf?PHPSESSID=511a6d01a82e0953a68adf517ffd08f8 P. Paatero comment (p. S5730)] and [http://www.cosis.net/copernicus/EGU/acpd/8/S11954/acpd-8-S11954.pdf?PHPSESSID=511a6d01a82e0953a68adf517ffd08f8 Author response (p. S11960)]
+
==== Data Prep Step 2. Removing Spikes (data dependent) ====
 +
 
 +
* By pressing the Remove spikes (optional) button new data and error matrices will be generated with '''spikes''' removed.
  
==== Propagation of Smoothing (when relevant) ====
+
==== Data Prep Step 3. Propagation of Smoothing (generally not relevant for ToF data) ====
Any smoothing of the data matrix must be propagated in the error matrix.  The function '''pmf_err_propogateSmooth''' propagates box or Gaussian smoothing.   
+
*Any smoothing of the data matrix must be propagated in the error matrix.  The function '''pmf_err_propogateSmooth''' propagates box or Gaussian smoothing.   
  
 
Some notes about specifying the smoothing that was performed for the data:
 
Some notes about specifying the smoothing that was performed for the data:
Line 455: Line 488:
 
#* For Gaussian smoothing, the number of points refers to the number of adjacent points used.  I.e., smoothing that includes 1 adjacent point on each side is 1-point smoothing.
 
#* For Gaussian smoothing, the number of points refers to the number of adjacent points used.  I.e., smoothing that includes 1 adjacent point on each side is 1-point smoothing.
  
The function produces a wave with the propagated error called nameOfWave(errMx)+"Prop" (e.g., OrgMSerr_Min becomes OrgMSerr_minProp).
+
* By pressing the "Propogate smoothing to the error mx" button the function produces a wave with the propagated error called nameOfWave(errMx)+"Prop" (e.g., OrgMSerr_Min becomes OrgMSerr_minProp).
  
==== Deleting NaNs/zeros for all Instruments====
+
==== Data Prep Step 4. Removing Nans, 0 columns ====
 +
*This step is required for all data.
  
The matrices produced in the previous steps have NaNs in all rows from bad runs and 0 values in columns with good runs that have no organic fragments.  All of these rows and columns need to be removed before running PMF.  This is a two-step process.
+
* The data matrix produced in the previous steps may have NaNs in all rows from bad runs and 0 values in columns with good runs that have no fragments of interest (i.e. Argon, or m/z 40 for AMS data).  All of these rows and columns need to be removed before running PMF.   
 +
 
 +
* The function produces a wave with the propagated error called nameOfWave(errMx)+"_noNans" (e.g., OrgMSerr_Min becomes OrgMSerr_min_noNans).
 +
<!--
 +
This is a two-step process.
  
 
'''1.''' First, change the columns with 0's to NaNs.   
 
'''1.''' First, change the columns with 0's to NaNs.   
Line 549: Line 587:
 
; pmf_ams_RemoveBlockSize_NaNsWv(NaNsWv, BlockSizeToRemove)
 
; pmf_ams_RemoveBlockSize_NaNsWv(NaNsWv, BlockSizeToRemove)
 
:  Creates ''NaNsWv_''type''_trunc'' and deletes the records for original blocks of NaNs of size BlockSizeToRemove.  If you did V-W switching and are only running the V-mode data in PMF, your NaNsWv_tseries will have an entry for every other row of your original matrix noting that 1 NaN row was removed.  You may also have periods with longer contiguous runs of NaNs (e.g., instrument was down, filter runs, etc.).  For publication plots, you should insert NaNs into your final factor waves so that periods of missing data are not crossed by a line connecting the good data before and after the real gaps.  For NaNs removed due to menu switching, however, you may not want to insert all of these NaNs again.  (In the example case, each solved value would have a NaN on each size of it and you couldn't plot with Lines Between Points.  If you uncheck "Gaps" to connect these points, you'll also have the undesirable lines across periods of missing data.)  Running '''pmf_ams_removeBlockSize_NaNsWv(NaNsWv_tseries, 1)''' would create ''NaNsList_tseries_trunc'' which would not include a record of all the removed W runs and then could be used to insert NaNs representing only missing data periods in the final waves.
 
:  Creates ''NaNsWv_''type''_trunc'' and deletes the records for original blocks of NaNs of size BlockSizeToRemove.  If you did V-W switching and are only running the V-mode data in PMF, your NaNsWv_tseries will have an entry for every other row of your original matrix noting that 1 NaN row was removed.  You may also have periods with longer contiguous runs of NaNs (e.g., instrument was down, filter runs, etc.).  For publication plots, you should insert NaNs into your final factor waves so that periods of missing data are not crossed by a line connecting the good data before and after the real gaps.  For NaNs removed due to menu switching, however, you may not want to insert all of these NaNs again.  (In the example case, each solved value would have a NaN on each size of it and you couldn't plot with Lines Between Points.  If you uncheck "Gaps" to connect these points, you'll also have the undesirable lines across periods of missing data.)  Running '''pmf_ams_removeBlockSize_NaNsWv(NaNsWv_tseries, 1)''' would create ''NaNsList_tseries_trunc'' which would not include a record of all the removed W runs and then could be used to insert NaNs representing only missing data periods in the final waves.
 +
-->
 +
 +
==== Data Prep Steps 5. and 6. Recommended Practice: Downweight "Weak" Variables (''m/z'''s) ====
 +
 +
* Any m/zs that have low signal-to-noise ratio (SNR) may, in fact, have more noise than signal.  If these m/zs contribute enough Q, PMF tries to fit the noisy data.  In this way, the inclusion of such m/zs can be detrimental to the PMF analysis.  If the error associated with these ''m/z'' 's is increased, the Q-contribution (residual/error) is decreased, "downweighting" these points' contribution to the fit.  m/zs with SNR<0.2 are considered "bad" by Paatero and Hopke (2003) and should be removed or strongly downweighted (factor of ~10).  ''m/z'' 's with 0.2<SNR<2 are considered "weak" and should be downweighted (factor of 2-3).
  
==== Recommended Practice: Downweight "Weak" Variables (''m/z'''s)====
+
* When a user presses the "Calculate, view SNR panel" A separate panel appears that shows time series, errors and SNR calculations. This tool is handy to examine ions that are deemed to be "bad" or "weak".  This panel is informational only and is used in step 6.
  
Any ''m/z'' 's that have low signal-to-noise ratio (SNR) may, in fact, have more noise than signal.  If these ''m/z'' 's contribute enough Q, PMF tries to fit the noisy data.  In this way, the inclusion of such ''m/z'' 's can be detrimental to the PMF analysis. If the error associated with these ''m/z'' 's is increased, the Q-contribution (residual/error) is decreased, "downweighting" these points' contribution to the fit.  ''m/z'' 's with SNR<0.2 are considered "bad" by Paatero and Hopke (2003) and should be removed or strongly downweighted (factor of ~10). ''m/z'' 's with 0.2<SNR<2 are considered "weak" and should be downweighted (factor of 2-3).
+
* When a user presses the "Downweight weak, bad mzs" button the code will generate new error matrices that have the bad and weak m/z columns adjusted. The function generates a new error matrix called nameofWave(errMx)+"Wk" (e.g., noNaNs_orgMSerr_minProp would become noNaNs_orgMSerr_minPropWk).
  
 +
<!--
 
The calculation of SNR and downweighting of "weak" ''m/z'''s is carried out in three steps:
 
The calculation of SNR and downweighting of "weak" ''m/z'''s is carried out in three steps:
# Calculate SNR of each ''m/z'' using function '''pmf_err_SNRwv''' using the data matrix, the version of the error matrix generated in the previous step, and the model error that will be used in the panel.  The function generates a wave of the SNR for each ''m/z'' called nameofwave(DataMx) + "_SNRwv".
+
# Calculate SNR of each m/z using function '''pmf_err_SNRwv''' using the data matrix, the version of the error matrix generated in the previous step, and the model error that will be used in the panel.  The function generates a wave of the SNR for each m/z called nameofwave(DataMx) + "_SNRwv".
 
# Check the graph produced for "bad" ''m/z'''s.  These are not removed in the next function.  To remove these columns, you'll need to rerun the ''DeleteNaNs'' step after making them into NaNs or changing them in the mask wave.  (This is better than just deleting the columns by had because they'll be added to the ''NaNsList_amus'' and will be reinserted if you '''insertNaNs''' later.
 
# Check the graph produced for "bad" ''m/z'''s.  These are not removed in the next function.  To remove these columns, you'll need to rerun the ''DeleteNaNs'' step after making them into NaNs or changing them in the mask wave.  (This is better than just deleting the columns by had because they'll be added to the ''NaNsList_amus'' and will be reinserted if you '''insertNaNs''' later.
# Downweight "weak" ''m/z'' 's with function '''pmf_err_DwntWeakColumns''' using the SNRwv generated in the previous step, the error matrix used to calculate the SNRwv, and the multiplicative value used to downweight the weak ''m/z'' 's (Paatero and Hopke recommend 2-3).
+
# Downweight "weak" m/zs with function '''pmf_err_DwntWeakColumns''' using the SNRwv generated in the previous step, the error matrix used to calculate the SNRwv, and the multiplicative value used to downweight the weak ''m/z'' 's (Paatero and Hopke recommend 2-3).
  
  
The function generates a new error matrix called nameofWave(errMx)+"Wk" (e.g., noNaNs_orgMSerr_minProp would become noNaNs_orgMSerr_minPropWk).
 
  
 +
-->
 +
''See also'' Paatero, P., and Hopke, P. K.: Discarding or downweighting high-noise variables in factor analytic models, Anal. Chim. Acta, 490, 277-289, 10.1016/s0003-2670(02)01643-4, 2003.  [http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6TF4-47RJM08-2&_user=918210&_coverDate=08/25/2003&_rdoc=24&_fmt=high&_orig=browse&_srch=doc-info(%23toc%235216%232003%23995099998%23448216%23FLA%23display%23Volume)&_cdi=5216&_sort=d&_docanchor=&_ct=33&_acct=C000047944&_version=1&_urlVersion=0&_userid=918210&md5=884956903b4a3f2da627f57207589050 Abstract]
  
''See also'' Paatero, P., and Hopke, P. K.: Discarding or downweighting high-noise variables in factor analytic models, Anal. Chim. Acta, 490, 277-289, 10.1016/s0003-2670(02)01643-4, 2003.  [http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6TF4-47RJM08-2&_user=918210&_coverDate=08/25/2003&_rdoc=24&_fmt=high&_orig=browse&_srch=doc-info(%23toc%235216%232003%23995099998%23448216%23FLA%23display%23Volume)&_cdi=5216&_sort=d&_docanchor=&_ct=33&_acct=C000047944&_version=1&_urlVersion=0&_userid=918210&md5=884956903b4a3f2da627f57207589050 Abstract]
+
==== Data Prep Step 7. Recommended Practice: Downweight Peaks Related to ''m/z'' 44 in Frag Table (for AMS and ACSM data only) ====
  
==== Recommended Practice: Downweight Peaks Related to ''m/z'' 44 in Frag Table ====
+
* In the default fragmentation table, the information in ''m/z'' 44 is repeated 6 or 7 times (in ''m/z'' 's 16, 17, 18, 19, 20, (28 in the Aiken et al. 2008 revision), and 44) with different proportionalities.  PMF fits correlations, regardless of the magnitudes of the signals.  Repeating the information of ''m/z'' 44 several times implies that it's really (x6 or 7) important, which it isn't!  It is possible to downweight the columns of these ''m/z'' 's so that in total they only contribute the ''m/z'' 44 signal once.  (It would be possible to remove the repeated information and replace those columns after running PMF, but we think that downweighting them is logistically simpler.)
  
In the default fragmentation table, the information in ''m/z'' 44 is repeated 6 or 7 times (in ''m/z'' 's 16, 17, 18, 19, 20, (28 in the Aiken et al. 2008 revision), and 44) with different proportionalitiesPMF fits correlations, regardless of the magnitudes of the signals.  Repeating the information of ''m/z'' 44 several times implies that it's really (x6 or 7) important, which it isn't!  It is possible to downweight the columns of these ''m/z'' 's so that in total they only contribute the ''m/z'' 44 signal once. (It would be possible to remove the repeated information and replace those columns after running PMF, but we think that downweighting them is logistically simpler.) 
+
* There are two buttons, one for UMR AMS or ACSM data and another for HR AMS or ACSM data. Once you press the button the List of duplicate ions will be filled in. If you are analyzing HR data and using the default HR organics frag waves, the equivalent set of HR ions of (the mz44list) correspond to O, HO, H2O, CO and CO2.  If you are analyzing organic UMR data the default list will include 16, 17, 18, 19, 20, 28, 44.  Then press the '''Downweight duplicated columns''' button.  
  
 +
* The function generates a new error matrix called nameofWave(errMx)+"44" (e.g., noNaNs_orgMSerr_minPropWk would become noNaNs_orgMSerr_minPropWk44).
 +
<!--
 
Downweighting the ''m/z'' 's related to ''m/z'' 44 is accomplished in two steps:
 
Downweighting the ''m/z'' 's related to ''m/z'' 44 is accomplished in two steps:
 
# Make a wave that contains the ''m/z'' 's related to m/z 44 (in any order), e.g.
 
# Make a wave that contains the ''m/z'' 's related to m/z 44 (in any order), e.g.
Line 576: Line 623:
 
:: make/N=7 mz44peaksWv = {16, 17, 18, 19, 20, 28, 44}
 
:: make/N=7 mz44peaksWv = {16, 17, 18, 19, 20, 28, 44}
 
# Use function '''pmf_err_dnwt44peaks''' with the error matrix generated in the previous step, the ''mz44peaksWv'', and the ''noNaNs_amus'' wave.
 
# Use function '''pmf_err_dnwt44peaks''' with the error matrix generated in the previous step, the ''mz44peaksWv'', and the ''noNaNs_amus'' wave.
 +
-->
 +
 +
==== Data Prep Conclusion: Indicate that all Data Prep Steps have been Completed ====
  
The function generates a new error matrix called nameofWave(errMx)+"44" (e.g., noNaNs_orgMSerr_minPropWk would become noNaNs_orgMSerr_minPropWk44).  
+
* Press the gold "Select prepared waves for PMF" button.  The second tab of the PET panel, "Run", will become active.
  
If you are analyzing HR data and using the default HR organics frag waves, the equivalent set of HR ions of (the mz44list) correspond to O, HO, H2O, CO and CO2.
+
== Perform PMF Analysis  ==
  
== Perform PMF Analysis *Step 1* ==
+
* At this point you are ready to indicate where the PMF executable file resides, where the prepared data and error matrices reside within the Igor experiment and what PMF solutions you are wanting to investigate.
  
 +
Press the "Create path to PMF executable and batch files" button. A window will pop up asking you where the PMF executable file resides. The path will be displayed once a user has selected a folder. If this path text is red, something is amiss and the code couldn't find the PMF executable.
 +
 +
If you pressed the gold button "Select prepared waves for PMF" in the previous data prep tab, entries in the data section will be filled in for you.
 +
 +
<!--
 
=== Additional File for PMF Executable Directory ===
 
=== Additional File for PMF Executable Directory ===
  
Line 607: Line 662:
 
#*Chose the folder (must be root: or a directory in root:)
 
#*Chose the folder (must be root: or a directory in root:)
 
#*Choose the Data and Error matrices (use noNaNs_ versions)
 
#*Choose the Data and Error matrices (use noNaNs_ versions)
#*Choose model error (PMF increases the errors provided by newError = oldError + modelError*dataValue)  
+
-->
 +
#*Choose model error  
 +
#**It is very rare for users to change this model error value from the default of 0.  (PMF increases the errors provided by newError = oldError + modelError*dataValue)  
 
# Choose the type of PMF analysis
 
# Choose the type of PMF analysis
 
#* Exploration will run PMF for a range of number of factors and FPEAKs or SEEDs.  This is the typical use for exploring a dataset and comparing many solutions.
 
#* Exploration will run PMF for a range of number of factors and FPEAKs or SEEDs.  This is the typical use for exploring a dataset and comparing many solutions.
 
#* Bootstrapping explores the uncertainty of '''one''' solution (i.e., ''one'' number of factors at ''one'' fpeak for ''one'' seed).  This is usually a final step run only on the solution you have selected from the exploratory analysis.
 
#* Bootstrapping explores the uncertainty of '''one''' solution (i.e., ''one'' number of factors at ''one'' fpeak for ''one'' seed).  This is usually a final step run only on the solution you have selected from the exploratory analysis.
 
# Choose a range for number of factors.   
 
# Choose a range for number of factors.   
#* When checking to make sure that everything runs properly, you may want to run just one case (Min p = 2, Max p = 2).
+
#* When checking to make sure that everything runs properly, you may want to run just one case (Min p = 2, Max p = 2; p is the number of solved factors).
 
#* Recommended Practice: Run cases with 1 factor to have a context for the meaning of the 2-factor solution.
 
#* Recommended Practice: Run cases with 1 factor to have a context for the meaning of the 2-factor solution.
 
#* In the Bootstrapping mode, only the "min p" is read; the "max p" is ignored.
 
#* In the Bootstrapping mode, only the "min p" is read; the "max p" is ignored.
Line 627: Line 684:
 
# Select checkboxes
 
# Select checkboxes
 
#* '''Run PMF in background'''  Each execution of PMF (see [[#Exploration Mode Summary| Exploration Mode Summary]]) creates a black DOS window that pops up.  If the box is not checked, this window "grabs the focus" and makes itself the top window.  This makes it hard to use the computer for anything else.  If the box is checked, the window will not grab focus, but Igor and your computer's CPU will be busy.
 
#* '''Run PMF in background'''  Each execution of PMF (see [[#Exploration Mode Summary| Exploration Mode Summary]]) creates a black DOS window that pops up.  If the box is not checked, this window "grabs the focus" and makes itself the top window.  This makes it hard to use the computer for anything else.  If the box is checked, the window will not grab focus, but Igor and your computer's CPU will be busy.
 
+
#* '''Save experiment before and after PMF has finished''' It is generally a good idea to save the experiment before calling PMF so that all the data prep steps and user selections will be saved before attempting to call the PMF executable.  Once the code has finished all its calls to the PMF executable it is a good idea to save the experiment because all the PNF results are now saved as waves within the experiment.
  
 
==== What the Software Does When You Press the Button... ====
 
==== What the Software Does When You Press the Button... ====
  
 
===== Exploration Mode Summary =====
 
===== Exploration Mode Summary =====
The software will execute PMF once for every combination of number of factors and FPEAK/seed.  So if you run 1-5 factors and 5 FPEAKs, PMF will run 5x5=25 times.  Each run starts a new black DOS window that will close when the run is completed.  The duration of each run is printed in the history at the end of each run.  In general, runs which solve for more factors and runs with FPEAK farther from 0 take longer.  The code runs all of the FPEAKS or seeds for one number of factors, then advances to the next number of factors (e.g., run 1 factor with each of 5 FPEAK values, then 2 factors with each of 5 FPEAK values, etc.).
+
The software will execute PMF once for every combination of number of factors and FPEAK/seed.  So if you run 1-6 factors and 5 FPEAKs, PMF will run 6x5=30 times.  Each run starts a new black DOS window that will close when the run is completed.  The duration of each run is printed in the history at the end of each run.  In general, runs which solve for more factors and runs with FPEAK farther from 0 take longer.  The code runs all of the FPEAKS or seeds for one number of factors, then advances to the next number of factors (e.g., run 1 factor with each of 5 FPEAK values, then 2 factors with each of 5 FPEAK values, etc.).
 +
 
 +
When the entirety of sets of calls to the PMF executable has completed many diagnostic values are calculated. The panel from where you selected "Begin PMF analysis with these choices" will be killed and a new panel "View PMF Analysis Results" will be created with PMF inputs filled in for you.  Select options for possible additional calculations.  For example for AMS/ACSM HR Organic data, calibrated elemental ratios can be generated for each factor.
  
 
====== A little more detail ======
 
====== A little more detail ======
Line 638: Line 697:
 
  C:\delete_log.bat    C:\runPMF.bat
 
  C:\delete_log.bat    C:\runPMF.bat
  
and writes your DataMatrix and ErrorMatrix as MATRIX.DAT and STD_DEV.DAT, respectively to the folder with your PMF Executable.  The software also writes a file to that folder called STD_DEV_PROP.DAT, which has the same number of points as the DataMatrix and in which every element is equal to the ModelError.
+
and writes the DataMatrix and ErrorMatrix as MATRIX.DAT and STD_DEV.DAT, respectively to the folder with your PMF Executable.  The software also writes a file to that folder called STD_DEV_PROP.DAT, which has the same number of points as the DataMatrix and in which every element is equal to the ModelError. These files will be read by the PMF executable.
 
 
  
 
The software then enters a pair of nested loops in which the following steps occur:
 
The software then enters a pair of nested loops in which the following steps occur:
Line 649: Line 707:
 
   imupmf.ini
 
   imupmf.ini
 
:::which is used as the control file for PMF.
 
:::which is used as the control file for PMF.
:::* In v2.03, the '''convergence criteria''' for completing the PMF calculations can be set proportional to Q/Qexp by changing headers in the ''PMF_Execution...ipf''.  The default is to use the convergence criteria in ''mypmft.ini''.  
+
* The '''convergence criteria''' for completing the PMF calculations can be set in the 3rd tab of the PMR_PerformCalcPanel "PMF defaults & advanced options".  
   //Settings for PMF Iteration Convergence proportional to Qexp
+
   //Default settings for PMF iteration convergence proportional to Qexp
 
   constant ITERATION_CHI2_PROP_QEXP = 0
 
   constant ITERATION_CHI2_PROP_QEXP = 0
 
   constant FIRST_ITER_LEVEL_PROP_CONST = 2e-6
 
   constant FIRST_ITER_LEVEL_PROP_CONST = 2e-6
 
   constant SECOND_ITER_LEVEL_PROP_CONST = 2e-6
 
   constant SECOND_ITER_LEVEL_PROP_CONST = 2e-6
 
   constant THIRD_ITER_LEVEL_PROP_CONST = 1e-6
 
   constant THIRD_ITER_LEVEL_PROP_CONST = 1e-6
:::* In v2.06 and higher, within the advanced tab of the PMF_performCalc_panel, the user may change PMF convergence criteria.
 
 
:::: You can read more about convergence criteria and large datasets in the PMF Users Manual Part 1 (pg. 10) (see [[#Other Resources|Other Resources]]).
 
:::: You can read more about convergence criteria and large datasets in the PMF Users Manual Part 1 (pg. 10) (see [[#Other Resources|Other Resources]]).
 
:*Delete the old PMF2.LOG file by running delete_log.bat
 
:*Delete the old PMF2.LOG file by running delete_log.bat
Line 662: Line 719:
 
:*Load PMF output (including log file and factors)
 
:*Load PMF output (including log file and factors)
  
At the completion of the loops, the software calculates some statistics from the output and then creates a panel to select data for viewing.
+
At the completion of the loops, the software calculates some statistics from the output and then creates a panel to select PMF results for viewing.
  
===== Bootstrapping (added in v2.02) =====
+
===== Bootstrapping =====
  
 
The bootstrapping mode is developed after the method described in the EPA PMF v3.0 Users Manual Sect. 6.4 (see [[#Other Resources|Other Resources]]).  The bootstrapping method is used to estimate the uncertainty in both the factor mass spectra and time series.  This is achieved by running PMF on the full dataset once (''from one PMF solution (i.e., combination of number of factors and fpeak) at a time'') and then making a series of variations (the number is specified by the user when selecting the bootstrapping mode) in which a subset of the original rows (mass spectra) are randomly replaced by other rows from the original matrix and running PMF on each of these.   
 
The bootstrapping mode is developed after the method described in the EPA PMF v3.0 Users Manual Sect. 6.4 (see [[#Other Resources|Other Resources]]).  The bootstrapping method is used to estimate the uncertainty in both the factor mass spectra and time series.  This is achieved by running PMF on the full dataset once (''from one PMF solution (i.e., combination of number of factors and fpeak) at a time'') and then making a series of variations (the number is specified by the user when selecting the bootstrapping mode) in which a subset of the original rows (mass spectra) are randomly replaced by other rows from the original matrix and running PMF on each of these.   
Line 688: Line 745:
 
Note that the criteria for including bootstrapped cases in calculations of the average bootstrapped factors is different in our code than in EPA PMF.  Bootstrapped cases in which two or more factors are mapped to the same original factor are not included in the averages in our code.  In these instances, the whole bootstrapping case is rejected before calculating averages.  In EPA PMF, all bootstrapped factors mapped to the same original factor are included in the average.
 
Note that the criteria for including bootstrapped cases in calculations of the average bootstrapped factors is different in our code than in EPA PMF.  Bootstrapped cases in which two or more factors are mapped to the same original factor are not included in the averages in our code.  In these instances, the whole bootstrapping case is rejected before calculating averages.  In EPA PMF, all bootstrapped factors mapped to the same original factor are included in the average.
  
''Important:'' At the present time (v. 2.02) the plots produced by the bootstrapping code expect to find the waves '''noNaNs_amus''' (for plotting profiles) and '''noNaNs_t_series''' (for plotting time series).  If these are not the names of your waves that describe the columns and rows, respectively) of the input matrix, you should duplicate your row and column description waves to have these names.
+
''Important:'' At the present time the plots produced by the bootstrapping code expect to find the waves '''noNaNs_amus''' (for plotting profiles) and '''noNaNs_t_series''' (for plotting time series).  If these are not the names of your waves that describe the columns and rows, respectively) of the input matrix, you should duplicate your row and column description waves to have these names.
 +
 
 +
===== When PMF seems to work for some solutions but not others =====
 +
In the "Perform PMF calculations" panel, "PMF defaults and advanced options" tab, in versions 3.08D and higher there is a checkbox named "Save the log file for every iteration".  If this is checked, then after every call to PMF the PET code will copy the PMF2.LOG file and rename it PMF2_x_y.LOG  where x and y are the current number of factors and fpeak. One can then look through these files to see what problems may have arisen for a specific case.
  
==== Running Two Simultaneous Analyses on Dual-Processor Computers ====
+
==== Running Two Simultaneous Analyses on Multi-Processor Computers ====
  
You can run two PMF analyses (in two separate experiments) on the same computer simultaneously if you have dual processors.  Each analysis will run at the same speed as on a single-processor computer (or when one analysis is run on the dual processor computer).  The PMF executable is not "multi-processor aware," meaning that it can not utilize both processors simultaneously for one PMF run.
+
You can run two or more PMF analyses (in two separate experiments) on the same computer simultaneously if you have multiple processors.  Each analysis will run at the same speed as on a single-processor computer (or when one analysis is run on the dual processor computer).  The PMF executable is not "multi-processor aware," meaning that it can not utilize both processors simultaneously for one PMF run.
  
 
To run two simultaneous analyses with the PET, you'll need '''two''' directories on your computer with the PMF .exe, .key, and mypfmt.ini files.  The directory names must end in "1" and "2," respectively.  For example, you could have
 
To run two simultaneous analyses with the PET, you'll need '''two''' directories on your computer with the PMF .exe, .key, and mypfmt.ini files.  The directory names must end in "1" and "2," respectively.  For example, you could have
  
; <nowiki>C:\PMF\PMF Executable1</nowiki>
+
; <nowiki>C:\Users\*\Documents\PMF\PMF Executable1</nowiki>
: pmf2wtst.exe
+
: pmf2wopt.exe
 
: pmf2key.key
 
: pmf2key.key
 
: mypmft.ini
 
: mypmft.ini
; <nowiki>C:\PMF\PMF Executable2</nowiki>
+
; <nowiki>C:\Users\*\Documents\PMF\PMF Executable2</nowiki>
: pmf2wtst.exe
+
: pmf2wopt.exe
 
: pmf2key.key
 
: pmf2key.key
 
: mypmft.ini
 
: mypmft.ini
  
 
'''Each experiment running PMF must use a separate Executable directory.'''  While a PMF analysis associated with a PMF Executable directory is running, a file called "PMFrunning.txt" exists in that directory.  If you try to run your second PMF analysis using the same Executable directory, the PET will give you an error.  You can choose a different Executable directory and press the button to start the analysis.
 
'''Each experiment running PMF must use a separate Executable directory.'''  While a PMF analysis associated with a PMF Executable directory is running, a file called "PMFrunning.txt" exists in that directory.  If you try to run your second PMF analysis using the same Executable directory, the PET will give you an error.  You can choose a different Executable directory and press the button to start the analysis.
 
  
 
==== Convergence ====
 
==== Convergence ====
Line 725: Line 784:
  
 
== View PMF Analysis Results *Step 2* ==
 
== View PMF Analysis Results *Step 2* ==
 +
Once PET has finished calling PMF and has calculated some diagnostics the "Perform PMF" panel will close and a "View Results" panel will appear with the previous selections of folders and waves preselected. For AMS/ACSM high resolution organic data additional calculations regarding elemental summing & ratios can be selected. Press the gold button "View PMF analysis" to proceed with generating the large PMF results panel.
 
<!--
 
<!--
 
This is a clickable picture of the main PMF panel.  Clicking on a region will take you down the page to more information on the options possible for that control or graph.
 
This is a clickable picture of the main PMF panel.  Clicking on a region will take you down the page to more information on the options possible for that control or graph.
Line 946: Line 1,006:
  
 
=== AMS HR factors colored by families hack ===
 
=== AMS HR factors colored by families hack ===
Versions 3.0 and higher of PET have incorporated data and code to create elemental ratios for High resolution AMS data. Information below is only useful for older (2.x versions of PET.
+
Versions 3.0 and higher of PET have incorporated data and code to create elemental ratios for high resolution AMS organic data. Information below is only useful for older (2.x versions of PET).
  
As of PMF version 2.08D, the grouping of HR ions info CH, CHO1 etc families and elemental ratios of factors is built into thePMF "View Results" panel. For historical purposes, a 'hack' for performing these tasks for use with older PMF versions is described below.
+
As of PMF version 2.08D, the grouping of HR ions info CH, CHO1 etc families and elemental ratios of factors is built into thePMF "View Results" panel. For historical purposes, a 'hack' for performing these tasks for use with older PMF versions is described below.
  
 
During the examination of factors of High Resolution AMS spectra, it is often helpful to plot the HR spectra summed into chemical 'families', i.e. familyCH which consists of all chemical fragments consisting only of C and H atoms, in a unit resolution dimension. As of November 2013 the following method can be employed.
 
During the examination of factors of High Resolution AMS spectra, it is often helpful to plot the HR spectra summed into chemical 'families', i.e. familyCH which consists of all chemical fragments consisting only of C and H atoms, in a unit resolution dimension. As of November 2013 the following method can be employed.
Line 969: Line 1,029:
 
== Compare PMF Results with External Factors *Step 3* ==
 
== Compare PMF Results with External Factors *Step 3* ==
  
Caution: This part of the code is a bit fussy!  Please contact [mailto:ulbrich@colorado.edu?Subject=Trouble%20with%20Scatter%20Panel Ingrid] if you have tried the tips here and have trouble getting this panel to work.  
+
Caution: This part of the code is a bit fussy!  Please contact [mailto:support@aerodyne.com?Subject=Trouble%20with%20Scatter%20Panel Donna] if you have tried the tips here and have trouble getting this panel to work.  
  
 
=== Setting up the External Data ===
 
=== Setting up the External Data ===
Line 1,024: Line 1,084:
  
 
= Considerations for Choosing a Solution =
 
= Considerations for Choosing a Solution =
 +
* This topic is beyond the scope of discussion on this wiki.
  
 
= PMF Evaluation Tool Software =
 
= PMF Evaluation Tool Software =
Line 1,031: Line 1,092:
 
!Date
 
!Date
 
!Release Notes
 
!Release Notes
!PET ipf files
+
!PET ipf Files for Upgrading
 
!
 
!
!PET Template
+
!PET Igor Template Experiment
 
!
 
!
!mypmft.ini
+
!PMF Executable Package
  
 
|-
 
|-
|3.08D
+
|3.08E
|12 Dec 2023
+
|24 Oct 2024
 
|[http://cires.colorado.edu/jimenez-group/PMFResources/PMF_v3_00_ReadMe.txt Release Notes v3.00]
 
|[http://cires.colorado.edu/jimenez-group/PMFResources/PMF_v3_00_ReadMe.txt Release Notes v3.00]
 
<!-- | ''a required file '' -->
 
<!-- | ''a required file '' -->
|[http://cires.colorado.edu/jimenez-group/PMFResources/Downloads/PMF_v3_08D.zip Zipped PMF file containing these 7 ipfs]:
+
|[http://cires.colorado.edu/jimenez-group/PMFResources/Downloads/PMF_v3_08E.zip Zipped PMF file containing these 7 ipfs]:
PMF_SetupPanel_v3_08D.ipf<br>
+
PMF_SetupPanel_v3_08E.ipf<br>
PMF_ErrPrep_AMS_v3_08D.ipf<br>
+
PMF_ErrPrep_AMS_v3_08E.ipf<br>
PMF_Execution_v3_08D.ipf<br>
+
PMF_Execution_v3_08E.ipf<br>
PMF_ViewResults_v3_08D.ipf<br>
+
PMF_ViewResults_v3_08E.ipf<br>
PMF_scatter_v3_08D.ipf<br>
+
PMF_scatter_v3_08E.ipf<br>
PMF_EACode_v3_08D.ipf<br>
+
PMF_EACode_v3_08E.ipf<br>
 
SNRAnalysis_v1_02A.ipf<br>
 
SNRAnalysis_v1_02A.ipf<br>
 
|''or''
 
|''or''
|[http://cires.colorado.edu/jimenez-group/PMFResources/Downloads/IgorPMF_Template_v3_08D.pxt Igor PMF template experiment v3_08D]
+
|[http://cires.colorado.edu/jimenez-group/PMFResources/Downloads/IgorPMF_Template_v3_08E.pxt Igor PMF template experiment v3_08E]
 
|rowspan="6"|''and''
 
|rowspan="6"|''and''
|rowspan="6"|[http://cires.colorado.edu/jimenez-group/PMFResources/Downloads/mypmft.ini mypmft.ini for use with v2.03 and higher]
+
|rowspan="6"|[http://cires.colorado.edu/jimenez-group/PMFResources/Downloads/PMF_executable_package.zi PMF_executable_package.zip]
 +
 
  
<!--  
+
<!--
 
|-
 
|-
 
|2.06 BETA
 
|2.06 BETA
Line 1,132: Line 1,194:
 
-->
 
-->
 
|}
 
|}
Quick review: Igor code called PET (PMF evaluation tool) assists in a PMF analysis. Anyone with credentials can download PET; there is no need to request an individual account. When you click on a link to download, a prompt should appear asking for credentials. If you need credentials email [mailto:sueper@colorado.edu?Subject=Igor-PMF%20Question Donna] The PET code v3.08 has been tested on these versions of Igor: 6.37, 7.08, 8.04, 9.0.2
+
Quick review: Igor code called PET (PMF Evaluation Tool) assists in a PMF analysis. Anyone with credentials can download PET; there is no need to request an individual account. When you click on a link to download, a prompt should appear asking for credentials. If you need credentials to download items on this wiki email [mailto:support@aerodyne.com?Subject=Igor-PMF%20Credentials Donna]. The PET code v3.08E has been tested on these versions of Igor: 6.37, 7.08, 8.04, 9.0.6
 +
 
 +
A PMF executable file and a license file is required for PET to interface with. The executable file can be downloaded using the PMF_executable_package link above. Originally the PMF executable and license files could be purchased by sending an email to its creator Pentti Paatero (retired U. of Helsinki). The Swiss company Datalystica is now the sole official seller of the multi-linear engine (ME-2) solver package. This ME-2 package will contain a license that is used for the PMF executable. Once this ME-2 package is downloaded, copy and rename the ME2key.key file to pmf2key.key and place this in the same folder as the PMF executable files. Please go to [http://datalystica.com/me-2-solver/ https://datalystica.com/me-2-solver/] to purchase ME-2.
  
PMF executable code is required for PET to interface with. PMF is available for purchase from Penti Paatero, [mailto:pentti.paatero86@gmail.com?Subject=PMFOrder%20Pentti Paatero] and you must email and pay him separately and after payment he will provide a license key which allows PMF to run. 
 
 
The Igor PET code is not needed to run PMF; PET provides a handy interface to generate and view PMF results.
 
The Igor PET code is not needed to run PMF; PET provides a handy interface to generate and view PMF results.
  
 
+
<!--
 
==BareBones Starter Kit==
 
==BareBones Starter Kit==
 
Download the [http://cires.colorado.edu/jimenez-group/PMFResources/Downloads/BareBones_PMF_Starter_Kit.zip BareBones_PMF_Starter_Kit.zip] described for [[#Setting up PMF on Your Computer | initial installation of PMF on a computer]].
 
Download the [http://cires.colorado.edu/jimenez-group/PMFResources/Downloads/BareBones_PMF_Starter_Kit.zip BareBones_PMF_Starter_Kit.zip] described for [[#Setting up PMF on Your Computer | initial installation of PMF on a computer]].
Line 1,182: Line 1,245:
 
   
 
   
 
PET version 3.0 has been uploaded to the web site and fixes the bug.
 
PET version 3.0 has been uploaded to the web site and fixes the bug.
 +
 +
-->
  
 
= Other Resources =
 
= Other Resources =
Originally the PMF software could be purchased by sending an email to its creator [mailto:pentti.paatero86@gmail.com?Subject=PMFOrder%20Pentti Paatero].  The Swiss company Datalystica is now the sole official seller of the multi-linear engine (ME-2) package, which contains the PMF executable. Please go to [http://datalystica.com/me-2-solver/ https://datalystica.com/me-2-solver/] for further assistance.
 
 
 
 
<!--  
 
<!--  
 
* FTP sites with PMF code, documentation
 
* FTP sites with PMF code, documentation

Latest revision as of 08:27, 30 October 2024

A shortcut to this page is http://tinyurl.com/PMF-guide

The PMF Evaluation Tool (PET) was described in Ulbrich et al., ACP 2009. Please cite the tool with this work in publications in which you have used the PET. The PET consists of several Igor procedure files (ipfs). This wiki serves as the help and documentation for the software. To run PMF with the panel, the PMF executable and associated files, accessed separately, are required (see Section 2, Installing PMF with Igor).

The ipfs were written by Ingrid Ulbrich (formerly of the Jimenez Group, University of Colorado, Boulder) and Donna Sueper (Aerodyne Research and Jimenez Group, University of Colorado, Boulder) and Greg Brinkman (Hannigan Group, University of Colorado, Boulder). Questions about this code can be addressed to Donna.

PMF (Positive Matrix Factorization) was developed by Dr. P. Paatero (retired, Dept. of Physics, University of Helsinki). Links to Paatero's PMF documentation and many PMF method papers can be found in Section 9, Other Resources.

Originally, the PMF executable and license files could be purchased by sending an email to Paatero. Now one can freely download the pmf executable files below, but a license file is needed for it to run. Please email Donna for credentials to download the executables. The Swiss company Datalystica is now the sole official seller of the multi-linear engine (ME-2) solver package that contains a license that is used for the PMF executable. Once the ME-2 package is downloaded, copy and rename the ME2key.key file to pmf2key.key and place this in the same folder as the PMF executable files. Please go to https://datalystica.com/me-2-solver/ to purchase ME-2. One license file can be used for all in the same research group.

This Igor toolkit was intended for use in analyzing AMS data, but there are only few assumptions in the toolkit relating to AMS-type data. Some information on ways to create the necessary waves and matrices from non-AMS data are found in Section 3.2.4 below.

Contents

A Message to Contributors

We want to encourage active participation by all users in the evolution of the information contained within this wiki and welcome the addition of content that is beneficial to the community as a whole. However, please DO NOT delete any content from this page!! Significant time, effort, and deliberation has gone into the information contained in this page. Rather than deleting content, please feel free to voice your concerns by posting a comment to the discussion page where others can contribute (please be sure to include a topic to be referenced in responses).

Installing PMF with Igor

This section has been modified with new information on 23 October 2024 by Donna

PMF and Operating Systems

The PMF executable is compiled only for Windows/DOS; it has been used with Windows 11, 10 and older operating systems Windows XP and Windows Vista. Using OneDrive to access the PMF executable file within PET has been problematic for some users. If possible, it is recommended that users access the PMF executable and the Igor template experiment PET from a local drive.

For users with Macs, two options have proven successful. One method is to execute PMF on a Windows computer and then analyze the experiment on a Macintosh. The other method is to execute PMF under a Windows emulator on a Mac. Note that in this latter case, you may need to have the PMF executable somewhere in the c:\ directory; this isn't necessary when running Windows on a PC.

Setting up PMF and PET on Your Computer the First Time

Download the PMF executable package

The file pmf2wopt.exe is the PMF solver. It is downloadable via the PMF executable package below for free; however a username and password is required. Email Donna for these credentials. An older executable file named pmf2wtst.exe also comes with this package but is not typically used. The PMF executable package will also contain a initialization/configuration file named mypmft.ini. This pmf initialization file contains parameters that is used by PET but is not explicitly needed for the executable to run. For every call to the PMF executable PET will generate a 'working' initialization file based on mypmft.ini. This file, named imupmf.ini, is specific to the PET package.

PMF executable file step:

Step 1: Download the zipped PMF executable package and save locally on the computer from where it will run.

Purchase a group PMF license

Along with the PMF executable file a user will need a license file named pmf2key.key. This pmf2key.key file is checked when the PMF executable begins and ensures that a license is valid. If this key file is missing the PMF executable will abort without any analysis taking place. The license, or key file, is purchased through Datalystica, a Swiss company that obtained the rights from the originator of the PMF code, Penti Paatero (University of Helsinki, retired). For PMF usage, one need not purchase SoFi from Datalystica, only ME-2 (multi-linear engine 2), a linear solver for which PMF is a part. This ME-2 package will contain not only executable files such as me2gf*.exe, but will come with a key file named me2key.key. This same key file for ME-2 executable files will also work for PMF executable files, but it must be copied and renamed pmf2key.key and placed in the same folder as the PMF executable files. Every single end user within an organization is not required to obtain a license; users within the same research group can share a license. In Paatero's words "If you drink coffee together in a real room, you are a group." Datalystica does not support PET and PMF but Datalystica employees may be able to answer questions regarding ME-2. The me2key.key and pmf2key.key license file is small and will likely have characters #PMF2KEY#PMF3KEY#ME-2KEY#3127#MSDOS somewhere in the first line.

PMF license file steps:

Step 1: Purchase and download ME-2 solver from Datalystica

Step 2: Copy the file named me2key.key into the PMF executable folder and rename as pmf2key.key

Download PET Igor template experiment or zipped file of ipfs (ipf = Igor Procedure File)

The Igor interface for viewing and evaluating PMF solutions is called PET (PMF Evaluation Tool).

PET step: Step 1: Download the PET Igor template file (IgorPMF_Template_v*.pxt) and save locally on the computer. It is not recommended that the Igor template experiment reside in the same folder as the PMF executable package.

From the top most Igor menu there is a section named "PMF". From this menu there are these options:

  • "Perform PMF analysis *Step 1*"
  • "View PMF analysis results *Step 2*"
  • "Small Pop-Plots Panel for *Step 2*",
  • "Compare PMF results with external factors *Step 3*"

Selecting "Perform PMF analysis" will bring up a panel from which you can preform data preparation steps and select data and conditions for running PMF.

Selecting "View PMF analysis results" will bring up a panel from which you can select waves that were previously input into PMF. Once choices have been selected the large PET panel with multiple graphs showing results and diagnostics will appear. Within one PET experiment, it is possible to run PMF either on multiple data sets or under different conditions. This panel allows one to select the PMF results from possible different data sets or conditions.

Selecting "Small Pop-Plots Panel" will generate a mini version of the big PMF results panel. This option was initially intended for users with small screen sizes.

Selecting "Compare PMF results with external factors *Step 3*" will generate a panel from which one can compare PMF results with other data that was not included in the PMF analysis. For example, if one had some gas phase or meteorological measurements with the same time stamps as the input data, this option will show a panel of correlation plots of PMF factors with this 'external' data.

Set up options

It is recommended that users keep the PET Igor template experiment either within a Wavemetrics folder/subfolder or adjacent to the PMF executable package folder. Once PET data preparation steps have been completed and the gold "Being PMF..." button is pushed, the user will be prompted to save the template experiment with a unique name.

One may have many copies of the PMF executable packages on one computer. A file organization suggestion is to set up multiple copies like so: C:\Users\*\Documents\PMF\PMF Executable1, C:\Users\*\Documents\PMF\PMF Executable2 , etc, each folder with a copy of the pmf2key.key file. One can have multiple versions of Igor open at once; this allows one to run multiple copies of PMF (from different Igor experiments) at the same time.

Upgrade PET version options

One can upgrade versions of the PET software without having to redo any PMF analysis steps. Once can download the latest PET ipfs (Igor Procedure Files) below and load the latest versions of the ipfs into an existing PET experiment and then kill all the old ipfs. You may want to kill and recreate PET panels such as the "View Results" and "Setup" panels to take advantage of new features.

If PMF does not Run Properly within PET

0. Make sure that the path to the executable is correct. If red text appears in the section 'PMF Executable File Path' then something is wrong. Look in the folder you created with the PMF2wtst.exe or pmf2wopt.exe file. If necessary go to the tab entitled "PMF defaults and advanced options' and reselect the executable file; then go to the run tab, 'PMF Executable File Path' and reselect the folder with your chosen .exe file.

1. Look in the PMF executable folder that contains pmf2wopt.exe (and/or pmf2wtst.exe) and check for the existence of the files

 Matrix.dat and StdDev.dat
If these files exist, Igor was able to access the pmf executable and key file. Continue with Step 2.
If these files do not exist, Igor was not able to access the correct folder. Check the existence and spelling of the license file pmf2key.key

2. Look in the PMF executable folder that contains PMF2*.exe for the existence of the file

 PMF2.LOG
If this file does not exist, PMF was not run in this folder. Go to step 3.
If this file does exist, PMF attempted to run in this folder. Open the file PMF2.LOG (it is a text file).
Glance down the contents of the file and look for many lines of sequential numbered output, such as
 1 rank1 step chi2=   9282.6     Penalty=  1.5287E+04 Flags GF
 2 rank1 step chi2=   7411.7     Penalty=  1.4084E+04 Flags GF
If these lines are in the file, PMF ran successfully on your computer.
If the lines of sequential numerical output are not in the file, you should find the following lines within this file (note that every line is NOT included here, but the lines selected here are in the order they appear in the file):

2a)

 ##PMF2 .ini file for: IMUPMF.INI  --- PMF Evaluation Panel v2.3
 Successfully read task initialization file imupmf.ini
 titled:  ##PMF2 .ini file for: IMUPMF.INI  --- PMF Evaluation Panel v2.3
If these lines appear in the file, PMF found the .ini file. Continue with Step 2b.
If these lines do not appear, you should see a message about not finding an appropriate .ini file.
  • Make sure that this folder contains the file imupmf.ini, provided with the PMF executable package.
If the file imupmf.ini already exists in the same folder as the PMF2*.exe file, continue with Step 3.

2b)

 Successfully opened input file      30
 with name MATRIX.DAT
 Successfully opened input file      31
 with name STD_DEV.DAT
If these lines appear in the file, everything should have run correctly. Look at the rest of the PMF2.LOG file to see whether other errors are reported. If you still encounter difficulty, contact Donna for assistance and attach the PMF2.LOG file to your email.
If these lines do not appear in the file, you will see a message about PMF not being able to access one of these files. Check that the files are not being used by other programs, delete the file PMF2.LOG, and press the second button on the panel again to see whether PMF runs successfully.

3. Look in the C:\ directory of your computer (C:...\My Documents with the full version) for the existence of the file

 runpmf.bat
If this file exists, Igor was able to write to your C:\ drive. Continue with Step 3a.
If this file does not appear, Igor was not able to write this file to your C:\ drive. This might be due to high security settings on your computer. You should create a text file with this name (NOT runpmf.bat.txt) and choose to edit it (not open it) with a text editor (e.g., Notepad, WordPad, Emacs, etc.). The file should contain the lines
 cd C:\Documents and Settings\*\Desktop\
 pmf2wopt imupmf  
NOTE that you must change the path in this example to the path to the folder where you have put the file PMF2*.exe!

3a) Execute runpmf.bat by double-clicking on its icon. You should see the black DOS window pop up, scroll output, and close again.

If this happens, PMF has run successfully from the batch. In the Igor experiment, press the second button on the panel to see whether PMF runs successfully.
If the black window does not show scrolled output, continue with Step 3b.

3b) You now need to execute PMF from the command line.

To access the DOS command line, launch the Command Prompt from your Windows Start menu, located in
 Start -> Programs -> Accessories -> Command Prompt
Change from the path displayed in the prompt to the folder where you put the PMF2*.exe file. Use the command
 cd
to change directories (e.g.)
 cd Desktop\PMF
You can change one directory at a time, or several at a time as shown in the example. To move up 1 directory, use
 cd ..	
Be sure that the folder contains the three files
 pmf2wopt.exe and/or PMF2wtst.exe
 imupmf.ini
 pmf2key.key
by listing the contents of the current directory, using the command
 dir	
Then type this command at the prompt to run PMF:
 pmf2wopt imupmf

or

 pmf2wtst imupmf
You should see scrolling output in the command window. The window will not disappear and you can scroll back through the output (which is also saved in the file PMF2.LOG).
If the output contains many lines of sequential numbered output, such as
 1 rank1 step chi2=   9282.6     Penalty=  1.5287E+04 Flags GF
 2 rank1 step chi2=   7411.7     Penalty=  1.4084E+04 Flags GF
PMF ran successfully here. Make sure to close this file after you are finished examining it so that the next time PMF runs it won't complain about writing to this file.
If the output does not contain these lines, go back to Step 2 of this section to examine errors that might be reported by PMF in the file PMF2.LOG.

Data Preparation *Step 1*

Exporting the Data by Instrument, Analysis Software Type into PET

  • PMF analysis requires two 2-dimensional matrices of the same dimensions, a data matrix and a matrix corresponding to uncertainties, which is typically the calculated errors. Both of these matrices must not have any nans (Not a Number) values. In the data preparation section of PET there is a step for the automated removal of rows containing all nans and columns which contain all zeros or nans. The data matrix may contain negative values (PMF solutions will always be >=0) and the uncertainty matrix must have all values greater than zero. In the data preparation section of PET there is a step for the automated setting of a minimum non-zero error values.

The Igor PET PMF tool needs additional 1-dimensional input waves to facilitate the plotting and interpretation of PMF solutions. Required are a 1-dimenensional numeric wave corresponding to index values for the rows, which typically a indicates measurement time in increasing order and 1-dimenensional numeric wave corresponding to index values for the columns, which is typically an m/q in increasing order. For data sets where the columns correspond to high resolution data, i.e. each column indicates a chemical formula such as C2H3O, a 1-dimensional text wave will also be required.

ToF AMS, Q-AMS & ToF ACSM, Q-ACSM

  • Regardless as to whether the data has been generated from a quadrupole or ToF, users should select to use MS airbeam correction option for generating the matrices. This is because the error matrices will be generated using the airbeam correction factor if it exists. In the various analysis software packages users may generate the data in units of ug/m3 or Hz. An Igor text file can be generate to facilitate the exporting of all relevant input information for PET.
  • Some steps are slightly different between manipulation of the quadrupole and the ToF data sets. In the quadrupole the removal of spikes and the smoothing of data may be necessary, where as in the ToF data sets, neither may be required (due to the fact that the quadrupole only samples one m/z at a time).
  • For AMS and ACSM data an additional 1-Dimensional numeric wave of the same dimension as the rows of the data matrix will be generated that contains the minimum error, the uncertainty of measuring of 1 ion per time period. This wave is recommended to be used in the PET data preparation section.
  • Buttons in the analysis software facilitate the export of data. Typically, the species exported is organics (Org) or high resolution organics (HROrg), although users have had success in combining this matrix with other signal (i.e. nitrate, etc. or in analyzing binned raw spectra (un-speciated).
In ToF AMS Analysis Software (SQUIRREL/PIKA)
  • Buttons are provided for the export of data to PET.
  • It is important to know that the CE and RIE are *NOT* applied to the Org and Org error matices and HROrg and HROrg error matrices if you choose unit so ug/m3. The general convention is that whenever Squirrel or Pika outputs anything with an m/z dimension the units are NO3-equivalent. By using nitrate equivalent mass, we are scaling all ions equally (i.e. not scaling differently according to which species each ion contributes to). This means that the mass spectra in ugm3 units are identical to those in Hz units except for a single scaling factor. The thinking behind this convention is that whenever we have an m/z dimension, such as with an average mass spectra, we can sum the unit resolution at any nominal mass to compare to the raw spectra.
In the Q-AMS Analysis Software (James')
  • Updated Q-AMS analysis software will have buttons and other controls to easily generate * Be sure to use v1.41 or later of the Q-AMS Analysis Software ("James' Program"). Corrections have been made since earlier versions to the error calculation routines!
  • Download from Qi Zhang's website Extract Waves&matrices v 1.1.ipf (note that v1.2 is for Squirrel/HR data) and include it in your experiment with James' software.
  • Call org_mats, which will calculate a data matrix (organics_MS) and error matrix (organics_MS_err) in root: in your expeiment, and save these matrices along with the timeseries for organics, sulfate, nitrate, ammonium, and chloride in a file called "WavesMatricesForOrganicAnalysis.itx" (Igor should prompt you for the folder where you want to save the data). All of the data are saved in ug/m3 with all corrections (CE, RIE) applied.
  • You may want to load the saved waves into a new experiment to run PMF.
    • You'll also need to include your time series wave (t_series) and a wave of the m/z's in the matrix (amus).
Recommended Practice: Removing Spikes (generally, for Quadrupole data)
  • "Spikes" in the time series of an m/z can occur in Q-AMS data from large but infrequent particles during the scanning of the quadrupole. If such spikes have a common source with a factor that can be retrieved by PMF, they may increase the variation of that factor profile and additional factors may be found that represent this variation, but not a physically-meaningful, separate component. The "excess signal" from these spikes can be subtracted from the spikes and the average mass spectrum of the spikes examined. See Zhang, Q. et al., ES&T, 2005.
  • Note that if you remove "excess signal" in the method of Zhang et al. 2005 and leave the error values for these points unchanged, you are automatically "downweighting" these points in PMF. This is appropriate because the replacement value for the original spike is not known as well as the values for points without spikes.
Optional Practice: Smoothing
  • Smoothing can be used to reduce high-frequency noise in the data that could also be fit as additional factors. If you smooth the data, be sure to propagate this smoothing in the error matrix (not added to the wiki yet).

Tofware for ToF CIMS and ACSM data

Controls in Tofware allow the export of all necessary Igor waves into one external file.

For AMS Organics + Other Signals including AMS Inorganics

Setting up the signal and error matricies for org+inorg is not terribly difficult. The interpretation is much more complicated and nuanced than for a PMF analysis on only the organics. One can look through 2009 - 2019 AMS Users Meeting talks by searching for "PMF" to review many options that have been presented and subsequently published.

Further Data Preparation Before Calling PMF (Prep tab in the PET panel)

  • The following steps are recommended for AMS datasets and follow the practices laid out in Ulbrich et al., ACP, 2009 (with more detailed references in each section below). Note that the only mandatory step is step 0 (importing data into the experiment and selecting it) and step 4, (Deleting NaNs/zeros). More extensive documentation on use of the remove/delete nan functions is given in the header comments in the ipf pmf_ErrPrep_AMS_v*.ipf.

Before running these functions, you may want to rename your error matrix so that it has fewer than 12 characters. Each data prep step lengthens the wavename and the code will complain if the name gets too long.

Data Prep Step 0. Select Initial data

  • From the top Igor menu Data - Load Waves load in the Igor file that you exported from the analysis software - Tofware, Squirrel, etc. It is often good practice to create a data folder to contain these newly imported data waves.
  • Select the data folder containing the PMF input and subsequent DATA, ERROR, etc waves.

Recommended Practice: Data Prep Step 1. Apply a Minimum Error

  • Ions arrive at the mass spectrometer detector with a Poisson distribution. The error for a counted number of ions is sqrt(counted number of ions). The smallest number of ions we can count in one run is, of course, zero ions, but perhaps there was one and it was missed. The error for counting zero is sqrt(0), but an error of 1 would be more appropriate in this case. Hence a minimum error threshold of 1 ion is set.
  • Within this step the user must choose the mass spec detector type. For ToF detectors, there is an m/z dependence to account for the unequal 'pushing' of the ions into the detector. Small m/z ions move faster than large m/z at an approximate rate of square root of the mass. This phenomena is sometimes referred to as an 'm/z duty cycle' correction. The application of a minimum error must undo this duty cycle correction if it was applied to the data to reflect true ions counted. In practice then, when one selects the AMS-ToF for the MS type, a sqrt(28)/sqrt(m/z) factor is temporarily multiplied to the error matrix. For AMS-quad data and for Tofware-processed CIMS data, this undoing of the correction factor is not needed.
  • The one ion wave needs to be in the same units as the data and error matrices. If the data and error matrices are in units of Hz and the data acquisition (for AMS the actual time measuring i.e. in the 'MS open' mode) for each run is 1 second, then the one ion wave is simply a wave containing all 1s. If each run had an MS open mode lasting one minute, the wave would consist of 1/60. If the data and error matrices are in typical AMS units of ug/m3 then one needs to convert from Hz using the flow rate, ionization efficiency, AB correction factor, etc. This will have been done for you in the PMF export step in Squirrel, Pika, ACSM Tofware code.
  • By pressing the "Apply minimum error" button the function produces the following waves:
  1. The error matrix with minimum error applied called nameOfWave(errMx)+"_min" (e.g., Orgerr becomes Orgerr_min)

See also discussion of Ulbrich et al., ACPD 2008, P. Paatero comment (p. S5730) and Author response (p. S11960)

Data Prep Step 2. Removing Spikes (data dependent)

  • By pressing the Remove spikes (optional) button new data and error matrices will be generated with spikes removed.

Data Prep Step 3. Propagation of Smoothing (generally not relevant for ToF data)

  • Any smoothing of the data matrix must be propagated in the error matrix. The function pmf_err_propogateSmooth propagates box or Gaussian smoothing.

Some notes about specifying the smoothing that was performed for the data:

  1. Allowed types of smoothing are "box" and "Gaussian".
  2. The type of smoothing that is done in the AMS software is selected in the Misc tab.
  3. The number of points used in box and Gaussian smoothing is used as defined in Igor.
    • For box smoothing, the number of points refers to the size of the box. I.e., smoothing that includes 1 adjacent point on each size is 3-point smoothing.
    • For Gaussian smoothing, the number of points refers to the number of adjacent points used. I.e., smoothing that includes 1 adjacent point on each side is 1-point smoothing.
  • By pressing the "Propogate smoothing to the error mx" button the function produces a wave with the propagated error called nameOfWave(errMx)+"Prop" (e.g., OrgMSerr_Min becomes OrgMSerr_minProp).

Data Prep Step 4. Removing Nans, 0 columns

  • This step is required for all data.
  • The data matrix produced in the previous steps may have NaNs in all rows from bad runs and 0 values in columns with good runs that have no fragments of interest (i.e. Argon, or m/z 40 for AMS data). All of these rows and columns need to be removed before running PMF.
  • The function produces a wave with the propagated error called nameOfWave(errMx)+"_noNans" (e.g., OrgMSerr_Min becomes OrgMSerr_min_noNans).

Data Prep Steps 5. and 6. Recommended Practice: Downweight "Weak" Variables (m/z's)

  • Any m/zs that have low signal-to-noise ratio (SNR) may, in fact, have more noise than signal. If these m/zs contribute enough Q, PMF tries to fit the noisy data. In this way, the inclusion of such m/zs can be detrimental to the PMF analysis. If the error associated with these m/z 's is increased, the Q-contribution (residual/error) is decreased, "downweighting" these points' contribution to the fit. m/zs with SNR<0.2 are considered "bad" by Paatero and Hopke (2003) and should be removed or strongly downweighted (factor of ~10). m/z 's with 0.2<SNR<2 are considered "weak" and should be downweighted (factor of 2-3).
  • When a user presses the "Calculate, view SNR panel" A separate panel appears that shows time series, errors and SNR calculations. This tool is handy to examine ions that are deemed to be "bad" or "weak". This panel is informational only and is used in step 6.
  • When a user presses the "Downweight weak, bad mzs" button the code will generate new error matrices that have the bad and weak m/z columns adjusted. The function generates a new error matrix called nameofWave(errMx)+"Wk" (e.g., noNaNs_orgMSerr_minProp would become noNaNs_orgMSerr_minPropWk).

See also Paatero, P., and Hopke, P. K.: Discarding or downweighting high-noise variables in factor analytic models, Anal. Chim. Acta, 490, 277-289, 10.1016/s0003-2670(02)01643-4, 2003. Abstract

Data Prep Step 7. Recommended Practice: Downweight Peaks Related to m/z 44 in Frag Table (for AMS and ACSM data only)

  • In the default fragmentation table, the information in m/z 44 is repeated 6 or 7 times (in m/z 's 16, 17, 18, 19, 20, (28 in the Aiken et al. 2008 revision), and 44) with different proportionalities. PMF fits correlations, regardless of the magnitudes of the signals. Repeating the information of m/z 44 several times implies that it's really (x6 or 7) important, which it isn't! It is possible to downweight the columns of these m/z 's so that in total they only contribute the m/z 44 signal once. (It would be possible to remove the repeated information and replace those columns after running PMF, but we think that downweighting them is logistically simpler.)
  • There are two buttons, one for UMR AMS or ACSM data and another for HR AMS or ACSM data. Once you press the button the List of duplicate ions will be filled in. If you are analyzing HR data and using the default HR organics frag waves, the equivalent set of HR ions of (the mz44list) correspond to O, HO, H2O, CO and CO2. If you are analyzing organic UMR data the default list will include 16, 17, 18, 19, 20, 28, 44. Then press the Downweight duplicated columns button.
  • The function generates a new error matrix called nameofWave(errMx)+"44" (e.g., noNaNs_orgMSerr_minPropWk would become noNaNs_orgMSerr_minPropWk44).

Data Prep Conclusion: Indicate that all Data Prep Steps have been Completed

  • Press the gold "Select prepared waves for PMF" button. The second tab of the PET panel, "Run", will become active.

Perform PMF Analysis

  • At this point you are ready to indicate where the PMF executable file resides, where the prepared data and error matrices reside within the Igor experiment and what PMF solutions you are wanting to investigate.

Press the "Create path to PMF executable and batch files" button. A window will pop up asking you where the PMF executable file resides. The path will be displayed once a user has selected a folder. If this path text is red, something is amiss and the code couldn't find the PMF executable.

If you pressed the gold button "Select prepared waves for PMF" in the previous data prep tab, entries in the data section will be filled in for you.

    • Choose model error
      • It is very rare for users to change this model error value from the default of 0. (PMF increases the errors provided by newError = oldError + modelError*dataValue)
  1. Choose the type of PMF analysis
    • Exploration will run PMF for a range of number of factors and FPEAKs or SEEDs. This is the typical use for exploring a dataset and comparing many solutions.
    • Bootstrapping explores the uncertainty of one solution (i.e., one number of factors at one fpeak for one seed). This is usually a final step run only on the solution you have selected from the exploratory analysis.
  2. Choose a range for number of factors.
    • When checking to make sure that everything runs properly, you may want to run just one case (Min p = 2, Max p = 2; p is the number of solved factors).
    • Recommended Practice: Run cases with 1 factor to have a context for the meaning of the 2-factor solution.
    • In the Bootstrapping mode, only the "min p" is read; the "max p" is ignored.
  3. Choose FPEAK or SEED values
    • "FPEAK" is a tool used to explore rotations of the solutions of a given number of factors. Note that FPEAK does not explore all possible rotations of a solution. FPEAK = 0 does not apply any rotational forcing. Non-zero values of FPEAK create near-zero values in the factor profiles (mass spectra) or time series. More information about FPEAK can be found in the PMF Users Manual Part 1 (pp. 9,12,14,21) and Part 2 (p. 24), and in several papers by P. Paatero (see Other Resources).
      • A good first set of FPEAK values is -1.0 to +1.0 with a delta value of 0.1 or 0.2. For a full analysis, a wide enough range of FPEAKs to achieve Q/Qexp of at least 1% above the minimum value is recommended.
      • In Exploration mode when varying the fpeaks, the Seed is set in headers in the PMF_Execution...ipf file: constant DEFAULT_SEED = 0
      • In Bootstrapping mode, only the "min fpeak" is read; the "max fpeak" is ignored.
    • "SEED" is a tool used to choose different random starts (initial values) for the PMF algorithm. Using different seeds may lead to solutions in different local minima (Q/Qexp) in the solution space. One set of solutions may have more physical meaning than another, or multiple sets may make physical sense. It is impossible to test all start values, but testing many seeds may give an indication of local minima for your dataset. More information about seeds can be found in the PMF Users Manual Part 1 (p. 11) and Part 2 (p. 16).
      • Run seeds from 0 to your preferred maximum with a delta value of 1.
      • In Exploration mode when varying the Seed, the fpeak is set in the headers of the PMF_Execution...ipf file: constant DEFAULT_FPEAK = 0
        • Exploring Seeds with a non-zero fpeak should be done with caution, as the "push" of a magnitude of fpeak for one area of local minima may be different than the "push" of the same magnitude of fpeak for a different area of local minima.
      • In Bootstrapping mode, the Seed is set in headers in the PMF_Execution...ipf file: constant DEFAULT_SEED = 0
  4. Select checkboxes
    • Run PMF in background Each execution of PMF (see Exploration Mode Summary) creates a black DOS window that pops up. If the box is not checked, this window "grabs the focus" and makes itself the top window. This makes it hard to use the computer for anything else. If the box is checked, the window will not grab focus, but Igor and your computer's CPU will be busy.
    • Save experiment before and after PMF has finished It is generally a good idea to save the experiment before calling PMF so that all the data prep steps and user selections will be saved before attempting to call the PMF executable. Once the code has finished all its calls to the PMF executable it is a good idea to save the experiment because all the PNF results are now saved as waves within the experiment.

What the Software Does When You Press the Button...

Exploration Mode Summary

The software will execute PMF once for every combination of number of factors and FPEAK/seed. So if you run 1-6 factors and 5 FPEAKs, PMF will run 6x5=30 times. Each run starts a new black DOS window that will close when the run is completed. The duration of each run is printed in the history at the end of each run. In general, runs which solve for more factors and runs with FPEAK farther from 0 take longer. The code runs all of the FPEAKS or seeds for one number of factors, then advances to the next number of factors (e.g., run 1 factor with each of 5 FPEAK values, then 2 factors with each of 5 FPEAK values, etc.).

When the entirety of sets of calls to the PMF executable has completed many diagnostic values are calculated. The panel from where you selected "Begin PMF analysis with these choices" will be killed and a new panel "View PMF Analysis Results" will be created with PMF inputs filled in for you. Select options for possible additional calculations. For example for AMS/ACSM HR Organic data, calibrated elemental ratios can be generated for each factor.

A little more detail

The software writes the files

C:\delete_log.bat    C:\runPMF.bat

and writes the DataMatrix and ErrorMatrix as MATRIX.DAT and STD_DEV.DAT, respectively to the folder with your PMF Executable. The software also writes a file to that folder called STD_DEV_PROP.DAT, which has the same number of points as the DataMatrix and in which every element is equal to the ModelError. These files will be read by the PMF executable.

The software then enters a pair of nested loops in which the following steps occur:

  • for each number of factors
    • for each FPEAK or SEED
      • use the file
 mypmft.ini
as a template to create the file
 imupmf.ini
which is used as the control file for PMF.
  • The convergence criteria for completing the PMF calculations can be set in the 3rd tab of the PMR_PerformCalcPanel "PMF defaults & advanced options".
 //Default settings for PMF iteration convergence proportional to Qexp
 constant ITERATION_CHI2_PROP_QEXP = 0
 constant FIRST_ITER_LEVEL_PROP_CONST = 2e-6
 constant SECOND_ITER_LEVEL_PROP_CONST = 2e-6
 constant THIRD_ITER_LEVEL_PROP_CONST = 1e-6
You can read more about convergence criteria and large datasets in the PMF Users Manual Part 1 (pg. 10) (see Other Resources).
  • Delete the old PMF2.LOG file by running delete_log.bat
  • Execute PMF by running run_PMF.bat.
  • Wait for PMF to complete its run.
  • Load PMF output (including log file and factors)

At the completion of the loops, the software calculates some statistics from the output and then creates a panel to select PMF results for viewing.

Bootstrapping

The bootstrapping mode is developed after the method described in the EPA PMF v3.0 Users Manual Sect. 6.4 (see Other Resources). The bootstrapping method is used to estimate the uncertainty in both the factor mass spectra and time series. This is achieved by running PMF on the full dataset once (from one PMF solution (i.e., combination of number of factors and fpeak) at a time) and then making a series of variations (the number is specified by the user when selecting the bootstrapping mode) in which a subset of the original rows (mass spectra) are randomly replaced by other rows from the original matrix and running PMF on each of these.

For each new PMF case ("bootstrapped case"), the resultant factors are compared to those from the original dataset and assigned as "matching" the original factor with which it has the highest correlation. Bootstrapped cases in which each bootstrapped factor was matched to exactly one of the original factors (i.e., there is a one-to-one mapping between original factors and those from the individual boot-strapped cases) are retained for calculation of the average mass spectrum and time series of bootstrapped factors. Plots of the original factors and the average bootstrap factors with 1-sigma variation bars are produced automatically.

EPA PMF Users Manual recommends doing 100 bootstrap runs for final results.

A little more detail

All output from the bootstrapping runs is saved in folder root:pmf_bootstrap:.

Row (mass spectrum) replacement is performed by using the StatsResample function in Igor to select rows for replacement. The row values are then sorted in increasing order as a convenience.

  • The 2d wave RowsToBeReplaced records the rows to be used in each bootstrapped case. Each column represents one bootstrap case. Each column lists the rows of the original matrix included in that bootstrapped case.
  • The 2d wave ReplacementHistogram counts the number of times that each original matrix row was used in a bootstrap case. Each column represents one bootstrap case. Summing the rows of this matrix gives the total number of times that each original matrix row was was used in the bootstrapping cases.

The assignment of bootstrapped factors to the factors from the original case is made by Pearson R correlation. Factors are assigned only on the basis of mass spectral comparison, and each factor is assigned to one of the original factors.

  • Note that this is different than in EPA PMF. Our code does not have a criterion for the lowest allowable correlation between bootstrapped and original factors. In EPA PMF, factors that fall below this limit are "unmapped"; no factors are "unmapped" in our code.
  • Note that if the original case has factors that are very similar to each other, the assignment of the bootstrapped factors may be incorrect or ambiguous. No current work has been done to give guidance as to what "very similar" means. No sanity checks are made in the code for this type of situation.
  • The 3d wave FactorProfile_Rval stores the correlation between the boostrapped and original factors. Rows represent the factors from the original case, columns represent the factors from the bootrapped cases, and layers represent each bootstrapped case.
  • The 2d wave FactorProfileSort contains the number of the original factor to which each boostrapped factor (row) has been matched in the bootstrapped case (column). Columns which contain (e.g., in a case with three factors) "0, 1, 2" have factors which were uniquely matched to the columns in the original case; these will be included in the averages of mapped factors. Columns which contain duplicate entries (e.g., "0, 1, 0") have multiple factors that were matched to the same original factor; these cases will not be included in the averages of the mapped factors.

Note that the criteria for including bootstrapped cases in calculations of the average bootstrapped factors is different in our code than in EPA PMF. Bootstrapped cases in which two or more factors are mapped to the same original factor are not included in the averages in our code. In these instances, the whole bootstrapping case is rejected before calculating averages. In EPA PMF, all bootstrapped factors mapped to the same original factor are included in the average.

Important: At the present time the plots produced by the bootstrapping code expect to find the waves noNaNs_amus (for plotting profiles) and noNaNs_t_series (for plotting time series). If these are not the names of your waves that describe the columns and rows, respectively) of the input matrix, you should duplicate your row and column description waves to have these names.

When PMF seems to work for some solutions but not others

In the "Perform PMF calculations" panel, "PMF defaults and advanced options" tab, in versions 3.08D and higher there is a checkbox named "Save the log file for every iteration". If this is checked, then after every call to PMF the PET code will copy the PMF2.LOG file and rename it PMF2_x_y.LOG where x and y are the current number of factors and fpeak. One can then look through these files to see what problems may have arisen for a specific case.

Running Two Simultaneous Analyses on Multi-Processor Computers

You can run two or more PMF analyses (in two separate experiments) on the same computer simultaneously if you have multiple processors. Each analysis will run at the same speed as on a single-processor computer (or when one analysis is run on the dual processor computer). The PMF executable is not "multi-processor aware," meaning that it can not utilize both processors simultaneously for one PMF run.

To run two simultaneous analyses with the PET, you'll need two directories on your computer with the PMF .exe, .key, and mypfmt.ini files. The directory names must end in "1" and "2," respectively. For example, you could have

C:\Users\*\Documents\PMF\PMF Executable1
pmf2wopt.exe
pmf2key.key
mypmft.ini
C:\Users\*\Documents\PMF\PMF Executable2
pmf2wopt.exe
pmf2key.key
mypmft.ini

Each experiment running PMF must use a separate Executable directory. While a PMF analysis associated with a PMF Executable directory is running, a file called "PMFrunning.txt" exists in that directory. If you try to run your second PMF analysis using the same Executable directory, the PET will give you an error. You can choose a different Executable directory and press the button to start the analysis.

Convergence

PMF uses a variety of metrics to assess the progress of finding a solution. Some of these metrics are based on parameters that were initially optimized for AMS data, and hence, may not be appropriate for other data sets. The last tab in the PMF_PerformCalcs panel has a section entitled Q convergence criteria which lists some defaults; users may modify these values to achieve convergence. The log text file, PMF2.LOG generates information about the progress for finding a solution. As an example the following is an excerpt that was written to this file:

The log text file, PMF2.LOG generates information about the progress for finding a solution. Below is an excerpt from a file where a PMF solution didn't converge.

299 rank1 step chi2= 6.22372E+06 Penalty= 2.5296E+04 Flags GF 300 rank1 step chi2= 6.22364E+06 Penalty= 2.5295E+04 Flags GF

************************************************************  1
*/ Iteration interrupted because maximum step count
*\ has been exceeded in the robust mode. NO CONVERGENCE?
************************************************************  1
300 iter-steps in all. Final chi2=  6223643.5000 in the robust mode.

As a practical matter, the last version of the PMF solution is saved within PET and users can view this last result for this PMF run. However, because it did not converge graphs in the main PMF panel will be grayed out and the Q/Qexp value will be blank.

View PMF Analysis Results *Step 2*

Once PET has finished calling PMF and has calculated some diagnostics the "Perform PMF" panel will close and a "View Results" panel will appear with the previous selections of folders and waves preselected. For AMS/ACSM high resolution organic data additional calculations regarding elemental summing & ratios can be selected. Press the gold button "View PMF analysis" to proceed with generating the large PMF results panel.

Controls

Factor Space and Fpeak Slider

Fspace Fpeak slid 1.JPG

Save Solution Space Waves

Save Sol Spac Wv 1.JPG

Select Factor

Select Factor 1.JPG
Select Factor 2.JPG
Select Factor 3.JPG
Select Factor 4.JPG

Select species

Select Species 1.JPG

Select Multiple Factors or Fpeak/Seed

Mult Fact Fpk Seed 1.JPG
Pmap plotting.JPG
Factor space selected.JPG
Fpeak seed all.JPG

Mass Fraction

Mass Fraction/Variance

Mass Frac Var 1.JPG

RotMx

RotMx 1.JPG

Q Plots

Q versus Number of Factors

Q vs num Fact 1.JPG
Q vs num Fact 2.JPG

Q versus fpeak/seed

Q vs fpeak seed 1.JPG
Q vs fpeak seed 2.JPG

Factor Graphs

Time Series

Time Series 1.JPG
Time Series 2.JPG
Time Series 3.JPG
Time Series 4.JPG
Time Series 5.JPG
Time Series 6.JPG

Profiles (usually Mass Spectra)

Profile 1.JPG
Profile 2.JPG
Profile 3.JPG
Profile 4.JPG
Profile 5.JPG

Reconstruction and Residuals

Total Reconstruction

Total Reconstruction 1.JPG
Total Reconstruction 1.JPG

Total Residuals

Total Residual 1.JPG
Total Residual 2.JPG
Total Residual 3.JPG

Current Species Reconstruction and Residual

Current Spec Recon Res 1.JPG
Current Spec Recon Res 2.JPG

Scaled Residuals

All Species Scaled Residuals

All Spec Scal Resid 1.JPG
All Spec Scal Resid 2.JPG

Current Species Scaled Residuals

Current Spec Scal Resid 1.JPG
Current Spec Scal Resid 2.JPG
Current Spec Scal Resid 3.JPG

"RR" plot

RR plot 1.JPG
RR plot 2.JPG

AMS loadings and the RIE, CE and CDCE factors

When exporting species like Org and HROrg from all versions of Squirrel and Pika, users should be aware that the RIE or the CE or CDCE factors are not applied to the 2-dimensional data. The reasons for this rule are somewhat historical. Basically, whenever a result in Squirrel or Pika retains an MS dimension, as in the export of the 2D wave for PMF, which has dimensions of measurement time and UMR or HR mass values, the RIE and CE (or CDCE or HR CDCE values) are not applied.

Users may wish to compare factors and loading from PMF results to species time series loadings with the RIE or CE applied. Below is a guide for applying default RIE and HR CDCE to source factors within a PET experiment; similar operations can be used for CE or UMR CDCE. Be aware that by performing this operation, the user assumes that the RIE and CE (or CDCE or HR CDCE) is the same for all factors. This may not reflect reality. For example RIE calibrations and CE (or CDCE or HR CDCE) have been generated for bulk ambient organic aerosol; the RIE and CE for BBOA and COA, for example, may not be identical.

There are 3 main steps to incorporating the RIE and CE (or CDCE or HR CDCE) to PMF factors.

1. Generate a 1D wave matching the time series wave containing a multiplicative factor of the RIE and CE or CDCE and import it to the PMF experiment.

2. Remove points in this 1D wave that matches the nans removed in the data prep step of PMF.

3. Divide this wave to your time series factor waves in the PMF experiment.


More details and example of commands are below.

Step 1. Within your Pika experiment, the HR CDCE is root:HRCE_fphase; for UMR data it is root:CE_fphase. The RIE values can be found in the UMR or HR batch tables. Here are commands you can use and modify as needed.

Duplicate/o root:HRCE_fphase root:HRRIECEfactor // Use a similar command for UMR. This wave should be the same length as the t_series wave

root:HRRIECEfactor *= 1.4 // 1.4 is the RIE of HROrg in this data. We will divide the PMF factors by this HRRIECEfactor wave.

Save/T/M="\r\n" HRRIECEfactor as "HRRIECEfactor.itx" // Save this wave as separate itx file.

In this example I saved this file in the same place as the exported PMF text file (that you had loaded into your PMR experiment at the beginning of that analysis). You can also load this wave into your PMF experiment via the browse experiment command without saving it as an external text file.

Step 2. There are a few ways to accomplish this step. Within your PMF experiment and data folder containing the original data (before noNan_* versions of the data was created, create a time series wave to flag when rows (time points) were nan.

Load the HRRIECEfactor wave from step 1 and multiply this wave by it.

LoadWave/T "HRRIECEfactor.itx"

MatrixOp/o HRtimeseries = sumRows(HROrg) // HRtimeseries will be n x 1 matrix with nans in the correct places.

HRtimeseries /=HRtimeseries // make all the values in this wave nans or 1s.

HRRIECEfactor*=HRtimeseries // nan the correct time points

RemoveNaNs(HRRIECEfactor) // The Remove Points ipf containing this RemoveNaNs function should automatically be loaded into the PMF experiment via the line #include <remove points> in the PET scatter ipf.

Step 3. In the main results PMF panel, current factor tseries graph, the y waves are named root:pmf_plot_globals:TSeriesFactor1, root:pmf_plot_globals:TSeriesFactor2, etc. for any factor solution. You may choose to simply copy and divide these waves directly (all assuming you are in the same data folder as in step 1):

duplicate/o root:pmf_plot_globals:TSeriesFactor1 TSeriesFactor1RIECDCE

duplicate/o root:pmf_plot_globals:TSeriesFactor2 TSeriesFactor2RIECDCE //...


TSeriesFactor1RIECDCE/=HRRIECEfactor

TSeriesFactor2RIECDCE/=HRRIECEfactor //...


Know that any time you reselect the factor solution (fpeak, number of factors) in the main results panel, you will need to redo this duplication and division.

AMS HR factors colored by families hack

Versions 3.0 and higher of PET have incorporated data and code to create elemental ratios for high resolution AMS organic data. Information below is only useful for older (2.x versions of PET).

As of PMF version 2.08D, the grouping of HR ions info CH, CHO1 etc families and elemental ratios of factors is built into thePMF "View Results" panel. For historical purposes, a 'hack' for performing these tasks for use with older PMF versions is described below.

During the examination of factors of High Resolution AMS spectra, it is often helpful to plot the HR spectra summed into chemical 'families', i.e. familyCH which consists of all chemical fragments consisting only of C and H atoms, in a unit resolution dimension. As of November 2013 the following method can be employed.

(1) Add the ipf named HR_PMFsupplemental_v1_1.ipf to your PMF experiment.

(2) From the pika experiment from which the HR spectra was generated, copy the entire folder named HR into the PMF experiment. This can easily be done via the Data Browser - Browse Experiment feature in Igor. Specifically, in your PMF experiment, press the Browse Experiment button, select the pika experiment, and then in the right hand side of the window click and drag the HR data folder over to the left hand side, so that the cursor points to the root folder. The PMF experiment should now have the waves root:HR:HR_family_B copied into it. Not all the waves in this folder will be used, but it is easiest to simply copy them all.

(3) Execute the following line, or analagous, to calculate and plot the HR family grouped spectrum for a currently chosen PMF factor of interest. BreakHRSpectraIntoUMRFamilies(root:pmf_plot_globals:profilefactor1, root:noNaNs_ExactMassWaveNoIso, root:noNaNs_ExactMassTextNoIso, 1)

In the line of code above, the first parameter indicates which currently displayed HR factor is to be calculated. The second and third parameters indicate the HR mass value and the Mass Text identifier waves of the HR spectrum. These waves should be exactly the same as those you chose in Step 2 of the PMF tool, Description of COLUMNS of Data matrix (numbers) and (text). The last parameter is a simple true/false flag for indicating whether or not to plot and color the results.

Subsequent executions of the line, i.e. BreakHRSpectraIntoUMRFamilies(root:pmf_plot_globals:profilefactor2, root:noNaNs_ExactMassWaveNoIso, root:noNaNs_ExactMassTextNoIso, 1) will generate a new HR colored spectrum for the second factor (of the currently selected PMF solution).

Compare PMF Results with External Factors *Step 3*

Caution: This part of the code is a bit fussy! Please contact Donna if you have tried the tips here and have trouble getting this panel to work.

Setting up the External Data

  1. Check that the waves NaNsWv_amus" and "NaNsWv_tseries" (these were created when you deleted NaNs from your original matrix) are in the datafolder where your PMF output is being saved. If they're not already there, you'll need to find them and copy them to this location.
    • Versions 2.02 and lower used the strings "NaNsList_amus" and "NaNsList_t_series" to record the location of deleted NaNs. If you are upgrading experiments, make sure that these strings are in the folder where your PMF output is being saved; "NaNsWv_amus" and "NaNsWv_tseries" will be created and used for future calculations.
  2. You'll need separate folders (in root: ; they cannot be in a lower directory) for mass spectra and time series you want to use for comparison to the factors.
  3. The tricky part of using this panel is setting up your mass spectra and time series correctly.
    • Each wave must have either the same number of points as in the corresponding dimension of your noNaNs_ data matrix or the same number of points as in the corresponding dimension of your original matrix.
      • For example, if your original matrix is 3246 rows and 300 columns and your noNaNs_ matrix is 3200 rows and 268 columns, your "external" mass spectra can have 300 or 268 points; "external" time series can have 3246 or 3200 points.
      • As another example, if your original matrix is 100 rows and 320 columns and your "external" mass spectra has 300 points then you may extend the external mass spectra to 320 points, inserting nan values in the last m/z 301 - m/z 320 rows.
    • For mass spectral comparisons, download "full" spectra (usually 300 points) from the AMS Spectral Database instead of using the shortened ones provided in the 9th Users Meeting template. You should inspect the length of all waves from the Database to make sure that every one has the correct number of points for your work.
  4. IMPORTANT NOTES
    1. The reason for the restriction on the number of points in "external" comparison waves is the following: After you select the datafolders for the external mass spectra and time series, the code makes a folder inside each of these folders called "noNaNs". Each wave in the external datafolder is copied to the new directory. Then the code checks whether the waves have the same number of points as the same dimension in the matrix used to run PMF; if so, no change is made to the wave. If not, the code _assumes_ that the wave has the dimension of the original matrix (which it doesn't know about) and therefore uses the string "NaNsList_amus" or "NaNsList_t_series" to delete the rows that it believes were NaN in the original dataset. (It's ok if these waves still have NaNs (e.g., missing points in data from another instrument when the AMS data was good); only the points where both the factor and external waves have valid data are included in the correlation calculation.)
    2. Because of some internal coding restrictions, time series waves for comparison cannot include the string "series" in their name; such waves will not be created in the noNaNs folder.
    3. Each of the mass spectral and time series waves must be 1-dimensional. This means that you cannot use the waves from root:pmf_plot_globals: called TseriesFactor1, TseriesFactor2, etc. since these are 2-dimensional waves.
    4. Old versions of the mass spectra in the AMS Spectral Database have m/z = point number, meaning that point 0 = m/z 0. This is not usually the way that AMS matrices are saved from James' software or from Squirrel, so you may need to delete 1 point from the beginning of each spectrum that you use from the Database. This is important to check, or else you correlate the wrong m/z's with each other.

Choosing the Folders with External Data

  1. The first time you want to calculate factors you can do so by pressing the "External Data Panel" button on the main panel or by choosing from the PMF pulldown menu "Compare PMF results with External Factors *Step 3*".
    • Note that after you have accessed the "External Data Locations" panel, pressing the "External Data Panel" button on the main panel will not bring you back to the selection panel again. To choose different folders or force recalculations you must access the selection panel from the PMF pulldown menu.
  2. Select your external data folders.
    • Other choices from the pulldown menus include
      • "No external data of this type": The PET will not attempt to calculate correlations between factors and external data of this type.
      • "Update List": If after calculating factor correlations you wish to add new external waves of this type, choose this option to add the correlations for the new waves do your existing list of correlations.
  3. Press the button to proceed.

What the PET Does (and how to fix things if something goes wrong)

  1. In each external data folder, a folder called "noNaNs" is created as described in the IMPORTANT Note 1 above (item 3 of Setting Up the External Data).
  2. Each of these new waves is compared to every factor wave. (This can take a while if you have a lot of waves for comparison.)
    • If the factor and external waves have different lengths, the function aborts and tells you that the comparison function was called with waves of different lengths. Unfortunately, it doesn't give helpful information about which wave had the wrong length (we'll try to look into that and improve that error message).
      • If this happens, you should look in the "noNaNs" folder in the appropriate external data folder and check whether all of the waves have the correct length. Waves with incorrect numbers of points in this folder may be the result of incorrect wave lengths in the "external data" folder. Try to fix all of the problem waves and then run the function for calculating the scatter comparison again by choosing step 3 from the PMF pull-down menu. Recall that this is the only way to force recalculation of the comparison of the factors!
        • If forcing recalculation didn't fix the problem, delete the "noNaNs" folder in the appropriate external data folder and then recalculate again.
  3. The correlation values between the factor waves and the external data waves are stored in waves in the folder with PMF output called
      • "RcorrMx4d_Profiles" and "RcorrMx4d_Tseries" (with Pearson R)
      • "RcorrMx4d_Profiles_pear_mzGrt44" (with Pearson R, only for m/z > 44)
    • It is also possible use the function scat_calc_RCorrMx4d_UC() to calculate
      • "RcorrMx4d_Profiles_UC" and "RcorrMx4d_Tseries_UC" (with the Uncentered Correlation, as reported in [ Ulbrich et al., ACP, 2009]
  4. When the calculation is complete, the Scatter Panel is created.

Other Potential Problems and Solutions

  • Pulldown menus don't have lists. Each "noNaNs" folder should also contain a text wave called "TseriesWvsNms" for time series or "FactorWvsNms" for profiles. This wave is used for the pulldown menus in the panel. If this wave is missing, the pulldown menus may not work. There should also be a string of wave names in the folder called "TseriesWvsNmsList" or "FactorWvsNmsList"; if so, you can create the text wave by gen_list2txtWv(listStr, wvNm).

Some Other Notes

  • Order of the factors. Factors are numbered 1 to N and match the factors in the main panel, counting from the bottom of the factor plots.
  • Colors. Factor 1 is black, Factor 2 is red, Factor 3 is green, Factor 4 is blue, etc. Factors in this panel have the same color as they did in the main panel. In the overlay plots, the factor is its usual color and the external species is orange.
  • Size of Factor Space and Current Fpeak value sliders. The sliders in this panel and in the main panel control both panels simultaneously. Graph updates are slower with both panels. Be patient and don't click on anything until everything has updated.

More Features

  • Assign Groups to External Data. This feature allows you to reorder the external data waves and assign them to groups. Groups then define the colors used in the R bar plots in the panel and in the Comprehensive External Data Correlation Plot (below).
  • Comprehensive External Data Correlation Plot. This plot display the "R vs External Factor" plots for all factors at once.

Considerations for Choosing a Solution

  • This topic is beyond the scope of discussion on this wiki.

PMF Evaluation Tool Software

Version Date Release Notes PET ipf Files for Upgrading PET Igor Template Experiment PMF Executable Package
3.08E 24 Oct 2024 Release Notes v3.00 Zipped PMF file containing these 7 ipfs:

PMF_SetupPanel_v3_08E.ipf
PMF_ErrPrep_AMS_v3_08E.ipf
PMF_Execution_v3_08E.ipf
PMF_ViewResults_v3_08E.ipf
PMF_scatter_v3_08E.ipf
PMF_EACode_v3_08E.ipf
SNRAnalysis_v1_02A.ipf

or Igor PMF template experiment v3_08E and PMF_executable_package.zip


Quick review: Igor code called PET (PMF Evaluation Tool) assists in a PMF analysis. Anyone with credentials can download PET; there is no need to request an individual account. When you click on a link to download, a prompt should appear asking for credentials. If you need credentials to download items on this wiki email Donna. The PET code v3.08E has been tested on these versions of Igor: 6.37, 7.08, 8.04, 9.0.6

A PMF executable file and a license file is required for PET to interface with. The executable file can be downloaded using the PMF_executable_package link above. Originally the PMF executable and license files could be purchased by sending an email to its creator Pentti Paatero (retired U. of Helsinki). The Swiss company Datalystica is now the sole official seller of the multi-linear engine (ME-2) solver package. This ME-2 package will contain a license that is used for the PMF executable. Once this ME-2 package is downloaded, copy and rename the ME2key.key file to pmf2key.key and place this in the same folder as the PMF executable files. Please go to https://datalystica.com/me-2-solver/ to purchase ME-2.

The Igor PET code is not needed to run PMF; PET provides a handy interface to generate and view PMF results.


Other Resources

  • Ingrid's presentation of the PET at the 9th AMS Users Meeting in three parts (1. Overview and PMF Execution; 2. Viewing PMF Results; 3. Using the Scatter Plot Panel)

Some PMF Method Papers

  • Paatero, P., and Tapper, U.: Analysis of different modes of factor analysis as least squares fit problems, Chemom. Intell. Lab. Syst., 18, 183-194, 1993. Abstract
  • Paatero, P., and Tapper, U.: Positive Matrix Factorization: a non-negative factor model with optimal utilization of error estimates of data values, Environmetrics, 5, 111-126, 1994. Abstract
  • Paatero, P.: Least squares formulation of robust non-negative factor analysis, Chemom. Intell. Lab. Syst., 37, 23-35, 1997. Abstract
  • Paatero, P., Hopke, P. K., Song, X. H., and Ramadan, Z.: Understanding and controlling rotations in factor analytic models, Chemom. Intell. Lab. Syst., 60, 253-264, 2002. Abstract
  • Paatero, P., and Hopke, P. K.: Discarding or downweighting high-noise variables in factor analytic models, Anal. Chim. Acta, 490, 277-289, 10.1016/s0003-2670(02)01643-4, 2003. Abstract
  • Paatero, P., Hopke, P. K., Begum, B. A., and Biswas, S. K.: A graphical diagnostic method for assessing the rotation in factor analytical models of atmospheric pollution, Atmos. Environ., 39, 193-201, 10.1016/j.atmosenv.2004.08.018, 2005. Abstract
  • Paatero, P., and Hopke, P. K.: Rotational Tools for Factor Analytic Models, J. Chemom., 23, 91-100, 10.1002/cem.1197, 2009. Abstract