PMF coding

From Jimenez Group Wiki
Jump to: navigation, search

PMF Analysis

For Future Releases

  • Saving large matrices is VERY slow -- how can this be sped up???
  • Is it possible to calc. all scaled resid distributions and box-whiskers, diurnals after running PMF and store as matrices? Done?
  • In bootstrapping, also calculate factor bars by EPA method; consider looking for groups of solutions?
  • Is BareBones useful?
  • Load Q for each iteration - grep
  • make sure every panel has a name by newPanel/N=

Done

  • PMF executable no longer has to be on C:\ (done in 2.04)
  • Make sure that there was output (convergence = 0 or 1) before calculating resid, correlation, etc. (done in 2.04)
  • Fix calculation of unweighting Q -- robust criterion (done in 2.04)
  • release .ini with longer line lengths for data, std dev matrices (done with 2.03)
  • implemented James' bootstrapping bug fixes (done in 2.03)
  • use convergence criteria in terms of Q/Qexp instead of absolute Q (done in 2.03)
    • level 3: 1e-6*Qexp, levels 1 and 2 are twice that
  • made strconstant line for pmf2wopt.exe (currently only version released by Paatero) (done in 2.03)
  • Fix incorrect SEED when varying FPEAK. Was seed = fpeak. (done in 2.03)
    • made constants DEFAULT_SEED and DEFAULT_FPEAK that can be changed by user to run e.g., range of fpeaks at seed = 2.
  • Calculate Qexp "correctly", accounting for degrees of freedom of solution matrices. Had calculated Qexp as numpnts(dataMx); now use numpnts(dataMx) - p*(rows + columns of data matrix). For normalizing the Q time series and mass spectrum, Qexp was calculated as rows or columns of data matrix; now use (rows or columns of data matrix) - p (done in 2.03).
  • Fix things for Vista machines - use SpecialDirPath instead of the C directory (done in 2.00B).
  • Allow ability to do bootstrapping(done in 2.00A).
  • Allow users to run PMF on partitioned drives (done in 1.04A).
  • Be sure that executeScriptText has /z so no errors pop up (done in 1.04A).
  • Allow user to find solutions ranging over many seed values instead of fpeak values (done in 1.03I)

PMF Evaluation Panel

Priority

For future releases

  • add scaled resid TS to species reconst/resid plot
  • generalization of RR_PMF_vs_Tracers that doesn't depend on TS (so Jill can compare TS_fract to tracer TS)
  • Can diurnals be calculated at the end of execution and stored in matrices? Done?
  • diurnal waves error bars (end in EB) make in pmf_plot_globals:Diurnals Done?
  • In diurnals, options for monthly, day-of-week, weekday-vs-weekend options Done?
  • for Q vs p, fpeak plots, check box for "hide minimum" (added by Ingrid, 2 Jul 2009)
  • Ken had a memory issue when maxp is 12 and number of fpeaks is 41 during the allocation of waves for the diurnal plots. Make all the evaluation plot waves single precision to save memory. Done?
  • The box and whisker plot for the scaled residuals for all species plots the min and max and we don't have the option to go from 5 - 95% as we do in diurnal plots. Should this be changed to be consistent with the diurnal plots? Done?
  • On the diurnal plots, have checkboxes for "Display Mean" and Display Median" so that you could display one or the other or both (do this for the popped graph only)
  • Mtg at ARI 12/8/08: For Total Residual Profiles plot, more ticks! For Total Residual Tseries plot, when popped also display TotalSignal_as_Tseries (organics time series from the input data)
  • List from Ingrid on 3/20/08: 8. Another longer-term thing might be to make a table of string waves where the user can name their factors, and then make bar plots with the same factors in the same color/order. Ken has done a version of that in the attached plot. This will also let us assign a name (and therefore a standard color) to factors as you look at the solutions. Note from Donna: I'm gonna hold off on this one, cuz it may be hard to implement and only advanced users will probably do this well anyway.
  • There are redimensioning issues at the creation of the Evalutation Panel when the user has selected to do only factor spaces of size one (and many fpeaks for example). In practice this would only happen if someone wanted to test the evaluation tool to see how it works, and in the interest of pmf-calcing time only chose one factor.
  • From Patrick H., make option for Thurs-Friday & Sunday in addition to weekday & weekend. Important because of 'carry-over effects' at many ground sites on Mondays and Saturdays.

Done

  • DTS added at HR clinic - In the calculation of the diurnal stuff, for some reason the binarysearchInterp function sometimes returned nans instead of a numeric value. Then Igor complained when it wanted to create waves of nans length. When using this binarysearchInterp we need to make sure that values are zero if they are used later for creating waves. (think this is fixes with updated diurnal code from Donna, v2.04)
  • Is there unnecessary update of the panels when changing fpeak, p? (think this is no longer an issue, v2.04)
  • List from Ingrid on 3/20/08: 2. Can we (doesn't have to be this week) make a tseries plot of % of each tseries (so stacked factors as %) including the Residual (so it addes up to 100%) (like Figure 1d in paper http://www.atmos-chem-phys.net/5/3289/2005/acp-5-3289-2005.pdf ) Note from Donna: This is easy enough for the user to do themselves. (done v2.04)
  • RR should also be calculated at end of execution!! (was done in v2.0?)
  • Created small panel with buttons to pop main-panel plots (done v2.03)
  • Save Sol'n Space Waves now also makes subfolder "Diurnals" and saves mean, median, and percentile waves for each factor, total Resid, Q/Qexp (done v2.03)
  • In bootstrapping, if noNaNs_amus and noNaNs_tseries don't exist, user is prompted to select appropriate waves (done v2.02).
  • In View PMF Results Selection panel, wavelist to choose from for Row Description, Column Description waves only includes waves that match the size of the data matrix (done v2.02)
  • Implemented a choice for Selected Factor (=-2) to display factor time series overlayed (the -1 behavior) and mass spec displayed on separate axes (the =0 behavior). (done v2.03)
  • In plots of species TS, total signal TS, resid TS, resid ratio TS, both axes of RR plot, make y-axis go from 0 or full scale if min values is negative (done v2.03)
  • Factor Profile plots, y-axes cross the left axis at 0 (was at 10, a problem for non-AMS users who might start the indexes for their data at 0. (done v2.03)
  • In Total Residuals plots (time series and profile) (done v2.03):
    • Label for Q-contributions changed to be more accurate
    • Bottom axis go from 10 to 100 (instead of 0 to 100) and left axes cross the bottom at 0 to make more room for axis labels.
  • Perfect Gaussian wave is scaled to the area of the histograms (done in 2.02).
  • Double check that the box and whisker plot has whiskers that go to 95%, not 100% (done in 2.00B).
  • Allow ability to plot diurnal residuals(done in 2.00A).
  • In the m/z spectra graph, add more tick marks to show m/z 30 etc (done in 1.3J)
  • If you select the m/z as -1 you get a very bad Igor crash (done in 1.3J)
  • Change the selected species to correspond to m/z instead of 1, 2, 3 (done in 1.3J)
  • Use Q/Qexpected instead of Q (I think done in 1.3H - ask Ingrid)
  • Typo in the popped current t series graph - "Curret" instead of "Current" (done in 1.3J)
  • In the time series residual label say Sum or original (done in 1.3J)
  • The stacked factors real measurement should be purple lines (but not summed for the stacking)
  • Colors in the diurnal plots should be applied to the median waves (when the factor is set to -1)
  • Make sure all popped graphs have labels
  • Is is possible to make the axes on the RR plot have the lower value be the min(0, wave minimum)?
  • I think it would be helpful to people to have a button somewhere for "save this solution". I do this with a function in my "notUsed" ipf called pmf_makeStaticFactorWvs (and you could remove the part where it asks for a datafolder list; I built that for some synthetic data runs). It has a naming convention and saves the waves to root:StaticFactors: in a subfolder for the number of factors in the solution (e.g., w3factors).
  • It would be nice if the status of the XaxisTo100 box for the factors plot was examined when remaking plots after moving the sliders.
  • People have asked me before if more of the axes can have labels, like Q/Qexp for the Q plots, Mass on the total Tseries plot, and some sort of # for the Current Species Histogram.
  • The total residuals plot needs to have good colors
  • The total residuals plots need ability to plot several (several factor solutions)
  • When more than 10 traces are on a plot, make sure the colors get looped
  • List from Ingrid on 3/20/08: 1. Are mass fraction bars normalized to total measured mass instead of for total fit mass? We should add a wave for "Residual" (it could probably be black).
  • List from Ingrid on 3/20/08: 4. (Probably also lower priority) It would be nice if the "factors to plot" box was enabled. Most useful plots to enable this way are residuals and bars (though bars isn't very reliable).
  • When Selected Factor = -1 and Current Factor Profile displays stacked MS, the stacked MS should be weighted by average mass fraction (so could create something like ProfileWeightedFactor1 = ProfileFactor1 * var_PMFresultsMx4d[3][0][fpeakDex][pdex])

Unresolved Strangenesses

  • From Ingrid: "I don't know why I don't usually have this problem, but today I had a problem with this function in the ViewResults ipf (v2_0RC3) pmf_plot_ScaResidAllSpeciesBox(graphDest)

When I try to run the function, the line

  gen_fBoxPlot_graphAxNm(ScaResid_median, ScaResid_boxtop, ScaResid_boxbottom, ScaResid_whiskertop, ScaResid_whiskerbottom, $colDescrWvFullPath, 0.3, 0, $"", $"", 0, 0, 0,FullGraphNm, "left")

returns an error that "The wave giving the bottoms of the boxes doesn't exist"

because in the beginning of the function it doesn't find the boxBottom wave. I don't understand why it would work the rest of the time, but...

I fixed my version and it seems happy now, so I dont' know what happened."

PMF Scatter Panel

For Future Releases

  • IMU added, working in TT1: warn if waveNames in ExtDF are too long for noNaNs_ prefix; currently have problems with generating textWv from list of waves from DF if are truncated in noNaNs_ naming.
  • IMU added, working in TT1: can scat_update_panel be streamlined to not redraw each object (e.g., buttons don't need to be redrawn, but maybe they need to be directed to different functions? pulldown menus need to be reloaded with different options)
  • ability to color Tseries scatter plots and overlay by all other tseries (campaign time, sulfate, nitrate, etc.; time-of-day would also be useful) Donna's note: for now, I leave this up to the user to do manually (it's only a few clicks).
  • All of the factor mass spectra (or time series) in one solution vs each other (a la a Scatter Plot Matrix, which can be found under Windows/New/Packages/Scatter Plot Matrix). This can be used for assessing rotation and I think Manjula does it.
  • When executing the "External Data Locations" panel from the main panel, remove the option "Update List" from the pulldown menus. Instead, add a button to the Scatter Panel to calculate correlations with new factors. Needs to execute
 scat_make_noNaNs_ms_or_Ts_DF(1, "ts")
 scat_calc_RcorrMx4d(1, "ts")

or

 scat_make_noNaNs_ms_or_Ts_DF(1, "ms")
 scat_calc_RcorrMx4d(1, "ms")

Then may need to update the pull-down lists for factor comparison. (Note that you can get the Data Folder selection panel py choosing *Step 3* from the PMF menu; but if you choose the same data folders as before, all of the R's are recalculated (I think). -- added by Ingrid after ARI 12/08

Done

  • fix so that if calc MS but no TS, panel does not try to display TS first by default (I have fixed this somewhere in my own version) (done in 2.04)
  • when you run the panel for choosing external data folders, check for NaNsList_amus and NaNsList_t_series in the PMFdataFolder and abort/warn if they're not present. (done in 2.04)
  • user-editable table of FactorNmsWvs with a column to name them as "type" and sort by this. Donna's note: Kinda sorta done. This wasn't implemented as it was first envisioned, but it is clean and simple and useful, I think. We can revisit if it doesn't quite do the job.
  • Add ability to add new external mass spectra or time series without having to recalculate everything

Donna's note: This is somewhat implemented. With the addition of the ability to group external factors, it gets complicated to always keep track of what factors may be new and what may be old. I've broken up the code so that the user can recalc only the ts or ms external correlations. I don't think it takes that much more time to redo the calcs. and it is just safer than trying to rearrange things in the master Rcorrelation matrix. Previously, redoing the RCorr calcs meant:

    • kill (currently) dependent waves pmfDFnm:RforCurrentFactor...
    • kill waves pmfDFnm:RcorrMx4d_...
    • kill noNaNs folders in MassSpecDF, TseriesDF
    • Run pmf_calcs_RcorrMx4d()
  • In a separate tab, this plot [1]

Donna's note: Is done for one PMF solution. I will leave it up to the users to glue together the plots so that different rows correspond to different factor solutions. I always try my solution space from 1, and this can go up to 9, and I think things will get just too jumbled.

  • In function scat_plot_RzvsFPeak, line

appendToGraph/W=$FullGraphNm RcorrMx4d[V_Value][idex][][pdex] vs fpeak_map
needs to be changed to
appendToGraph/W=$FullGraphNm RcorrMx4d[V_Value-1][idex][][pdex] vs fpeak_map

  • Scatter plots should be the same color as their factor label. Factor 1 profile or time series should be black (like its factor label).
  • When you change the slider in the scatter panel, the R vs Fpeak plot doesn't update.
  • Comparing the various types of Residual (time series or mass spectrum) the the factors (this how we see, e.g., that the residual in Pittsburgh looked like the OOA-II time series). Donna's note: For now, folks can do this by generating static waves and then moving them to your external data folder.
  • The user can't modify the R vs Fpeak plot in the panel (change wave colors, axes, etc.).

Data and Errors from PMF

(from meeting, Donna, Pete, Alex, Ingrid, and Allison, 4-June-08)

Q-AMS

  • James' code calculates the Org data and error correctly (as of v 1.36).
  • However it does not create the matrices that can then be used for PMF.
  • We are using Qi's code to extract the matrices from James's code
  • Action items
    • Ingrid will create a stand-alone IPF for these functions and post on the web
    • After that we'll contact James and see if he can make a new version of the Q-AMS code that does this.

Squirrel (ToF-AMS Unit Rez)

  • Organic matrix can be dumped to memory and saved with the "Export Matrix" button (MS Tab)
  • Errors are currently only calculated for the MSSDiff Matrix
  • Org Errors can be calculated according to procedure in Appendix of Ingrid's paper (same as for synthetic data) which takes the MSSDiff_Err and Frag_Matrix and calculate MSSOrg_Err. The Org error matrix will be automatically calculated and saved to memory if the MSSDiff_Err matrix already exists.
  • It can be calculated with Pete's code which does it a different (but equivalent) way, which WILL be documented in Pete's paper (albeit briefly)
  • Action Item
    • Donna will implement Ingrid's calculation for species matrix errors in Squirrel before 08 Users Meeting
    • (Aside: Donna will whiten the spectra in Squirrel)

PIKA (HR-ToF-AMS high Rez)

  • PIKA allows dumping the data matrix (signal for specific ions)
  • Pete has PMF Helper coding that calculates the errors
    • Applying error formula
    • Remove non-organic ions
    • Doing corrections for main "frag" interferences (29, 44)
    • Add back HxO and CO based on Allison's frag, and downscale errors
    • Remove "bad variables" (average SNR < 0.2, same as Paatero & Hopke)
    • Scale back to ug m-3
  • For the time being, people should be directed to Pete if they want the code
  • Action item
    • Pete will email the code to attendees
    • Potential future release after Pete's paper is published