Difference between revisions of "PMF coding"

From Jimenez Group Wiki
Jump to: navigation, search
(Done)
(Priority)
Line 28: Line 28:
  
 
* DTS at HR clinic - We should have sanity checks for step 2 (view big panel) when the users forgets to check the row entry wave, which should always be a time series.  Should also add a check for the sanity of the column labels.  For example, if the first text entry is var_  (or something that would be very rare)  we could add a "are you sure" prompt.
 
* DTS at HR clinic - We should have sanity checks for step 2 (view big panel) when the users forgets to check the row entry wave, which should always be a time series.  Should also add a check for the sanity of the column labels.  For example, if the first text entry is var_  (or something that would be very rare)  we could add a "are you sure" prompt.
 +
** could be solved by making the wavelist to choose from only include waves that match the length of the data matrix. 
  
 
* In diurnals, options for monthly, day-of-week, weekday-vs-weekend options
 
* In diurnals, options for monthly, day-of-week, weekday-vs-weekend options

Revision as of 14:33, 28 June 2009

PMF Analysis

To Do Priority

  • release .ini with longer line lengths for data, std dev matrices

For Next Releases

  • In bootstrapping, also calculate factor bars by EPA method; consider looking for groups of solutions?
  • Allow users to set up inital conditions with varying fpeaks, but setting the seed to any value they like. Currently if varying fpeak, seed is set to 0, and if varying seed, fpeak is set to 0.

Done

  • use convergence criteria in terms of Q/Qexp instead of absolute Q (done in 2.02RC1)
    • level 3: 1e-6*Qexp, levels 1 and 2 are twice that
  • made strconstant line for pmf2wopt.exe (currently only version released by Paatero) (done in 2.02RC1)
  • Fix incorrect SEED when varying FPEAK. Was seed = fpeak. (done in 2.02RC1)
    • made constants DEFAULT_SEED and DEFAULT_FPEAK that can be changed by user to run e.g., range of fpeaks at seed = 2.
  • Calculate Qexp "correctly", accounting for degrees of freedom of solution matrices. Had calculated Qexp as numpnts(dataMx); now use numpnts(dataMx) - p*(rows + columns of data matrix). For normalizing the Q time series and mass spectrum, Qexp was calculated as rows or columns of data matrix; now use (rows or columns of data matrix) - p (done in 2.02RC1).
  • Fix things for Vista machines - use SpecialDirPath instead of the C directory (done in 2.00B).
  • Allow ability to do bootstrapping(done in 2.00A).
  • Allow users to run PMF on partitioned drives (done in 1.04A).
  • Be sure that executeScriptText has /z so no errors pop up (done in 1.04A).
  • Allow user to find solutions ranging over many seed values instead of fpeak values (done in 1.03I)

PMF Evaluation Panel

Priority

  • DTS at HR clinic - We should have sanity checks for step 2 (view big panel) when the users forgets to check the row entry wave, which should always be a time series. Should also add a check for the sanity of the column labels. For example, if the first text entry is var_ (or something that would be very rare) we could add a "are you sure" prompt.
    • could be solved by making the wavelist to choose from only include waves that match the length of the data matrix.
  • In diurnals, options for monthly, day-of-week, weekday-vs-weekend options
  • In Total Residuals plots (time series and profile) (added by Ingrid 26 May 09):
    • The label for the Q-contributions is incorrect; this is a normalized Q-contribution. The axis label needs to be changed so that this is clear to users.
    • Would it be possible to make the bottom axis go from ~5 to 100 (instead of 0 to 100) and have all the left axes cross the bottom at 0? In the current version (2.02) the values and labels get written on top of each other.
  • In Factor Profile plots, y-axes cross the left axis at 10; this is a problem for non-AMS users who might start the indexes for their data at 0. (added by Ingrid 26 May 09)
  • In species TS, full mass TS, resid TS, y-axis of RR plot, make y-axis go from 0 or full scale if min values is negative
  • Implement Pete's suggestion whereby when the user selects -1 for the factor to plot, the mass spec isn't stacked but still displayed on separate axes. (Make Pete's version -2.)
  • In bootstrapping, if noNaNs_amus and noNaNs_tseries don't exist, alert user and (if possible) ask them to select appropriate waves (added by Ingrid 26 May 09).

For future releases

  • DTS added at HR clinic - In the calculation of the diurnal stuff, for some reason the binarysearchInterp function sometimes returned nans instead of a numeric value. Then Igor complained when it wanted to create waves of nans length. When using this binarysearchInterp we need to make sure that values are zero if they are used later for creating waves.
  • Ken had a memory issue when maxp is 12 and number of fpeaks is 41 during the allocation of waves for the diurnal plots. Make all the evaluation plot waves single precision to save memory.
  • The box and whisker plot for the scaled residuals for all species plots the min and max and we don't have the option to go from 5 - 95% as we do in diurnal plots. Should this be changed to be consistent with the diurnal plots?
  • On the diurnal plots, have checkboxes for "Display Mean" and Display Median" so that you could display one or the other or both (do this for the popped graph only)
  • Mtg at ARI 12/8/08: For Total Residual Profiles plot, more ticks! For Total Residual Tseries plot, when popped also display TotalSignal_as_Tseries (organics time series from the input data)
  • List from Ingrid on 3/20/08: 2. Can we (doesn't have to be this week) make a tseries plot of % of each tseries (so stacked factors as %) including the Residual (so it addes up to 100%) (like Figure 1d in paper http://www.atmos-chem-phys.net/5/3289/2005/acp-5-3289-2005.pdf ) Note from Donna: This is easy enough for the user to do themselves.
  • List from Ingrid on 3/20/08: 8. Another longer-term thing might be to make a table of string waves where the user can name their factors, and then make bar plots with the same factors in the same color/order. Ken has done a version of that in the attached plot. This will also let us assign a name (and therefore a standard color) to factors as you look at the solutions. Note from Donna: I'm gonna hold off on this one, cuz it may be hard to implement and only advanced users will probably do this well anyway.
  • There are redimensioning issues at the creation of the Evalutation Panel when the user has selected to do only factor spaces of size one (and many fpeaks for example). In practice this would only happen if someone wanted to test the evaluation tool to see how it works, and in the interest of pmf-calcing time only chose one factor.

Done

  • Perfect Gaussian wave is scaled to the area of the histograms (done in 2.02).
  • Double check that the box and whisker plot has whiskers that go to 95%, not 100% (done in 2.00B).
  • Allow ability to plot diurnal residuals(done in 2.00A).
  • In the m/z spectra graph, add more tick marks to show m/z 30 etc (done in 1.3J)
  • If you select the m/z as -1 you get a very bad Igor crash (done in 1.3J)
  • Change the selected species to correspond to m/z instead of 1, 2, 3 (done in 1.3J)
  • Use Q/Qexpected instead of Q (I think done in 1.3H - ask Ingrid)
  • Typo in the popped current t series graph - "Curret" instead of "Current" (done in 1.3J)
  • In the time series residual label say Sum or original (done in 1.3J)
  • The stacked factors real measurement should be purple lines (but not summed for the stacking)
  • Colors in the diurnal plots should be applied to the median waves (when the factor is set to -1)
  • Make sure all popped graphs have labels
  • Is is possible to make the axes on the RR plot have the lower value be the min(0, wave minimum)?
  • I think it would be helpful to people to have a button somewhere for "save this solution". I do this with a function in my "notUsed" ipf called pmf_makeStaticFactorWvs (and you could remove the part where it asks for a datafolder list; I built that for some synthetic data runs). It has a naming convention and saves the waves to root:StaticFactors: in a subfolder for the number of factors in the solution (e.g., w3factors).
  • It would be nice if the status of the XaxisTo100 box for the factors plot was examined when remaking plots after moving the sliders.
  • People have asked me before if more of the axes can have labels, like Q/Qexp for the Q plots, Mass on the total Tseries plot, and some sort of # for the Current Species Histogram.
  • The total residuals plot needs to have good colors
  • The total residuals plots need ability to plot several (several factor solutions)
  • When more than 10 traces are on a plot, make sure the colors get looped
  • List from Ingrid on 3/20/08: 1. Are mass fraction bars normalized to total measured mass instead of for total fit mass? We should add a wave for "Residual" (it could probably be black).
  • List from Ingrid on 3/20/08: 4. (Probably also lower priority) It would be nice if the "factors to plot" box was enabled. Most useful plots to enable this way are residuals and bars (though bars isn't very reliable).
  • When Selected Factor = -1 and Current Factor Profile displays stacked MS, the stacked MS should be weighted by average mass fraction (so could create something like ProfileWeightedFactor1 = ProfileFactor1 * var_PMFresultsMx4d[3][0][fpeakDex][pdex])

Unresolved Strangenesses

  • From Ingrid: "I don't know why I don't usually have this problem, but today I had a problem with this function in the ViewResults ipf (v2_0RC3) pmf_plot_ScaResidAllSpeciesBox(graphDest)

When I try to run the function, the line

  gen_fBoxPlot_graphAxNm(ScaResid_median, ScaResid_boxtop, ScaResid_boxbottom, ScaResid_whiskertop, ScaResid_whiskerbottom, $colDescrWvFullPath, 0.3, 0, $"", $"", 0, 0, 0,FullGraphNm, "left")

returns an error that "The wave giving the bottoms of the boxes doesn't exist"

because in the beginning of the function it doesn't find the boxBottom wave. I don't understand why it would work the rest of the time, but...

I fixed my version and it seems happy now, so I dont' know what happened."

PMF Scatter Panel

Priority

  • IMU added at HR Clinic: when you run the panel for choosing external data folders, check for NaNsList_amus and NaNsList_t_series in the PMFdataFolder and abort/warn if they're not present.
  • ability to color Tseries scatter plots and overlay by all other tseries (campaign time, sulfate, nitrate, etc.; time-of-day would also be useful) Donna's note: for now, I leave this up to the user to do manually (it's only a few clicks).
  • All of the factor mass spectra (or time series) in one solution vs each other (a la a Scatter Plot Matrix, which can be found under Windows/New/Packages/Scatter Plot Matrix). This can be used for assessing rotation and I think Manjula does it.
  • When executing the "External Data Locations" panel from the main panel, remove the option "Update List" from the pulldown menus. Instead, add a button to the Scatter Panel to calculate correlations with new factors. Needs to execute
 scat_make_noNaNs_ms_or_Ts_DF(1, "ts")
 scat_calc_RcorrMx4d(1, "ts")

or

 scat_make_noNaNs_ms_or_Ts_DF(1, "ms")
 scat_calc_RcorrMx4d(1, "ms")

Then may need to update the pull-down lists for factor comparison. (Note that you can get the Data Folder selection panel py choosing *Step 3* from the PMF menu; but if you choose the same data folders as before, all of the R's are recalculated (I think). -- added by Ingrid after ARI 12/08

Done

  • user-editable table of FactorNmsWvs with a column to name them as "type" and sort by this. Donna's note: Kinda sorta done. This wasn't implemented as it was first envisioned, but it is clean and simple and useful, I think. We can revisit if it doesn't quite do the job.
  • Add ability to add new external mass spectra or time series without having to recalculate everything

Donna's note: This is somewhat implemented. With the addition of the ability to group external factors, it gets complicated to always keep track of what factors may be new and what may be old. I've broken up the code so that the user can recalc only the ts or ms external correlations. I don't think it takes that much more time to redo the calcs. and it is just safer than trying to rearrange things in the master Rcorrelation matrix. Previously, redoing the RCorr calcs meant:

    • kill (currently) dependent waves pmfDFnm:RforCurrentFactor...
    • kill waves pmfDFnm:RcorrMx4d_...
    • kill noNaNs folders in MassSpecDF, TseriesDF
    • Run pmf_calcs_RcorrMx4d()
  • In a separate tab, this plot [1]

Donna's note: Is done for one PMF solution. I will leave it up to the users to glue together the plots so that different rows correspond to different factor solutions. I always try my solution space from 1, and this can go up to 9, and I think things will get just too jumbled.

  • In function scat_plot_RzvsFPeak, line

appendToGraph/W=$FullGraphNm RcorrMx4d[V_Value][idex][][pdex] vs fpeak_map
needs to be changed to
appendToGraph/W=$FullGraphNm RcorrMx4d[V_Value-1][idex][][pdex] vs fpeak_map

  • Scatter plots should be the same color as their factor label. Factor 1 profile or time series should be black (like its factor label).
  • When you change the slider in the scatter panel, the R vs Fpeak plot doesn't update.
  • Comparing the various types of Residual (time series or mass spectrum) the the factors (this how we see, e.g., that the residual in Pittsburgh looked like the OOA-II time series). Donna's note: For now, folks can do this by generating static waves and then moving them to your external data folder.
  • The user can't modify the R vs Fpeak plot in the panel (change wave colors, axes, etc.).

Data and Errors from PMF

(from meeting, Donna, Pete, Alex, Ingrid, and Allison, 4-June-08)

Q-AMS

  • James' code calculates the Org data and error correctly (as of v 1.36).
  • However it does not create the matrices that can then be used for PMF.
  • We are using Qi's code to extract the matrices from James's code
  • Action items
    • Ingrid will create a stand-alone IPF for these functions and post on the web
    • After that we'll contact James and see if he can make a new version of the Q-AMS code that does this.

Squirrel (ToF-AMS Unit Rez)

  • Organic matrix can be dumped to memory and saved with the "Export Matrix" button (MS Tab)
  • Errors are currently only calculated for the MSSDiff Matrix
  • Org Errors can be calculated according to procedure in Appendix of Ingrid's paper (same as for synthetic data) which takes the MSSDiff_Err and Frag_Matrix and calculate MSSOrg_Err. The Org error matrix will be automatically calculated and saved to memory if the MSSDiff_Err matrix already exists.
  • It can be calculated with Pete's code which does it a different (but equivalent) way, which WILL be documented in Pete's paper (albeit briefly)
  • Action Item
    • Donna will implement Ingrid's calculation for species matrix errors in Squirrel before 08 Users Meeting
    • (Aside: Donna will whiten the spectra in Squirrel)

PIKA (HR-ToF-AMS high Rez)

  • PIKA allows dumping the data matrix (signal for specific ions)
  • Pete has PMF Helper coding that calculates the errors
    • Applying error formula
    • Remove non-organic ions
    • Doing corrections for main "frag" interferences (29, 44)
    • Add back HxO and CO based on Allison's frag, and downscale errors
    • Remove "bad variables" (average SNR < 0.2, same as Paatero & Hopke)
    • Scale back to ug m-3
  • For the time being, people should be directed to Pete if they want the code
  • Action item
    • Pete will email the code to attendees
    • Potential future release after Pete's paper is published