Difference between revisions of "Pk nextGen"

From Jimenez Group Wiki
Jump to: navigation, search
(Multiple Dimensions)
m (Software map)
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[File:2010-09-23_MJC_HRconceptmap.jpg]]
+
'''What we know for sure:'''
 
 
=Discussion: Next-generation PIKA=
 
What we know for sure:
 
  
 
1. MikeC is to develop a PIKA-esque tool for multi-dimensional data from TofDAQ data files.
 
1. MikeC is to develop a PIKA-esque tool for multi-dimensional data from TofDAQ data files.
Line 9: Line 6:
 
<br>4. Whatever gets written should address existing desires and concerns with the PIKA code, and be fully transparent so as to accept any input data type. It should be developed with memory and speed optimisation in mind.  
 
<br>4. Whatever gets written should address existing desires and concerns with the PIKA code, and be fully transparent so as to accept any input data type. It should be developed with memory and speed optimisation in mind.  
  
What Mike initially proposed:
+
'''Who will use this software?'''
* An ion indexing scheme to allow saving/comparison of different ion fits at a given unit mass
 
* A dimension in the HR data set to save multiple copies of HR fits (this could be for different calibration, PW, PS, or simply open/closed AMS data)
 
* Fitting of subsets of unit masses within the mass spectrum (but over any dimensionality in time) to speed the process.
 
 
 
==Concerns and discussion==
 
 
 
===Isotopic constraining===
 
Summary of discussion thus far:
 
* Isotopic contraints, in practice, force the calculation of ion fits for every unit mass in the spectrum to be performed together, and in the correct order
 
* The fitting of a single unit mass at a time, whilst potentially useful in speed, is not thus not practical
 
 
 
The system is thus
 
* constrained to calculate all the fits together, meaning
 
** speed optimisation must come from assessing current memory management and use of threadsafe functions
 
** a lot of complexity in the code it avoided!
 
 
 
===Multiple fitting schemes===
 
Summary of discussion thus far:
 
 
 
* Existing PIKA code acts on multiple ''datasets'' (open/closed/diff), with a single set of ''parameters'' in any given Igor experiment:
 
** Time-series values for
 
*** single ion values
 
*** m/z calibration function and parameters
 
*** instrumental transfer function (PW-PS) 
 
** Single set of values for:
 
*** baseline function parameters (all options in bsl panel) (note this is a time-series in TW)
 
**** note from Donna - all values?  what about a mass defect wave and 'alter stick integration or stick complement' regions?
 
*** set of HR ions that were fit
 
*** set of HR ions that were constrained
 
*** checkbox settings from HR_PeakHeights_Gr panel
 
* PIKA thus will:
 
** return ONE set of HR sticks PER dataset
 
** require a re-calculation of HR_Sticks if the parameters are changed
 
 
 
MJC contends that the following are limitations in the current system:
 
* No data is saved on what parameters led to your currently saved data (a ''parameter-profile'')
 
** Change a parameter, and the saved data is unaffected
 
*** In Pika 1.09 one can change the list of HR ions to fit and previous HR fits are unaffected.  This brings Pika closer to reproducibility idea but it is not completely there because we don't save everything.
 
** e.g. one can create baseline-subtracted MS, and then change the m/z cal and carry-on...
 
* Inability to compare fits calculated using different parameters
 
** Note that the clunky HR-Sensitivity code simply duplicates data folders to get around this, and relies on a successful execution of the code, as it changes parameters along the way.
 
 
 
DTS important point: However, if one wanted to really allow a user to re-generate the HR_Sticks from the raw spectra, a parameter-profile would also need to save all the steps that went into generating the peak width and peak shape which would be REALLY tedious.
 
 
 
thus
 
 
 
MJC accepts that
 
* It is not realistic to store a (huge) ''parameter-profile'' for a given set of HR_Sticks that would allow the user to reset their experiment to the environment that calculated them.
 
 
 
MJC and DTS disagree on the concept of
 
* Storing multiple copies of HR-Sticks from any given ''dataset'', calculated using different ''parameter-profiles''.
 
** I am warming to the idea.  My only hesitation is that it will confuse users. Imagine that 6 months after a user worked up the data they went back to their experiment to regenerate an HROrg time series.  The user may have a very hard time remembering what their 'right' answer (profile) was.  Unless we forced that at any one time there is a supreme, 'final', current profile.
 
 
 
MJC proposes another compromise solution, that
 
* The single-MS stage (HR_PeakHeights_Gr) of HR-analysis is totally separated from the time-series calculations,
 
* At the single-MS stage, users can change the ''parameter-profile'' and optimise the HR_Sticks (as currently), but that only a '''single''' set of sticks are saved in the time-series,
 
* The ''parameter-profiles'', including date-time, can be "saved" in some manner during single-MS analysis, to aid in the optimisation,
 
* During single-MS analysis, we let the user store multiple sets of HR_Sticks for direct intercomparison,
 
* At the point of calculating time-series, the (advanced) option is presented to save the ''parameter-profile'' information in the HDF, and a method to display this is encoded,
 
* The user is allowed to change the name of the HR_Sticks matrix in the HDF (we need a get-around for open/closed/diff/TWappl etc).
 
 
 
DTS comments: Yes I like this idea, but I'm not clear on the very last point, I'll ignore it for now. 
 
 
 
I do think it instructive to envision this approach for the unit res sticks - flesh it out,and then become clear about what is the same or different for HR sticks. I keep thinking that the profiles contain non time-dependent entries, like the mass defect wave and the alter stick complement region waves, and not any time-dependent waves like the m/z calibration parameters.  If the profiles do contain time-dependent values like  m/z calibration parameters, then I think they need these values for all runs or time-entities.  Because while a user may choose a profile for certain todo waves, I don't think we want a single profile to ONLY work for some todo waves.  And I like the idea that we are saving only the parameters that explicitly go into the calculation of sticks, NOT how parameters were derived (i.e. not saving which ions were used in generating the m/z calibration parameters, just the final m/z calibration parameters themselves). This allows the user allows some reproducibility, which has been sorely lacking.  This approach of saving the values that go into the sticks not how they were derived is a bit of a revelation to me, a lightbulb moment, and makes more clear in my mind what should be saved in a profile. I had always been approaching it from the user side - I did x then y then z.... and this is just unworkable. You give the users what parameters were actually used, not how they were derived. I envision these profiles being stored only in memory, but this doesn't have to be the case.
 
 
 
In much of this coding discussion there is a distinction between places where a user can 'play', try out options typically for one or a few spectra, and places where a user has to commit to some choices to move forward and 'do it' for many spectra all at once. This distinction really isn't clear from a user's point of view. Maybe we can think of a clever visual  interface clue for the distinction? I'm thinking of the old fashioned m/z calibration panel.  One can try out options for one spectra at a time on the left side, but when the time comes to do it for all runs, only the current settings are used and time-dependent results are generated.  A user can go back and look at one spectra and add or remove ions.  But by changing the current settings they can lose track of which settings were used for the real values that were calced previously.  New buttons for saving and importing a set of ion choices has been implemented, but it only saves the waves, it doesn't save settings like 'keep the power value constant to 0.4999' for example.
 
 
 
As Jose pointed out to me today, this profile idea is somewhat similar to menus in the daq.  I do like the word profile to describe it instead of a menu option, but this analogy may be useful when we present this concept to users.
 
 
 
So far as I can tell, the points in the program where profiles are needed is in the generation of sticks (both UMR and HR).  At the moment I would prefer that this profile implementation be restricted to these tasks.  In theory one could implement this profile scheme for the generation of the m/z calibration parameters or for the generation of the peak width or peak shape, but I feel strongly that this is a second order issue; it will confuse users tooo much.
 
 
 
I do like the idea of a stick generation 'play place' for a single spectra. Again, sticking to the easier UMR case, it would allow users to generate many versions of sticks  for one spectra - each version uses a whole set of stick-generation parameters.  It allows users a 'before' and 'after' glimpse that they need.  For example, I think many users get too caught up in some of their baseline parameters for UMR. They may want to tweak things ever so slightly (i.e. a slightly wider integration region) when it really doesn't have a big affect. Having the ability to see the integrated area with and without these tweaks would give users a quantitative feel for what is important, what effects are or are not negligible. 
 
  
The generation of sticks is the most significant step in the analysis. If I had to do squirrel over again I would try to make it abundantly obvious what operations are pertinent to the generation of sticks and what operations are the manipulation of sticks once they have been found.  For example I would never want a user to ask "If I change my stick integration region, why doesn't my Org time series values change?"  The preprocess button on the squirrel panel should really be renamed "Commit to Sticks" or some such (but it turns out that this step can deal with raw spectra too so it would be too confusing). Just by reading buttons etc on the panel it is not clear that all the subsequent steps use this one set of sticks. This is another reason why I am happy about this profile concept. It would reinforce what choices a user has committed to (in the generation of sticks) and where a user could play and not mess up previous calculations; it would reinforce when an operation is ''working with'' sticks not ''generating'' sticks.
+
1. PIKA users, via careful integration of new software to mate with existing PIKA construct<br>
 +
2. Aerodyne-based Tofwerk-TOF-users community<br>
 +
3. TW prototype instrument application developers<br>
 +
4. Other TW OEM customers, probably via libraries (mapped from the template this Igor code will provide)<br>
  
===Ion 'bit' indexing===
+
=Software map=
Summary of discussion so far:
+
[[File:2010-10-08_HRconceptmap.jpg]]
 +
==Step 1: Define ''MS Matrix''==
 +
The user first runs through
 +
* m/z calibration as f(t)
 +
* Baseline determination as f(t)
 +
At this point, the user generates an HDF dataset containing the raw multi-dimensional MS matrix on which the entire HR-fitting process will run. This will include
 +
* integration in secondary dimensions (not writes/runs)
 +
* saving the raw data and the baseline-subtracted data
 +
Note that it is '''not''' prudent to allow multiple copies of the mass cal. and bsl params to pevade through the HR-fitting, as these affect the HDF matrix directly and thus must be considered FIXED during HR-fitting. We thus save the date/time of creation of this matrix as an associated HDF attribute, for comparison with outputs in steps 2-5. We need '''some method to flag the m/z cal and bsl params''', such that if the user changes them, the remaining steps in this HR-fitting process are immediately aware, and force a re-start from step 1.
 +
** '''Important caveat''': Steps 1&2 can be iterated, thus we only ''recommend'' fixing m/z at step 2, and force thereafter.
  
* MJC had proposed a bit-indexing scheme that would allow essentially infinite combinations of ion-master-lists to be saved in the fits
+
==Step 2: Define model==
* This option is not going to be required, since
+
Identical process to PIKA peak width/shape determination, with option to save multiple copies of the output instrument model. In each copy:
** Only one set of time-series sticks will be allowed
+
* Peak width&shape as f(m/z, t)
** The current master-list system parses a given ion-list at run time and creates an appropriate mask after matching up all the existing selections to the new list (right, Donna?!)
+
** The model will certainly change as a f(m/z). It may change as a f(t).
*** Nothing so sophisticated. In pika 1.09 a copy of all the chosen ions is saved in memory and given a name. It does not go back to the master list of all ions and resets the mask waves to 1s and 0s based on whether the ion was fit or not.  This could be implemented. The copy of HR ions that were fit is used in the calc and display of results.
+
* Name and version (so if updated, time-series can tell)
** The development of a tool to allow comparison of multiple sets of fits using different ''parameter-profiles'' would inherently need to store the fitted ion list anyway
+
* Date/Time of creation (compare to MS-matrix... if older, the model is flagged as VOID - really IMPORTANT)
  
MJC thus proposes the solution that
 
* The existing code to generate mask waves from any given master list is retained
 
* The existing code to deal with looking for duplicate ions is retained
 
* The output from the fitting procedure is NOT tied to a master ion list, but an exact mass wave and text wave that corresponds to the ions used in the fit, for
 
** this is more explicit
 
** isotopic constraints force the calculation of abundances for every ion in the list, so fitting a subset (and thus wishing to insert/delete points from the base wave) is not an option
 
** it saves valuable memory for the multi-dimensional datasets
 
  
DTS response: I want to be clear that for the time-dependent sticks matrix I think different parameter profiles can be used for different sets of runs (One parameter profile for V mode data, another for W mode data) but each run can have only one profile. This is what pika 1.09 can do, as far as the list of HR ions to be fit goes. The exception for multiple profiles for a single spectra is the 'play place' when one is looking at different profiles for the same spectra. A user will want to be able to look at multiple profiles for multiple single single spectra (i.e. Run 100 and run 101 each with profiles A,B,C), so we just as well plan for it. I propose that all these results be kept in memory, not written to disk. I also propose that there are 2 (at the very most 3) single mass spectra each of which has at most, say 10 profiles.  Users can get carried away if you let them and they will only confuse themselves.
+
==Step 3: Define fit parameters (via SingleMS panel)==
 +
TW panel will be inherently more complex than PIKA, which can retain some default options for this section 'under the hood', so to speak.
 +
* Choose a slice from the integrated, calibrated, baseline-subtracted MS matrix calculated in step 1. With this single MS:
 +
** Choose an instument model
 +
** Choose ion contraint options:
 +
*** 1. Constrain the ion list (cf PIKA). '''Directly edit''' a master list
 +
*** 2. Find peaks in the MS, perhaps using initial guesses. '''Generate''' a master list
 +
*** 3. Don't constrain the ion list in the fits. '''Reference''' a master list (which could be an exact mass finder) so as to to give meaning to the fit results
 +
*** 4. Combination. Find peaks (2) but then allow the x0 to vary in the fits
 +
** Choose isotopic constraints if appropriate
 +
** Choose other fit parameters
 +
* Display results
 +
* Re-evaluate parameters and calculate new results. Intercompare.
 +
* '''Choose one set of fit parameters for time-series calculations'''
  
===Multiple Dimensions===
+
==Step 4: Get results==
Summary of discussion thus far:
+
Use parameters from step 3 to generate time-series ion fits for a todo-wave. Save, as f(t):
* TW applications will require the ability to perform ion fitting on a MS taken from a 3-D data matrix
+
* HDF: Ions fitted
* Some data may need integrating to achieve useable signal:noise ratios before fitting
+
** Exact mass
* MSConcs has the capability to
+
** Chemical formula - if not constraining the list, this is a guess!
** integrate data in a very generalised manner, including user-definable bases in all axes, eg
+
* HDF: Ion x0 - if allowed to vary
*** For AMS PToF data, one can imagine wanting to average the raw spectra for bins representing 30-100, 100-300, 300-1000nm
+
* HDF: Ion amplitude
** integrate over writes (runs), which would screw up the indexing and use of todo-waves in time-series HR fits
+
* Mem: Fit Parameters as f(runs), thru named HR_Sticks_Profile
** generate the baselines for the same integration parameters as the MS
+
** Instrument model profile name and version (so if updated, can tell)
* MSConcs cannot currently
+
** Creation date/time of this instrument model
** write to HDF
+
** Creation date/time of MS matrix used in the fits
** not write to memory
+
** Ion constraint options
 +
** Checkboxes etc from singleMS panel that influence the fits
 +
** Name of HDF waves saved
  
MJC thus proposes
+
==Step 5: Diagnostics==
* MSConcs is tweaked (easy fix) to accept an hdf_flag where the output is an intermediate-HDF-file dataset
+
Crunch point for the user. If things look bad, they have to return to a previous step:
* The integration of mass spectra is performed ''a priori'' to defining the peak model, ie that
+
* '''Return to step 1:''' oh dear
** the integration in MSConcs defines the MSOpenLessBaseL_p datasets etc
+
** User believes there is a problems with
* The integration is not allowed over writes, such that the SQ_Backbone still works as we know and love
+
*** mass calibration
with the caveat that
+
*** baseline subtraction
* Integration over greatly different bin widths could develop bin-dependent transfer functions (models)... but the alternative is S:N-limited analysis
+
*** MS integration regions (cf Donna's PTOF regions)
 +
** Altering any of these will alter the MS Matrix... voiding the instrument model. The user has to '''start all over again'''.
 +
* '''Return to step 2:''' model issues
 +
** The user believes there is a problem with
 +
*** a particular model profile used in one of the runs
 +
** Altering this voids the model, and possibly therefore the ion lists etc. The user must re-evaluate those runs calculated using this model
 +
** NOTE: If the user thinks they should instead have used a different model profile, they could just re-calculate at this point, caveating that this does not afford them to chance to check the ion lists again. This is essentially one of the options in the next bullet
 +
* '''Return to step 3:''' fit params look bad
 +
** The user believes there are problems with
 +
*** the ion constraint options, master lists
 +
*** the isotopic constraint options
 +
*** the fit options employed
 +
*** the instrument model profile chosen for each run
 +
** Altering any of these changes the HR sticks, and thus the users must choose again (more wisely) and re-calculate
  
DTS response: Yes using the MSConc function to average raw spectra and put into intermediate hdfs and useful for different dimensions is good.  I'm not clear on what you mean by '.. not allowed over writes.. '
+
=Changes from and similarities to PIKA=
 +
==Being kept analagous to PIKA==
 +
* Re-requisite of 'SQUIRREL'-steps '''m/z and baseline''' before HR-fitting commences
 +
* Isotopic constraints. We always require all unit masses to be fitted together, in the correct order.
 +
* Master list option.
 +
* Time-series changes in the instrument model (tho the method changes).
 +
* Stages where diagnostics are assessed. Points at which the user has to revert if changes made... even if this is not apparent currently!
  
Why don't you use the profile concept to allow users to generate multiple versions of groupings?  Again going back to the supposedly easier PToF example one could imagine a profile that says group all PToF bins into 3 new sets representing 30-100, 100-300, 300-1000nm and another profile that says group all PTOF bins into 8 separate sets evenly spaced from 20-1000bnm.  As long as you don't integrate in the time dimension I think this could be easily added to the regular squirrel indexing scheme by having a new squirrel index layer correspond to each ptof-bin-grouping profile. The tougher thing to fold into the squirrel indexing scheme is the averaging in the time, or run dimension.  In the past I have written code to allow users to generate new 'fake' DAQ files by merging several runs together. (Ie. current run 100 and 101 become new run x that is the average of runs 100 and 101 and current run 102 and 103 become new run y that is the average of 102 and 103, etc.) This is the pika partitioning code which very very few users use.  By generating brand new DAQ like hdf files squirrel can index them - the only pitfall is that these new hdf files are processed in a new, different squirrel experiment. But in that new experiment one can refer to the entire time period from runs 100 and 101 as one entity, which is what you want.
+
==Changed from PIKA==
 +
* Multi-dimensional. Although points in this discussion may not explicitly refer to it, the entire process must be capable of acting on a 3-D dataset.
 +
** Caveats: We allow the model profile and fit-parameters chosen for the time-series to vary on a run-by-run basis ONLY. This is to keep to the todo-indexing system (and force users to save appropriately with TofDAQ!!).
 +
* Handling of instrument model profiles:
 +
** Individual profiles can be saved. Time-series calculations act on profile names, not PWPS parameters.
 +
* Single MS analysis.
 +
** Intercomparison of multiple sets of fits, using different
 +
*** MS slices from the matrix
 +
*** model profiles
 +
*** fit parameters (ion constraints etc)
 +
** Additional options for ion constraints
 +
*** Master list is an option, not a given
 +
*** Peak-finding routine can be used to generate a master list from a given MS-model-param combination
 +
*** x0 can be allowed to vary in the minimisation (ie no constraints at all)... assigning found exact masses to chemical formulae after-the-fact
 +
* MSConcs writes to HDF (in step 1)
  
===DTS: Proposed step forward===
+
=Questions and discussion points=
  
A small step forward: ??
+
* How to flag the m/z calibration and baseline parameter determination? The HR-fitting routine needs to know
Perhaps we can tweak the existing HR code with a modified version of your proposal. Suppose the user
+
** when these were calculated, and
really want to get diagnostic information for all runs about whether to fit CHNO;C3H5O;C3H7; or to leave
+
** if it was thus prior to the calculation of the MS Matrix
CHNO out. (leaving aside for the moment any isotope constraining issues). Perhaps the code could
+
** '''if not''', then we need to force the user to start over at step 1
generate two data sets: HRSticks43_7 and HRSticks43_6. In the AMS world, both data sets would contain
+
* Should we save m/z and bsl params with the MS matrix?
at least 3 columns. Version “7” would contain 3 nonzero columns and version “6” 2 nonzero columns. We
+
** Should not be entirely necessary if we keep track of date/time... plus, it will be huge
could id the columns using the HR ion bit-wise idea you have so that column 0 is always CHNO;column 1 is
+
* How to implement the single MS intercomparison analysis? Got to keep track of
always C3H5O, etc. So the ‘7’ and ‘6’ suffix in the data set name identifies the HR ions fit. This way, a
+
** MS slice used from the matrix
user could examine these multiple versions of HR ions fit at 43.
+
** instrument model
 +
** ion constraints (and/or chosen ions to fit)
 +
** therefore we need to brainstorm the interface to make this happen
 +
* Different datasets:
 +
** Have not put much in this discussion about the use of differnet datasets (eg open/closed). Clearly need this capability.
 +
** Instrument model should be easily able to handle these since we are coding in profiles
 +
** If, for e.g., the open m/z is altered, forcing the user to step1 for the open HR sticks, we should probably retain the ability to note that the closed HR sticks are not void, right?!
 +
* Parameter Profiles?
 +
** Much talk about instrument model profiles... what about for the fit parameters? Do we get the user to build up a (named, versioned, dated) profile containing:
 +
*** Ion constraint options
 +
*** Isotopic options
 +
*** Other fit options
 +
*** Model profile name
 +
*** Slice used from MS Matrix in the single MS analysis, and which dataset
 +
** '''Advantage would be''' that we don't save a bunch of stuff in the HR_Sticks HDF, but rather a 'parameter-profile' name for each todo-wave
 +
* Use of profiles:
 +
** Assuming that the use of the profiles arises through todo-waves... so, rather than defining which profile to use as f(runs), we just save which WAS used as f(runs).... its a subtle but important difference.
 +
* Advanced options:
 +
** Coding in advanced option to save multiple copies of HR_Sticks for a given dataset.
 +
* Linking of ion lists
 +
** If using a master list... could flag the ions used in m/z, PW, PS, fits in analgous manner.
  
== PIKA fitting process: steps ==
+
= Existing PIKA fitting process: steps =
  
 
[[File:2010-09-21 MJC PIKAconceptmap.jpg]]
 
[[File:2010-09-21 MJC PIKAconceptmap.jpg]]
  
===1. Preparation for HR fitting===
+
==1. Preparation for HR fitting==
 
The order of preparatory steps below is generally not modifiable.
 
The order of preparatory steps below is generally not modifiable.
 
''(MJC: comments added for changes required in the TW version)''
 
''(MJC: comments added for changes required in the TW version)''
Line 170: Line 168:
 
(1E) Select a subset of HR ions in (1D) to be constrained.<br>Constrained means that the fitting routine does not ‘fit’ this HR ion – the peak height is fixed to a value based on the magnitude of the HR ion’s isotopic ‘parent’ that has been previously fit or determined. This has consequences about the order in which UMR sets of HR ions at on m/z are fit.
 
(1E) Select a subset of HR ions in (1D) to be constrained.<br>Constrained means that the fitting routine does not ‘fit’ this HR ion – the peak height is fixed to a value based on the magnitude of the HR ion’s isotopic ‘parent’ that has been previously fit or determined. This has consequences about the order in which UMR sets of HR ions at on m/z are fit.
  
===2. Perform HR fitting===
+
==2. Perform HR fitting==
 
In the current AMS HR code the HR fitting is performed at one UMR set of HR ions (all chosen HR ions at an individual integer m/z range). For example at nominal m/z 18 the HR ions 18O+, H2O+, 15NH3 at 17.999161, 18.010559, 18.023581 are fit in one ‘set’. Typically isotopic HR ions are constrained, and hence in this case the magnitude of the HR ions of O+ and NH3+ would need to be determined from sets of HR ions at nominal m/z at 16 and 17 before the set of HR ions at 18 would be found. In future applications ToF mass spec applications of high organic fragments (>250m/zish?) with a positive mass defect the division into nominal m/z sets of HR ions could be problematic.
 
In the current AMS HR code the HR fitting is performed at one UMR set of HR ions (all chosen HR ions at an individual integer m/z range). For example at nominal m/z 18 the HR ions 18O+, H2O+, 15NH3 at 17.999161, 18.010559, 18.023581 are fit in one ‘set’. Typically isotopic HR ions are constrained, and hence in this case the magnitude of the HR ions of O+ and NH3+ would need to be determined from sets of HR ions at nominal m/z at 16 and 17 before the set of HR ions at 18 would be found. In future applications ToF mass spec applications of high organic fragments (>250m/zish?) with a positive mass defect the division into nominal m/z sets of HR ions could be problematic.
  
====(2A) Single MS high-resolution stick calculation & diagnostics====
+
===(2A) Single MS high-resolution stick calculation & diagnostics===
 
Before a user spends a lot of computation time generating HR sticks, it is beneficial to examine HR fits of few raw spectra at periods of various concentrations and compositions. This involves visually inspecting each UMR m/z region of interest and significant signal. This visual inspection gives a user feedback on all the parameters that go into the fit: m/z calibration, baseline settings, PW, PS, selection of HR ions to fit. It is often at this stage where at least one of the input parameters of the HR fit requires fine-tuning.
 
Before a user spends a lot of computation time generating HR sticks, it is beneficial to examine HR fits of few raw spectra at periods of various concentrations and compositions. This involves visually inspecting each UMR m/z region of interest and significant signal. This visual inspection gives a user feedback on all the parameters that go into the fit: m/z calibration, baseline settings, PW, PS, selection of HR ions to fit. It is often at this stage where at least one of the input parameters of the HR fit requires fine-tuning.
 
Other diagnostics for single spectra include:
 
Other diagnostics for single spectra include:
Line 182: Line 180:
 
* Mike’s HR ion sensitivity tool.
 
* Mike’s HR ion sensitivity tool.
  
====(2B) High-resolution stick calculation for many spectra====
+
===(2B) High-resolution stick calculation for many spectra===
 
As the AMS HR code currently exists in version 1.09, the HR fitting results are only saved in intermediate files for future access via subsequent ‘fetch’ commands. It is beneficial for users to have a place to ‘play’ while keeping the HR fitting results from being modified.
 
As the AMS HR code currently exists in version 1.09, the HR fitting results are only saved in intermediate files for future access via subsequent ‘fetch’ commands. It is beneficial for users to have a place to ‘play’ while keeping the HR fitting results from being modified.
  
===3. Organizing, displaying HR stick results & diagnostics===
+
==3. Organizing, displaying HR stick results & diagnostics==
====(3A) Organization (order of steps important here)====
+
===(3A) Organization (order of steps important here)===
 
(3Ai) Define HR families<br>It will always be convenient to group HR ions into families. A family can be defined explicitly by listing its members (i.e. family Cx =C+, C2+, C3+, etc) or by an algorithm that parses the chemical formula of each HR ion. The AMS HR code currently wants families to be determined at the same time as the selection of the HR ions to fit. However, this is unnecessary and future versions will allow more flexibility.
 
(3Ai) Define HR families<br>It will always be convenient to group HR ions into families. A family can be defined explicitly by listing its members (i.e. family Cx =C+, C2+, C3+, etc) or by an algorithm that parses the chemical formula of each HR ion. The AMS HR code currently wants families to be determined at the same time as the selection of the HR ions to fit. However, this is unnecessary and future versions will allow more flexibility.
  
Line 193: Line 191:
 
(3Aiii) Define HR frag entries<br>HR frag entries explicitly states the mathematical treatment of any/all specialized considerations of HR ion, whether the HR ion was fit or not.
 
(3Aiii) Define HR frag entries<br>HR frag entries explicitly states the mathematical treatment of any/all specialized considerations of HR ion, whether the HR ion was fit or not.
  
====(3B) Display====
+
===(3B) Display===
 
Users will require typical output: time series, mass spectra summed or averaged in user-defined ways, and plotted in a variety of formats. What is different from a UMR analysis is that user have an additional type of entity: HR families as well as individual HR ions, unit resolution summed values, and HR species.
 
Users will require typical output: time series, mass spectra summed or averaged in user-defined ways, and plotted in a variety of formats. What is different from a UMR analysis is that user have an additional type of entity: HR families as well as individual HR ions, unit resolution summed values, and HR species.
  
====(3C) Diagnostics====
+
===(3C) Diagnostics===
 
All the diagnostics outlined in 2A above, and with a time (or other) dimension will be required by users.
 
All the diagnostics outlined in 2A above, and with a time (or other) dimension will be required by users.

Latest revision as of 00:11, 8 October 2010

What we know for sure:

1. MikeC is to develop a PIKA-esque tool for multi-dimensional data from TofDAQ data files.
2. The non-AMS data format and multi-dimensional aware criteria forces a re-write of the HR code rather than straight adoption of existing PIKA
3. Other TW (non-AMS) applications will wish to use many existing PIKA features (not limited to PWPS, HRfrag, families, isotopic constraints), but others will be less applicable (obviously, the AMS-centric parts!).
4. Whatever gets written should address existing desires and concerns with the PIKA code, and be fully transparent so as to accept any input data type. It should be developed with memory and speed optimisation in mind.

Who will use this software?

1. PIKA users, via careful integration of new software to mate with existing PIKA construct
2. Aerodyne-based Tofwerk-TOF-users community
3. TW prototype instrument application developers
4. Other TW OEM customers, probably via libraries (mapped from the template this Igor code will provide)

Software map

2010-10-08 HRconceptmap.jpg

Step 1: Define MS Matrix

The user first runs through

  • m/z calibration as f(t)
  • Baseline determination as f(t)

At this point, the user generates an HDF dataset containing the raw multi-dimensional MS matrix on which the entire HR-fitting process will run. This will include

  • integration in secondary dimensions (not writes/runs)
  • saving the raw data and the baseline-subtracted data

Note that it is not prudent to allow multiple copies of the mass cal. and bsl params to pevade through the HR-fitting, as these affect the HDF matrix directly and thus must be considered FIXED during HR-fitting. We thus save the date/time of creation of this matrix as an associated HDF attribute, for comparison with outputs in steps 2-5. We need some method to flag the m/z cal and bsl params, such that if the user changes them, the remaining steps in this HR-fitting process are immediately aware, and force a re-start from step 1.

    • Important caveat: Steps 1&2 can be iterated, thus we only recommend fixing m/z at step 2, and force thereafter.

Step 2: Define model

Identical process to PIKA peak width/shape determination, with option to save multiple copies of the output instrument model. In each copy:

  • Peak width&shape as f(m/z, t)
    • The model will certainly change as a f(m/z). It may change as a f(t).
  • Name and version (so if updated, time-series can tell)
  • Date/Time of creation (compare to MS-matrix... if older, the model is flagged as VOID - really IMPORTANT)


Step 3: Define fit parameters (via SingleMS panel)

TW panel will be inherently more complex than PIKA, which can retain some default options for this section 'under the hood', so to speak.

  • Choose a slice from the integrated, calibrated, baseline-subtracted MS matrix calculated in step 1. With this single MS:
    • Choose an instument model
    • Choose ion contraint options:
      • 1. Constrain the ion list (cf PIKA). Directly edit a master list
      • 2. Find peaks in the MS, perhaps using initial guesses. Generate a master list
      • 3. Don't constrain the ion list in the fits. Reference a master list (which could be an exact mass finder) so as to to give meaning to the fit results
      • 4. Combination. Find peaks (2) but then allow the x0 to vary in the fits
    • Choose isotopic constraints if appropriate
    • Choose other fit parameters
  • Display results
  • Re-evaluate parameters and calculate new results. Intercompare.
  • Choose one set of fit parameters for time-series calculations

Step 4: Get results

Use parameters from step 3 to generate time-series ion fits for a todo-wave. Save, as f(t):

  • HDF: Ions fitted
    • Exact mass
    • Chemical formula - if not constraining the list, this is a guess!
  • HDF: Ion x0 - if allowed to vary
  • HDF: Ion amplitude
  • Mem: Fit Parameters as f(runs), thru named HR_Sticks_Profile
    • Instrument model profile name and version (so if updated, can tell)
    • Creation date/time of this instrument model
    • Creation date/time of MS matrix used in the fits
    • Ion constraint options
    • Checkboxes etc from singleMS panel that influence the fits
    • Name of HDF waves saved

Step 5: Diagnostics

Crunch point for the user. If things look bad, they have to return to a previous step:

  • Return to step 1: oh dear
    • User believes there is a problems with
      • mass calibration
      • baseline subtraction
      • MS integration regions (cf Donna's PTOF regions)
    • Altering any of these will alter the MS Matrix... voiding the instrument model. The user has to start all over again.
  • Return to step 2: model issues
    • The user believes there is a problem with
      • a particular model profile used in one of the runs
    • Altering this voids the model, and possibly therefore the ion lists etc. The user must re-evaluate those runs calculated using this model
    • NOTE: If the user thinks they should instead have used a different model profile, they could just re-calculate at this point, caveating that this does not afford them to chance to check the ion lists again. This is essentially one of the options in the next bullet
  • Return to step 3: fit params look bad
    • The user believes there are problems with
      • the ion constraint options, master lists
      • the isotopic constraint options
      • the fit options employed
      • the instrument model profile chosen for each run
    • Altering any of these changes the HR sticks, and thus the users must choose again (more wisely) and re-calculate

Changes from and similarities to PIKA

Being kept analagous to PIKA

  • Re-requisite of 'SQUIRREL'-steps m/z and baseline before HR-fitting commences
  • Isotopic constraints. We always require all unit masses to be fitted together, in the correct order.
  • Master list option.
  • Time-series changes in the instrument model (tho the method changes).
  • Stages where diagnostics are assessed. Points at which the user has to revert if changes made... even if this is not apparent currently!

Changed from PIKA

  • Multi-dimensional. Although points in this discussion may not explicitly refer to it, the entire process must be capable of acting on a 3-D dataset.
    • Caveats: We allow the model profile and fit-parameters chosen for the time-series to vary on a run-by-run basis ONLY. This is to keep to the todo-indexing system (and force users to save appropriately with TofDAQ!!).
  • Handling of instrument model profiles:
    • Individual profiles can be saved. Time-series calculations act on profile names, not PWPS parameters.
  • Single MS analysis.
    • Intercomparison of multiple sets of fits, using different
      • MS slices from the matrix
      • model profiles
      • fit parameters (ion constraints etc)
    • Additional options for ion constraints
      • Master list is an option, not a given
      • Peak-finding routine can be used to generate a master list from a given MS-model-param combination
      • x0 can be allowed to vary in the minimisation (ie no constraints at all)... assigning found exact masses to chemical formulae after-the-fact
  • MSConcs writes to HDF (in step 1)

Questions and discussion points

  • How to flag the m/z calibration and baseline parameter determination? The HR-fitting routine needs to know
    • when these were calculated, and
    • if it was thus prior to the calculation of the MS Matrix
    • if not, then we need to force the user to start over at step 1
  • Should we save m/z and bsl params with the MS matrix?
    • Should not be entirely necessary if we keep track of date/time... plus, it will be huge
  • How to implement the single MS intercomparison analysis? Got to keep track of
    • MS slice used from the matrix
    • instrument model
    • ion constraints (and/or chosen ions to fit)
    • therefore we need to brainstorm the interface to make this happen
  • Different datasets:
    • Have not put much in this discussion about the use of differnet datasets (eg open/closed). Clearly need this capability.
    • Instrument model should be easily able to handle these since we are coding in profiles
    • If, for e.g., the open m/z is altered, forcing the user to step1 for the open HR sticks, we should probably retain the ability to note that the closed HR sticks are not void, right?!
  • Parameter Profiles?
    • Much talk about instrument model profiles... what about for the fit parameters? Do we get the user to build up a (named, versioned, dated) profile containing:
      • Ion constraint options
      • Isotopic options
      • Other fit options
      • Model profile name
      • Slice used from MS Matrix in the single MS analysis, and which dataset
    • Advantage would be that we don't save a bunch of stuff in the HR_Sticks HDF, but rather a 'parameter-profile' name for each todo-wave
  • Use of profiles:
    • Assuming that the use of the profiles arises through todo-waves... so, rather than defining which profile to use as f(runs), we just save which WAS used as f(runs).... its a subtle but important difference.
  • Advanced options:
    • Coding in advanced option to save multiple copies of HR_Sticks for a given dataset.
  • Linking of ion lists
    • If using a master list... could flag the ions used in m/z, PW, PS, fits in analgous manner.

Existing PIKA fitting process: steps

2010-09-21 MJC PIKAconceptmap.jpg

1. Preparation for HR fitting

The order of preparatory steps below is generally not modifiable. (MJC: comments added for changes required in the TW version)

(1A) Get good m/z calibration parameters. Mike has this code in place for Tofwerk files.

(1B) Get good baseline-removed spectra.
The purpose of saving copies of the raw spectra with the baseline removed is so that the multipeak fitting is done on the ‘same’ spectra. This insures that the calculation of any baseline isn’t dependent on settings (‘resolution’ interpolations parameters) that could be adjusted by the user and then not saved, not recorded, an hence not replicable.
MJC: Something to consider: should the dimensional averaging be performed at this stage? This would limit the user to the prescribed dimensional bases, but facilitate the sped-up analysis currently available in PIKA.??

DTS: Yes, you are correct. This is the place to generate the spectra that will be fit. I think it important that the PW and PS calcs are based on the same spectra that will be fit.

(1C) Get good peak width (PW), peak shape (PS).
Getting good values for these parameters is necessarily an iterative process. In general one looks at 100s of sets of isolated HR ions (i.e. C4H9+, etc ) spectra to get good statistics for PW and PS. Once a user is confident that selected HR ions are behaving in a consistent and ‘smooth’ manner, one can set the PW and even PS on an individual run basis if needed.
MJC: TW product will require a more generalised version of the current code, which hard-wires in AMS-specific ions...

DTS: Yes the current list of HR ions fit with gaussians is pre-determined, but currently users can add to the list. An import/export list of HR ions to be fit with gaussians is necessary. While on the topic, Jose had always urged that the various lists of ions be linked/merged somehow. That is, from the main list of all ions, there would be flags for each HR ion indicating subsets used for m/z calibration, peak width, peak shape.

(1D) Select the HR ions to fit.
This is highly variable depending on the application. Similar to Mike’s m/z calibration routine, I envision a simple interface whereby a user imports settings appropriate to their type of application.
MJC: Since the UM now I'm wondering about de-constraining the list, too.... hmmm

DTS: I'm not clear what deconstraining the list means. I envision a single prompt to the user at the start of the experiment that would load in their HR ion settings (with flags for m/z calibrations, gaussian fits, peak width, peak shape).

(1E) Select a subset of HR ions in (1D) to be constrained.
Constrained means that the fitting routine does not ‘fit’ this HR ion – the peak height is fixed to a value based on the magnitude of the HR ion’s isotopic ‘parent’ that has been previously fit or determined. This has consequences about the order in which UMR sets of HR ions at on m/z are fit.

2. Perform HR fitting

In the current AMS HR code the HR fitting is performed at one UMR set of HR ions (all chosen HR ions at an individual integer m/z range). For example at nominal m/z 18 the HR ions 18O+, H2O+, 15NH3 at 17.999161, 18.010559, 18.023581 are fit in one ‘set’. Typically isotopic HR ions are constrained, and hence in this case the magnitude of the HR ions of O+ and NH3+ would need to be determined from sets of HR ions at nominal m/z at 16 and 17 before the set of HR ions at 18 would be found. In future applications ToF mass spec applications of high organic fragments (>250m/zish?) with a positive mass defect the division into nominal m/z sets of HR ions could be problematic.

(2A) Single MS high-resolution stick calculation & diagnostics

Before a user spends a lot of computation time generating HR sticks, it is beneficial to examine HR fits of few raw spectra at periods of various concentrations and compositions. This involves visually inspecting each UMR m/z region of interest and significant signal. This visual inspection gives a user feedback on all the parameters that go into the fit: m/z calibration, baseline settings, PW, PS, selection of HR ions to fit. It is often at this stage where at least one of the input parameters of the HR fit requires fine-tuning. Other diagnostics for single spectra include:

  • Comparison of HR summed to UMR vs. UMR sticks.
  • Residuals from the HR fits.
  • “5-panel” graph... Some sort of UMR, family summarized spectra plots. This is more important for EI than for soft or other ionization techniques.
  • Tabulated results. Having easy access to the HR sticks in a table is useful for those wishing to check the math or perform their own subsequent calculations.
  • Mike’s HR ion sensitivity tool.

(2B) High-resolution stick calculation for many spectra

As the AMS HR code currently exists in version 1.09, the HR fitting results are only saved in intermediate files for future access via subsequent ‘fetch’ commands. It is beneficial for users to have a place to ‘play’ while keeping the HR fitting results from being modified.

3. Organizing, displaying HR stick results & diagnostics

(3A) Organization (order of steps important here)

(3Ai) Define HR families
It will always be convenient to group HR ions into families. A family can be defined explicitly by listing its members (i.e. family Cx =C+, C2+, C3+, etc) or by an algorithm that parses the chemical formula of each HR ion. The AMS HR code currently wants families to be determined at the same time as the selection of the HR ions to fit. However, this is unnecessary and future versions will allow more flexibility.

(3Aii) Define HR batch entities
In AMS parlance a batch entity is typically a species, like organics, nitrate, etc. While the grouping of HR ions into families is convenient, it will always be the case that the chemical information a user desires may require a mathematical or specialized treatment of the HR fit results beyond the simply family sorting mechanism. A common AMS example is the parsing of the OH+ signal between the water and the organic species. Every HR species is defined first by two items: (a) a list of families and (b) a frag wave which identifies any modifications based on individual HR ions (such as the OH example above).

(3Aiii) Define HR frag entries
HR frag entries explicitly states the mathematical treatment of any/all specialized considerations of HR ion, whether the HR ion was fit or not.

(3B) Display

Users will require typical output: time series, mass spectra summed or averaged in user-defined ways, and plotted in a variety of formats. What is different from a UMR analysis is that user have an additional type of entity: HR families as well as individual HR ions, unit resolution summed values, and HR species.

(3C) Diagnostics

All the diagnostics outlined in 2A above, and with a time (or other) dimension will be required by users.