TABLES AND AN ALGORITHM FOR CALCULATING FUNCTIONAL GROUPS OF ORGANIC MOLECULES IN HIGH RESOLUTION MASS SPECTROMETRY Joshua Lederberg Professor of Genetics School of Medicine Stanford University STAR (Scientific and Technical Aerospace Reports) No. N6N -2)}4 26 Submitted April 1, 1964 to NASA as an interim report under Grant No, NsG 81-60, "Cytochemical studies of planetary microorganisms", TABLES AND AN ALGORITHM FOR CALCULATING FUNCTIONAL GROUPS OF ORGANIC MOLECULES IN HIGH RESOLUTION MASS SPECTROMETRY Joshua Lederberg Stanford University Introduction The pioneering work of Beynon! has demonstrated the salient power of high resolution mass spectrometry for the structural analysis of organic molecules. Instruments with resolution capabilities of 1 part in 10,000 to 100,000 are becoming commercially available, There remains the problem of computing the molecular composition from_the experimentally found mass. Tables for this purpose have been published® but are necessarily limited in scope by the sheer number of possible combinations. On the atomic mass scale 12¢ 2 12,00000, the fractional mass is attributable exclusively to the non-carbon part of a molecule, so these parts can be tabulated separately and much more compactly than complete tables of composition, Such tables have been computed over the range of H from 0 to 124, N from 0 to 6 and O from 0 to 11, and published elsewhere? to make them generally accessible. An alternative approach was also generated, which gives even more compact tables and an algorithm more suitable for computer-oriented analysis, and is presented here, The CHy=14 Algorithm Where molecules in homologous series are in question, or the investigator has a definite combination of functional groups in mind, the CH2=14 algorithm will more than justify some additional arithmetic on the part of the specialized user. Furthermore, its tables are even more com- pact: roughly speaking, where an independent variable of the C=l2 system is the number of H atoms, plausibly from 0 to 124, in the CHj=14 system we use degrees of unsaturation from 0 to 30. This permits us to encompass most molecules and radicals, N from 0 to 8 and © from 0 to 13 in Table 2. In addition, the entries appear in an order related to the complexity of the molecule. The topmost, simpler areas of the tables may become quite familiar with extensive use, since e.g. alkanes, monocarboxylic acids, dicarboxylic acids, monoketones, etc, each have a singular location in the tables, The logic of extracting the formula by division and examining the remainder is similar to that of the C#=12 system, However, as Kendrick has pointed out, using CHo as a base relates the formula to the structural ) : . | . : 4 : : : 2. concept of a fundamental hydrocarbon, with functional substituents, Instead of taking C, H, 0, N as the variables, we take a formula as Terminal H + CH), Cz, 0, NH parts, i.e. saturated carbons, double bonds, oxygen, and amino-functions. For computational purposes, -N= is regarded as -NH=- +C:, -CH»-CH=N- is treated as BO ae’ This leads to a mathematical fiction in treating molecules which have more double bonds than C atoms. "Double bonds" includes rings (equivalent to one double bond each) and -C=C= functions (equivalent to two C:), i.e. C: can be read as "degrees of unsaturation" together with an equal number of C atoms. In principle we can implement this scheme of extracting the CH part by shifting to a mass scale 12cHy = 14,.00000, and then dividing by 14. This would be done by multiplying the found mass by 0,99888337 = (14,00000/14,01565),. We would then tabulate the mass defect, the amount by which the decimal value falls short of an integral mass number, There will be a characteristic defect on this scale for every class of organic compound, A few classes of compounds exemplified by ethane, ethyl radical, and ethylamine have a mass excess, i.e. negative defect, -.01339, ~.00669 and ~,00753 respectively; i.e. their masses are 30.01339, 29,00669 and 43,00753, respectively, when expressed on the CH,=14 scale, while higher monologues with n additional CH, groups will be precisely 14n larger, In these cases their functional groups fail to overbalance the mass excess of the two terminal hydrogens of the basic structure, Alkenes, Coons will have a defect of zero, Other functional groups will have equally characteristic mass defects, simply the sum of the contribution of each part (see Table 3). In actual computation the multiplier .99888 is clumsy and inefficient, Where m = exact mass on ““scale, it is better to take m (0,99888337) = m~.00111663m, especially as the factor can usually be truncated, and only the decimal corrections need be kept in hand, This will be readily seen in working through the examples, Table 1 calculates the correction from the integer part of the mass (with reasonable precision for most purposes) so the only arithmetic actually needed is to subtract the entry of Table 1 from the observed mass, and use the difference in scanning the main table (Table 2) ° Terminal H numbers 2 for complete molecules, 1 for radicals, the latter being signified by * in the table, Example. An alkaloid analysed to contain about N, and Og to Og returned a mass reading of 718.37430 + ,00600. Quotient Residue Decimal 1, From Table 1, 718,37430 = 51 (x 14) + 4 + » 37430 2. Calculate m (1-0,00111663), From Table 1: 718 — 30174" 637430 ——> 37430 Defect 42744 + 600 OE ORE EE 3. (*The table gives the value for the integer 718. A more precise measure of the defect by interpolation or direct multiplication would be 80216 - 37430 = 42786; with practice, the interpolation can be done by eye if necessary.) 3. From Table 2, integer residue class 4, the following entries are within the range 42144 - 43344 and are intact molecules (not*); Defect Cz NH 0 =(CH), 42192 17 8 7 31 42325 12 4 11 27 42378 30 6 0 32 42511 25 2 4 28 42727 8 6 13 28 42779 26 8 2 33 42912 21 4 6 29 43045 16 0 10 25 43314 17 6 8 30 Of these, only 42912 satisfies the compositional requirements with respect to N and O, It may be interpreted as follows: 2H + 21C: + 4NH + 60 + (51-29) # 22CHo Extensions The proper extent of the tables is a compromise between the bulk and cost of a larger table as against the effort of additional arithmetic. The most frequent extension may be in the range of NH, here limited to 8. If so indicated, 9NH can be extracted from the experimental mass by sub- tracting 135,09809; the entries in the table will then correspond to NH from 9 to 17 in place of 0 to 8. Further extended versions of the tables can be computed if their utility warrants. Alternatively, other special cases can be handled with the present tables with some additional arithmetic. If additional atoms are suspected, their weights should simply be subtracted from the mass numbers found, Eeg.s, if monochloro compounds are in question the corres- ponding dichloro radicals are formed by subtracting 35¢1 or 3%c1, The tables are presented to 5 significant figures, which is an optimistic projection of instrumental capacity. The Sth digit is subject to rounding error of computation, The values refer to the mass of the neutral molecule rather than the positive ion, as the mass of neutral reference molecules is usually set down in calibrating the mass spectrometer, Constants The constants used in the computation are (on the 12¢ scale): H = 1,00782522, N = 14,003074, 0 = 15.994915, They are taken from the IUPAC report. 4. Acknowledgments These tables represent an exercise in the application of computers to biochemical problems. Research connected with this program has been supported by grants from the National Aeronautics and Space Administration (NsG 81-60), National Science Foundation (NSF G-6411), and National Institutes of Health (NB-04270-01 and 02, and AI-5160-06). I am grateful to Professor Carl Djerassi for having challenged the computer to generate the extended tables, to which the present algorithms are a rebuttal. The programs were run under the Subalgol monitor on the IBM 7090 at Stanford University Computation Center, whose assistance to academic research is supported by an NSF grant (NSF=GP948). I am indebted to the staff of the Computation Center for their unstinting cooperation and to Mrs, Margaret Wightman for skilled and loyal assistance. The actual operation of these programs required about one second of main frame computer time per page of tabular output. References 1. J. He. Beynon, Mass Spectrometry and Its Applications to Organic Chemistry, Elsevier, Amsterdam, 1960, 2, J. H. Beynon and A. E, Williams, Mass and Abundance Tables for Use in Mass Spectrometry, Elsevier, Amsterdam, 1963. 3. J. Lederberg, The Computation of Molecular Formulas for Mass Spectrometry, Holden-Day, San Francisco, 1964, 4, Prior to the completion of this report a similar proposal was enunciated by E. Kendrick, A mass scale based on CH5 = 14,0000 for high resolution mass spectrometry of organic compounds, Analytical Chemistry 35:2146, 1963, His implementation, oriented towards petroleum constituents, is substantially different, but the under- lying concept closely anticipates the present scheme. 5, A, E, Cameron and E. Wichers, Report of the International Commission E on Atomic Weights (1961), Journal of the American Chemical Society, 84: 4175, 1962. BALGOL PROGRAM For Computation of Tables of Mass Defects by Composition 2 MIN3000 LEDERBERG MASSCORRECTIONS.».14 ALGORITHM STANFORD UNIVERSITY COMPILER -- VERSION OF 1/27/64 144.46 COMMENT### OUTPUT ON TAPE $ *-SYMBOL #-SPACE 144.464 COMMENT##* OUTPUT ON TAPE 648$ 144.24. INTEGER OTHERWISES 144.66 GLOBAL INTEGER COLUMNS$ 144.006 ARRAY A(2000%10)$ 144e«+ PROCEDURE COLGUT(LINESsCOLS $ VAR:V1sV2sV3 $ OUT,FORM) $ 156006 BEGIN INTEGER OTHERWISE $ SL = V2sLINES $ 161... M = (CGLS-1)SL + VL $ COLUMNS = COLS $ SM = SL + V1 — V2 $ lTbess COMMENT WRITES MIN( LINES*COLS » (V34V2-V1)/V2 ) OUTS $ 2OT eee FOR I = (VleV2_sSM) $ BEGIN 213006 UNTIL M LEQ V3 $ (M = M-SL $ COLUMNS = COLUMNS~1) $ 224ea0 WRITE($$0, FORM) $ M = M + V2 END $ 233000 RETURN $ OUTPUT O(FOR VAR=(1I,SL»M) $ OUT()) 25be0% END COLOUT() § 304.e. PROCEDURE COLIST{COLS $ I411,12,13 $ OUT, FORM) $ 316... BEGIN INTEGER OTHERWISE $ 31l6ee- J = 60 COLS.12 $ 322e06 FOR K = (Il, Js 13) & COLOUT{60,COLS $ IsKsl2s13 $ OUT,FORM) $ 35leae RETURN END COLIST{) $ 372ee6 PROCEDURE SORTS(FILL»N » VALUES() » KEY()) $ #-PRINT #PRINT 664ee6 FOR I=(Lsl,1000)$A(1,9)=1$ T0500 FOR M=(0,1,13)$ BEGIN T4a2ece FUNCTION CR(D)=134000D + 58626NH +2294540 ~ 67000HR$ T5leece 1=0% T52ece FOR D=(0,1,30)$ T6306 FOR NH =(071,8)$% TTbese (( HR=-MOD{(M+NH,2)$ 1011... { O = MOD{ M + 1398 + 2D — HR — NHe14)/2 ) ) $ L025ee6 FOR 0=0,0+7$( 1053... SUM = 12D + 15 NH + 160 + 2 + HR$ 1055.66 J=L$ [=141$ FOR A(T sJ)=CRID) sDyNHeOsHReSUM/14$ J=J+1))$ LL62.e6 ILIM=I1$% 1164... SORTS(OsILIMs A029) 2Ale1))$ LL77... [LIM = MINLILIM,540)$ 1204.06 COLIST(3$ KslelsILIM$ SOMEsNUMBERS)$ L214e06 ENDS L215... OCUTPUT SOME(FOR I=A(K,9)$ FOR HR=A(1,5) $((ACT,1)-133950)/100, 1270... FOR J = 2239476 $ AlIeJIdIIS 1302.04 FORMAT NUMBERS ( $COLUMNS $(B69L57 By 213» 149 B, $—-HRS( #8) sS1tHRS(* *)5 134lece 13) 2W)$ 134lee6 FINISHS : c : ¥19T0° zELot’ g0€9T° 89890° 647€90° 68280" T2240" 97610" TO?vT’ LO6TT* LYZET’ 78890" 02040° 7¥SS0° 647720" cL1S0° €z0S0° T8820° 68S540° 00000° 66220° GS600° £S$200°- 69900° 6EETO"= we Ed N ec Nr nwsnowwrwrnrnrnwonwo own wo Oo aNNOdNOD dO ssv19 LOaigd SSWW dNdIsaa suoyfqouny etdwts jo (aT#IsS »t=°HO) Sqoajoq sseR Z900°OZ 91%26° 18 L1976°62 TLELE “LE 89916°SE 26786°8S T8S00 °C? O60TO"L7 8€196°L6 97716°18 £6916°L6 G6€000* 29 0S6990° 74 SYEL90'°76 087S00°7¢ 6LE9TO*T9 CLL86°CE 7991720° SY 0847S00° 94 OETEO* 82 GS9SOT0°OE G9SOTO'8T 6769720°LT ZSZ8200°T c9STO'Z aH 24H 19H LOH 10H SNOH ONOH NOH Yosty fostn Yoaty fo0¢H 9999 £ (Ho) SHED 2oty HOOO°HN s¢H CHNOOH HOOOH eyo=CHD o¢HO o°H fuN -H Cy ATVOS Z71=97_ dNNOdhOd ASVA OL NOTINGINLINOD OL NOLLAGTYINOD SSVW *paqernotes Aysnoqzasid punoduos e Jo voTzeqinjizsd sazItTppe sey €8¢20° TOVLT* LL69T° L£€SZo° 8TOZO° 8c680° 06840" S6S70" OySST° 9YCET* 98ST" £2280° 6SES0° £8890" 88S40° 71S90° £9€90° O0@Z470° 876S0° 6€€T0" VEIEO* 76220" 98S00° 69900° Loadgad SSVW °€ ®LqeL S a ot 24 18 g 3G 6L 8 TO LE 9 10 SE t SNO EL OND Il NO zt £oso OT £os ZI foo 4 0090 g- =D4 9 o£ ¥ 0¢ € OODHN y S 1 HNOOD Z 009 Z- =)- 0 O=D z 0 I HN I- -H “HZ 3NdISHa WAOTINI anouo TWNOILONNA ANTYON TA aq Inoud AdInNOds aadTYOTHD da TYOTHO aLVNVAOOTHL GLVNVAD TITYLIN YUaALSA ALVATAS dIOV OINOATAS UALSA ITYOHdSOHd GIOV AXOUCAH aNAZNAT TOTaL— IOId~ QIOV ONINV UHHLEOLHL 29 ‘TOTHL~ da Inv aqIov OT0- ONTY 20 ANd~ "IONA- 20 AZNO- 10 'IV- YaHLA 120 ‘10- ANIKV- (TVOIGVa) ‘TANV- aNV= aWVN