J. Mol. Biol. (1966) 19, 548-555

Codon—Anticodon Pairing:
The Wobble Hypothesis

F. H. C. Crick
Medical Research Council, Laboratory of Molecular Biology

Hills Road, Cambridge, England

(Received 14 February 1966)

It is suggested that while the standard base pairs may be used rather strictly
in the first two positions of the triplet, there may be some wobbie in the pairing
of the third base. This hypothesis is explored systematically, and it is shown
that such a wobble could explain the general nature of the degeneracy of the
genetic code.

Now that most of the genctic code is known and the base-sequences of sRNA molecules
are coming out, it seems a proper time to consider the possible base-pairing between
codons on mRNA and the presumed anticodons on the sRNA.

The obvious assumption to adopt is that sSRNA molecules will have certain common
features, and that the ribosome will ensure that all sRNA molecules are presented
to the mRNA in the same way. In short, that the pairing between one codon-
anticodon matching pair will to a first approximation be “equivalent” to that between
any other matching pair.

As far as I know, if this condition has to be obeyed, and if all four bases must be
distinguished in any one position in the codon, then the pairing in this position is
highly likely to be the standard one; that is:T

G====C
and ====— U
or some equivalent ones such as, for example,
I ====C
and A====T

since this is the only type of pairing which allows all four bases to be distinguished
in a strictly equivalent way.

We now know enough of the genetic code to say that in the first two positions 0!
the codon the four bases are clearly distinguished; certainly in many cases. anil
probably in all of them. I thus deduce that the pairings in the first two positions ar
likely to be the standard ones,

{ Throughout this paper the sign === = is used to mean “pairs with’. If two bases a™

co
548

equivalent in their coding properties, this is written 7 a
“WOBBLE HYPOTHESIS” 549

However, what we know about the code has already suggested two generalizations
‘about the third place of the codon. These are:

(1) U)|} this already appears true in about a dozen cases out of the possible 16,
Cj and there are no data to suggest. any exceptions.

(2) A} probably true in about half of the possible 16 cases, but the evidence
GJ suggests it may perhaps be incorrect in several other cases.

.. The detailed experimental evidence is rather complicated and will rot be discussed
here. (For details of the code see, for example, Nirenberg et al., 1965; and Séll et al.,
1965.) It suffices that these rules may be true, as suggested by Eck (1963) a little
time ago. Alternatively, only the first one may be true.

. This naturally raises the question: Does one sRNA molecule recognize more than
one codon, e.g. both UUU and UUC. Some evidence for this was first presented by

Bernfield & Nirenberg (1965). They showed that all the sRNA for phenylalanine can
be bound by poly U, although this sRNA also recognizes the triplet UUC, at least in
part. More recent evidence along these lines is presented in S6ll et al. (1966) and
Kellogg et al. (1966). Again I do not wish to discuss here the evidence in detail, but
simply to ask: If one sRNA codes both XYU and XYC, how is this done?

Now if we do not know anything about the geometry of the situation, it might be
thought that almost any base pairs might be used, since it is well known that the
bases can be paired (i.e. form at least two hydrogen bonds) in many different ways.
However, it occurred to me that if the first two bases in the codon paired in the
standard way, the pairing in the third position might be close to the standard ones.

We therefore ask: How many base pairs are there in which the glycosidic bonds
occur in a position close to the standard one? Possible pairs are:

====A (1)

In my opinion this will not occur, because the NH, group of guanine cannot make
one of its hydrogen bonds, even to water (see Fig. 1),

-
-
~~

Fia. 1. The unlikely pair guanine—adenine.

Us===0C (2)

This brings the two keto groups rather close together and also the two glycosidic
bonds, but it may be possible (see Fig. 2).

+ This symbol implies that both U and C code the same amino acid.
550 F. H. C. CRICK

Fia. 2. The close pair uracil—cytosine.

U =e U
Again rather close together (see Fig. 3).

~~ ee

Fig. 3. The close pair uracil—uracil.
G====U (4)

orl ====U

These only require the bond to move about 2-5 A from the standard position (sce
Fig. 4).

-
~

————

Fie, 4. The pair guanine-uracil (the pair inosine—uracil is similar).
“WOBBLE HYPOTHESIS" 551

This is perfectly possible. Poly I and poly A will form a double helix. The distance
between the glycosidic bonds is increased (see Fig. 5),

ee

on

Fic. 5. The pair inosine—adenine.

As far as I know, these are all the possible solutions if it is assumed that; the bases
are in their usual tautomeric forms.

I now postulate that in the base-pairing of the third base of the codon there is a
certain amount of play, or wobble, such that more than one position of pairing is
possible.

As can be seen from Fig. 6, there are seven possible positions which might be reached
by wobbling. However, it by no means follows that all seven are accessible, since the
molecular structure ig very likely to impose limits to the wobble. We should there-
fore strictly consider all possible combinations of allowed positions. There are 127 of
these, but most of them are trivial. If we adopt the rule that all four bases on the codon
(in the third position) must be recognized (that is, paired with) we are left with
51 different combinations. This is too many for easy consideration, but fortunately
we can eliminate most of them by only accepting combinations which do not violate
_the broad features of the code. If we assume:

(a) that all four bases must be recognizable;

(b) that the code must in some cases distinguish between

cf and a} as it appears to do for the pairs

Phe Tyr His Asn Asp
Leu C.T.t Gln Lys Glu
(not all of which are likely to be; wrong)

then by strictly logical argument it can be shown both that the standard position
must be used, and that the three positions on the left of Fig. 6 cannot be used,

This leaves us with only four possible sites to consider one of which—the standard
one—must be included. There are therefore only seven possible combinations. I have
examined all these, but I shall restrict myself here to the case in which all four posi-
tions are used, as this is structurally the most likely and also seems to give the code
(called code 4 in the note privately circulated) which best fits the experimen:al data.

} C.T., Chain termination.
552 F. H. C. CRICK

Anticodon Codon LUI
G---U}}
\e N\
\ —_
U--U

A--~U
225} C---G

\ G---C
a I---C
a

Fis. 6. The point X represents the position of the C," atom of the glycosidic bond (shown
dotted) in the anticodon. The other points show where the C,’ atom and the glycosidic bond fall
for the various base pairs, (Pairs with inosine in the codon have been omitted for simplicity.)
The wobble code suggested uses the four positions to the right of the diagram, >ut not the threo
close positions.

¢
af
/ Standard
xd : \ \—— U---A
U--C

The rules for pairing between the third base on the codon and the corresponding
base on the anticodon are set out in Table 1. It can be seen that these rules make
several strong predictions:

(1) it is not possible to code for either C alone, or for A alone.
For example, at the moment the codon UGA has not been decisively allocated.
Wobble theory states that UGA might either:
(a) code for cysteine, which has UGU and UGC; or
(b) code for trypotophan, which has UGG; or
(c) not be recognized.

TABLE |
Pairing at the third position of the codon

 

Base on the Bases recognized

 

anticodon on the codon
A
U a
Cc G
At U
, U
G cf

U
I c}
A

ae sHIN.A-

{ It seems likely that inosine will be formed enzymically from an adenine in the nascent se .

This may mean that A in this position will be rare or absent, depending upon the exact specun’'-
of the enzyme(s) involved.

 
“WOBBLE HYPOTHESIS” 553

However it does not permit UGA to code for any amino acid other than cysteine or
tryptophan. This rule could also explain why no suppressor has yet been found which

suppresses only ochre mutants (UAA), although suppressors exist which SUPPprcss
“both ochre and amber mutants (UAQ).

(2) If an sRNA has inosine in the place at the relevant position on the anticodon
(i.e. enabling it to pair with the third base of the codon), then it must recognize U,
C and A in the third place of the codon. Conversely, those amino acids coded only
by XY (such as Phe, Tyr, His, etc.) cannot have inosine in that place on their sRNA.

(3) Wobble theory does not state exactly how many different types of sRNA will
actually be found for any amino acid. However if an amino acid is coded for by all
four bases in the third position (as are Pro, Thr, Val, etc.), then wobble theory pre-
dicts that there will be at least two sRNA’s. These can have the recognition pattern:

UL me 4
cf PS @

or U
C> plus G
A

Note that the sets actually used for any amino acid may well vary from species to
species.

The Anticodons

At this point it is useful to examine the experimental evidence for the anticodon.
Ih the sRNA for alanine from yeast, Holley et al. (1965) have the following sequences:

——— pUpUpIp Gp CpMeIp¥p ———
position ——~— 36 37 38 ——_—
Zachau and his colleagues (Diitting, Karan, Melchers & Zachau, 1965) have for

one of the serine sRNA’s from yeast:
——~ p¥pUpIpGpApAtp¥p ———
(A* stands for a modified A)

For the valine sRNA from yeast, Ingram & Sjéquist (1963) have shown that the
‘nly inosine occurs in the sequence:

——— pIpApCp ———

Holley et al. (1965) have already pointed out that IGC is a possible anticodon for
‘lanine, and the additional evidence makes it almost certain to my mind that this is
‘orrect, and that the anticodons are as given in the Table belowt:

_ 7? Note added 26 April 1966. Drs J.T. Madison, G. A. Everett and H. Kung (personal communi-

tion) have completed the sequence of the tyrosine sRNA from yeast. The sequence strongly

Nggests that the anticodon in this case is GA, corresponding to the known codons UAY. Since
can form the same base pairs as U, this is in excellent agreement with the previous data,
554 F. H. C. CRICK

 

 

Yeast sRNA
Anticodon Codon
Ala IGaGec GC?
Ser IGA UC?
Val . IAC GuU?

 

remembering that the pairing proposed between codon and anticodon is anti-parallel.
Thus I confidently predict: the anticodon is a triplet at (or very near) positions 36-37_
38 on every sRNA, and that the first two bases in the codon pair with this (in an anti.
parallel manner) using the standard base pairs,

However, inosine does not occur in every SRNA. In particular Hclley et al. (1963)
(and personal communication) have reported that the tyrosine sRNA has two peaks,
neither of which contains inosine. Moreover, Sanger (personal communication) tells
me that there is rather little inosine in the total sRNA from EZ. coli.

Testing the Theory
Two obvious tests present themselves:

(1) To find which triplets are bound by any one type of sRNA. This is being done by
Khorana and his colleagues (Séll et al., 1966), and also by Nirenberg’s group (Kellogg,
Doctor, Loebel & Nirenberg, 1966). The difficulty here is to be sure that the sRNA
used is pure, and not a mixture.

(2) To discover unambiguously the position of the anticodon on sRNA, and to find
further anticodons. This will certainly happen as our knowledge of the base sequence
of sRNA molecules develops. The absence of inosine from any anticodon is obviously
of special interest.

In conclusion it seems to me that the preliminary evidence seems rather favourable
to the theory. I shall not be surprised if it proves correct.

I thank my colleagues for many useful discussions and the following for sending me
material in advance of publication: Dr M. W. Nirenberg, Dr H. G. Khorana, Dr G.
Streisinger, Dr W. Holley, Dr J. Fresco, Dr H. G. Zachau, Dr C. Yanofsky, Dr H. G.
Wittmann, Dr H. Lehmann and Dr J. D. Watson.

REFERENCES

Bernfield, M. R. & Nirenberg, M. W. (1985). Science, 147, 479.

Diitting, D., Karan, W., Melchers, F. & Zachau, H. G. (1965). Biochim. biophys. Acta, 108,
194.

Eck, R. Y. (1963). Science, 140, 477.

Holley, R. W., Apgar, J., Everett, G. A., Madison, J. T., Marquisee, M., Merrill, S. H..
Penswick, J. R. & Zamir, A. (1965). Science, 147, 1462.

Holley, R. W., Apgar, J, Everett, G. A., Madison, J. T., Merrill, S. H. & Zamir, A. (1963).
Cold Spr. Harb. Symp. Quant. Biol. 28, 117. :

Ingram, V. M. & Sjéquist, J. A. (1963). Cold. Spr. Harb. Symp. Quant. Biol. 28, 133.
“WOBBLE HYPOTHESIS" 555

‘Kellogg, D. A., Doctor, B. P., Loebel, J. E. & Nirenberg, M. W. (1966). Proc. Nat. Acad.

x Sci., Wash. 55, 919.

‘Nirenberg, M., Leder, P., Bernfield, M., Brimacombe, R., Trupin, J., Rottman, F. &

r  O’Neal, C. (1965). Proc. Nat. Acad. Sci., Wash. 53, 1161.

S6, D., Jones, D.S., Ohtsuka, E., Faulkner, R. D., Lohrmann, R., Hayatsu, H., Khorana,

: H. G., Cherayil, J. D., Hampel, A. & Bock, R. M. (1966). J. Mol. Biol. 19, 556.

86H, D., Ohtsuka, E.. Jones, D. S., Lohrmann, R., Hayatsu, H., Nishimura, S. & Kkorana,
H. G. (1965). Proc. Nat. Acad. Sci., Wash. 54, 1378.