Proc. Natl. Acad. Sci. USA Vol. 89, pp. 2883-2887, April 1992 Biochemistry Hox-1.11 and Hox-4.9 homeobox genes (Hox-4.3 / Hox-4.2 /homeobox nucleotide sequences) ADIL NAZARALI, YONGSOK KIM, AND MARSHALL NIRENBERG* Laboratory of Biochemical Genetics, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892 Contributed by Marshall Nirenberg, December 17, 199] ABSTRACT Mouse Hox-1.11 and Hox-4.9 genes were cloned, and the nucleotide sequences of the homeobox regions were determined. In addition, nucleotide sequence analysis of the homeobox regions of cloned Hox-4.3 and Hox-4.2 genomic DNA revealed some differences in nucleotide sequences and in the deduced homeodomain amino acid sequences compared with the sequences that have been reported. Homeobox genes code for proteins that bind to specific nucleotide sequences in DNA and either activate or inhibit the expression of the corresponding genes (for reviews, see refs. 1-5). Homeobox proteins are related to one another primarily in the sequence of the 60-amino acid residue DNA-binding-site portion of the protein, the homeodomain. The homeobox family of genes is large; more than 50 mouse homeobox genes or species of cDNA have been reported thus far, and additional homeobox genes undoubtedly will be found in the future. Many homeobox genes reside at neigh- boring sites in the chromosome in clusters of homeobox genes (5, 6). Whereas the Drosophila genome contains only one copy of the Antennapedia (Antp) and Ultrabithorax (Ubx) clusters of homeobox genes, mammalian genomes contain four copies of the combined Antp—Ubx cluster of homeobox genes, which presumably originated by succes- sive duplications of an ancestral cluster of genes (7). The amino acid sequences of the homeodomains encoded by genes that originated as copies of the same ancestral gene, which are located in different clusters of genes, are more closely related to one another than the homeodomains en- coded by other genes within the same cluster. Both the amino acid sequence of the homeodomain encoded by each gene and the order of the genes within the four mammalian Antp-Ubx clusters of genes have been highly conserved during evolution. Why the organization of genes within each cluster has been maintained during evolution is not known, but several clues have been found. There is considerable overlap in the expression of many of the homeobox genes in the Antp—Ubx clusters of genes along the anterior—posterior axis of the embryo, but the anterior border of gene expression is successively displaced towards the posterior, starting with the second gene from the 3’ end of the cluster and progressing toward the gene at the 5’ end of the cluster (8, 9). Thus, different combinations of homeobox genes are expressed in different regions along the anterior—posterior axis of the embryo (10, 11). In addition, treatment of cultured human embryonal carcinoma cells with retinoic acid results in the gradual, sequential activation of many homeobox genes in each cluster over a period of days, starting with the gene at the 3’ end of the cluster and proceeding towards the 5’ end of the cluster (6). These results suggest that the order of homeobox genes within each cluster may be involved in determining the topographic position and/or the develop- The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked ‘‘advertisement”’ in accordance with 18 U.S.C. §1734 solely to indicate this fact. 2883 mental time of initiation of expression of these homeobox genes in the embryo. In this report, the nucleotide sequences of the homeobox regions of Hox-/.1] and Hox-4.9 genes are described.* METHODS AND MATERIALS Clones of PCR-Amplified Mouse Genomic DNA. The ho- meobox regions of many mouse homeobox genes were am- plified by PCR. Multiple species of oligodeoxynucleotides that correspond to highly conserved sequences in the ho- meoboxes of many mouse homeobox genes were synthesized with the aid of an Applied Biosystems DNA synthesizer model 380B and purified by OPC (Applied Biosystems) column chromatography. The (+)-oligodeoxynucleotide primers consisted of 64 species of oligodeoxynucleotides, each 28 nucleotide residues long, with a Sac I site near the 5’ terminus; the (—)-oligonucleotide PCR primers consisted of 48 species of oligodeoxynucleotides, 28 nucleotide residues long, with an EcoRI site near the 5’ terminus. (See Fig. 2 for the nucleotide sequences of the primers.) A programmable DNA thermal cycler (Perkin-Elmer/ Cetus) was used for the amplification of DNA. A typical 25-1 reaction mixture contained 1 yg of BALB/c mouse liver genomic DNA; 50 mM KCI; 10 mM Tris-HCI (pH 8.3); 1.5 mM MgCl; 0.01% gelatin; 15.6 nM of each species of (+)-oligodeoxynucleotide primer and 20.8 nM of each species of (—)-oligodeoxynucleotide primer; 1.0 mM each of dATP, dCTP, dGTP, and dTTP; and 2.5 units of Tag polymerase. Reaction mixtures were covered with 50 yl of mineral oil and were incubated for 35 PCR cycles; each cycle consisted of incubation for 1 min at 94°C, 2 min at 37°C, and 3 min at 65°C. After the last cycle, the reaction mixtures were incubated for an additional 10 min at 65°C. The DNA was precipitated with ethanol, incubated with EcoRI and Sac I, and subcloned in pBluescript II KS(+) (Stratagene). RNA Probes. **P-labeled (+)-RNA probes were prepared by using a modification of the Stratagene RNA transcription protocol. A typical 10-yl reaction mixture contained 40 mM Tris-HCI (pH 8.0), 8 mM MgCl,, 2 mM spermidine, 50 mM NaCl, 10 mM dithiothreitol, 10 uM [a-**PJUTP (800 Ci/ mmol; 1 Ci = 37 GBq), 0.5 mM ATP, 0.5 mM CTP, 0.5 mM GTP, 194 fmol of linear proteinase K-treated DNA, 10 units of RNase inhibitor, and 4 units of phage T7 RNA polymerase. Reaction mixtures were incubated at 37°C for 30 min; then RNA was precipitated with sodium acetate and ethanol. Clones of Unamplified Homeobox Genomic DNA. A mouse genomic DNA library in AGEM-11 (Promega) was screened for some of the homeobox genes that had been found with PCR-amplified DNA. E. coli KW251 cells (2 x 10°) infected *To whom reprint requests should be addressed at: Laboratory of Biochemical Genetics, National Heart, Lung, and Blood Institute, Building 36, Room 1C-06, 9000 Rockville Pike, Bethesda, MD 20892. *The sequences for Hox-/./1, Hox-4.9, Hox-4.3, and Hox-4.2 have been deposited in the GenBank data base (accession nos. M87801— M87804, respectively). 2884 Biochemistry: Nazarali et al. Proc. Natl. Acad. Sci. USA 89 (1992) MOUSE 13 12 11 10 9 8 7 6 5 4 3 2 1 CHROMOSOME m7 ji-t0} 2 |] Hox-1 6 ij P101 Pil P8B- P1160 439 (12) 433 (2) 23} |} fe} ee ee} ee} ee Ha] toe P155 P31 (69) yoeeny : ; Hox-3 15 3G 3F P8A (3) Hox X P30 a4 (3) rg} Hox-4 2 41 P9IB P24 P167 P125 (2) 49 (4) 240 26 (6) rat Fic. 1. Mouse homeobox gene clusters. Genes that code for proteins with similar homeodomain amino acid sequences that are thought to be copies of the same ancestral gene are aligned vertically. The numbers 1 through 13 (from right to left) at the top of the figure represent the vertical sets of related genes in different clusters. Clones of homeobox genes described in this report are shown with shaded backgrounds. Homeobox nucleotide sequences shown in this report are indicated by boxes drawn with thick solid or dashed lines. The chromosomal location of Hox-X is uncertain. The boxes drawn with thin dotted lines indicate that no mouse homeobox sequence has been reported; the names of the human HOX genes (6, 13) are shown beneath these boxes. Some DNA clones described in this report also are shown beneath the appropriate box; the number of DNA clones found is enclosed within parentheses. DNA clones that begin with P are clones of PCR-amplified mouse genomic DNA; clones prefaced by A are clones of mouse genomic DNA that were not amplified prior to cloning. with 25,000 recombinant phages were plated on each 150-mm States Biochemical) with universal phage M13 primers or Petri dish. Phage DNA adsorbed to replica nytran filters specific primers by the dideoxynucleotide chain-termination (GeneScreenPlus, DuPont) was hybridized overnight at 60°C method (12), and also by using an automated DNA sequencer with **P-labeled RNA (35 fmol/ml, 2 x 10° cpm/ml) synthe- (Applied Biosystems Model 373A) with Tag DNA polymer- sized from cloned PCR-amplified DNA. The hybridization ase at 70°C, dITP instead of dGTP, dideoxynucleotides or buffer contained 1 M NaCl, 50 mM Tris-HCI (pH 7.6), 1% primers labeled with fluorescent dyes, double- or single- SDS, and 100 ug of yeast tRNA per ml. Filters were washed stranded DNA preparations, and other components accord- twice with 2x SSC (300 mM sodium chloride/30 mM sodium ing to the manufacturer’s instructions. citrate, pH 7.0) at room temperature for 15 min (each wash), followed by two washes in 2x SSC/1% SDS at 60°C for 60 min (each wash) and finally by one wash in 0.1 SSC at 24°C RESULTS AND DISCUSSION for 30 min. Filters then were exposed to x-ray film in Clones of PCR-Amplified Homeobox DNA. Small DNA cassettes at —70°C. Recombinant phage with matching pos- fragments that correspond to part of the homeobox of many itive signals on autoradiograms of replica filters were cloned. mouse homeobox genes were amplified from mouse genomic DNA inserts were excised with Sac I and cleaved with DNA with the use of sets of primers, each set consisting of various restriction enzymes; some DNA fragments were multiple species of oligodeoxynucleotides that correspond to subcloned into pBluescript II SK(+). conserved nucleotide sequences within the homeobox (nu- DNA Sequencing. Both strands of cloned DNA fragments cleotide residues 43-68 and 142-162). The amplified DNA were sequenced manually by using Sequenase 2.0 (United was subcloned, and the chain lengths of the DNA inserts from (+) PRIMERS - 7 | (-) PRIMERS (+)5' TGGAG CTC GPA AAA SAR TTT CAT TTT AAl ACC PAA GTT TTA GCC FCT TAC TTARGCT S*¢(-) G G G c c c G ct G ce PERCENT REF. G HOMOLOGY NOS. 9 69 bs 75 39 195 120 135 141 189 162 (69-1416) Hox ‘.11 P&B SAG CTC GAG ARG GAA TTT CAC TIC ARC FAG TAC CIT TGC AGA CCC CGC FOG GTG GAA ATC GCC GCG CTG CIG GAT TTG ACC GAG AGA CAR GiG AAA GIG: TGG TTC CAG AAC COG ACG FIG 100 Hox 2.8 SAG CT GAG ARG GAG] ITE cae TIC AAT AAG TAC CT) ToC Coe] ccG) cst (Cel) ork) caGi atc ocl cere cre GA (Cre) acc CAR AGG) cA] GiT) AAA GIG} GG Tre cAG AAC COAIMCE ATG §=671 [16] Hox 4.9 PI25 GAG CTC GAG AAG GAG TTT CAT TTT AAT AAG TAC CTA ACT AGA GCC CGA CGC ATC GAG ATR GCC ARC TGT TTA CAG CTG AAT GAC ACC CAG GTC AAR ATC) 1GG TTC CAG AAC COG COC ATG 100 Hox 1.6 GAG CTGi GAG AAG GAG TT) CAR TTC) AAC] AAG TAC CTfT] ACiA|/GcA seb) coc, {rt Gifs: GAG ATH! Gcc TEC] CTA CAG CTC} AAT GAG] ACC CAG GTS) ARG) RTC) TGS TTC CAC ART] COR) COC ATO 7E [24,25] Hox 2,9 GAG cid GAG AAG GAR) TIT CAT TIC) AAC fl TAC CT i fl bec cs Fi GIG! GAG ATich ccc C| cr CTC} RAT GAA! ACK] CAG GTG RAG) ATC) TGS TTC CAG ARC COG CGC ATG 64 [16,17] Hox 2.9 GAG CTG) GAG AAG GAR) TTT CAT TTC] ARK) A TAC CTO} AL i] GEC CGS Gl | 9G Arie} Gci) i G Ci¢) RAT GAIA) ACH] CAG GTG! AAG ATC} TGG TTC CAG AAC COG CEC ATE 63 f18] Hox X P30 GAG CTC GAG AAG GAG TTT CAT TIC AAC CGC TAC CTA TGC COG CCG CGC CGO STG GAG ATG GCC AAC CTS CTG AAC CTC AGC GAG CGC CAG ATC AAG ATCHTGG TTC CAR AAC CGG COG ATG 100 Hox 1.9 GAG CT] GAG AAG GAG TTI) CAC TIC AAC CGC TAC CTA Bill coc cco CGC CGC GIG GAG AIG GCC AAC CTS CTG AAC CTC GAG CGC CAG ATC AAG ATCITGG TT cA] AAC coc Cc ATC 95 [24-26] Hox 4.1 AG cTs| GAG AAG GAG TT oR TIC AAC CGC TAR) CT] ToC CGG CCG CGC CGE GTG GAG ATG GCC AAC CTS CTG AAC CTC A GRA CGC CAG ATC AAG ATC| To te ca Ape Coll ¢ RIG 93 [27] Hox 2.7 CAG eT] GAG AAG GAG Tri] cAG TIc AAC cof) TAL MMT) Toc coG ceo CoC CGC GTK] GAG PTG Gee AA CTS CTG AAC CTC AGC GAG CGC CAG ATC AAG ATC) TOG TTC CAG) AAC Col ATG 92 [28} Hox 4,3 P24 GAG CTC GAG RAG GAA TTT CAT TTC AAK CCT TAT CTG ACC AGG AAG AGG AGA ATC GAG GIC TCC CAF ACT CTG GCC CTC ACG GAS ASA CAG GTA AAA ATC! T66 TiC CAR ARC CGG CGC ATG 100 Hox 4,3 Gac (r@ cA aac cAa Trl ct 11] AAC CCT TAT CTG ACC AGG AAG AGS AGA ATC GAG GTC Tee CAT AGT CTG GCC CTC ACG GAS AGA CAG GTA AAA ATC! TGG Tre caG aac [Acc Rog atc 99 [t4] Hox 4.2 P16? GAG CTC GAA AAG GAG TFT CAC TIT AAC ASG TAT CTG ACC AGG CGC CGT C&G ATT GAA ATC GCF CAC ACC CTE TOT CTG TCT GAS CSC CAG ATC ARG are] TG TTC CAG AAS CGG CGC ATG 100 Hox 4,2 GAR CTK] GAA AAG GAA TIT cap) TIT RAC AGG TAT CTG ACC AGG COC CGT CSG ATT GRA ATC GCF CAC ACC CTG TOT CTG (QCT GAG eSc CAG ATC ARG ATC) 166 TFC CAG ARC CG [ok] ATG 99 (15) Hox 2.9 P31 GAG CIC GAR AAG GRG TIT CAT TTT SAIC AAA TAC CTG AGC COT GCC COG ASG GIG GAG ATC GCC GCC ACC CTO GAG CTC : 100 kox 2.9 GAG CTK GA AAG GAA: TTT CAT TTR) AAC AAA TAC CTG AGC COT GCC CGG ASG GIG GAG ATC GCC GCC ACC CTG GAG CIC tog (16,17) kox 2.9 GRG CTIG) GAG RAG GAA! TIT cpT TT] AAC AAA TAC CTG AGC CGT GCC SGG AGG CTG GAG ATC Gc (cc ace CiG CAG CTC | 96 (18) Fic. 2. The nucleotide sequences of six homeobox DNA clones, obtained by PCR amplification of mouse genomic DNA, are shown and are compared with the sequences of the most closely related mouse homeobox genes. The numbers at the top correspond to homeobox nucleotide residues. The sequences of oligodeoxynucleotide primers for PCR amplification of DNA are shown at the top. The asterisks above the primers indicate that only one nucleotide residue is present at the position indicated; thus oligodeoxynucleotides will base-pair correctly if the codon sequence in DNA is complimentary to that of the oligodeoxynucleotide but not if the DNA contains other synonym codons for the same amino acid. The symbol t indicates that (—)-oligodeoxynucleotide primers do not contain A at this position; hence, correct base pairs can form if the DNA contains five of the six arginine codons but not if the DNA contains the arginine codon CGT. Biochemistry: Nazarali et al. —_- —__——_ (+) PRIMERS (-)PRIMERS PEACENT CO=HECTX TI CO=HETTE 2) (O-HELTX 3} HOMOLOGY 15 21 24 28 38 42. 47 52 54 AA BASE Hox 1.11 P8B ELEKEFH FNKYLC RPRRUEIAALL OLT ERQUKUWFONR RM 100 100 Hox 2.8 ELEKEFH FNKYLC APARVEIAALL OLT ERQUKUWFQNR AN 100 7 Hox 4,9 P12S ELEKEFH FNKYLT RARRIEJANCL OLN OTQUKIFQNR RN 109 1900 Hox 1.6 ELEKEFH FNKYLT RARRMEIARGL OLN EITQUKIFONR Bn 83 7 Hox 2.9 ELEKEFH FNKYCS) RARRME | AAITIL Ee TQUK IWFQNR RM 75 64 Hox 2.9 ELEKEFH FNKYLIS) RARRMEAPIIL [ELN EJTQUKIIWFONR AN 75 63 Hox % P30 ELEKEFH FNRYLC APRRUEMANLL NLS ERQIKIMFQNA BN 100 100 Hox 2.7 ELEKEFH FNRYLC RPRAVENANLL NLS ERQIKHWFOQNA RN 100 92 Hox 1.5 ELEKEFH FNRYL(T] RPRAYENANLL ne ERQIKIRFQNA RM 92 95 Hox 4.1 ELEKEFH FNAL] APRAVEMANLL NLIT] EAQIKIMFQNA AN 8393 Hox 4,3-P24 ELEKEFH FMPYLT RKARIEUSHTL ALT ERQUKIIWFQNR RM 400 100 Hox 4,3 ELEKEFR] FNPYLT RKARIEUSHEL ALT ERQUK!WFONR Rt 96 99 Hox 4.2-P167 ELEKEFH FNRVLT ARAATETAHTL CLS ERQIKINFQNA AN 100 160 Hox 4.2 ELEKEFH FNAYLT RARRIEIAHTL CLP] ERQIKIWEONR AN 96 99 Hox 2,9-P31 ELEKEFH FNKYLS RARRVEIAATL EL 100 100 Hox 2.9 ELEKEFH FNKYLS RARRVEIRATL ELN ETQUKIMFQNR AN log 100 Hox 2.9 ELEKEFH FNKYLS RAARUEIAPITL ELN ETOUKIMFONR AN 96 96 Fic. 3, The amino acid sequences deduced from the nucleotide sequences of PCR-amplified cloned mouse genomic DNA (Fig. 2) are shown and are compared with the most closely related mouse homeodomain amino acid sequences. The percent homology be- tween related amino acid sequences for the central region between the PCR primers (amino acid residues 24-47) and the percent homology between nucleotide sequences shown in Fig. 2 (nucleotide residues 69-141) are shown. The numbers at the top refer to homeodomain amino acid residues. The positions of a-helices 1-3 in the Antennapedia homeodomain (1) of Drosophila also are shown. 93 clones were determined. The nucleotide sequences of some of the DNA inserts also were determined. Of 93 clones examined, 85 were identified as homeobox genes. Two clones, P8 and P91 with different DNA inserts, consist of two homeobox DNA fragments amplified from separate genes that were joined by ligation and cloned. The DNA clones found correspond to 13 mouse homeobox genes as shown in Fig. 1. The nucleotide sequences of six amplified homeobox ge- nomic DNA clones are shown in Fig. 2 and are compared with the sequences of the most closely related mouse homeobox genes. The percent homology also is shown between the central region of each cloned DNA insert without the primer sequences (homeobox residues 69-141) and the correspond- ing sequences of the most closely related mouse homeobox genes. The percent homology was calculated for only the central region of the cloned DNA between the primers because oligodeoxynucleotides that hybridize to DNA with some incorrectly paired bases can serve as PCR primers. Proc. Natl. Acad. Sci. USA 89 (1992) 2885 Clone Hox-/.1] P8B corresponds to a novel mouse ho- meobox gene of the Drosophila proboscipedia homeobox class. The nucleotide sequence of clone Hox-/.// P8B is most closely related to that of the mouse Hox-2.8 gene, but only 71% of the nucleotide residues compared are identical. Clone Hox-4.9 P125 is a member of the labial class of homeobox genes, probably the first gene in the Hox-4 cluster of homeobox genes (see Fig. 1). The amino acid sequence of the Hox-4.9 homeodomain was reported recently (11), but the nucleotide sequence has not been described. The nucleotide sequence of Hox-4.9 P125 differs considerably from the sequences of other labial class mouse homeobox genes, Hox-1.6 (71% homology) and Hox-2.9 (63-64% homology). Clone Hox-X P30 may correspond to an unreported murine homeobox gene, probably the third gene in the Hox-3 cluster of homeobox genes (Fig. 1). The nucleotide sequence of the Hox-X P30 homeobox clone is most closely related to that of Hox-1.5 (95% homology). The nucleotide sequence of clone Hox-4.3 P24 differs from that of Hox-4.3 (14) by only 1 of the 73 residues in the central region between the (+)- and (—)-oligonucleotide homeobox primers, which suggests that clone Hox-4.3 P24 corresponds to the Hox-4.3 gene. The nucleotide sequence of clone Hox-4.2 P167 differs from the sequence reported for Hox-4.2 (15) by only 1 nucleotide residue. Of the 85 homeobox clones obtained, 69 were found to be only 78 nucleotide residues long because of the presence of a Sac I site within the homeobox. Sequence analysis of 9 of the 69 clones revealed one kind of DNA insert with Sac I sites at both the S’ and 3’ termini. The nucleotide sequence of a representative clone, Hox-2.9 P31, shown in Fig. 2, is iden- tical to the sequence of the Hox-2.9 gene reported by Rubock et al. (16) and Frohman ef al. (17) and differs by only 2 nucleotide residues from the Hox-2.9 sequence reported by Murphy and Hill (18). The amino acid sequences deduced from the nucleotide sequences of PCR-amplified and cloned DNA are shown in Fig. 3 and are compared with the most closely related mouse homeodomain amino acid sequences. The amino acid se- quence of clone Hox-/./] P8B is the same as that of the Hox-2.8 gene, but the nucleotide sequences differ markedly (71% homology). Clone Hox-4.9 P125 is a labial class ho- meobox gene with an amino acid sequence identical to the sequence recently reported for the Hox-4.9 gene (11). Hox- 4.9 P125 differs from the other labial class homeobox genes, Hox-1.6 and Hox-2.9, in both nucleotide and amino acid sequences. The amino acid sequence of the homeodomain of clone Hox-4.3 P24 (residues 24-47) differs from that of the Hox-4,3 gene by 1 amino acid residue. Since the correspond- ing nucleotide sequences differ by only 1 residue, Hox-4.3 Hox 1,11 433 ~89 GACCOTTCCACCTTCARCTGTATGIGIGICICTIGTIGGITICCCTTTCTGCAGAR -34 -tt -! Fl Hox 1 Hox 1 Hox 2.8 2 W433 S LE | A OG $ 6 6G ERRLRTA Y Vt 433 TCCCTGGARATAGC TGATGGCAGCGGCGGGGGATCCAGGCGTCTGAGARCCGCGTACACCARCACTCAGCTT TQe 13 39 GG--CC-G-T-GC-A--AT -~G-----A-C--C)---C -CA-A---C-C--G--C-----------G--A--6 Hox 2.8 6 PGLPECG-sg§-/,f------ - ~- - LoEL eK —€F HPF NXKY LECRPRRUY A AL 37 Fic. 4. The nucleotide sequence and de- Hox 1.11 433. /TTGGAGCTGGAAAAGGAATTICATTTCRACARGTACCTTTGCAGACCCCECAGGGTGGARATCGCCGCGCTG | 111 duced amino acid sequence of the Hox-/.// Hox 2.8 [-~-------- G----- G--C--C----- To-+----- G---C-G--G~-TC-C--C--G-----T--CT-~ (clone A33) homeobox and flanking regions Ts 7 are shown and are compared with the se- LoLTERQUKUUFEONRANKH R QT 61 quences of the most closely related mouse Hox 1.11 233 |CTGGATTTGACCGEAGAGACAAGT GARAGTGIGGTTICAGRACCGGAGRATGRAGCATARGAGGCAARC 183 homeobox gene, Hox 2.8 (16). Dashes in the Hox 2.6 = [~---- cl-c--~--- f--G--G~-C----- C----- (-------- AC-C~----A--C---C----6-- nucleotide sequence or amino acid sequence SH - - represent a Hox-2.8 nucleotide or amino acid residue that is identical to the corresponding cK ENQNS EG KF KNLEODSOK EEDE 85 . : Hox 1.11 933. TGCAAGGAGRACCARARCAGCOARGSGARATTTARARACCTSGAGGACTCGGACAAAGTGGAGGARGACGAG 255 MuCleotide or residue shown for Hox-1 11. The homeobox sequence is enclosed within EEK § L FEQA 94 a box. The arrowhead represents an intron— Hox 1,11 433 GRAGAGAAGTCACTCTTTGAGCAAGCC 282 exon junction. 2886 Biochemistry: Nazarali et al. Proc. Natl. Acad. Sci. USA 89 (1992) ~10 -! -5) K L S$ E ¥ GAT S P CCTTGTCTTTATGTTGCAGGCARACTGTCCGRATATGGAGCCACARGCCCT + a4) P A4l Hox Hox Hox iP SR s A | ICCCAGTGCCATCCGCACARRT RT N R T N FS TK Q LT ELE K EF HFN K YLT RAR A 3) Hox 4.9 41 | TTCAGCACCAAGCARC TGACAGAGCTAGAGARAGAGTITCATTTCAATAAGTACCTARCTAGAGCCCGACEC | 93 Hox 4.9 FS TK Q LT ELE K EF HEN KYL TR ARR | ETANCLQOLNDTOUKIUN FE ONR AO kK I 55 Fic. 5. The nucleotide sequence and de- Hox 4.9 A41 | ATCGAGATAGCCAACTETTTACAGCTORATGACACCCAGGTCAARATCTGGTICCAGAACCGTAGGATGARS 1165 duced amino acid sequence of the homeobox Hox 4.9 | EtantectQbLbNoTQukKInWrFQONRAN EK region of the Hox-4.9 gene (clone A41) are shown. The amino acid sequence of Hox-4.9 Qk kK REREGLLATAAS YA STK L PRS 79 A4lis compared to the recently reported (11) Hox 4.9 241 | CAGRAGAAGAGGGARCGRGRGGSGCTICTGGCCACAGCTGCCTCTGIGGCCTCGATTARGCTICCCCGGTCA 237 : . Hox 4.9 OK KARE Hox-4.9 amino acid sequence. The dashes ET S$ P | kK $ GRNL GS P §$ Q A Q Hox 4.9 41 P24 DNA probably corresponds to the Hox-4.3 gene. Simi- larly, the nucleotide and deduced amino acid sequences of clone Hox-4.2 P167 differ from those of the Hox-4.2 gene by only 1 nucleotide residue and 1 amino acid residue, which suggests that clone Hox-4.2 P167 corresponds to Hox4.2. The nucleotide and amino acid sequences of clone Hox-2.9 P31 are the same as those of the Hox-2.9 gene (16, 17). Clones of PCR-amplified DNA also were obtained and sequenced that correspond to Hox-!.1, Hox-1.2, Hox-!.3, Hox-!.6, Hox-2.1, Hox-3.4, and Hox-4.4 (data not shown). Homeobox Genomic DNA Clones in AGEM-11. A mouse genomic DNA library in AGEM-11 with 15-kilobase (kb) DNA inserts (average size) that were not amplified prior to cloning was screened for homeobox genes with a mixture of 32p_labeled RNA probes synthesized from PCR-amplified, cloned DNA that correspond to Hox-1.11, Hox-4.9, Hox4.3, Hox-4.2, Hox-3.4, Hox-2.9, Hox-1.2, and Hox-l.1. Two million recombinants were screened, and 29 clones of ho- meobox genomic DNA were obtained. Restriction site anal- ysis revealed seven kinds of DNA inserts that were shown by nucleotide sequence analysis to correspond to seven ho- meobox genes (Hox-!.11, Hox-4.9, Hox-4.4, Hox-4.3, Hox- 4.2, Hox-3.4, and Hox-!./). Hox-1.11. Two genomic DNA clones, A16 and A33, were found that correspond to Hox-/.//. The nucleotide sequence and deduced amino acid sequence of the Hox-1.// A33 homeobox and flanking regions are shown in Fig. 4 and are compared with homeobox sequences of the most closely related homeobox gene, Hox-2.8. The nucleotide sequence of Hox-!.11 433. DNA, which was not amplified prior to cloning, was identical to that found with Hox-/.// P8B cloned from PCR-amplified mouse genomic DNA (nucleotide residues 69-141) shown in Fig. 2. Although only 74% of the Hox-/.11 A33 and Hox-2.8 homeobox nucleotide residues are the same, the amino acid sequences of the homeodomains of Hox-1.11 and Hox-2.8 are identical. However, 9 of the 11 Hox-1.11 A33 deduced amino acid residues that precede the homeodomain and 1 amino acid residue after the homeodomain differ from those of Hox-2.8. These results show that Hox-/.//] and — P y -27 TGTIGTTTTTRATCRGCA $ GAARCAAGTCCCATCAARTCTGGCCGGAATCTAGGARGCCCTICTCAGGCTCAAGAGCETICCTGA represent Hox-4.9 amino acid residues that are identical to those of Hox-4.9 A41. The homeobox region is enclosed within a box. * 100 303 Hox-2.8 are separate genes. The intron—-exon junction shown in Fig. 4 at nucleotide residue —36 was identified by com- paring the nucleotide sequences of Hox-/.// genomic DNA and cDNA, which will be described elsewhere (D. Tan, J. Ferrante, A.N., C. Kozak, V. Guo, and M.N., unpublished data); elsewhere we also show that the Hox-/./1 gene resides in mouse chromosome 6, which suggests that the Hox-/.// gene is a member of the Hox-/ cluster of genes. The amino acid sequences of the Hox-1.11 and Hox-2.8 homeodomains are identical, which suggests that both spe- cies of homeobox proteins may bind to the same or similar nucleotide sequences in DNA. Another pair of homeobox proteins, Hox-1.3 (19, 20) and Hox-2.1 (21-23) also have identical homeodomains. Hox-4.9 41, The nucleotide sequence and deduced amino acid sequence of the homeobox and flanking regions of clone Hox-4.9 441 mouse genomic DNA are shown in Fig. 5. The nucleotide sequence of Hox-4.9 has not been reported pre- viously; however, the deduced amino acid sequence (A41) is the same as the recently reported Hox-4.9 homeodomain amino acid sequence (11), which suggests that A41 DNA is a Hox-4.9 genomic DNA clone. The nucleotide sequence of Hox-4.9 441 mouse genomic DNA (cloned from DNA that was not amplified) is identical to that found with PCR- amplified mouse genomic DNA (clone Hox-4.9 P125 ho- meobox nucleotide residues 69-141 shown in Fig. 2). Hox-4.3 440. The nucleotide sequence and deduced amino acid sequence of the homeobox and surrounding regions of Hox-4.3 440 DNA are shown in Fig. 6 and are compared with the nucleotide and amino acid sequences of the most closely related homeobox gene, Hox-4.3 (14). The nucleotide se- quence of the A40 mouse genomic DNA fragment, cloned from DNA that was not amplified, was found to be identical to that of Hox-4,3 P24, derived from PCR-amplified mouse genomic DNA shown in Fig. 2 (homeobox nucleotide resi- dues 69-141). Only 4 of the 209 A40 nucleotide residues compared differ from those reported for Hox-4.3 (14); how- ever, three of the homeodomain amino acid residues differ from those reported for Hox-4.3. The high nucleotide se- Fic. 6. The nucleotide sequence and de- Hox 4.3 A40 Hox 4.3 a -1 +t Hox 4.3440 A P GIR RR GR QT ¥ S$ R F Q Hox 4.3 440 GCTCCTGGTIAGACGGAGAGGAAGACARACCTACAGTCGCT TCCRARCCCTAGAGT TGGRARAGGAATICCTT Hox 4.30000 wanna nee qas ncaa nrc nr nnn ncn cr sc rs secscnscne Hox 4.3 - = = pe te eR FON PY LC TR K RR E VS H 440 | TTTARCCCTTATC TGACCAGGAAGAGGAGRATCGAGGTCTCCCAT} Hox Hox a duced amino acid sequence of the homeobox and surrounding regions of Hox-4.3 A40 mouse genomic DNA are shown and are compared with the Hox-4.3 nucleotide and amino acid sequences reported (14). Only nucleotide and amino acid residues of Hox 4.3 that differ from those of clone A40 are shown; residues that are the same are indi- cated by dashes. The homeobox region is 2 63 45 135 K | WF QN RRA K RK K E NN K OD ARARTCTGGTTCCAGAACAGGAGART GAART GGRARAAGGAGAAL Hox A40 Hox K F P enclosed within a large box. Nucleotide and amino acid residues that differ are enclosed within small boxes. The arrowhead repre- sents an intron—exon junction reported for Hox-4.3 (14). 66 198 Biochemistry: Nazarali er al. PERCENT HOMOLOGY 1 10 20 30 40 30 60 AA BASE Hox 1.11 A433 SRRLRTAYTNTQLLELEKEFHF NK YLCRPRRUE | ARLLOLTERQUKUMFONRRMKHKRQT 100 100 Hox 2.8 100 74 ROX TK HUNAN 100 90 Hox 4,9 A4t Hox 4.9 HOX 4G HUNAN Hox 1.6 PSRIATNFSTKQL TELEKEFHFNKYLTRARR IE | ANCLOLNOTOQUK I HFONRRNKQKKRE 100 100 100 oF BB of? Hox X P30 100 160 Hox 2.0 a eee eee eee 100 92 Hox 4.3 440 AARGROTYSRFQTLELEKEFLFNPYLTRKRRIEUSHTLALTERQUKIWFQNRAMKWKKEN 100 100 Hox $3000 wee eee ene ene RU---------------------- §----------------------- 95 98 Hox 4.2 46 PKRSRTAYTROQULELEKEFHFNRVLTRARRIEIAHTLCLSERQIKIUFONRRHKUKKOR 100 100 Hox 4.2 ween eee n eee n eee -- - P- — 98 99 Fic.7. The amino acid sequences of the homeodomains encoded by five mouse homeobox genes deduced from the nucleotide se- quences of cloned DNA are shown and are compared with the amino acid sequences of the most closely related mouse or human homeo- domains. The percent homology between the amino acid sequences of related homeodomains and between the corresponding homeobox nucleotide sequences also are shown. Only differences in amino acid sequence are shown. Dashes represent identical amino acid residues. quence homology between A40 and Hox-4.3 suggests that clone 440 DNA corresponds to the Hox-4.3 gene. However, the possibility that clone A40 DNA corresponds to a novel homeobox gene, Hox-1./2, the eighth gene in the Hox-] cluster of homeobox genes shown in Fig. 1, is not ruled out. A summary of results is shown in Fig. 7. Homeodomain amino acid sequences deduced from the nucleotide se- quences of cloned mouse genomic DNA are shown and are compared with the most closely related sequences reported for mouse or human homeobox proteins. The amino acid and nucleotide sequence homologies also are shown. The amino acid sequence of the Hox-1.11 homeodomain is identical to that of Hox-2.8; however, 9 of the 11 amino acid residues before the homeodomain and 1 amino acid residue after the homeodomain differ from those of Hox-2.8. Many differ- ences also were observed in the nucleotide sequences of the Hox-1.11 and Hox-2.8 homeobox regions. The mouse Hox- 1.11 homeodomain is the equivalent of the recently reported human HOX-1K homeodomain (6). The amino acid sequence of the Hox-4.9 A41 homeodomain is identical to the recently reported Hox-4.9 homeodomain amino acid sequence (11). The mouse Hox-4.9 homeodomain is the equivalent of the human HOX-4G homeodomain (13). Six of the 73 Hox-X P30 nucleotide residues differ from the corresponding sequence of the most closely related ho- meobox gene, Hox-2.7; however, the deduced amino acid sequence of Hox-X P30 is the same as that of Hox-2.7. The cumulative error due to misincorporation of bases during DNA amplification was estimated by comparing the DNA sequences of 10 clones of mouse genomic DNA subjected to 35 cycles of DNA amplification (730 nucleotide residues compared) with mouse genomic DNA sequences that were not amplified prior to cloning, which correspond to Hox-/./1, 4.9, -1.1, -3.4, 4.2, 4.3, and -4.4. No DNA amplification errors were detected. Comparison of the nucleotide se- quences of 13 additional clones of PCR-amplified DNA that correspond to Hox-1.2, -1.3, -1.6, -2.1, and -2.9 with pub- lished nucleotide sequences revealed only 2 residues that differ (906 nucleotide residues compared). Hence, no more than 2 misincorporated nucleotide residues were found per 1636 residues sequenced in DNA molecules that were cloned after 35 cycles of DNA amplification; that is, the error due to PCR amplification of DNA is no more than 1 residue per 818 nucleotide residues of cloned DNA. Thus, we think it more Proc. Natl. Acad. Sci. USA 89 (1992) 2887 likely that Hox-X P30 DNA corresponds to a homeobox gene that has not been reported previously, such as the third gene of the Hox-3 cluster of homeobox genes shown in Fig. 1, rather than a DNA clone with an erroneous sequence due to misincorporation of 6 nucleotide residues during DNA am- plification. However, the nucleotide sequence of a Hox-X mouse genomic DNA clone that was not amplified prior to cloning is needed to confirm the nucleotide sequence of Hox-X P30. Also, as shown in Fig. 7, the deduced amino acid se- quences of the Hox-4.3 A40 and Hox-4.2 A6 homeodomains were found to differ from the reported sequences (14, 15) by 3 and 1 amino acid residues, respectively. 1. Gehring, W. J., Miiller, M., Affolter, M., Percival-Smith, A., Billeter, M., Qian, Y. Q., Otting, G. & Wiithrich, K. (1990) Trends Genet. 6, 323-329. 2. Dessain, S. & McGinnis, W. (1991) Curr. Opin. Genet. Dev. 1, 275-282. Laughon, A. (1991) Biochemistry 30, 11357-11367. Scott, M. P., Tamkun, J. W. & Hartzell, G. W., III (1989) Biochim. Biophys. Acta Rev. Cancer 989, 25-48. Kessel, M. & Gruss, P. (1990) Science 249, 374-379. Simeone, A., Acampora, D., Nigro, V., Faiella, A., D’Es- posito, M., Stornaiuolo, A., Mavilio, F. & Boncinelli, E. (1991) Mech. Dey. 33, 215-228. 7. Kappen, C., Schughart, K. & Ruddle, F. H. (1989) Proc. Natt. Acad. Sci. USA 86, 5459-5463. 8. Graham, A., Papalopulu, N. & Krumlauf, R. (1989) Cell 57, 367-378. 9. Duboule, D. & Dolle, P. (1989) EMBO J. 8, 1497-1505. 10. Lewis, E. B. (1978) Nature (London) 276, 565-570. 11. Hunt, P., Gulisano, M., Cook, M., Sham, M.-F., Faiella, A., Wilkinson, D., Boncinelli, E. & Krumlauf, R. (1991) Nature (London) 353, 861-864. 12. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl. Acad. Sci. USA 74, 5463-5467. 13. Acampora, D., D’Esposito, M., Faiella, A., Pannese, M., Migliaccio, E., Morelli, F., Stornaiuolo, A., Nigro, V., Sime- one, A. & Boncinelli, E. (1989) Nucleic Acids Res. 17, 10385- 10402. 14. Izpisua-Belmonte, J. C., Dolle, P., Renucci, A., Zappavigna, V., Falkenstein, H. & Duboule, D. (1990) Development 110, 733-745. 15, Featherstone, M. S., Baron, A., Gaunt, S. J., Mattei, M.-G. & Duboule, D. (1988) Prac. Natl. Acad. Sci. USA 85, 4760-4764. 16. Rubock, M. J., Larin, Z., Cook, M., Papalopulu, N., Krum- lauf, R. & Lehrach, H. (1990) Proc. Natl. Acad. Sci. USA 87, 4751-4755. 17. Frohman, M. A., Boyle, M. & Martin, G. R. (1990) Develop- ment 110, 589-607. 18. Murphy, P. & Hill, R. E. (1991) Development 111, 61-74. 19. Odenwald, W. F., Taylor, C. F., Palmer-Hill, F. J., Friedrich, V., Jr., Tani, M. & Lazzarini, R. A. (1987) Genes Dev. 1, 482-496. 20. Fibi, M., Zink, B., Kessel, M., Colberg-Poley, A. M., Labeit, S., Lehrach, H. & Gruss, P. (1988) Development 102, 349-359. 21. Hauser, C. A., Joyner, A. L., Klein, R. D., Learned, T. K., Martin, G. R. & Tjian, R. (1985) Cell 43, 19-28. 22. Jackson, I. J., Schofield, P. & Hogan, B. (1985) Nature (Lon- don) 317, 745-748. 23. Krumlauf, R., Holland, P. W. H., McVey, J. H. & Hogan, B. L. M. (1987) Development 99, 603-617. 24. Baron, A., Featherstone, M. S., Hill, R. E., Hall, A., Galliot, B. & Duboule, D. (1987) EMBO J. 6, 2977-2986. 25. LaRosa, G. J. & Gudas, L. J. (1988) Mol. Cell. Biol. 8, 3906— 3917. 26. McGinnis, W., Hart, C. P., Gehring, W. J. & Ruddle, F. H. (1984) Cell 38, 675-680. 27. Lonai, P., Arman, E., Czosnek, H., Ruddle, F. H. & Blatt, C. (1987) DNA 6, 409-418. 28. Graham, A., Papalopulu, N., Lorimer, J., McVey, J. H., Tuddenham, E. G. D. & Krumlauf, R. (1988) Genes Dev. 2, 1424-1438. bw Hin