Molecular Characterization, cDNA Cloning and Phylogenetic Analysis of Cecropin Gene Isolated from Bacterial-challenged Cotton Leafworm, Spodoptera littoralis

Experiments were designed and conducted to isolate and characterize the full length cDNA of cecropin gene from the cotton leafworm, Spodoptera littoralis. In addition, sequence and phylogenetic analyses of this gene were studied. Using differential display technique, a typical cationic insect cecropin gene (SpliCec) was isolated from bacterial-challenged S. littoralis haemolymph. Five differentially displayed bands were sequenced. Based on sequence analyses of the data, specific primers for the full length of cecropin were designed and successfully amplified 388 bp amplicon. The integration of the results revealed that the 388 bp-PCR product has one open reading frame (orf) of 186 bp long, including both start codon (AUG) and stop codon (UGA). The deduced amino acid sequence of SpliCec indicated that the full length prepropeptide consists of a 22-residue signal peptide, a dipeptide prosequence and a 38-residue mature peptide. The SpliCec sequence created significant similarity with many insect cecropin sequences, especially those of M. sexta and A. convolvuli.


INTRODUCTION
Recently, a large number of antimicrobial peptides (AMPs) have been identified and characterized from a variety of insects, including the Lepidoptera, Hymenoptera, Diptera, and Coleoptera.It has been generally accepted that insect AMPs suppress bacterial infections, and have minimal toxic and allergic side effects in host cells (Bulet et al., 1999).These AMPs are classified on the basis of their amino acid sequences and secondary structures into cecropins, defensins, and peptides with an over representation of proline and/or glycine residues, e.g., lebocins and moricins (Bulet et al., 1999).Cecropin was initially isolated from bacterially challenged Hyalophora cecropia pupa (Hultmark et al., 1980) from which the term cecropin was derived.Cecropins constitute a main part of the cell-free immunity of insects.They have been given various names; bactericidin, sarcotoxin, etc.All of these peptides are structurally related (Boman et al., 1991).The active mature cecropins without cysteine residues are typically 35 to 39 amino acids in length, and form two amphipathic α-helices connected by a hinge region (Saito et al., 2005).They evidenced broad-spectrum activity against a broad range of bacteria (Vizioli et al., 2000), as well as certain fungal (Andra et al., 2001) and metazoan parasites (DeLucca et al., 1997).They lyse bacterial cell membranes; they also inhibit proline uptake and cause leaky membranes (Wade et al., 1992).
This study reported on cDNA cloning and characterization of a cationic cecropin gene encoding for antimicrobial peptide, SpliCec, from haemolymph of the Egyptian cotton leafworm, S. littoralis.In addition, sequence and phylogenetic analyses of the SpliCec gene were also reported.

Insects and bacterial strains
Laboratory colony of the cotton leafworm, S. littoralis, used for our experiments was originally collected from okra field at Giza, Egypt in 1995 and maintained in the insectary of the Department of Entomology, Faculty of Science, Cairo University according to the technique described by Levinson and Navon (1969) and kept at 25 °C, 65-70% RH and 14L: 10D photoperiod cycle.
Three gram (+) bacteria: Staphylococcus aureus, Bacillus subtilis and Streptococcus sanguinis and three gram (-) bacterial strains: Escherichia coli (D 31 ), Proteus vulgaris and Klebsiella pneumoniae, were obtained from the Unit for Genetic Engineering and Agricultural Biotechnology, Faculty of Agriculture, Ain Shams University and used for insect immunization.Bacteria were grown in a peptone medium (1%), supplemented with 1% meat extract and 0.5% NaCl, at 37 °C in a rotary shaker.

Bacterial challenge and haemolymph collection
Bacterial challenge was performed as described by Seufi et al. (2011).Haemolymph was collected at 24, 48 and 72 h post-infection (p.i.) at 4 °C (500 µl/ each), containing few crystals of phenylthiourea to prevent melanization.Aliquots of 100 μl each were stored at -80 °C until investigated.Control group was injected with bacteria-free saline solution.

Differential display using primers corresponding to cecropin-like sequences (DD-PCR)
Total RNA of the insect haemolymph (300-500 µl) was extracted using RNeasy kit according to the manufacturer's instructions (Qiagen, Germany).Residual genomic DNA was removed from RNA using RNase-free DNase (Ambion, Germany).RNA integrity and purity were justified by examining 260/280 and 260/230 ratios for protein and solvent contamination.Reverse transcription reaction was carried out according to the ABgene protocol (ABgene, Germany).The cDNA was aliquoted and stored at -80 °C until processed.
A total reaction volume of 25 μl containing 2.5 μl PCR buffer, 1.5 mM MgCl 2 , 200 μM dNTPs, 1 U Taq DNA polymerase (AmpliTaq, Perkin-Elmer), 2.5 μl of 10 pmol primer (Table 1) and 2.5 μl of each cDNA was cycled in a DNA thermal cycler (Eppendorf, Mastercycler 384, Germany).The amplification program was one cycle at 94 °C for 5 min (hot start), followed by 40 cycles at 94 °C for 1 min, 40 °C for 1 min and 72 °C for 1 min.The reaction was then incubated at 72 °C for 10 min for final extension.PCR product was visualized on 2 % agarose gel and photographed using gel documentation system.For DNA contamination assessment, a no-reverse transcription control reaction was performed.

Full length cDNA cloning and sequence analysis
Based on the sequence and alignment data, specific primers for cecropin-related sequences were designed and tried for reverse transcription polymerase chain reaction Molecular Characterization, cDNA Cloning and Phylogenetic Analysis of Cecropin Gene 67 (RT-PCR).RT-PCR reaction was performed as previously described in this section regarding to the optimum annealing temperature (T a ) for each specific primer set.Positive PCR products were visualized and eluted from the gel using GenClean Kit (Invitrogen Corporation, San Diego, CA, USA) following the manufacturer's instructions.The purified PCR products were cloned into PCR-TOPO vector with TOPO TA cloning kit (Invitrogen, USA) following the manufacturer's instructions.Ligation mix was used to transform competent E. coli strain TOPO 10 provided with the cloning kit.White colonies were screened using PCR as described earlier in this section.Two positive clones of SpliCec fragment were selected and sequenced (to exclude PCR errors certainly) using the specific forward and reverse primers.In addition to the above mentioned analyses, ExPasy Proteomics Server (http://expasy.org/tools)was used to calculate physico-chemical parameters of the translated peptide (ProtParam tool).Furthermore, primary and secondary structure analyses, post-translational modifications and topology predictions were investigated using SignalP, NetGlycate, NetPhos, NetPhosK, NetSurfP, SUSI and TMpred tools.

Phylogenetic analyses
Phylogenetic analyses of the nucleotide sequence and its deduced amino acids (a.a.) were done using Mega4 software.Poorly aligned positions and divergent sequences were eliminated manually.Multiple alignment of 57 published cecropinrelated nucleotide sequences was done before phylogenetic analyses to approximate sequence lengths manually.100% homologous sequences of the same species with different accession numbers were represented by only one sequence.The cloned DNA fragment was deposited in GenBank under the JQ408983 accession number.

Differential display using primers corresponding to well kown cecropins
Differential display technique was used to characterize the transcript variation between bacterial-challenged and control cotton leafworm, S. littoralis.Fig. (1A) shows the results of differentially displayed cDNAs of bacterial-challenged and control insects using 8 primers corresponding to previously characterized cecropins (Table 1).Haemolymph samples were differentially displayed at 24, 48 and/ or 72 h p.i. with S. aureus, S. sanguinis, E. coli, P. vulgaris and K. pneumoniae bacterial strains.It was observed that S. aureus-challenged insects died 24 h p.i., E. colichallenged insects died 48 h p.i. and S. sanguinis-challenged insects died 72 h p.i.All insects died before sampling in the case of P. vulgaris and K. pneumoniae.Alaa Eddeen M. Seufi

68
Differential display results revealed that the average number of bands per sample was 3.4 bands for each amplification reaction.The total number of bands (transcripts) resolved in 1.5 % agarose gel for both control and challenged insects was 124 (molecular size ranged from >1300 to ~80 bp).Thirty two polymorphic bands (34.2 %) were differentially displayed with 5 of the used primers.Five reproducible, infection-induced bands were cloned and sequenced using M 13 universal primer.Analyses of the results revealed that a cDNA fragment of 388 bp was amplified containing the open reading frame (orf) of a cecropin gene.

Primer design, RT-PCR amplification and cloning of cecropin gene
Specific primers for the full length cecropin gene were designed.These primers would be used later in the following reactions during this study.Nucleotide sequence of the used primers was illustrated in Table (1).PCR was optimized for each primer set and primers successfully produced positive PCR amplicons of 186 bp for the full length cecropin sequence (Fig. 1B).The full length fragment includes one open reading frame (orf) of cecropin gene (positions: 34 (AUG) -222 (UAA)).Subsequently this segment (SpliCec) was cloned into PCR-TOPO vector (Fig. 1B, lane 2) and transformed cells were tested with PCR using the same primers (Fig. 1B, lane 4).Using this screening method, clone PCR-TOPOSpliCec was tested as positive (Fig. 1B, lane 4).

Nucleotide sequence and sequence analyses
Nucleotide sequences of the SpliCec and its deduced amino acid sequence are shown in Fig. (2).A single orf that could encode a polypeptide of 62 amino acids was detected for SpliCec.One stop codon was found at the 3′ end of the sequence.The flanking region of the initiation codon ATG is ACAATGAAC, and the length of 3′ untranslated region was 166 bp before the poly (A) track (Fig. 2).Two putative polyadenylation sequences (AATAAA) were located 78 and 146 bp downstream from the stop codon (Fig. 2).The identified cecropin orf includes signal peptide (66 bp), propeptide (6 bp) and mature peptide (114 bp).The deduced SpliCec polypeptide (prepropeptide) contains 9 strongly basic, 3 strongly acidic, 38 hydrophobic and 12 polar amino acids.The calculated molecular masses of the full length and mature cecropins, were 6.58 and 3.99 KDa, respectively.The calculated isoelectric points (PIs) were 11.36 and 10.93, respectively and the net charges at pH 7.0 were 6.0 and 4.0, respectively.The cecropin prepropeptide was classified as unstable protein (Instability Index (II): 41.72) and its mature peptide was classified as stable protein (II: 32.61).Ratios of hydrophilic residues were 19 and 26% for prepropeptide and mature cecropin peptides, respectively.)).The percentage identity ranged from 90% for M. sexta to 68% for A. pernyi.Interestingly, the SpliCec putative peptide showed significant alignment with more than 82 insect-published peptides (64 cecropins, 5 cecropin-like, 2 bactericidins, a hinnavin II, a hyphancin-3E, a papiliocin, a defense protein and 7 enbocin peptides).The percentage identity ranged from 85% for the bactericidin of M. sexta (Acc# AAA29306) to 33% for CecB of D. melanogaster (Acc# BAA28722).
SignalP analysis showed that the cleavage site for the potential signal peptide of the SpliCec was predicted between 22-Ala and 23-Ala.Further, a cleavage site between 24-Pro and 25-Arg was also predicted by the alignment of the amino acid sequence of the SpliCec with that of 22 lepidopteran cecropins.Thus in the SpliCec sequence, the signal peptide makes up the first 22 residues, leaving a dipeptide Ala-Pro (AP) before the start of 38 a.a.residues mature ceropin peptide.In addition to the precise conservation of proline (P) at N-terminal domain, the aromatic tryptophan residue (W) at position 2, the lysine residues (K) at positions 5 and 9, the arginine residue (R) at position 13 and the glycine-lysine residues (GK) at C-terminal domain were also observed.The mature cecropin consists of two amphipathic α-helices interrupted by a hinge region composed of Ala-Pro dipeptide (AP).These motifs were found to be conserved at equivalent positions in the sequences of some other insect cecropins (Fig. 2).
Primary, secondary structure analyses, post-translational modifications and topology predictions revealed that amino acid sequence of the putative SpliCec peptide has two potential glycated lysine residues (positions: 29 and 62) and two kinase specific phosphorylation sites (Ser: 1 at position 44, Thr: 1 at position 51) with highest score: 0.66 PKC at position 51.NetSurfP results for the probability for α-helix confirmed that the putative mature SpliCec peptide consists of two α-helices (first: 27-44 and second: 48-59) interrupted by a hinge dipeptide (AP).In addition, two strong transmembrane helices (one primary: 3-23 with inside to outside orientation and one secondary: 41-59 with outside to inside orientation) were predicted.

Phylogenetic analyses of the SpliCec sequence
Phylogenetic analyses have been performed on the SpliCec nucleotide seuquence and its deduced polypeptide and the results of these analyses are shown in  In the case of SpliCec nucleotide sequence, a phylogenetic tree was generated from 58 cecropin-related sequences (21 insect species including 11 Lepidoptera and 10 Diptera) by neighbor-joining distance analysis with maximum sequence difference 1.0 (Fig. 3A).The topology shows two distinct lineages including 21 (Lepidoptera) and 37 (Lepidoptera and Diptera) cecropin-related sequences, respectively.The maximum nucleotide sequence divergence was exhibited in the lineage II (21 phylogenetic groups).Meanwhile, the cecropin sequences appear in the other lineage as less divergent clades (14 phylogenetic groups).The SpliCec was clustered with M. sexta and A. convolvuli cecropins (Acc# M23661 and GQ888768, respectively) in a monophyletic sister clade (Fig. 3A).Surprisingly, two lepidopteran secropins were clustered with the dipteran lineage II (Fig. 3A).In the case of SpliCec deduced amino acid seuquence, a phylogenetic tree was generated from sequence data of 57 published sequences (21 insect species including 11 Lepidoptera and 10 Diptera) by neighborjoining distance analysis with maximum sequence difference 2.0 (Fig. 3B).The topology shows two distinct lineages including 4 (Lepidoptera) and 53 (Lepidoptera and Diptera) cecropin peptides, respectively.The maximum divergence of amino acid sequences was exhibited in lineage I (33 phylogenetic groups).However, minimum divergence was observed in the other lineage (3 phylogenetic groups).The SpliCec putative peptide was clustered with M. sexta and A. convolvuli cecropins (Acc# M23661 and GQ888768, respectively) in a monophyletic sister clade (Fig. 3B).Meanwhile, the other lepidopteran sequence was grouped in a separate phylogenetic group (13 sister clades) in the lineage I (Fig. 3B).

DISCUSSION AND CONCLUSION
In the present study, the common bands revealed by DD-PCR in both control and challenged samples may represent the house-keeping genes.Some bands were recorded in control insects and disappeared in challenged ones (genes were turned off).On the other hand, many bands were induced as a result of bacterial-challenge at different time intervals post-infection.DD-PCR technique is considered a powerful genetic screening tool for complicated dynamic tissue processes, to detect and compare altered gene expression in eukaryotic cells, and to screen and to characterize differentially expressed mRNAs (Santana et al., 2006), because it allows for simultaneous amplification of multiple arbitrary transcripts.Many publications described the enhancement of the insect immune system and induction of AMPs due to stress and/ or infection (e.g.Seufi et al., 2011).Cecropins were isolated from only four insect orders: Lepidoptera (e.g.Kim et al., 2004, Kaneko et al., 2007, Hong et al., 2008, Kim et al., 2010), Hymenoptera (e.g.Orivel et al., 2001), Coleoptera (e.g.Saito et al., 2005) and Diptera (e.g.Vizioli et al., 2000, Jin et al., 2010).The full length cDNA of the SpliCec contained a 186 bp orf encoding 62 amino acids, preceded by a 33 bp 5′-UTR and followed by a 166 bp 3′-UTR.The putative translational start site sequence (ACAATGAAC) conformed well to the Kozak consensus sequence and keeps the adenine nucleotide at position -3 as a universal feature in all eukaryotic mRNA (Kozak, 1984).Many insect cecropins also conformed Kozak consensus sequence (ACAATGAAC for An.gambiae (Vizioli et al., 2000), ACAATGAAT for B. mori (Hong et al., 2008), AAAATGAAT for G. mellonella (Kim et al., 2004) and AAAATGAAT for P. xuthus (Kim et al., 2010).The SpliCec cDNA harbored two putative polyadenylation sequences (AATAAA).Two polyadenylation sequences were reported in the CecB of H. cecropia (Xanthopoulos et al., 1988, Boman et al., 1991).On the other hand, only one polyadenylation sequence was reported in the following cases: cecropins A and D of H. cecropia (Gudmundsson et al., 1991), cecropin-like of Pachycondyla goeldii (Orivel et al., 2001), cecropin-like of B. mori (Kaneko et al., 2007), cecropins of An. gambiae, G. mellonella, B. mori and P. xuthus (Vizioli et al., 2000, Kim et al., 2004, Hong et al., 2008, Kim et al., 2010).The SpliCec cDNA identified in this study encodes a protein of 62 a.a.showing a high degree of similarity to insect cecropins, particularly to the cecropins of M. sexta, A. convolvuli and H. cecropia.Comparison of the SpliCec a.a.sequence to 22 lepidopteran cecropins showed some replacements.The possibility that it comes from polymorphism cannot be excluded, because cecropins were identified from different insect species (Kaneko et al., 2007).Despite of the identified replacements, many conserved regions were observed throughout the 22 compared sequences.These conserved regions may be very important in designing universal primers to detect lepidopteran cecropin genes.A potential 22-residue signal peptide, a dipeptide prosequence and a 38-residue mature peptide were predicted by SignalP software and by multiple alignment of the amino acid sequence of the SpliCec peptide with that of 22 lepidopteran cecropins.The rule is that signal peptide sequences are not conserved (Von Heijne, 1985).The signal peptides of lepidopteran cecropins probably make up the first 22 a.a., leaving 2-to 4-residue propeptide before the start of a linear, amphipathic mature peptide of 35 to 39 a.a.residues (Boman et al., 1991).The deduced amino acid sequence of two B. mori cecropin-like sequences were reported to contain 20-residue signal peptides, dipeptide prosequences and 37-residue mature peptides (Kaneko et al., 2007).Thus, the SpliCec peptide (62 a.a.) was comparable in size to that of the other lepidopteran cecropins (55-67 a.a.).Moreover, the conserved pattern of the putative SpliCec peptide was consistent with the typical features of invertebrate cecropins.
Reconstruction of the phylogenetic trees of the SpliCec nucleotide seuquence and its deduced polypeptide resulted in two different topologies.In spite of constructing two different tree-topologies, both trees clustered the SpliCec sequence with that of M. sexta and A. convolvuli to indicate that they descend from a common ancestor.The grouping of some lepidopteran and dipteran cecropins (e.g. S. litura with An. gambiae) in one sister clade indicated that they may be homologous or share some similarity.
The potential glycated and phosphorylated residues may serve for correct folding and stability of the protein in both prokaryotic and eukaryotic organisms (Barford et al., 1998).Reversible phosphorylation (phosphorylation and dephosphorylation) is very important mechanism in protein-protein interaction via recognition domains (Cole et al., 2003).It may also result in conformational changes in the structure of many peptides, causing them to become activated, deactivated or degraded (Olsen et al., 2006).In addition, two α-helices interrupted by a hinge dipeptide (AP), and two strong transmembrane helices were predicted.Structure prediction studies of AngCecA supports an overall α-helical structure with a strikingly amphipathic N-terminal α-helix (Vizioli et al., 2000).Like many insect cecropins, SpliCec also harbors Gly-Lys residues (GK) at the C-terminal end.Contrary to our results, it was reported that the majority of insect cecropin-like peptides harbor amidated C-termini (Saito et al., 2005).Additionally, it was previously described that H. cecropia CecA activity was enhanced significantly by this C-terminal amidation (Callaway et al., 1993).On the other hand, the amidation of An. gambiae's Glyextended cecropin did not influence the antimicrobial activity (Vizioli et al., 2000).However, the amidation of Gly-residue was previously theorized to protect the cecropin-like peptide against carboxypeptidase digestion (Liang et al., 2006).Whether the amidating enzyme works on Gly-Lys is not known.If not, a carboxypeptidase H will also have to be invoked for the removal of the terminal Lys-residue.
The SpliCec precursor harbors a dipeptide (AP) between the signal and mature peptides, which is not conserved in terms of the tetrapeptides (APEP, VPEP or APSP) in cecropins of B. mori, T. ni, S. litura, H. cunea and G. mellonella or the dipeptides (SP or WD) in cecropins of P. xuthus and H. armigera (Kim et al., 2004, Hong et al., 2008, Kim et al., 2010).In addition, the precise conservation of Pro-residue (P) at Nterminal domain, the aromatic Trp-residue (W) at position 2, the Lys-residues (K) at positions 5 and 9, and the Arg-residue (R) at position 13 confirmed a possible replacements of Lys 8 and Arg 13 with Glu 8 and Gln 13 in the case of B. mori, H. cunea and P. rapae cecropins (Kim et al., 2004, Hong et al., 2008, Kim et al., 2010).
Cecropin was initially isolated from a bacterially challenged H. cecropia pupa as a cationic antimicrobial peptide, and then a number of cecropin-like peptides have also been identified in lepidopteran, dipteran, hymenopteran and coleopteran insects.
The current results provide a typical cationic insect cecropin gene (SpliCec) with a possible replacement mutation.SpliCec plays an important immune role in S. littoralis by cooperating with other AMPs to control bacterial infection.These findings would be helpful in cecropin studies concerning ELISA, PCR and other related molecular and immunological techniques.Study of the expression profile, potential antimicrobial, hemolytic, apoptotic and antitumor activties of the SpliCec recombinant protein as well as structure-function relationship studies are required for more understanding of its mode of action.

Fig
Fig. (1): (A): Representative 2% agarose gels of DD-PCR patterns generated from control and S. aureus, E. coli and S. sanguinis-challenged haemolymph samples using 8 primers corresponding to well known defense genes.(B): Lanes M: DNA marker, lanes 1 and 5: controls of different treatments, lanes 2 and 3: 24 and 48 h p.i. by E. coli, Lanes 4, 6 and 7: 24, 48 and 72 h p.i. by S. sanguinis and Lane 8: 42 h p.i. by S. aureus.Arrows refer to differentially displayed sequenced bands.(B): 2% gel electrophoresis showing positive PCR and cloning products.Lanes 1, 2, 3 and 4 show empty PCR-TOPO, E. coli harbouring PCR-SpliCec, PCR-SpliCec after digestion with EcoRI and positive control (186 bp), respectively.Lane 5: PCR mix without DNA used as negative control.The size of the bands is shown in bp.

Fig. 3 :
Fig. 3: Phylogenetic analysis of SpliCec nucleotide (A) and its deduced amino acid (B)sequences compared to 58 and 57 sequences, respectively.Phylogenetic trees were generated by neighbor-joining distance analysis using Phylogeny.frweb service, One Click mode.Full sequence names and accession numbers are included in the trees.

Table 1 :
Key table for the primers used in DD-PCR study providing their names, origin and sequences.
All oligonucleotides were synthesized by Invitrogen, USA and HPLC purified.