Length Isolation and Phylogenetic Analysis of C-type Lectin Gene from Bacterial-challenged Cotton Leafworm , Spodoptera littoralis

Experiments were designed to investigate the molecular immune response of Spodoptera littoralis larvae against bacterial infection. In addition, sequence and phylogenetic analyses of the involved gene were studied. Using differential display technique, a partial insect lectin gene (SpliLec) was isolated from bacterial-challenged S. littoralis haemolymph. Five differentially displayed bands were sequenced. Sequence results revealed that a fragment of 640 bp was amplified within the open reading frame (orf) of a lectin gene. This fragment contained the complete 3` end with a poly(A) tail, but it lacks start codon, AUG at its 5` end. Using RACE PCR reaction, 5` end was extended and a final reaction was performed to obtain the full length of the SpliLec. Sequence analyses of the data revealed that SpliLec consists of a single orf encoding a deduced polypeptide consisting of a 18-residue signal peptide and a 291residue mature peptide. SpliLec sequence contained two CRDs: short form CRD1 and long form CRD2 stabilized by two and three highly conserved disulfide bonds, respectively. SpliLec shares homology with some dipteran lectins suggesting possible common ancestor. These results suggested an important role of the SpliLec gene in cell adhesion and non-self recognition. It may cooperate with other AMPs in clearance of invaders of Spodoptera littoralis.


INTRODUCTION
After pathogens penetrate the insects' structural barriers, they rely solely on an efficient innate immune system which shares many characteristics with the innate immune system of vertebrates.Insect innate immune system comprises both humoral and cellular responses (Pinheiro andEllar, 2006, Lemaitre andHoffmann, 2007).Insect humoral defenses include the production of a potent arsenal of antimicrobial peptides (AMPs) (Pinheiro andEllar, 2006, Lemaitre andHoffmann, 2007), coagulation, and melanization led by protease cascades (Kanost et al., 2004).Insect cellular defense refers to haemocyte-mediated immune responses, such as phagocytosis, nodulation, and encapsulation (Lavine and Strand, 2002).The encapsulation process involves cell adhesion and melanization (Eslin and Prevost, 2000).
Lectins are an important class of carbohydrate-binding proteins that have several distinct biological activities.They mediate cell adhesion (i.e.bind to microbial surface components), non-self recognition and immuno-protection processes in immune responses (Vasta et al., 1999).They exist in a wide variety of plants, animals, fungi, bacteria and viruses (Sharon, 1977) and play significant role in clearance of invaders, either as cell surface receptors for microbial carbohydrates or as soluble proteins existing in tissue fluids (Yu and Kanost, 2003).Such proteins are known as pattern recognition receptors (PRPs), because they bind to the pathogen associated molecular patterns (PAMPs) present in the array of carbohydrate components on the surface of microorganisms and consequently, trigger a series of protective immune responses (Medzhitov and Janeway, 2002).Various proteins that display carbohydrate-binding activity in a calcium-dependent manner are classified into the C-type lectin family (Drickamer and Taylor, 1993).They contain C-type carbohydrate-recognition domains (CRDs) or C-type lectin domains (CTLDs) composed of 110-130 amino acid residues in common.These CRDs or CTLDs contain a characteristic double-loop (loop in a loop) stabilized by two or three highly conserved disulfide bonds.The vertebrate C-type lectins are usually multi-domain lectins and they fall into seven groups (I-VII) (Day, 1994).Seven new groups (VIII-XIV) were added in the revised classification in 2002 (Drickamer and Fadden, 2002) and three new groups (XV-XVII) were updated, recently (Zelensky and Gready, 2004).In contrast, the invertebrate C-type lectins are mostly single-domain proteins, but C-type lectins that contain two CRDs are characterized too.Although all C-type lectin CRDs have sequence similarity, they can be divided into two types: a "short form" approximately 115 residues long and a "long form" approximately 130 residues long, which includes two additional disulfide-bonded cysteine residues at the amino terminus (Drickamer andTaylor, 1993, Day, 1994).In recent years, more and more C-type lectins with two tandem CRDs have been identified and characterized from invertebrates, especially from insects (Yu and Kanost, 2000, Yu et al., 2005, Tian et al., 2009).Examples of the C-type lectins with two tandem CRDs include the M. sexta immunolectins (IML-1, IML-2, IML-3 and IML-4) which serve as humoral PRPs (Kanost et al., 2004), LPS-binding lectins from the silkworm, Bombyx mori (Koizumi et al., 1999) and the fall webworm, Hyphantria cunea (Shin et al., 2000).
In this paper, the full length cDNA of a C-type lectin with two tandem CRDs from S. littoralis, was isolated using differential display and RACE PCR techniques.Sequence characterization and phylogenetic analyses were reported, too.

Insects and bacterial strains
Laboratory colony of the cotton leafworm, S. littoralis, used for our experiments was originally collected from a private okra field at Giza, Egypt in 1995 and maintained in the insectary of the Department of Entomology, Faculty of Science, Cairo University according to the technique described by Levinson and Navon (1969) and kept at 25 °C, 65-70% RH and 14L: 10D photoperiod cycle.
Two gram (+) bacteria, Staphylococcus aureus and Streptococcus sanguinis and three gram (-) bacteria, Escherichia coli (D 31 ), Proteus vulgaris and Klebsiella pneumoniae were obtained from the Unit for Genetic Engineering and Agricultural Biotechnology, Faculty of Agriculture, Ain Shams University and used for insect immunization.Bacteria were grown in a peptone medium (1%), supplemented with 1% meat extract and 0.5% NaCl, at 37 °C in a rotary shaker.

Insect immunization and haemolymph collection
Bacterial challenge was performed as described by Seufi et al. (2011).Haemolymph was collected at 24, 48 and 72 h post-infection (p.i.) at 4 °C (500 µl/ each), containing few crystals of phenylthiourea to prevent melanization.Aliquots of 100 μl each were stored at -80 °C until investigated.Control group was injected with bacteria-free saline solution.

RNA extraction and reverse transcription
Total RNA of the insect haemolymph (300-500 µl) was extracted using RNeasy kit according to the manufacturer's instructions (Qiagen, Germany).Residual genomic DNA was removed from RNA using RNase-free DNase (Ambion, Germany).RNA integrity and purity were justified by examining 260/280 and 260/230 ratios for protein and solvent contamination.Reverse transcription reaction was carried out according to the ABgene protocol (ABgene, Germany).The cDNA was aliquoted and stored at -80 °C until processed.

Differential display using primers corresponding to lectin sequence (DD-PCR)
A total reaction volume of 25 μl containing 2.5 μl PCR buffer, 1.5 mM MgCl 2 , 200 μM dNTPs, 1 U Taq DNA polymerase (AmpliTaq, Perkin-Elmer), 2.5 μl of 10 pmol primer (Table 1) and 2.5 μl of each cDNA was cycled in a DNA thermal cycler (Eppendorf, Mastercycler 384, Germany).The amplification program was one cycle at 94 °C for 5 min (hot start), followed by 40 cycles at 94 °C for 1 min, 40 °C for 1 min and 72 °C for 1 min.The reaction was then incubated at 72 °C for 10 min for final extension.PCR product was visualized on 1.5 % agarose gel and photographed using gel documentation system.For DNA contamination assessment, a no-reverse transcription control reaction was performed.
Based on the sequence and alignment data, specific primers (LecSF 1,2 and LecSR 1,2 ) for lectin-related sequences were designed (Table 1) and tried for reverse transcription polymerase chain reaction (RT-PCR).RT-PCR reaction was performed as previously described in this section regarding to the optimum annealing temperature (T a ) for each specific primer set.Positive PCR products were visualized and eluted from the gel using GenClean Kit (Invitrogen Corporation, San Diego, CA, USA) following the manufacturer's instructions.
The purified PCR product (SpliLec) was cloned into PCR-TOPO vector with TOPO TA cloning kit (Invitrogen, USA) following the manufacturer's instructions.Ligation mix was used to transform competent E. coli strain TOPO 10 provided with the cloning kit.White colonies were screened using PCR as described earlier in this section.Two positive clones of SpliLec fragment were selected and sequenced (to exclude PCR errors certainly) using their specific forward and reverse primers (Table 1).Sequencing and sequence analyses were performed as described early in this section.

Full-length cDNA isolation of immunolectin gene
Specific primers (sense and antisense) were designed based on the sequence of SpliLec containing 3µ end.The 5µ end fragment was amplified using SMART RACE cDNA Amplification kit (Clontech) following the procedure outlined in the supplied user manual.The amplified 5µ end fragment was purified, cloned into PCR-TOPO vector, and sequenced as described early in this section.The sequences of 3′ and 5′ end fragments were aligned and the predicted full-length cDNA was obtained.Thus a pair of primers, LecFLF and LecFLR (Table 1), was designed for the amplification of full-length SpliLec cDNA.PCR was carried out in a total volume of 25 μl reaction solution containing 2.5 μl PCR buffer, 1.5 mM MgCl 2 , 200 μM dNTPs, 1 U Taq DNA polymerase (AmpliTaq, Perkin-Elmer), 2.5 μl of 10 pmol of each primer and 2 μl cDNA using the following protocol: 94 °C for 5 min (hot start) followed by 35 cycles of amplification (94 °C for 1 min, 60 °C for 1 min, 72 °C for 1.5 min) and a final extention step at 72°C for 10 min.Full-length SpliLec was visualized and eluted from the gel using GenClean Kit (Invitrogen Corporation, San Diego, CA, USA) following the manufacturer's instructions.
Moreover, Phylogenetic analyses of the nucleotide sequence and its deduced amino acids were done using Mega4.Poorly aligned positions and divergent sequences were eliminated manually.Multiple alignment of available published lectin-related nucleotide sequences was done before phylogenetic analyses to approximate sequence lengths manually.100% homologous sequences of the same species with different accession numbers were represented by only one sequence.The cloned DNA fragment was deposited in GenBank under the HQ603826 accession number.

Differential display using primers corresponding to well known lectins
Differential display technique was used to characterize the genetic variation (at RNA level) between bacterial-challenged and control cotton leafworm, S. littoralis.Fig.
(1) shows the results of differentially displayed cDNAs of bacterial-challenged and control insects using 8 primers corresponding to previously characterized lectins (Table 1).Haemolymph samples were differentially displayed at 24, 48 and/ or 72 h p.i. with S. aureus, S. sanguinis, E. coli, P. vulgaris and K. pneumoniae bacterial strains.It was observed that S. aureus-challenged insects died 24 h p.i., E. coli-challenged insects died 48 h p.i. and S. sanguinis-challenged insects died 72 h p.i.All insects died before sampling in the case of P. vulgaris and K. pneumoniae.Differential display results revealed that the average number of bands per sample was 4.3 bands for each amplification reaction.The total number of bands (transcripts) resolved in 1.5 % agarose gel for both control and challenged insects was 124 (molecular size ranged from >1300 to ~80 bp).Forty seven polymorphic bands (37.9 %) were differentially displayed with 6 of the used primers.Five reproducible, infection-induced bands were cloned and sequenced using M 13 universal primer.Analyses of the results revealed that a fragment of 640 bp was amplified within the open reading frame (orf) of a lectin gene.This fragment contained the complete 3` end with a poly(A) tail, but it was not complete at the 5` end (lacking starting codon, AUG at its 5` end).

RT-PCR amplification and cloning of the lectin gene
To obtain the full-length sequence, the 5µ end of the cDNA was amplified using RACE PCR method, purified, cloned and sequenced.The full-length sequence of SpliLec cDNA was amplified using LecFLF and LecFLR.RT-PCR was optimized for the primer set and successfully amplified ≈1150 bp fragment (Fig. 2).
The positive PCR product was visualized, eluted and cloned into PCR-TOPO vector (Fig. 2, lane 2).Using PCR screening method, the clone PCR-TOPOSpliLec was tested as positive (Fig. 2, lane 4).Two positive clones of SpliLec fragment were selected and sequenced (to exclude PCR errors certainly) using LecFLF and LecFLR primers (Table 1).

Nucleotide sequence and sequence analyses
Nucleotide sequences of the SpliLec and its deduced amino acid sequence is shown in Fig. (3).A single orf encoding a 309-residues polypeptide was detected in the SpliLec sequence.One stop codon was found at the 3` end.The flanking region of the initiation codon ATG is AGTATGGAG, and the length of 5µ untranslated region (UTR) was 60 bp before the start codon ATG.The length of 3µ UTR was 60 bp before the poly (A) track.
The putative polyadenylation sequence AATAAA was located 15 bp downstream from the stop codon (Fig. 3).The identified SpliLec orf includes a signal peptide (54 bp), and a mature peptide (873 bp).Analysis of the amino acid sequence deduced from the cDNA indicated that SpliLec is a member of the C-type lectin superfamily.It contains two C-type CRDs, an amino-terminal domain, CRD 1 (residues 1-149), and a carboxyl-terminal domain, CRD 2 (residues 160-301).The deduced SpliLec polypeptide contains 50 strongly basic, 28 strongly acidic, 127 hydrophobic and 104 polar uncharged amino acids.
The calculated molecular masses of the putative SpliLec and its mature peptide are 34.85 and 32.91 KDa, respectively.The theoretical isoelectric points (PIs) were 9.27 and 9.38 for the full length and mature SpliLec peptides, respectively.The net charges at pH 7.0 were 15.9 and 16.9 for the SpliLec and its mature peptide, respectively.Both the full length and the mature SpliLec peptides were classified as unstable (Instability Index (II): 55.81 and 56.95, respectively).Ratios of the hydrophilic residues were calculated as 37 and 38% for the full length and its mature peptides, respectively.Nucleotide sequence and its deduced amino acid sequence of the SpliLec were blasted with all available sequences in GenBank database.Alignment results revealed that the SpliLec sequence (Acc# HQ603826) has a significant alignment with 9 and 14 published lepidopteran DNA and peptide sequences, respectively.Although the percentage identity ranged from 100% to 69% with IML-A precursor (Acc# AF053131) and IML-3 (Acc# AY768811) of Manduca sexta, it did not necessarily mean full consistence, especially when the percentage coverage of the gene was regarded.Some insect lectins covered the forward region of the SpliLec sequence and others covered the backward segment (e.g.M. sexta and Bombyx mori immunolectins) (Fig. 4 A and B).Primary, secondary structure analyses, post-translational modifications and topology predictions revealed that amino acid sequence of the putative SpliLec peptide had one signal peptide cleavage site (between positions 18 and 19), one tyrosine-glycosylated and two tyrosine-sulfated sites at positions 111, 31 and 33, respectively.Fifteen O-GlcNAcylated residues (8 Ser and 7 Thr) and six potentially glycated lysines were predicted.Twenty one phosphorylation sites (Ser: 11, Thr: 6 and Tyr: 4) and 44 (24 S, 2 Y and 18 T) kinase specific phosphorylation sites (highest score: 0.82 PKC at position 185) were also predicted.In addition, two transmembrane helices (one primary: 166-182 with outside to inside orientation and one secondary: 3-22 with inside to outside orientation) were predicted.

Phylogenetic analyses of the SpliLec sequence
Phylogenetic analyses of the SpliLec have been performed with the 47 nucleotide seuquence (including 10 insect genera from the order Lepidoptera.)and 14 polypeptides (including 8 insect species: 3 lepidopterans and 5 dipterans).The results of these analyses are shown in Figs.(5 A and B).LPS-binding proteins of the silkworm, B. mori (Koizumi et al., 1999) and the putative lectin of the fall webworm, H. cunea (Shin et al., 2000).The predicted modifications of the SpliLec protein suggested an important role of the SpliLec protein in modulating a broad range of biological processes in the cell.The predicted O-GlcNAcylation suggested a possible function of the SpliLec protein in macromolecular complex assembly and intracellular transport.Glycosylation and glycation serve for the correct folding and stability of the protein (unglycosylated proteins degrade quickly).Glycosylation of proteins play a role in cell-cell adhesion (a mechanism employed by cells of the immune system), as well (Varki et al., 2009).Reversible phosphorylation of proteins (using kinases and phosphatases) is considered an important regulatory mechanism in protein-protein interaction via recognition domains, (i.e.many proteins and receptors are switched "on" or "off" by phosphorylation and dephosphorylation).It also results in a conformational changes in the structure in many peptides, causing them to become activated, deactivated or degraded (Olsen et al., 2006).In addition, many transmembrane proteins (TPs) function as gateways or "loading docks" to deny or permit the transport of specific substances across the biological membranes (to get into or out of the cell by folding up or bending through the membrane).
Reconstruction of the phylogenetic trees of the SpliLec nucleotide sequence and its deduced polypeptide resulted in two different topologies.Both of the two trees clustered SpliLec sequence in two different groups (clustered with Bombyx in the case of nucleotide-based tree and with Anopehles in the case of amino acid-based tree) indicating the possibility of evolutionary trend between these lectins which might descend from a common ancestor.Grouping of some lepidopteran and dipteran lectins (e.g.M. sexta with Sarcophaga and S. littoralis with Anopehles) in one sister clade indicated that they may be homologous or share some similarity.In addition, lepidopteran lectin-like sequences were diverged in many sister clades as amino acids due to the difference in codon usage in different species.
In short, these findings shed a new light on the lectin-mediated immune system.Combination of these findings with that reported by Seufi et al. (2009), Seufi et al. (2011) and Seufi (2011) suggested that the SpliLec, SpliDef and SpliCec peptides with other possible AMPs may constitute the defense network of S. littoralis (Lepidoptera) against invading microorganisms.
Conclusively, the current results provide a novel insect lectin gene (SpliLec) with a two tandem CRDs.The SpliLec plays an important immune role in S. littoralis by cooperating with other AMPs to clear invading microorganisms.These findings would be helpful in future studies on lectins concerning ELISA, PCR and other related molecular and immunological techniques.Future studies on the carbohydratebinding and blood group specificities, on the determination of molecular weight and three-dimensional structure of the SpliLec will be needed to provide direct evidences and more understanding of the SpliLec mode of action.

Fig. 1 :
Fig. 1: Representative 1.5% agarose gels of the DD-PCR patterns generated from control and S. aureus, E. coli and S. sanguinis-challenged haemolymph samples using 8 primers corresponding to well known lectin genes.Lane M: DNA marker 100 bp Ladder, lanes 1, 4, 8 and 10: controls of different treatments, lanes 9 and 11: 24 h post-infection with S. aureus, lanes 2, 3 and 5, 6: 24 and 48 h post-infection with E. coli and lane 7: 72 h post-infection with S. sanguinis.Arrows refer to differentially displayed sequenced bands.

Fig
Fig. (3): Nucleotide and corresponding deduced amino acid sequence of S. littoralis immunolectin gene (SpliLec).Cleavage site between the signal and mature peptides are indicated by an arrow.Positions of cysteine residues are shaded and numbered.Asterisk indicates the stop codon.Boxed sequence represents the putative polyadenylation signal.

Fig. 5 :
Fig. 5: Phylogenetic analysis of SpliLec nucleotide and deduced amino acid sequences compared to 46and 13 sequences registered in NCBI.Phylogenetic trees were generated from 47 and 14 lectin-related sequences by neighbor-joining distance analysis using Mega4 software.Full sequence names and accession numbers are included in the tree.

Table 1 :
Key table for the primers used in this study providing their names, origin and sequences.