Haloferax Elongans Classification Essay

1. The CRISPR-Cas Immune System

The CRISPR-Cas system is the most elaborate defence strategy present in prokaryotic cells (for general reviews about the CRISPR-Cas system see: [1,2,3,4,5,6,7,8,9]). It confers immunity against foreign genetic elements by a sequence-specific targeting and elimination of the invading nucleic acids. To this end, the cell establishes and maintains a genetic record of previously encountered viruses and plasmids within its CRISPR loci. These genomic regions are arrays of recurring repeat sequences, between which are short variable spacer sequences that represent genetic samples of invader DNA [10,11]. CRISPR loci not only provide genetically heritable systems for specific immunity but also, by their transcription, give rise to a key player of CRISPR defence, the crRNA. In proximity to CRISPR loci are gene cassettes encoding Cas proteins, that are responsible for all parts of the defence reaction: acquisition of foreign DNA (spacer sequences), crRNA biogenesis as well as target degradation. The defence reaction progresses in three stages. In the first stage, new spacer sequences are acquired. Here, as shown in Figure 1, a short piece of invader nucleic acid is selected and integrated into a CRISPR locus [5,7]. For this step, type I and II systems require short sequence motifs, called PAMs [12,13]. These motifs are part of the invader DNA and are used by the adaptation machinery for selecting the invader DNA fragment to be integrated. In addition, they are essential for the recognition and degradation of the invader upon a recurring infection.

Figure 1. Acquisition of new spacers. The invader DNA is degraded by Cas proteins and a piece of the invader DNA is integrated as a new spacer (shown as red rectangle) into the CRISPR locus. Repeats are shown as diamonds, spacers as grey rectangles and the leader region as white rectangle. The leader is located at the 5' end of the CRISPR locus. The CRISPR locus including the novel spacer is shown at the right, the original CRISPR locus is shown at the left. The invader DNA to which Cas1 and Cas2 bind is shown at the bottom.

Figure 1. Acquisition of new spacers. The invader DNA is degraded by Cas proteins and a piece of the invader DNA is integrated as a new spacer (shown as red rectangle) into the CRISPR locus. Repeats are shown as diamonds, spacers as grey rectangles and the leader region as white rectangle. The leader is located at the 5' end of the CRISPR locus. The CRISPR locus including the novel spacer is shown at the right, the original CRISPR locus is shown at the left. The invader DNA to which Cas1 and Cas2 bind is shown at the bottom.

The second stage of CRISPR activity covers the biogenesis of the crRNAs. CRISPR loci are transcribed into long precursor molecules, which are processed into much smaller, mature crRNAs, each containing a spacer sequence and parts of the flanking repeat sequences. The spacer sequences render each crRNA specific for a particular invader. The third stage, referred to as interference, occurs when the cell is invaded by intruder DNA. If the CRISPR locus contains a spacer sequence matching this invader (i.e., captured from a previous invasion event), then the resulting crRNA will guide the CRISPR associated complex for antiviral defence (Cascade) complex to recognize the intruder DNA, which ultimately leads to degradation of the foreign nucleic acid via the activity of the protein components of these complexes [5].

CRISPR-Cas systems have been classified into three major types (I, II, III) [1] that can be further subdivided into 14 subtypes, each showing significant differences in the nature of their Cas proteins as well as mechanistic details of the defence reaction [1,14,15,16]. Subtype III-B systems are a clear example of this variation, as they target RNA, whereas all other currently known subtypes target DNA. To allow for a complete and comprehensive picture of this defence mechanism, it is essential to analyse all CRISPR-Cas systems in a variety of species. Since very good overviews about the CRISPR-Cas system and its function in general have been recently published [1,2,3,4,5,7,8,14,17], this review focuses on the type I-B system of Haloferax volcanii.

2. The Type I-B CRISPR-Cas System of Haloferax volcanii

Hfx. volcanii is a halophilic euryarchaeon first isolated from the shores of the Dead Sea [18]. It grows best at around 45 °C, requires a salinity of approximately 2.5 M NaCl and maintains an equally high intracellular salt concentration [18,19]. Haloferax possesses a single CRISPR-Cas system of subtype I-B, with three different CRISPR loci; one on the main chromosome (locus C) and two on the large (636 kb) chromosomal plasmid pHV4 (locus P1, P2) (Figure 2) [20,21]. The P1 and P2 loci flank the single cas gene cassette that carries genes for eight Cas proteins (Cas1-8b). The repeat sequences of all three CRISPR loci are 30 nt in length and identical in sequence (in all but one nucleotide), whereas spacer sequences vary in length from 34 to 39 nucleotides.

Figure 2. The CRISPR-Cas type I-B system of Haloferax volcanii. (A) The system consists of eight Cas proteins and three CRISPR arrays. Specific for class I systems is the presence of the Cas3 protein. The presence of a Cas8b protein defines this system as type I-B. The cas gene cluster is flanked by two of the CRISPR loci while the third locus is encoded on the main chromosome. In comparison to the published genome sequence of Haloferax strain DS2 [22] the H119 strain has a deletion in CRISPR locus P1 (23 spacers and repeats deleted) [20]. Gene locations on pHV4 and the main chromosome are indicated (in kb) but their sizes are not to scale. (B) The repeat sequences of the three CRISPR loci are identical except for one nucleotide at position 23 (shown in red). Processing of the CRISPR RNA by Cas6b takes place between nucleotides 22 and 23 in the repeat sequence (indicated by an arrow) leaving an 8 nucleotide repeat sequence upstream of the spacer and the remaining 22 nucleotides of the repeat downstream of the spacer.

Figure 2. The CRISPR-Cas type I-B system of Haloferax volcanii. (A) The system consists of eight Cas proteins and three CRISPR arrays. Specific for class I systems is the presence of the Cas3 protein. The presence of a Cas8b protein defines this system as type I-B. The cas gene cluster is flanked by two of the CRISPR loci while the third locus is encoded on the main chromosome. In comparison to the published genome sequence of Haloferax strain DS2 [22] the H119 strain has a deletion in CRISPR locus P1 (23 spacers and repeats deleted) [20]. Gene locations on pHV4 and the main chromosome are indicated (in kb) but their sizes are not to scale. (B) The repeat sequences of the three CRISPR loci are identical except for one nucleotide at position 23 (shown in red). Processing of the CRISPR RNA by Cas6b takes place between nucleotides 22 and 23 in the repeat sequence (indicated by an arrow) leaving an 8 nucleotide repeat sequence upstream of the spacer and the remaining 22 nucleotides of the repeat downstream of the spacer.

All three loci are actively transcribed and the transcripts processed, leading to a stable population of mature crRNAs [20]. In 2012, only two spacers of Hfx. volcanii (C-14 and P1-2) showed likely matches to sequences in the public databases [20], but this has now been considerably expanded (Table 1), and has revealed prominent types of invader DNAs. The C-14 spacer shows exact matches to the genomes of two recent isolates of Haloferax, and targets homologs of Hfx. volcanii gene HVO_0372 (Table 1). This ORF occurs in similar gene contexts in at least five different isolates of Haloferax, and appears to be within an integrative mobile element (Hvol-IV1) of ~12 kb, that commonly attacks members of this genus, most likely a temperate virus. In their integrated (provirus) state, they are flanked by a tRNAAla gene at one end (attL), and an integrase and partial copies of the tRNA (attR) at the other end (Supplementary Figure S1); a typical arrangement first described in temperate bacteriophages. The significance of this virus group (denoted as HFIV1) in the natural environment is highlighted by CRISPR spacers from other species that target the same virus: one from Hfx. denitrificans that targets the same gene but at a different position, and another from Hfx. sp. ATCC BAA-645 that targets a nearby gene (HVO_0375) (Table 1). Spacer C-4 closely matches a gene within a previously documented (defective) provirus of Hrr. lacusprofundi, Hlac-Pro1 [23], as well as related viruses in Hfx. elongans and Hfx. mucosum (denoted HeloV2 and HmucV2, respectively). These all show relationships to halovirus BJ1, an integrative virus of Halorubrum, but HeloV2 and HmucV2 differ significantly from BJ1 in not carrying integrase or tRNA genes, and both appear (from the available sequence data) to exist in cells as circular plasmids (Supplementary Figure S2). Spacer P1-2 matches a sequence within Htg. jeotgali ORF HL44_04258, encoding a conserved ParBc (plasmid partition) domain containing protein. The closest known homologs of this protein (and many other ORFs around it and elsewhere on the same contig) are bacterial or phage/plasmid related, indicating a region of mobile foreign DNA. Other spacers match metagenomic sequences from salt lakes (P1-3, P2-1), including one that targets a MCM (helicase) gene. Finally, the P2-11 spacer exactly matches CRISPR spacers found in three other species of Haloferax that were isolated in different countries (Spain, Israel and Egypt), indicating a significant and widespread invading element, and presumably a preference or selective advantage for the retention of this particular protospacer. In summary, the matches discovered so far are consistent with the spacers of Hfx. volcanii representing sequences recovered from invader (foreign) DNA, such as viruses and plasmids.

Table 1. Sequences closely similar or exactly matching CRISPR spacers of Hfx. volcanii DS2.

Spacer Alignment of spacer/matching sequenceaMatching sequence
C-4 (nt 2386433:2386398)Hrr. lacusprofundi chromosome 1 (nt 759433:759468), within ORF Hlac_0754. Predicted translation of spacer is identical to protein Hlac_0754 but for one conservative (L/F) change (i.e., MPDLVRDNIVDV/MPDFVRDNIVDV). Hlac_0754 is part of a 28.7 kb region (nt 750728:779675) containing genes related to halovirus BJ1, and previously denoted by Krupovic et al. [23] as provirus Hlac-Pro1.
C-4 (nt 2386433:2386398)Hfx. elongans ATCC BAA-1513: AOLK01000020. Within ORF C453_12906 (nt 35718:35683), a homolog of Hlac_0754. Predicted translation of spacer exactly matches the protein sequence of C453_12906 (MPDLVRDNIVDV). Contig AOLK01000020 is likely to represent a halovirus genome, with many genes related to BJ1 or other haloviruses/plasmids. We denote this contig as HeloV2. A related virus appears to be represented by a contig (AOLN01000009) of Haloferax mucosum PA12, ATCC BAA-1512 (i.e., HmucV2, Supplementary Figure S2)
C-14 (nt 2385742:2385778)bHfx. volcanii (chromosome, nt 333928:333984), within ORF HVO_0372 (hypothetical protein). HVO_0372 occurs in a ~12kb region of foreign DNA flanked at one end by a tRNA-ala, and at the other end by an integrase and two partial repeats of the tRNA-ala gene. This region appears to be a provirus (we denote as Hvol-IV1). Related provirues are found in the genomes of at least four other Haloferax species (see Supplementary Figure S1). A nearby gene, HVO_0375 (CPxCG-related zinc finger protein), is the likely target of a CRISPR spacer of Hfx. sp. ATCC BAA-645 (contig_24).
C-14 (nt 2385742:2385778)Line 2: Haloferax sp. ATB1 (JPES01000108.1) scaffold108 (nt 9103: 9067).
Line 3: Haloferax sp. BAB2207: ANPG01000768 (nt 1559–1596). Both matches occur within a homolog of HVO_0372, and are likely to be part of proviruses (Hatb-IV1 and Hbab-IV1, see Figure S1). Elsewhere in this gene (and in HVO_0372) is a target sequence matching a CRISPR spacer carried by Hfx.denitrificans d.
P1-2 (nt 205072:205108)Line 2: Lake Tyrrell metagenome (contig 1101968716470, library GS84-02-2-3kb, nt 851:887) b.
Line 3: metavirome (assembly from SRR402046).
Line 4: Haloterrigena jeotgali A29, HL44_contig00019.19 (nt 18864:18136), within locus tag HL44_04258. BLASTX predicts a COG1475 (ParBc domain) protein (plasmid partition protein). Closest relatives are bacterial (e.g., WP_021624091). The predicted aa sequences are identical i.e., HKSIKEDGYTQP.
P1-3 (nt 205139:205173)Line 2: Lake Tyrrell metagenome (49037 1101497529448, library GS84-02-2-3kb, nt 190:224).
Line 3: metavirome (assembly from SRR402046). BLASTX of matching contigs show matches to Hbor_29150 of Hgm. borinquense. The adjacent gene, Hbor_29160, on the genome most closely matches M201_gp84 of halovirus HCTV-2. The predicted aa sequences over the matching region (left) are identical but for one conservative change (F/L), i.e., VLDEAGVQFGNR / VLDEAGVQLGNR
P1-38 (nt 207450:207485)Lake Tyrrell metavirome (assembly from SRR402046). BLASTX of matching contig shows a match (E = 10−13) to the integrase of halovirus HCTV-5 (M200_gp113). The predicted aa sequences of spacer and matching contig sequence differ by one conservative (D/E) change i.e., RLDDDYFALEAR/RLDDEYFALEAR.
P2-1 (nt 217843:217879)

TIP Sheet
WRITING A CLASSIFICATION PAPER

Classification is sorting things into groups or categories on a single basis of division. A classification paper says something meaningful about how a whole relates to parts, or parts relate to a whole. Like skimming, scanning, paraphrasing, and summarizing, classification requires the ability to group related words, ideas, and characteristics.

Prewriting and purpose
It is a rare writer, student or otherwise, who can sit down and draft a classification essay without prewriting. A classification paper requires that you create categories, so prewriting for a classification paper involves grouping things in different ways in order to discover what categories make the most sense for the purpose you intend.

An important part of creating useful categories is seeing the different ways that things can be grouped. For example, a list of United States presidents may be grouped in any number of ways, depending on your purpose. They might be classified by political party, age on taking office, or previous occupations, but you could just as well, depending on your purpose, classify them by the pets they keep or how they keep physically fit. If your purpose was to analyze presidential administrations, you would group information focusing on the presidents' more public actions–say, cabinet appointments and judicial nominations. On the other hand, if you intended to write about the private lives of presidents, you might select information about personal relationships or hobbies.

Make sure the categories you create have a single basis of classification and that the group fits the categories you propose. You may not, for example, write about twentieth century presidents on the basis of the kinds of pets they kept if some of those presidents did not keep pets. The group does not fit the category. If you intend to talk about all the presidents, you must reinvent the categories so that all the presidents fit into it. In the example below, the group is "all U.S. presidents" and the two categories are "those who kept pets and those who did not":

Some U.S. presidents have indulged their love of pets, keeping menageries of animals around the White House, and others have preferred the White House pet-free.

Alternatively, in the following example, the group is "twentieth century U.S. presidential pet-keepers" and the three categories are "dog lovers, cat lovers, and exotic fish enthusiasts."

Among the twentieth century presidents who kept pets, presidential pet-keepers can be classified as dog-lovers, cat-lovers, or exotic fish enthusiasts (for who can really love a fish?).

Developing a thesis
Once you have decided on your group, purpose, and categories, develop a thesis statement that does the following three things:

  • names what group of people or things you intend to classify
  • describes the basis of the classification
  • labels the categories you have developed

Here is a thesis statement for a classification paper written for a Health and Human Fitness class that includes all three of the above elements, underlined:

Our last five U.S. presidents have practiced physical fitness regimens that varied from the very formal to the informal. They have been either regular private gym-goers, disciplined public joggers, or casual active sports enthusiasts.

Ordering categories
Order is the way you arrange ideas to show how they relate to one another. For example, it is common to arrange facts and discussion points from most- to least-important or from least- to most-important, or from oldest to most recent or longest to shortest. The example thesis statement above is ordered from most- to least-formal physical fitness activities. There is no one right way; use an ordering system that seems best to suit your purpose and the type of information you are working with.

For example, suppose you are writing about the last five U.S. presidents for a psychology class. If you wish to show that these presidents' public decisions spring directly from negative issues in their personal relationships, you might order your information from most private to more public actions to clearly establish this connection. Or, if you wish to give the reader the impression that he is moving into increasingly intimate knowledge of personal presidential foibles, you may choose the reverse, ordering your information from public to private.

Signal words
Signal phrases, or transitions, typically used for classification papers include the following:

  • this type of...
  • several kinds of...
  • in this category...
  • can be divided into...
  • classified according to...
  • is categorized by...

These phrases signal to the reader your intention to divide and sort things. They also contribute to the unity of the paper.

Classification requires that you invent (or discover) abstract categories, impose them on a concrete whole, and derive something new-a tall order that you can, nevertheless, manage if you resist the temptation to skip the brainstorming steps. Remember that clinical dissection is never an aim in itself; the point of classification is to reveal and communicate something meaningful.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *