From 0ba1396f709fd60f9696706fcc1b8b3fcb84ed19 Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Fri, 30 Jan 2026 15:15:01 -0500 Subject: [PATCH 01/14] initial commit --- docs/src/rosalind/07-iprb.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 docs/src/rosalind/07-iprb.md diff --git a/docs/src/rosalind/07-iprb.md b/docs/src/rosalind/07-iprb.md new file mode 100644 index 0000000..d98034e --- /dev/null +++ b/docs/src/rosalind/07-iprb.md @@ -0,0 +1,17 @@ +# Counting Point Mutations + +🤔 [Problem link](https://rosalind.info/problems/iprb/) + +!!! warning "The Problem" + + Probability is the mathematical study of randomly occurring phenomena. We will model such a phenomenon with a random variable, which is simply a variable that can take a number of different distinct outcomes depending on the result of an underlying random process. + + For example, say that we have a bag containing 3 red balls and 2 blue balls. If we let X represent the random variable corresponding to the color of a drawn ball, then the probability of each of the two outcomes is given by Pr(X=red)=35 and Pr(X=blue)=25. + + Random variables can be combined to yield new random variables. Returning to the ball example, let Y model the color of a second ball drawn from the bag (without replacing the first ball). The probability of Y being red depends on whether the first ball was red or blue. To represent all outcomes of X and Y, we therefore use a probability tree diagram. This branching diagram represents all possible individual probabilities for X and Y, with outcomes at the endpoints ("leaves") of the tree. The probability of any outcome is given by the product of probabilities along the path from the beginning of the tree. + + An event is simply a collection of outcomes. Because outcomes are distinct, the probability of an event can be written as the sum of the probabilities of its constituent outcomes. For our colored ball example, let A be the event "Y is blue." Pr(A) is equal to the sum of the probabilities of two different outcomes: Pr(X=blue and Y=blue)+Pr(X=red and Y=blue), or 310+110=25. + + Given: Three positive integers k, m, and n, representing a population containing k+m+n organisms: k individuals are homozygous dominant for a factor, m are heterozygous, and n are homozygous recessive. + + Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype). Assume that any two organisms can mate. \ No newline at end of file From 7e03286f2c51645565c7c4273421076caefcf4d3 Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Fri, 30 Jan 2026 16:23:28 -0500 Subject: [PATCH 02/14] rough draft of first solution --- docs/src/rosalind/07-iprb.md | 56 +++++++++++++++++++++++++++++++++++- 1 file changed, 55 insertions(+), 1 deletion(-) diff --git a/docs/src/rosalind/07-iprb.md b/docs/src/rosalind/07-iprb.md index d98034e..0597e01 100644 --- a/docs/src/rosalind/07-iprb.md +++ b/docs/src/rosalind/07-iprb.md @@ -14,4 +14,58 @@ Given: Three positive integers k, m, and n, representing a population containing k+m+n organisms: k individuals are homozygous dominant for a factor, m are heterozygous, and n are homozygous recessive. - Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype). Assume that any two organisms can mate. \ No newline at end of file + Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype). Assume that any two organisms can mate. + +There are two main ways we can solve this problem: deriving an algorithm or simulation. + +### Deriving an Algorithm + +Using the information above, we can derive an algorithm using the variables k, m, and n that will calculate the probability of a progeny possessing a dominant allele. We could either calculate the probability of a progeny having a dominant allele, but in this case, it is easier to calculate the likelihood of a progeny having a recessive allele. This is a relatively rarer event, and the calculation will be straightforward. We just have to subtract this probability from 1 to get the overall likelihood of having a progeny with a dominant trait. + +To demonstrate how to derive this algorithm, we can use H and h to signify dominant and recessive alleles. + +Out of all the possible combinations, we will only get a progeny with a recessive trait in three situations: Hh x Hh, Hh x hh, and hh x hh. For all of these situations, we must calculate the probability of these mating combinations occuring (based on k, m, and n), as well as the probability of these events leading to a progeny with a recessive trait. + +To calculate this, we must the probability of picking the first mating pair and then the second mating pair. + +For the combination Hh x Hh, this is $\frac{m}{(k+m+n)}$ multiplied by $\frac{(m-1)}{(k+m+n-1)}$. Selecting the second Hh individual is equal to the number of Hh individuals left after 1 was already picked (m-1) divided by the total individuals left in the population (k+m+n-1). + +A similar calculation is performed for the rest of the combinations. However, it is important to note that the probability of selecting Hh x hh as a mating pair is $\frac{2*m*n}{(k+m+n)(k+m+n-1)}$, as there are two ways to choose this combination. Hh x hh can be selected, as well as hh x Hh. Order matters! + +| Probability of combination occuring | Hh x Hh | Hh x hh | hh x hh | +| --- |---|---|---| +| | $\frac{m(m-1)}{(k+m+n)(k+m+n-1)}$ | $\frac{2*m*n}{(k+m+n)(k+m+n-1)}$| $\frac{n(n-1)}{(k+m+n)(k+m+n-1)}$| + + + +The probability of these combinations leading to a recessive trait can be calculated using Punnet Squares. + +| Probability of recessive trait | Hh x Hh | Hh x hh | hh x hh | +| --- |---|---|---| +| | 0.25 | 0.50 | 1 | + +Now, we just have to sum the multiply the probability of each combination occuring by the probability of this combination leading to a recessive trait. This leads to the following formula: + +Pr(recessive trait) = +$\frac{m(m-1)}{(k+m+n)(k+m+n-1)}$ x 0.25 + $\frac{m*n}{(k+m+n)(k+m+n-1)}$ + $\frac{n(n-1)}{(k+m+n)(k+m+n-1)}$ + +Therefore, the probability of selecting an individual with a *dominant* trait is 1 - Pr(recessive trait). + +Now that we've derived this formula, let's turn this into code! + +```julia +function mendel(k,m,n) + + total = (k+m+n)*(k+m+n-1) + return 1-( + (0.25*m*(m-1))/total + + m*n/total + + n*(n-1)/total) +end + +mendel(2,2,2) +``` + + + +### Simulation Method \ No newline at end of file From 9bbaaf5a1aa9ba907b9bca0cf0f8e5df3de2c5fb Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Sat, 31 Jan 2026 10:04:02 -0500 Subject: [PATCH 03/14] initial commit --- docs/src/rosalind/08-prot.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) create mode 100644 docs/src/rosalind/08-prot.md diff --git a/docs/src/rosalind/08-prot.md b/docs/src/rosalind/08-prot.md new file mode 100644 index 0000000..16af6f9 --- /dev/null +++ b/docs/src/rosalind/08-prot.md @@ -0,0 +1,23 @@ +# Translating RNA into Protein + +🤔 [Problem link](https://rosalind.info/problems/prot/) + +!!! warning "The Problem" + + The 20 commonly occurring amino acids are abbreviated by using 20 letters from the English alphabet (all letters except for B, J, O, U, X, and Z). Protein strings are constructed from these 20 symbols. Henceforth, the term genetic string will incorporate protein strings along with DNA strings and RNA strings. + + The RNA codon table dictates the details regarding the encoding of specific codons into the amino acid alphabet. + + Given: An RNA string s corresponding to a strand of mRNA (of length at most 10 kbp). + + Return: The protein string encoded by s. + + Sample Dataset + ``` + AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA + ``` + + Sample Output + ``` + MAMAPRTEINSTRING + ``` \ No newline at end of file From 485bb3acdb8c5b2b06363f54b6cff15d3ac34a45 Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Tue, 3 Feb 2026 14:11:20 -0500 Subject: [PATCH 04/14] adding thought process to solve problem --- docs/src/rosalind/08-prot.md | 92 +++++++++++++++++++++++++++++++++++- 1 file changed, 91 insertions(+), 1 deletion(-) diff --git a/docs/src/rosalind/08-prot.md b/docs/src/rosalind/08-prot.md index 16af6f9..5b442d4 100644 --- a/docs/src/rosalind/08-prot.md +++ b/docs/src/rosalind/08-prot.md @@ -20,4 +20,94 @@ Sample Output ``` MAMAPRTEINSTRING - ``` \ No newline at end of file + ``` + +### DIY solution +Let's first tackle this problem by writing our own solution. + +First, we will check that this is a coding region by verifying that the string starts with a start codon (`AUG`). If not, we can still convert the string to protein, but we'll throw an error. There may be a frame shift, in which case the returned translation will be incorrect. + +We'll also do a check that the string is divisible by three. If it is not, this will likely mean that there was a mutation in the string (addition or deletion). Again, we can still convert as much of the the string as possible. However, we should alert the user that this result may be incorrect! + +We need to convert this string of DNA to a string of proteins using the RNA codon table. We can convert the RNA codon table into a dictionary, which can map over our codons. + +Then, we'll break the string into codons by slicing at every three characters. These codons can be matched to the strings into the RNA codon table to get the corresponding amino acid. We'll append this amino acid to a string. + +We'll need to deal with any three-character strings that don't match a codon. This likely means that there was a mutation in the input DNA string! If we get a codon that doesn't match, we can return "X" for that amino acid, and continue translating the rest of the string. However, if we get a string X's, that will definitely signal to us that there was some kind of frame shift. + +Now that we have established an approach, let's turn this into code! + +```julia + +dna = "AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA" + +# note: this can be created by hand +# or it can be accessed using +codon_table = rna_codon_table = { + # Phenylalanine (F) + 'UUU': 'F', 'UUC': 'F', + # Leucine (L) + 'UUA': 'L', 'UUG': 'L', 'CUU': 'L', 'CUC': 'L', 'CUA': 'L', 'CUG': 'L', + # Isoleucine (I) + 'AUU': 'I', 'AUC': 'I', 'AUA': 'I', + # Methionine (M) - Start Codon + 'AUG': 'M', + # Valine (V) + 'GUU': 'V', 'GUC': 'V', 'GUA': 'V', 'GUG': 'V', + # Serine (S) + 'UCU': 'S', 'UCC': 'S', 'UCA': 'S', 'UCG': 'S', 'AGU': 'S', 'AGC': 'S', + # Proline (P) + 'CCU': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P', + # Threonine (T) + 'ACU': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T', + # Alanine (A) + 'GCU': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A', + # Tyrosine (Y) + 'UAU': 'Y', 'UAC': 'Y', + # Stop Codons (*) + 'UAA': '*', 'UAG': '*', 'UGA': '*', + # Histidine (H) + 'CAU': 'H', 'CAC': 'H', + # Glutamine (Q) + 'CAA': 'Q', 'CAG': 'Q', + # Asparagine (N) + 'AAU': 'N', 'AAC': 'N', + # Lysine (K) + 'AAA': 'K', 'AAG': 'K', + # Aspartic Acid (D) + 'GAU': 'D', 'GAC': 'D', + # Glutamic Acid (E) + 'GAA': 'E', 'GAG': 'E', + # Cysteine (C) + 'UGU': 'C', 'UGC': 'C', + # Tryptophan (W) + 'UGG': 'W', + # Arginine (R) + 'CGU': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R', 'AGA': 'R', 'AGG': 'R', + # Glycine (G) + 'GGU': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G' +} + + +# check if starts with start codon + +# check if string is divisible by three + +# separate string into codons, map over with codon table + +# dealing with codons not in codon_table + +# return amino acid string + +``` + + +### Biojulia Solution + +An alternative way to approach this problem would be to leverage an already written, established function from BioJulia. + +```julia + + + +``` \ No newline at end of file From f0b421b0b2ef23a003c132a229ef735fd9a41519 Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Sat, 7 Feb 2026 20:35:26 -0500 Subject: [PATCH 05/14] remove other tutorials --- docs/src/rosalind/06-hamm.md | 137 ----------------------------------- docs/src/rosalind/07-iprb.md | 71 ------------------ 2 files changed, 208 deletions(-) delete mode 100644 docs/src/rosalind/06-hamm.md delete mode 100644 docs/src/rosalind/07-iprb.md diff --git a/docs/src/rosalind/06-hamm.md b/docs/src/rosalind/06-hamm.md deleted file mode 100644 index cd9a0c5..0000000 --- a/docs/src/rosalind/06-hamm.md +++ /dev/null @@ -1,137 +0,0 @@ -# Counting Point Mutations - -🤔 [Problem link](https://rosalind.info/problems/hamm/) - -!!! warning "The Problem" - - Given two strings s and t of equal length, the Hamming distance between s and t, denoted dH(s,t), is the number of corresponding symbols that differ in s and t. - - - Given: Two DNA strings s and t of equal length (not exceeding 1 kbp). - - Return: The Hamming distance dH(s,t). - - ***Sample Dataset*** - - ``` - GAGCCTACTAACGGGAT - CATCGTAATGACGGCCT - ``` - - ***Sample Output*** - - ``` - 7 - ``` - - -To calculate the Hamming Distance between two strings/sequences, the two strings/DNA sequences must be the same length. - -The simplest way to solve this problem is to compare the corresponding values in each string for each index and then sum the mismatches. This is the fastest and most idiomatic Julia solution, as it leverages vector math. - -Let's give this a try! - -```julia -ex_seq_a = "GAGCCTACTAACGGGAT" -ex_seq_b = "CATCGTAATGACGGCCT" - -count(i-> ex_seq_a[i] != ex_seq_b[i], eachindex(ex_seq_a)) -``` - - - -### For Loop - -Another way we can approach this would be to use the for-loop. This method will be a bit slower. - -We can calculate the Hamming Distance by looping over the characters in one of the strings and checking if the corresponding character at the same index in the other string matches. - - Each mismatch will cause 1 to be added to a `counter` variable. At the end of the loop, we can return the total value of the `counter` variable. - - - -```julia -ex_seq_a = "GAGCCTACTAACGGGAT" -ex_seq_b = "CATCGTAATGACGGCCT" - -function hamming(seq_a, seq_b) - - - # check if the strings are empty - if isempty(seq_a) - throw(ErrorException("empty sequences")) - end - - # check if the strings are different lengths - if length(seq_a) != length(seq_b) - throw(ErrorException(" sequences have different lengths")) - end - - mismatches = 0 - for i in 1:length(seq_a) - if seq_a[i] != seq_b[i] - mismatches += 1 - end - end - return mismatches -end - -hamming(ex_seq_a, ex_seq_b) - -``` - - - -## BioAlignments method - -Instead of writing your own function, an alternative would be to use the readily-available Hamming Distance [function](https://github.com/BioJulia/BioAlignments.jl/blob/0f3cc5e1ac8b34fdde23cb3dca7afb9eb480322f/src/pairwise/algorithms/hamming_distance.jl#L4) in the `BioAlignments.jl` package. - -```julia -using BioAlignments - -ex_seq_a = "GAGCCTACTAACGGGAT" -ex_seq_b = "CATCGTAATGACGGCCT" - -bio_hamming = BioAlignments.hamming_distance(Int64, ex_seq_a, ex_seq_b) - -bio_hamming[1] - -``` - -```julia -# Double check that we got the same values from both ouputs -@assert calcHamming(ex_seq_a, ex_seq_b) == bio_hamming[1] -``` - - - The BioAlignments `hamming_distance` function requires three input variables -- the first of which allows the user to control the `type` of the returned hamming distance value. - - In the above example, `Int64` is provided as the first input variable, but `Float64` or `Int8` are also acceptable inputs. The second two input variables are the two sequences that are being compared. - - There are two outputs of this function: the actual Hamming Distance value and the Alignment Anchor. The Alignment Anchor is a one-dimensional array (vector) that is the same length as the length of the input strings. - - Each value in the vector is also an AlignmentAnchor with three fields: sequence position, reference position, and an operation code ('0' for start, '=' for match, 'X' for mismatch). - - The Alignment Anchor for the above example is: - ``` - AlignmentAnchor[AlignmentAnchor(0, 0, '0'), AlignmentAnchor(1, 1, 'X'), AlignmentAnchor(2, 2, '='), AlignmentAnchor(3, 3, 'X'), AlignmentAnchor(4, 4, '='), AlignmentAnchor(5, 5, 'X'), AlignmentAnchor(7, 7, '='), AlignmentAnchor(8, 8, 'X'), AlignmentAnchor(9, 9, '='), AlignmentAnchor(10, 10, 'X'), AlignmentAnchor(14, 14, '='), AlignmentAnchor(16, 16, 'X'), AlignmentAnchor(17, 17, '=')] - ``` - - ### Distances.Jl method - - Another package that calculates the Hamming distance is the [Distances package](https://github.com/JuliaStats/Distances.jl). We can call its `hamming` function on our two test sequences: - - - -```julia -using Distances - -ex_seq_a = "GAGCCTACTAACGGGAT" -ex_seq_b = "CATCGTAATGACGGCCT" - -Distances.hamming(ex_seq_a, ex_seq_b) -``` - - - - diff --git a/docs/src/rosalind/07-iprb.md b/docs/src/rosalind/07-iprb.md deleted file mode 100644 index 0597e01..0000000 --- a/docs/src/rosalind/07-iprb.md +++ /dev/null @@ -1,71 +0,0 @@ -# Counting Point Mutations - -🤔 [Problem link](https://rosalind.info/problems/iprb/) - -!!! warning "The Problem" - - Probability is the mathematical study of randomly occurring phenomena. We will model such a phenomenon with a random variable, which is simply a variable that can take a number of different distinct outcomes depending on the result of an underlying random process. - - For example, say that we have a bag containing 3 red balls and 2 blue balls. If we let X represent the random variable corresponding to the color of a drawn ball, then the probability of each of the two outcomes is given by Pr(X=red)=35 and Pr(X=blue)=25. - - Random variables can be combined to yield new random variables. Returning to the ball example, let Y model the color of a second ball drawn from the bag (without replacing the first ball). The probability of Y being red depends on whether the first ball was red or blue. To represent all outcomes of X and Y, we therefore use a probability tree diagram. This branching diagram represents all possible individual probabilities for X and Y, with outcomes at the endpoints ("leaves") of the tree. The probability of any outcome is given by the product of probabilities along the path from the beginning of the tree. - - An event is simply a collection of outcomes. Because outcomes are distinct, the probability of an event can be written as the sum of the probabilities of its constituent outcomes. For our colored ball example, let A be the event "Y is blue." Pr(A) is equal to the sum of the probabilities of two different outcomes: Pr(X=blue and Y=blue)+Pr(X=red and Y=blue), or 310+110=25. - - Given: Three positive integers k, m, and n, representing a population containing k+m+n organisms: k individuals are homozygous dominant for a factor, m are heterozygous, and n are homozygous recessive. - - Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype). Assume that any two organisms can mate. - -There are two main ways we can solve this problem: deriving an algorithm or simulation. - -### Deriving an Algorithm - -Using the information above, we can derive an algorithm using the variables k, m, and n that will calculate the probability of a progeny possessing a dominant allele. We could either calculate the probability of a progeny having a dominant allele, but in this case, it is easier to calculate the likelihood of a progeny having a recessive allele. This is a relatively rarer event, and the calculation will be straightforward. We just have to subtract this probability from 1 to get the overall likelihood of having a progeny with a dominant trait. - -To demonstrate how to derive this algorithm, we can use H and h to signify dominant and recessive alleles. - -Out of all the possible combinations, we will only get a progeny with a recessive trait in three situations: Hh x Hh, Hh x hh, and hh x hh. For all of these situations, we must calculate the probability of these mating combinations occuring (based on k, m, and n), as well as the probability of these events leading to a progeny with a recessive trait. - -To calculate this, we must the probability of picking the first mating pair and then the second mating pair. - -For the combination Hh x Hh, this is $\frac{m}{(k+m+n)}$ multiplied by $\frac{(m-1)}{(k+m+n-1)}$. Selecting the second Hh individual is equal to the number of Hh individuals left after 1 was already picked (m-1) divided by the total individuals left in the population (k+m+n-1). - -A similar calculation is performed for the rest of the combinations. However, it is important to note that the probability of selecting Hh x hh as a mating pair is $\frac{2*m*n}{(k+m+n)(k+m+n-1)}$, as there are two ways to choose this combination. Hh x hh can be selected, as well as hh x Hh. Order matters! - -| Probability of combination occuring | Hh x Hh | Hh x hh | hh x hh | -| --- |---|---|---| -| | $\frac{m(m-1)}{(k+m+n)(k+m+n-1)}$ | $\frac{2*m*n}{(k+m+n)(k+m+n-1)}$| $\frac{n(n-1)}{(k+m+n)(k+m+n-1)}$| - - - -The probability of these combinations leading to a recessive trait can be calculated using Punnet Squares. - -| Probability of recessive trait | Hh x Hh | Hh x hh | hh x hh | -| --- |---|---|---| -| | 0.25 | 0.50 | 1 | - -Now, we just have to sum the multiply the probability of each combination occuring by the probability of this combination leading to a recessive trait. This leads to the following formula: - -Pr(recessive trait) = -$\frac{m(m-1)}{(k+m+n)(k+m+n-1)}$ x 0.25 + $\frac{m*n}{(k+m+n)(k+m+n-1)}$ + $\frac{n(n-1)}{(k+m+n)(k+m+n-1)}$ - -Therefore, the probability of selecting an individual with a *dominant* trait is 1 - Pr(recessive trait). - -Now that we've derived this formula, let's turn this into code! - -```julia -function mendel(k,m,n) - - total = (k+m+n)*(k+m+n-1) - return 1-( - (0.25*m*(m-1))/total + - m*n/total + - n*(n-1)/total) -end - -mendel(2,2,2) -``` - - - -### Simulation Method \ No newline at end of file From 3b086dbb5f57e5e3e62ec5ca2da61300b77d0622 Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Sat, 7 Feb 2026 21:49:05 -0500 Subject: [PATCH 06/14] add BioSequences translate solution --- docs/src/rosalind/08-prot.md | 144 ++++++++++++++++++++--------------- 1 file changed, 83 insertions(+), 61 deletions(-) diff --git a/docs/src/rosalind/08-prot.md b/docs/src/rosalind/08-prot.md index 5b442d4..fe523b4 100644 --- a/docs/src/rosalind/08-prot.md +++ b/docs/src/rosalind/08-prot.md @@ -4,11 +4,15 @@ !!! warning "The Problem" - The 20 commonly occurring amino acids are abbreviated by using 20 letters from the English alphabet (all letters except for B, J, O, U, X, and Z). Protein strings are constructed from these 20 symbols. Henceforth, the term genetic string will incorporate protein strings along with DNA strings and RNA strings. + The 20 commonly occurring amino acids are abbreviated by using 20 letters from the English alphabet. + (all letters except for B, J, O, U, X, and Z). + Protein strings are constructed from these 20 symbols. + Henceforth, the term genetic string will incorporate protein strings along with DNA strings and RNA strings. The RNA codon table dictates the details regarding the encoding of specific codons into the amino acid alphabet. - Given: An RNA string s corresponding to a strand of mRNA (of length at most 10 kbp). + Given: An RNA string s corresponding to a strand of mRNA. + (of length at most 10 kbp). Return: The protein string encoded by s. @@ -25,68 +29,63 @@ ### DIY solution Let's first tackle this problem by writing our own solution. -First, we will check that this is a coding region by verifying that the string starts with a start codon (`AUG`). If not, we can still convert the string to protein, but we'll throw an error. There may be a frame shift, in which case the returned translation will be incorrect. - -We'll also do a check that the string is divisible by three. If it is not, this will likely mean that there was a mutation in the string (addition or deletion). Again, we can still convert as much of the the string as possible. However, we should alert the user that this result may be incorrect! - -We need to convert this string of DNA to a string of proteins using the RNA codon table. We can convert the RNA codon table into a dictionary, which can map over our codons. - -Then, we'll break the string into codons by slicing at every three characters. These codons can be matched to the strings into the RNA codon table to get the corresponding amino acid. We'll append this amino acid to a string. - -We'll need to deal with any three-character strings that don't match a codon. This likely means that there was a mutation in the input DNA string! If we get a codon that doesn't match, we can return "X" for that amino acid, and continue translating the rest of the string. However, if we get a string X's, that will definitely signal to us that there was some kind of frame shift. - -Now that we have established an approach, let's turn this into code! +First, we will check that this is a coding region by verifying that the string starts with a start codon (`AUG`). +If not, we can still convert the string to protein, +but we'll throw an error. +There may be a frame shift, +in which case the returned translation will be incorrect. + +We'll also do a check that the string is divisible by three. +If it is not, this will likely mean that there was a mutation in the string +(addition or deletion). +Again, we can still convert as much of the the string as possible. +However, we should alert the user that this result may be incorrect! + +We need to convert this string of DNA to a string of proteins using the RNA codon table. +We can convert the RNA codon table into a dictionary, +which can map over our codons. +Alternatively, we could also import this from the BioSequences package, +as this is already defined [there](https://github.com/BioJulia/BioSequences.jl/blob/b626dbcaad76217b248449e6aa2cc1650e95660c/src/geneticcode.jl#L132). + +Then, we'll break the string into codons by slicing at every three characters. +These codons can be matched to the strings into the RNA codon table to get the corresponding amino acid. +We'll append this amino acid to a string. + +We'll need to deal with any three-character strings that don't match a codon. +This likely means that there was a mutation in the input DNA string! +If we get a codon that doesn't match, +we can return "X" for that amino acid, +and continue translating the rest of the string. +However, if we get a string X's, +that will definitely signal to us that there was some kind of frame shift. + +Now that we have established an approach, +let's turn this into code! ```julia -dna = "AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA" +rna = "AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA" # note: this can be created by hand # or it can be accessed using -codon_table = rna_codon_table = { - # Phenylalanine (F) - 'UUU': 'F', 'UUC': 'F', - # Leucine (L) - 'UUA': 'L', 'UUG': 'L', 'CUU': 'L', 'CUC': 'L', 'CUA': 'L', 'CUG': 'L', - # Isoleucine (I) - 'AUU': 'I', 'AUC': 'I', 'AUA': 'I', - # Methionine (M) - Start Codon - 'AUG': 'M', - # Valine (V) - 'GUU': 'V', 'GUC': 'V', 'GUA': 'V', 'GUG': 'V', - # Serine (S) - 'UCU': 'S', 'UCC': 'S', 'UCA': 'S', 'UCG': 'S', 'AGU': 'S', 'AGC': 'S', - # Proline (P) - 'CCU': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P', - # Threonine (T) - 'ACU': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T', - # Alanine (A) - 'GCU': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A', - # Tyrosine (Y) - 'UAU': 'Y', 'UAC': 'Y', - # Stop Codons (*) - 'UAA': '*', 'UAG': '*', 'UGA': '*', - # Histidine (H) - 'CAU': 'H', 'CAC': 'H', - # Glutamine (Q) - 'CAA': 'Q', 'CAG': 'Q', - # Asparagine (N) - 'AAU': 'N', 'AAC': 'N', - # Lysine (K) - 'AAA': 'K', 'AAG': 'K', - # Aspartic Acid (D) - 'GAU': 'D', 'GAC': 'D', - # Glutamic Acid (E) - 'GAA': 'E', 'GAG': 'E', - # Cysteine (C) - 'UGU': 'C', 'UGC': 'C', - # Tryptophan (W) - 'UGG': 'W', - # Arginine (R) - 'CGU': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R', 'AGA': 'R', 'AGG': 'R', - # Glycine (G) - 'GGU': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G' -} +codon_table = Dict{String,Char}( + "AAA" => 'K', "AAC" => 'N', "AAG" => 'K', "AAU" => 'N', + "ACA" => 'T', "ACC" => 'T', "ACG" => 'T', "ACU" => 'T', + "AGA" => 'R', "AGC" => 'S', "AGG" => 'R', "AGU" => 'S', + "AUA" => 'I', "AUC" => 'I', "AUG" => 'M', "AUU" => 'I', + "CAA" => 'Q', "CAC" => 'H', "CAG" => 'Q', "CAU" => 'H', + "CCA" => 'P', "CCC" => 'P', "CCG" => 'P', "CCU" => 'P', + "CGA" => 'R', "CGC" => 'R', "CGG" => 'R', "CGU" => 'R', + "CUA" => 'L', "CUC" => 'L', "CUG" => 'L', "CUU" => 'L', + "GAA" => 'E', "GAC" => 'D', "GAG" => 'E', "GAU" => 'D', + "GCA" => 'A', "GCC" => 'A', "GCG" => 'A', "GCU" => 'A', + "GGA" => 'G', "GGC" => 'G', "GGG" => 'G', "GGU" => 'G', + "GUA" => 'V', "GUC" => 'V', "GUG" => 'V', "GUU" => 'V', + "UAA" => '*', "UAC" => 'Y', "UAG" => '*', "UAU" => 'Y', + "UCA" => 'S', "UCC" => 'S', "UCG" => 'S', "UCU" => 'S', + "UGA" => '*', "UGC" => 'C', "UGG" => 'W', "UGU" => 'C', + "UUA" => 'L', "UUC" => 'F', "UUG" => 'L', "UUU" => 'F', + ) # check if starts with start codon @@ -102,12 +101,35 @@ codon_table = rna_codon_table = { ``` -### Biojulia Solution +### BioSequences Solution -An alternative way to approach this problem would be to leverage an already written, established function from BioJulia. +An alternative way to approach this problem would be to leverage an already written, +established function from the BioSequences package in BioJulia. ```julia +using BioSequences + +rna =("AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA") + +translate(rna"AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA") + +``` + +This function is straightforward to use. +However, there are also additional parameters for us to use. + +For instance, the function defaults to using the standard genetic code. +However, if a user wishes to use another codon chart +(for example, yeast or invertebrate), +there are others available on [BioSequences.jl](https://github.com/BioJulia/BioSequences.jl/blob/b626dbcaad76217b248449e6aa2cc1650e95660c/src/geneticcode.jl#L130) to choose from. +By default `allow_ambiguous_codons` is `true`. +However, if a user is giving the function a mRNA string with ambiguous codons that may not be found in the standard genetic code, +these codons will be translated to the most narrow amino acid which covers all +non-ambiguous codons encompassed by the ambiguous codon. +By default, ambiguous codons will cause an error. +Additionally, `alternative_start` is `false` by default. +If set to true, the starting codon will be Methionine regardless of the starting codon. -``` \ No newline at end of file +Similar to our function, the BioSequences function also throws an error if the input mRNA string is not divisible by 3. \ No newline at end of file From a80c19823df316355e0ea945b7318e798fc42142 Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Sun, 8 Feb 2026 21:43:36 -0500 Subject: [PATCH 07/14] add hand-written solution --- docs/src/rosalind/08-prot.md | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/docs/src/rosalind/08-prot.md b/docs/src/rosalind/08-prot.md index fe523b4..ff14359 100644 --- a/docs/src/rosalind/08-prot.md +++ b/docs/src/rosalind/08-prot.md @@ -87,16 +87,21 @@ codon_table = Dict{String,Char}( "UUA" => 'L', "UUC" => 'F', "UUG" => 'L', "UUU" => 'F', ) - -# check if starts with start codon - -# check if string is divisible by three - -# separate string into codons, map over with codon table - -# dealing with codons not in codon_table - -# return amino acid string +function translate_mrna(seq) + # check if starts with start codon + if startswith(seq, "AUG") + warn("this sequence does not start with AUG") + end + # check if string is divisible by three + if seq%3!=0 + warn("this sequence is not divisible by 3") + end + # separate string into codons, map over with codon table + codons = (join(chunk) for chunk in Iterators.partition(seq, 3)) + + protein = join(codon_table[c] for c in codons if haskey(codon_table, c)) + + # return amino acid string ``` From 3ebe9c3b81190505ab98605e59f80cbc5de3a5cf Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Mon, 9 Feb 2026 12:18:34 -0500 Subject: [PATCH 08/14] return X if aa not in codon_table --- docs/src/rosalind/08-prot.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/src/rosalind/08-prot.md b/docs/src/rosalind/08-prot.md index ff14359..6499b56 100644 --- a/docs/src/rosalind/08-prot.md +++ b/docs/src/rosalind/08-prot.md @@ -96,13 +96,14 @@ function translate_mrna(seq) if seq%3!=0 warn("this sequence is not divisible by 3") end - # separate string into codons, map over with codon table + # separate string into codons codons = (join(chunk) for chunk in Iterators.partition(seq, 3)) - - protein = join(codon_table[c] for c in codons if haskey(codon_table, c)) - # return amino acid string + # map over codons with codon table + aa_string = join(get(codon_table, c, "X") for c in codons) + # return amino acid string + return(aa_string) ``` From 2d54bc9b0ac8c364137c964ce497188a9f8ce784 Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Mon, 9 Feb 2026 12:32:41 -0500 Subject: [PATCH 09/14] fix bugs in warnings --- Project.toml | 9 --------- docs/src/rosalind/08-prot.md | 18 ++++++++++++------ 2 files changed, 12 insertions(+), 15 deletions(-) delete mode 100644 Project.toml diff --git a/Project.toml b/Project.toml deleted file mode 100644 index 0f2eea5..0000000 --- a/Project.toml +++ /dev/null @@ -1,9 +0,0 @@ -name = "BioTutorials" -uuid = "33e7be4a-8e14-4baf-892c-424bb664d307" -authors = ["Kevin Bonham (@kescobo)", "Kenta Sato (@bicycle1885)"] -title = "BioTutorials" -version = "0.1.0" - -[deps] -BioSequences = "7e6ae17a-c86d-528c-b3b9-7f778a29fe59" -Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306" diff --git a/docs/src/rosalind/08-prot.md b/docs/src/rosalind/08-prot.md index 6499b56..bd4bf16 100644 --- a/docs/src/rosalind/08-prot.md +++ b/docs/src/rosalind/08-prot.md @@ -63,6 +63,7 @@ Now that we have established an approach, let's turn this into code! ```julia +using Test rna = "AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA" @@ -87,23 +88,28 @@ codon_table = Dict{String,Char}( "UUA" => 'L', "UUC" => 'F', "UUG" => 'L', "UUU" => 'F', ) -function translate_mrna(seq) +function translate_mrna(seq, codon_table) + # check if starts with start codon - if startswith(seq, "AUG") - warn("this sequence does not start with AUG") + if ! startswith(seq, "AUG") + @warn "this sequence does not start with AUG" end # check if string is divisible by three - if seq%3!=0 - warn("this sequence is not divisible by 3") + if rem(length(seq), 3) != 0 + @warn "this sequence is not divisible by 3" end # separate string into codons codons = (join(chunk) for chunk in Iterators.partition(seq, 3)) - # map over codons with codon table + # map over codons with codon table, return X if not in codon_table aa_string = join(get(codon_table, c, "X") for c in codons) # return amino acid string return(aa_string) + + end + +translate_mrna(rna, codon_table) ``` From 96f1be8e67665cc967980c0d980d81f66947dfbc Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Mon, 9 Feb 2026 13:44:43 -0500 Subject: [PATCH 10/14] fix typos --- docs/src/rosalind/08-prot.md | 45 ++++++++++++++++++++---------------- 1 file changed, 25 insertions(+), 20 deletions(-) diff --git a/docs/src/rosalind/08-prot.md b/docs/src/rosalind/08-prot.md index bd4bf16..2ef736a 100644 --- a/docs/src/rosalind/08-prot.md +++ b/docs/src/rosalind/08-prot.md @@ -27,37 +27,39 @@ ``` ### DIY solution -Let's first tackle this problem by writing our own solution. +Let's tackle this problem by writing our own solution, +and then seeing how we can solve it with functions already available in BioJulia. First, we will check that this is a coding region by verifying that the string starts with a start codon (`AUG`). If not, we can still convert the string to protein, -but we'll throw an error. +but we'll throw a warning to alert the user. There may be a frame shift, in which case the returned translation will be incorrect. We'll also do a check that the string is divisible by three. If it is not, this will likely mean that there was a mutation in the string (addition or deletion). -Again, we can still convert as much of the the string as possible. -However, we should alert the user that this result may be incorrect! +Again, we can still convert as much of the string as possible. +However, we should alert the user that the result may be incorrect! -We need to convert this string of DNA to a string of proteins using the RNA codon table. +Next, we'll need to convert this string of mRNA to a string of proteins using the RNA codon table. We can convert the RNA codon table into a dictionary, which can map over our codons. Alternatively, we could also import this from the BioSequences package, as this is already defined [there](https://github.com/BioJulia/BioSequences.jl/blob/b626dbcaad76217b248449e6aa2cc1650e95660c/src/geneticcode.jl#L132). -Then, we'll break the string into codons by slicing at every three characters. -These codons can be matched to the strings into the RNA codon table to get the corresponding amino acid. -We'll append this amino acid to a string. +Then, we'll break the string into codons by slicing it every three characters. +These codons can be matched against the RNA codon table to get the corresponding amino acid. +We'll join all these amino acids together to form the final string. -We'll need to deal with any three-character strings that don't match a codon. -This likely means that there was a mutation in the input DNA string! +Lastly, we'll need to deal with any three-character strings that don't match a codon. +This likely means that there was a mutation in the input mRNA string! If we get a codon that doesn't match, we can return "X" for that amino acid, and continue translating the rest of the string. -However, if we get a string X's, -that will definitely signal to us that there was some kind of frame shift. +If we get a string of X's, +that should signal to the user that there was some kind of frame shift. + Now that we have established an approach, let's turn this into code! @@ -68,7 +70,7 @@ using Test rna = "AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA" # note: this can be created by hand -# or it can be accessed using +# or it can be accessed from the BioSequences package (see link above) codon_table = Dict{String,Char}( "AAA" => 'K', "AAC" => 'N', "AAG" => 'K', "AAU" => 'N', "ACA" => 'T', "ACC" => 'T', "ACG" => 'T', "ACU" => 'T', @@ -127,21 +129,24 @@ translate(rna"AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA") ``` -This function is straightforward to use. -However, there are also additional parameters for us to use. +This function is straightforward to use, +especially in the case where the input mRNA has no ambiguous codons +and is divisible by 3. +However, there are also additional parameters available for handling other types of strings. For instance, the function defaults to using the standard genetic code. However, if a user wishes to use another codon chart (for example, yeast or invertebrate), there are others available on [BioSequences.jl](https://github.com/BioJulia/BioSequences.jl/blob/b626dbcaad76217b248449e6aa2cc1650e95660c/src/geneticcode.jl#L130) to choose from. -By default `allow_ambiguous_codons` is `true`. -However, if a user is giving the function a mRNA string with ambiguous codons that may not be found in the standard genetic code, -these codons will be translated to the most narrow amino acid which covers all +By default, `allow_ambiguous_codons` is `true`. +If a user gives the function a mRNA string with ambiguous codons that may not be found in the standard genetic code, +these codons will be translated to the narrowest amino acid which covers all non-ambiguous codons encompassed by the ambiguous codon. -By default, ambiguous codons will cause an error. +If this option is turned off, +ambiguous codons will cause an error. Additionally, `alternative_start` is `false` by default. -If set to true, the starting codon will be Methionine regardless of the starting codon. +If set to true, the starting amino acid will be Methionine regardless of what the first codon is. Similar to our function, the BioSequences function also throws an error if the input mRNA string is not divisible by 3. \ No newline at end of file From 79236d3cf743d2dc2a6f11fe3e136cea32c05c8b Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Mon, 9 Feb 2026 13:57:00 -0500 Subject: [PATCH 11/14] fix indentation in translate_mrna func --- docs/src/rosalind/08-prot.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/rosalind/08-prot.md b/docs/src/rosalind/08-prot.md index 2ef736a..1cf72c4 100644 --- a/docs/src/rosalind/08-prot.md +++ b/docs/src/rosalind/08-prot.md @@ -109,7 +109,7 @@ function translate_mrna(seq, codon_table) # return amino acid string return(aa_string) - end +end translate_mrna(rna, codon_table) ``` From 48435ad307f3b97cb69103c547ce8247778507ca Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Wed, 11 Feb 2026 19:39:10 -0500 Subject: [PATCH 12/14] make small updates according to Kevin's comments --- docs/src/rosalind/08-prot.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/src/rosalind/08-prot.md b/docs/src/rosalind/08-prot.md index 1cf72c4..652889e 100644 --- a/docs/src/rosalind/08-prot.md +++ b/docs/src/rosalind/08-prot.md @@ -71,7 +71,7 @@ rna = "AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA" # note: this can be created by hand # or it can be accessed from the BioSequences package (see link above) -codon_table = Dict{String,Char}( +codon_table = Dict( "AAA" => 'K', "AAC" => 'N', "AAG" => 'K', "AAU" => 'N', "ACA" => 'T', "ACC" => 'T', "ACG" => 'T', "ACU" => 'T', "AGA" => 'R', "AGC" => 'S', "AGG" => 'R', "AGU" => 'S', @@ -101,13 +101,14 @@ function translate_mrna(seq, codon_table) @warn "this sequence is not divisible by 3" end # separate string into codons + # this makes a generator, which allocates less memory than a vector codons = (join(chunk) for chunk in Iterators.partition(seq, 3)) # map over codons with codon table, return X if not in codon_table aa_string = join(get(codon_table, c, "X") for c in codons) # return amino acid string - return(aa_string) + return aa_string end @@ -123,7 +124,7 @@ established function from the BioSequences package in BioJulia. ```julia using BioSequences -rna =("AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA") +rna = "AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA" translate(rna"AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA") From 0ff3ac6c673158a6841e1a5fb65839c078adfe41 Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Wed, 18 Feb 2026 10:36:14 -0500 Subject: [PATCH 13/14] add more examples of ambiguous codons and mrna strands not divisible by 3 --- docs/src/rosalind/08-prot.md | 65 ++++++++++++++++++++++++++++++++---- 1 file changed, 59 insertions(+), 6 deletions(-) diff --git a/docs/src/rosalind/08-prot.md b/docs/src/rosalind/08-prot.md index 652889e..98c05d8 100644 --- a/docs/src/rosalind/08-prot.md +++ b/docs/src/rosalind/08-prot.md @@ -115,17 +115,33 @@ end translate_mrna(rna, codon_table) ``` +Let's test that our function correctly deals with non-conventional mRNA strings. + +If we change the input string to include a codon that is not present in the codon table, +we should get a warning. +The codon should also be translated to an amino acid "X." +```julia +translate_mrna("AUGNCG", codon_table) +``` + +Next, let's confirm that an input mRNA strand with a length that is not divisible by 3 produces the correct warning. + +```julia +translate_mrna("AUGGC", codon_table) +``` + + + + ### BioSequences Solution -An alternative way to approach this problem would be to leverage an already written, -established function from the BioSequences package in BioJulia. +An alternative way to approach this problem would be to leverage +an established function from the BioSequences package in BioJulia. ```julia using BioSequences -rna = "AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA" - translate(rna"AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA") ``` @@ -133,13 +149,22 @@ translate(rna"AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA") This function is straightforward to use, especially in the case where the input mRNA has no ambiguous codons and is divisible by 3. -However, there are also additional parameters available for handling other types of strings. +However, there are also additional parameters available for handling other types of strings. For instance, the function defaults to using the standard genetic code. However, if a user wishes to use another codon chart (for example, yeast or invertebrate), there are others available on [BioSequences.jl](https://github.com/BioJulia/BioSequences.jl/blob/b626dbcaad76217b248449e6aa2cc1650e95660c/src/geneticcode.jl#L130) to choose from. + +For example, we can translate the same input mRNA string. +using the vertebrate mitochondrial genetic code! + +```julia +translate(rna"AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA", code=BioSequences.vertebrate_mitochondrial_genetic_code) + +``` + By default, `allow_ambiguous_codons` is `true`. If a user gives the function a mRNA string with ambiguous codons that may not be found in the standard genetic code, these codons will be translated to the narrowest amino acid which covers all @@ -147,7 +172,35 @@ non-ambiguous codons encompassed by the ambiguous codon. If this option is turned off, ambiguous codons will cause an error. +For example, the input mRNA string below includes the nucleotides `NCG`, +which is an ambiguous codon. +This could potentially code for `ACG` (Threonine), +`CCG` (Proline), `UCG` (Serine), `GCG` (Alanine), +each of which would code for four different amino acids. + +`allow_ambiguous_codons` is `true` by default, +so this mRNA strand is translated to `MX`. + +```julia +translate(rna"AUGNCG") +``` + +However, if `allow_ambiguous_codons` is changed to `false`, +an error is thrown, as no ambiguous codons are allowed in the result. + +```julia +translate(rna"AUGNCG", allow_ambiguous_codons=false) +``` + Additionally, `alternative_start` is `false` by default. If set to true, the starting amino acid will be Methionine regardless of what the first codon is. -Similar to our function, the BioSequences function also throws an error if the input mRNA string is not divisible by 3. \ No newline at end of file +```julia +translate(rna"AUCGAC", alternative_start = true) +``` + +Similar to our function, the BioSequences function also throws an error if the input mRNA string is not divisible by 3. + +```julia +translate(rna"AUGGA") +``` \ No newline at end of file From aaef36b7b050942c838b44a8936015e652c47aab Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Wed, 18 Feb 2026 10:38:13 -0500 Subject: [PATCH 14/14] fix typos --- docs/src/rosalind/08-prot.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/src/rosalind/08-prot.md b/docs/src/rosalind/08-prot.md index 98c05d8..aedd84a 100644 --- a/docs/src/rosalind/08-prot.md +++ b/docs/src/rosalind/08-prot.md @@ -157,7 +157,7 @@ However, if a user wishes to use another codon chart there are others available on [BioSequences.jl](https://github.com/BioJulia/BioSequences.jl/blob/b626dbcaad76217b248449e6aa2cc1650e95660c/src/geneticcode.jl#L130) to choose from. -For example, we can translate the same input mRNA string. +For example, we can translate the same input mRNA string using the vertebrate mitochondrial genetic code! ```julia @@ -167,7 +167,8 @@ translate(rna"AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA", code=BioSequ By default, `allow_ambiguous_codons` is `true`. If a user gives the function a mRNA string with ambiguous codons that may not be found in the standard genetic code, -these codons will be translated to the narrowest amino acid which covers all +these codons will be translated to the narrowest amino acid +which covers all non-ambiguous codons encompassed by the ambiguous codon. If this option is turned off, ambiguous codons will cause an error.