From d62948cd6a549f6c59b955939b61e8366c3d48d0 Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Tue, 27 Jan 2026 11:19:25 -0500 Subject: [PATCH 01/12] outline of hamming distance tutorial --- docs/src/rosalind/06-hamming.md | 92 +++++++++++++++++++++++++++++++++ 1 file changed, 92 insertions(+) create mode 100644 docs/src/rosalind/06-hamming.md diff --git a/docs/src/rosalind/06-hamming.md b/docs/src/rosalind/06-hamming.md new file mode 100644 index 0000000..7763ebf --- /dev/null +++ b/docs/src/rosalind/06-hamming.md @@ -0,0 +1,92 @@ +# Counting Point Mutations + +!!! warning "The Problem" + Problem + + Given two strings s and t of equal length, the Hamming distance between s + and t, denoted dH(s,t), is the number of corresponding symbols that differ in s and t. + + Given: Two DNA strings s and t of equal length (not exceeding 1 kbp). + + Return: The Hamming distance dH(s,t). + + ***Sample Dataset*** + + ``` + GAGCCTACTAACGGGAT + CATCGTAATGACGGCCT + ``` + + ***Sample Output*** + ``` + 7 + ``` + + +To calculate the Hamming Distance between two strings/sequences, the two strings/DNA sequences must be the same length. Therefore, we can calculate the Hamming Distance by looping over one of the strings and checking if the corresponding character in the other string matches. Each mismatch will cause 1 to be added to a `counter` variable. At the end of the loop, we can return the total value of the `counter` variable. + +Let's give this a try! + +```julia +SampleSeqA = "GAGCCTACTAACGGGAT" +SampleSeqB = "CATCGTAATGACGGCCT" + +function calcHamming(SeqA, SeqB) + SeqLength = length(SeqA) + + # check if the strings are empty + if SeqLength == 0 + return 0 + end + + mismatches = 0 + for i in 1:SeqLength + # print(i) + if SeqA[i] != SeqB[i] + mismatches += 1 + end + end + return mismatches +end + +calcHamming(SampleSeqA, SampleSeqB) + +``` + + + +## BioAlignments method + +Instead of writing your own function, an alternative would be to use the readily-available Hamming Distance [function](https://github.com/BioJulia/BioAlignments.jl/blob/0f3cc5e1ac8b34fdde23cb3dca7afb9eb480322f/src/pairwise/algorithms/hamming_distance.jl#L4) in the `BioAlignments.jl` package. + +```julia +using BioAlignments + +seqA = "GAGCCTACTAACGGGAT" +seqB = "CATCGTAATGACGGCCT" + +BioAlignmentsHamming = BioAlignments.hamming_distance(Int64, "GAGCCTACTAACGGGAT", "CATCGTAATGACGGCCT") + +BioAlignmentsHamming[1] + +``` + +```julia +# Double check that we got the same values from both ouputs +@assert calcHamming(SampleSeqA, SampleSeqB) == BioAlignmentsHamming[1] +``` + + + The BioAlignments `hamming_distance` function requires three input variables -- the first of which allows the user to control the `type` of the returned hamming distance value. In the above example, `Int64` is provided as the input variable, but `Float64` or `UInt8` are also acceptable inputs. + + The second two input variables are the two sequences that are being compared. + + There are two outputs of this function: the actual Hamming Distance value and the Alignment Anchor. The Alignment Anchor is a a one-dimensional array (vector) that is the same length as the length of the input strings. Each value in the vector is a also an AlignmentAnchor with three fields: sequence position, reference position, and an operation code ('0' for start, '=' for match, 'X' for mismatch). + + The Alignment Anchor for the above example is + ``` + AlignmentAnchor[AlignmentAnchor(0, 0, '0'), AlignmentAnchor(1, 1, 'X'), AlignmentAnchor(2, 2, '='), AlignmentAnchor(3, 3, 'X'), AlignmentAnchor(4, 4, '='), AlignmentAnchor(5, 5, 'X'), AlignmentAnchor(7, 7, '='), AlignmentAnchor(8, 8, 'X'), AlignmentAnchor(9, 9, '='), AlignmentAnchor(10, 10, 'X'), AlignmentAnchor(14, 14, '='), AlignmentAnchor(16, 16, 'X'), AlignmentAnchor(17, 17, '=')] + + ``` + + From 7e2b4045be55c52a784e5b98161595ca7d528881 Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Tue, 27 Jan 2026 16:20:43 -0500 Subject: [PATCH 02/12] slight text edits --- docs/src/rosalind/06-hamming.md | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/docs/src/rosalind/06-hamming.md b/docs/src/rosalind/06-hamming.md index 7763ebf..a8010b7 100644 --- a/docs/src/rosalind/06-hamming.md +++ b/docs/src/rosalind/06-hamming.md @@ -1,10 +1,8 @@ # Counting Point Mutations !!! warning "The Problem" - Problem - Given two strings s and t of equal length, the Hamming distance between s - and t, denoted dH(s,t), is the number of corresponding symbols that differ in s and t. + Given two strings s and t of equal length, the Hamming distance between s and t, denoted dH(s,t), is the number of corresponding symbols that differ in s and t. Given: Two DNA strings s and t of equal length (not exceeding 1 kbp). @@ -18,12 +16,13 @@ ``` ***Sample Output*** + ``` 7 ``` -To calculate the Hamming Distance between two strings/sequences, the two strings/DNA sequences must be the same length. Therefore, we can calculate the Hamming Distance by looping over one of the strings and checking if the corresponding character in the other string matches. Each mismatch will cause 1 to be added to a `counter` variable. At the end of the loop, we can return the total value of the `counter` variable. +To calculate the Hamming Distance between two strings/sequences, the two strings/DNA sequences must be the same length. We can calculate the Hamming Distance by looping over the characters in one of the strings and checking if the corresponding character at the same index in the other string matches. Each mismatch will cause 1 to be added to a `counter` variable. At the end of the loop, we can return the total value of the `counter` variable. Let's give this a try! @@ -41,7 +40,6 @@ function calcHamming(SeqA, SeqB) mismatches = 0 for i in 1:SeqLength - # print(i) if SeqA[i] != SeqB[i] mismatches += 1 end @@ -77,11 +75,11 @@ BioAlignmentsHamming[1] ``` - The BioAlignments `hamming_distance` function requires three input variables -- the first of which allows the user to control the `type` of the returned hamming distance value. In the above example, `Int64` is provided as the input variable, but `Float64` or `UInt8` are also acceptable inputs. + The BioAlignments `hamming_distance` function requires three input variables -- the first of which allows the user to control the `type` of the returned hamming distance value. In the above example, `Int64` is provided as the input variable, but `Float64` or `Int8` are also acceptable inputs. The second two input variables are the two sequences that are being compared. - There are two outputs of this function: the actual Hamming Distance value and the Alignment Anchor. The Alignment Anchor is a a one-dimensional array (vector) that is the same length as the length of the input strings. Each value in the vector is a also an AlignmentAnchor with three fields: sequence position, reference position, and an operation code ('0' for start, '=' for match, 'X' for mismatch). + There are two outputs of this function: the actual Hamming Distance value and the Alignment Anchor. The Alignment Anchor is a a one-dimensional array (vector) that is the same length as the length of the input strings. Each value in the vector is also an AlignmentAnchor with three fields: sequence position, reference position, and an operation code ('0' for start, '=' for match, 'X' for mismatch). The Alignment Anchor for the above example is ``` From 4f3c4e6f4330263596e724f499150cbfe64bdb09 Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Thu, 29 Jan 2026 21:42:24 -0500 Subject: [PATCH 03/12] make edits according to kevin's suggestions --- docs/src/rosalind/06-hamm.md | 104 ++++++++++++++++++++++++++++++++ docs/src/rosalind/06-hamming.md | 90 --------------------------- 2 files changed, 104 insertions(+), 90 deletions(-) create mode 100644 docs/src/rosalind/06-hamm.md delete mode 100644 docs/src/rosalind/06-hamming.md diff --git a/docs/src/rosalind/06-hamm.md b/docs/src/rosalind/06-hamm.md new file mode 100644 index 0000000..e804910 --- /dev/null +++ b/docs/src/rosalind/06-hamm.md @@ -0,0 +1,104 @@ +# Counting Point Mutations + +🤔 [Problem link](https://rosalind.info/problems/hamm/) + +!!! warning "The Problem" + + Given two strings s and t of equal length, the Hamming distance between s and t, denoted dH(s,t), is the number of corresponding symbols that differ in s and t. + + + Given: Two DNA strings s and t of equal length (not exceeding 1 kbp). + + Return: The Hamming distance dH(s,t). + + ***Sample Dataset*** + + ``` + GAGCCTACTAACGGGAT + CATCGTAATGACGGCCT + ``` + + ***Sample Output*** + + ``` + 7 + ``` + + +To calculate the Hamming Distance between two strings/sequences, the two strings/DNA sequences must be the same length. + + We can calculate the Hamming Distance by looping over the characters in one of the strings and checking if the corresponding character at the same index in the other string matches. + + Each mismatch will cause 1 to be added to a `counter` variable. At the end of the loop, we can return the total value of the `counter` variable. + +Let's give this a try! + +```julia +ex_seq_a = "GAGCCTACTAACGGGAT" +ex_seq_b = "CATCGTAATGACGGCCT" + +function hamming(seq_a, seq_b) + + + # check if the strings are empty + if isempty(seq_a) + throw(ErrorException("empty sequences")) + end + + # check if the strings are different lengths + if length(seq_a) != length(seq_b) + throw(ErrorException(" sequences have different lengths")) + end + + mismatches = 0 + for i in 1:length(seq_a) + if seq_a[i] != seq_b[i] + mismatches += 1 + end + end + return mismatches +end + +hamming(ex_seq_a, ex_seq_b) + +``` + + + +## BioAlignments method + +Instead of writing your own function, an alternative would be to use the readily-available Hamming Distance [function](https://github.com/BioJulia/BioAlignments.jl/blob/0f3cc5e1ac8b34fdde23cb3dca7afb9eb480322f/src/pairwise/algorithms/hamming_distance.jl#L4) in the `BioAlignments.jl` package. + +```julia +using BioAlignments + +ex_seq_a = "GAGCCTACTAACGGGAT" +ex_seq_b = "CATCGTAATGACGGCCT" + +bio_hamming = BioAlignments.hamming_distance(Int64, ex_seq_a, ex_seq_b) + +bio_hamming[1] + +``` + +```julia +# Double check that we got the same values from both ouputs +@assert calcHamming(ex_seq_a, ex_seq_b) == bio_hamming[1] +``` + + + The BioAlignments `hamming_distance` function requires three input variables -- the first of which allows the user to control the `type` of the returned hamming distance value. + + In the above example, `Int64` is provided as the first input variable, but `Float64` or `Int8` are also acceptable inputs. The second two input variables are the two sequences that are being compared. + + There are two outputs of this function: the actual Hamming Distance value and the Alignment Anchor. The Alignment Anchor is a one-dimensional array (vector) that is the same length as the length of the input strings. + + Each value in the vector is also an AlignmentAnchor with three fields: sequence position, reference position, and an operation code ('0' for start, '=' for match, 'X' for mismatch). + + The Alignment Anchor for the above example is: + ``` + AlignmentAnchor[AlignmentAnchor(0, 0, '0'), AlignmentAnchor(1, 1, 'X'), AlignmentAnchor(2, 2, '='), AlignmentAnchor(3, 3, 'X'), AlignmentAnchor(4, 4, '='), AlignmentAnchor(5, 5, 'X'), AlignmentAnchor(7, 7, '='), AlignmentAnchor(8, 8, 'X'), AlignmentAnchor(9, 9, '='), AlignmentAnchor(10, 10, 'X'), AlignmentAnchor(14, 14, '='), AlignmentAnchor(16, 16, 'X'), AlignmentAnchor(17, 17, '=')] + + ``` + + diff --git a/docs/src/rosalind/06-hamming.md b/docs/src/rosalind/06-hamming.md deleted file mode 100644 index a8010b7..0000000 --- a/docs/src/rosalind/06-hamming.md +++ /dev/null @@ -1,90 +0,0 @@ -# Counting Point Mutations - -!!! warning "The Problem" - - Given two strings s and t of equal length, the Hamming distance between s and t, denoted dH(s,t), is the number of corresponding symbols that differ in s and t. - - Given: Two DNA strings s and t of equal length (not exceeding 1 kbp). - - Return: The Hamming distance dH(s,t). - - ***Sample Dataset*** - - ``` - GAGCCTACTAACGGGAT - CATCGTAATGACGGCCT - ``` - - ***Sample Output*** - - ``` - 7 - ``` - - -To calculate the Hamming Distance between two strings/sequences, the two strings/DNA sequences must be the same length. We can calculate the Hamming Distance by looping over the characters in one of the strings and checking if the corresponding character at the same index in the other string matches. Each mismatch will cause 1 to be added to a `counter` variable. At the end of the loop, we can return the total value of the `counter` variable. - -Let's give this a try! - -```julia -SampleSeqA = "GAGCCTACTAACGGGAT" -SampleSeqB = "CATCGTAATGACGGCCT" - -function calcHamming(SeqA, SeqB) - SeqLength = length(SeqA) - - # check if the strings are empty - if SeqLength == 0 - return 0 - end - - mismatches = 0 - for i in 1:SeqLength - if SeqA[i] != SeqB[i] - mismatches += 1 - end - end - return mismatches -end - -calcHamming(SampleSeqA, SampleSeqB) - -``` - - - -## BioAlignments method - -Instead of writing your own function, an alternative would be to use the readily-available Hamming Distance [function](https://github.com/BioJulia/BioAlignments.jl/blob/0f3cc5e1ac8b34fdde23cb3dca7afb9eb480322f/src/pairwise/algorithms/hamming_distance.jl#L4) in the `BioAlignments.jl` package. - -```julia -using BioAlignments - -seqA = "GAGCCTACTAACGGGAT" -seqB = "CATCGTAATGACGGCCT" - -BioAlignmentsHamming = BioAlignments.hamming_distance(Int64, "GAGCCTACTAACGGGAT", "CATCGTAATGACGGCCT") - -BioAlignmentsHamming[1] - -``` - -```julia -# Double check that we got the same values from both ouputs -@assert calcHamming(SampleSeqA, SampleSeqB) == BioAlignmentsHamming[1] -``` - - - The BioAlignments `hamming_distance` function requires three input variables -- the first of which allows the user to control the `type` of the returned hamming distance value. In the above example, `Int64` is provided as the input variable, but `Float64` or `Int8` are also acceptable inputs. - - The second two input variables are the two sequences that are being compared. - - There are two outputs of this function: the actual Hamming Distance value and the Alignment Anchor. The Alignment Anchor is a a one-dimensional array (vector) that is the same length as the length of the input strings. Each value in the vector is also an AlignmentAnchor with three fields: sequence position, reference position, and an operation code ('0' for start, '=' for match, 'X' for mismatch). - - The Alignment Anchor for the above example is - ``` - AlignmentAnchor[AlignmentAnchor(0, 0, '0'), AlignmentAnchor(1, 1, 'X'), AlignmentAnchor(2, 2, '='), AlignmentAnchor(3, 3, 'X'), AlignmentAnchor(4, 4, '='), AlignmentAnchor(5, 5, 'X'), AlignmentAnchor(7, 7, '='), AlignmentAnchor(8, 8, 'X'), AlignmentAnchor(9, 9, '='), AlignmentAnchor(10, 10, 'X'), AlignmentAnchor(14, 14, '='), AlignmentAnchor(16, 16, 'X'), AlignmentAnchor(17, 17, '=')] - - ``` - - From 5fa4fc0f6203364f141aaf1652a1ca8024a460d4 Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Thu, 29 Jan 2026 21:54:51 -0500 Subject: [PATCH 04/12] add idiomatic solution --- docs/src/rosalind/06-hamm.md | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/docs/src/rosalind/06-hamm.md b/docs/src/rosalind/06-hamm.md index e804910..db5bbf1 100644 --- a/docs/src/rosalind/06-hamm.md +++ b/docs/src/rosalind/06-hamm.md @@ -27,11 +27,28 @@ To calculate the Hamming Distance between two strings/sequences, the two strings/DNA sequences must be the same length. - We can calculate the Hamming Distance by looping over the characters in one of the strings and checking if the corresponding character at the same index in the other string matches. +The simplest way to solve this problem is to compare the corresponding values in each string for each index and then sum the mismatches. This is the fastest and most idiomatic Julia solution, as it leverages vector math. + +Let's give this a try! + +```julia +ex_seq_a = "GAGCCTACTAACGGGAT" +ex_seq_b = "CATCGTAATGACGGCCT" + +count(i-> ex_seq_a[i] != ex_seq_b[i], eachindex(ex_seq_a)) +``` + + + +### For Loop + +Another way we can approach this would be to use the for-loop. This method will be a bit slower. + +We can calculate the Hamming Distance by looping over the characters in one of the strings and checking if the corresponding character at the same index in the other string matches. Each mismatch will cause 1 to be added to a `counter` variable. At the end of the loop, we can return the total value of the `counter` variable. -Let's give this a try! + ```julia ex_seq_a = "GAGCCTACTAACGGGAT" From 3a35cc2b23a0cefc4f3625bc0186b3e5d3ffe087 Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Thu, 29 Jan 2026 22:04:28 -0500 Subject: [PATCH 05/12] add distances.jl solution --- docs/src/rosalind/06-hamm.md | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/docs/src/rosalind/06-hamm.md b/docs/src/rosalind/06-hamm.md index db5bbf1..cd9a0c5 100644 --- a/docs/src/rosalind/06-hamm.md +++ b/docs/src/rosalind/06-hamm.md @@ -115,7 +115,23 @@ bio_hamming[1] The Alignment Anchor for the above example is: ``` AlignmentAnchor[AlignmentAnchor(0, 0, '0'), AlignmentAnchor(1, 1, 'X'), AlignmentAnchor(2, 2, '='), AlignmentAnchor(3, 3, 'X'), AlignmentAnchor(4, 4, '='), AlignmentAnchor(5, 5, 'X'), AlignmentAnchor(7, 7, '='), AlignmentAnchor(8, 8, 'X'), AlignmentAnchor(9, 9, '='), AlignmentAnchor(10, 10, 'X'), AlignmentAnchor(14, 14, '='), AlignmentAnchor(16, 16, 'X'), AlignmentAnchor(17, 17, '=')] - ``` + ### Distances.Jl method + + Another package that calculates the Hamming distance is the [Distances package](https://github.com/JuliaStats/Distances.jl). We can call its `hamming` function on our two test sequences: + + + +```julia +using Distances + +ex_seq_a = "GAGCCTACTAACGGGAT" +ex_seq_b = "CATCGTAATGACGGCCT" + +Distances.hamming(ex_seq_a, ex_seq_b) +``` + + + From 033e6b5bb06b3a28c5bb048e9a5af84d4873350e Mon Sep 17 00:00:00 2001 From: Danielle Pinto <108756057+danielle-pinto@users.noreply.github.com> Date: Fri, 6 Feb 2026 12:18:42 -0500 Subject: [PATCH 06/12] Update docs/src/rosalind/06-hamm.md Co-authored-by: Kevin Bonham --- docs/src/rosalind/06-hamm.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/src/rosalind/06-hamm.md b/docs/src/rosalind/06-hamm.md index cd9a0c5..5953b02 100644 --- a/docs/src/rosalind/06-hamm.md +++ b/docs/src/rosalind/06-hamm.md @@ -81,7 +81,6 @@ hamming(ex_seq_a, ex_seq_b) ``` - ## BioAlignments method Instead of writing your own function, an alternative would be to use the readily-available Hamming Distance [function](https://github.com/BioJulia/BioAlignments.jl/blob/0f3cc5e1ac8b34fdde23cb3dca7afb9eb480322f/src/pairwise/algorithms/hamming_distance.jl#L4) in the `BioAlignments.jl` package. From eb58985051adf31e5f24cd31332939f97fc7481e Mon Sep 17 00:00:00 2001 From: Danielle Pinto <108756057+danielle-pinto@users.noreply.github.com> Date: Fri, 6 Feb 2026 12:18:52 -0500 Subject: [PATCH 07/12] Update docs/src/rosalind/06-hamm.md Co-authored-by: Kevin Bonham --- docs/src/rosalind/06-hamm.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/src/rosalind/06-hamm.md b/docs/src/rosalind/06-hamm.md index 5953b02..b6a6e55 100644 --- a/docs/src/rosalind/06-hamm.md +++ b/docs/src/rosalind/06-hamm.md @@ -55,8 +55,6 @@ ex_seq_a = "GAGCCTACTAACGGGAT" ex_seq_b = "CATCGTAATGACGGCCT" function hamming(seq_a, seq_b) - - # check if the strings are empty if isempty(seq_a) throw(ErrorException("empty sequences")) From 82d71c857ec35e3dfd98d91a20b68acbe15b1ccb Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Fri, 6 Feb 2026 13:03:08 -0500 Subject: [PATCH 08/12] add semantic line breaks and benchmarking --- docs/src/rosalind/06-hamm.md | 83 ++++++++++++++++++++++++++++++------ 1 file changed, 69 insertions(+), 14 deletions(-) diff --git a/docs/src/rosalind/06-hamm.md b/docs/src/rosalind/06-hamm.md index b6a6e55..e83761b 100644 --- a/docs/src/rosalind/06-hamm.md +++ b/docs/src/rosalind/06-hamm.md @@ -4,7 +4,9 @@ !!! warning "The Problem" - Given two strings s and t of equal length, the Hamming distance between s and t, denoted dH(s,t), is the number of corresponding symbols that differ in s and t. + Given two strings s and t of equal length, + the Hamming distance between s and t, denoted dH(s,t), + is the number of corresponding symbols that differ in s and t. Given: Two DNA strings s and t of equal length (not exceeding 1 kbp). @@ -25,9 +27,11 @@ ``` -To calculate the Hamming Distance between two strings/sequences, the two strings/DNA sequences must be the same length. +To calculate the Hamming Distance between two strings/sequences, +the two strings/DNA sequences must be the same length. -The simplest way to solve this problem is to compare the corresponding values in each string for each index and then sum the mismatches. This is the fastest and most idiomatic Julia solution, as it leverages vector math. +The simplest way to solve this problem is to compare the corresponding values in each string for each index and then sum the mismatches. +This is the fastest and most idiomatic Julia solution, as it leverages vector math. Let's give this a try! @@ -38,15 +42,19 @@ ex_seq_b = "CATCGTAATGACGGCCT" count(i-> ex_seq_a[i] != ex_seq_b[i], eachindex(ex_seq_a)) ``` - - ### For Loop -Another way we can approach this would be to use the for-loop. This method will be a bit slower. +Another way we can approach this would be to use the for-loop. +For loops are traditionally slower and clunkier (especially in Python). +However, Julia can often optimize for-loops like this, +which is one of the things that makes it so powerful. +It has multiple processing units that can run the same task parallelly. -We can calculate the Hamming Distance by looping over the characters in one of the strings and checking if the corresponding character at the same index in the other string matches. +We can calculate the Hamming Distance by looping over the characters in one of the strings +and checking if the corresponding character at the same index in the other string matches. - Each mismatch will cause 1 to be added to a `counter` variable. At the end of the loop, we can return the total value of the `counter` variable. +Each mismatch will cause 1 to be added to a `counter` variable. +At the end of the loop, we can return the total value of the `counter` variable. @@ -101,22 +109,43 @@ bio_hamming[1] ``` - The BioAlignments `hamming_distance` function requires three input variables -- the first of which allows the user to control the `type` of the returned hamming distance value. + The BioAlignments `hamming_distance` function requires three input variables -- + the first of which allows the user to control the `type` of the returned hamming distance value. - In the above example, `Int64` is provided as the first input variable, but `Float64` or `Int8` are also acceptable inputs. The second two input variables are the two sequences that are being compared. + In the above example, `Int64` is provided as the first input variable, + but `Float64` or `Int8` are also acceptable inputs. + The second two input variables are the two sequences that are being compared. - There are two outputs of this function: the actual Hamming Distance value and the Alignment Anchor. The Alignment Anchor is a one-dimensional array (vector) that is the same length as the length of the input strings. + There are two outputs of this function: + the actual Hamming Distance value and the Alignment Anchor. + The Alignment Anchor is a one-dimensional array (vector) that is the same length as the length of the input strings. - Each value in the vector is also an AlignmentAnchor with three fields: sequence position, reference position, and an operation code ('0' for start, '=' for match, 'X' for mismatch). + Each value in the vector is also an AlignmentAnchor with three fields: + sequence position, reference position, and an operation code + ('0' for start, '=' for match, 'X' for mismatch). The Alignment Anchor for the above example is: ``` - AlignmentAnchor[AlignmentAnchor(0, 0, '0'), AlignmentAnchor(1, 1, 'X'), AlignmentAnchor(2, 2, '='), AlignmentAnchor(3, 3, 'X'), AlignmentAnchor(4, 4, '='), AlignmentAnchor(5, 5, 'X'), AlignmentAnchor(7, 7, '='), AlignmentAnchor(8, 8, 'X'), AlignmentAnchor(9, 9, '='), AlignmentAnchor(10, 10, 'X'), AlignmentAnchor(14, 14, '='), AlignmentAnchor(16, 16, 'X'), AlignmentAnchor(17, 17, '=')] + AlignmentAnchor[ + AlignmentAnchor(0, 0, '0'), + AlignmentAnchor(1, 1, 'X'), + AlignmentAnchor(2, 2, '='), + AlignmentAnchor(3, 3, 'X'), + AlignmentAnchor(4, 4, '='), + AlignmentAnchor(5, 5, 'X'), + AlignmentAnchor(7, 7, '='), + AlignmentAnchor(8, 8, 'X'), + AlignmentAnchor(9, 9, '='), + AlignmentAnchor(10, 10, 'X'), + AlignmentAnchor(14, 14, '='), + AlignmentAnchor(16, 16, 'X'), + AlignmentAnchor(17, 17, '=')] ``` ### Distances.Jl method - Another package that calculates the Hamming distance is the [Distances package](https://github.com/JuliaStats/Distances.jl). We can call its `hamming` function on our two test sequences: + Another package that calculates the Hamming distance is the [Distances package](https://github.com/JuliaStats/Distances.jl). + We can call its `hamming` function on our two test sequences: @@ -129,6 +158,32 @@ ex_seq_b = "CATCGTAATGACGGCCT" Distances.hamming(ex_seq_a, ex_seq_b) ``` +## Benchmarking + +Let's test to see which method is the most efficient! +Did the for-loop slow us down? + +```julia +using BenchmarkTools + +testseq1 = string(randdnaseq(100_000)) # this is defined in BioSequences + +testseq2 = string(randdnaseq(100_000)) + + +@benchmark hamming($testseq1, $testseq2) + +@benchmark BioAlignments.hamming_distance(Int64, $testseq1, $testseq2) + +@benchmark Distances.hamming($testseq1, $testseq2) +``` + +The BioAlignments method takes up a much larger amount of memory, +and nearly three times as long to run. +However, it also generates an `AlignmentAnchor` data structure each time the function is called, +so this is not a fair comparison. +The `Distances` package is the winner here,which makes sense, +as it uses a vectorized approach. From 4e5ae14d02b764216d57499cbf6d29eedb7f048a Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Fri, 6 Feb 2026 13:15:04 -0500 Subject: [PATCH 09/12] fix typos --- docs/src/rosalind/06-hamm.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/src/rosalind/06-hamm.md b/docs/src/rosalind/06-hamm.md index e83761b..e985666 100644 --- a/docs/src/rosalind/06-hamm.md +++ b/docs/src/rosalind/06-hamm.md @@ -48,7 +48,7 @@ Another way we can approach this would be to use the for-loop. For loops are traditionally slower and clunkier (especially in Python). However, Julia can often optimize for-loops like this, which is one of the things that makes it so powerful. -It has multiple processing units that can run the same task parallelly. +It has multiple processing units that can run the same task in parallel. We can calculate the Hamming Distance by looping over the characters in one of the strings and checking if the corresponding character at the same index in the other string matches. @@ -104,8 +104,8 @@ bio_hamming[1] ``` ```julia -# Double check that we got the same values from both ouputs -@assert calcHamming(ex_seq_a, ex_seq_b) == bio_hamming[1] +# Double check that we got the same values from both ouputs +@assert hamming(ex_seq_a, ex_seq_b) == bio_hamming[1] ``` @@ -142,7 +142,7 @@ bio_hamming[1] AlignmentAnchor(17, 17, '=')] ``` - ### Distances.Jl method + ### Distances.jl method Another package that calculates the Hamming distance is the [Distances package](https://github.com/JuliaStats/Distances.jl). We can call its `hamming` function on our two test sequences: @@ -182,7 +182,7 @@ The BioAlignments method takes up a much larger amount of memory, and nearly three times as long to run. However, it also generates an `AlignmentAnchor` data structure each time the function is called, so this is not a fair comparison. -The `Distances` package is the winner here,which makes sense, +The `Distances` package is the winner here, which makes sense, as it uses a vectorized approach. From 9bfbfd85164b0df1f5f70dd1447df79b671c1648 Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Sat, 7 Feb 2026 20:00:13 -0500 Subject: [PATCH 10/12] escape $ due to deployment warning --- docs/src/rosalind/06-hamm.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/src/rosalind/06-hamm.md b/docs/src/rosalind/06-hamm.md index e985666..7b1f02b 100644 --- a/docs/src/rosalind/06-hamm.md +++ b/docs/src/rosalind/06-hamm.md @@ -171,11 +171,11 @@ testseq1 = string(randdnaseq(100_000)) # this is defined in BioSequences testseq2 = string(randdnaseq(100_000)) -@benchmark hamming($testseq1, $testseq2) +@benchmark hamming(\$testseq1, \$testseq2) -@benchmark BioAlignments.hamming_distance(Int64, $testseq1, $testseq2) +@benchmark BioAlignments.hamming_distance(Int64, \$testseq1, \$testseq2) -@benchmark Distances.hamming($testseq1, $testseq2) +@benchmark Distances.hamming(\$testseq1, \$testseq2) ``` The BioAlignments method takes up a much larger amount of memory, From 79444973d433bc22c024c232a44325fab3f0ba20 Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Sat, 7 Feb 2026 20:16:01 -0500 Subject: [PATCH 11/12] pin mathjax version to 0.2.7 --- docs/Project.toml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/Project.toml b/docs/Project.toml index 459adf0..9f560c0 100644 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -10,3 +10,5 @@ FormatSpecimens = "3372ea36-2a1a-11e9-3eb7-996970b6ffbd" JuliaFormatter = "98e50ef6-434e-11e9-1051-2b60c6c9e899" LiveServer = "16fef848-5104-11e9-1b77-fb7a48bbb589" XAM = "d759349c-bcba-11e9-07c2-5b90f8f05f7c" + +DocumenterVitepress = "=0.2.7" From 45814fb6d8ab03d3f9e9e847655b2781c501fd55 Mon Sep 17 00:00:00 2001 From: Danielle Pinto Date: Sat, 7 Feb 2026 20:26:44 -0500 Subject: [PATCH 12/12] add documenter vitepress version pin under compat category --- docs/Project.toml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/Project.toml b/docs/Project.toml index 9f560c0..b1c5a68 100644 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -11,4 +11,6 @@ JuliaFormatter = "98e50ef6-434e-11e9-1051-2b60c6c9e899" LiveServer = "16fef848-5104-11e9-1b77-fb7a48bbb589" XAM = "d759349c-bcba-11e9-07c2-5b90f8f05f7c" +[compat] DocumenterVitepress = "=0.2.7" +