Skip to content

Potential identity calculation bias related to window position (v0.9.8) #56

@justdx

Description

@justdx

Hello,

I observed unexpected identity values in ModDotPlot (v0.9.8) that appear to depend on window position. To test, I designed a controlled simulation.

Test Setup: 3 windows (w1, w2 and w3, 10 kb each)

Pairwise Identity in the initial run:

  • w1 vs w2 → 96.8%
  • w1 vs w3 → 85.0%

Simulated Sequence (300 windows total)

w1 × 1 copy + w3 × 100 copies + w1 × 99 copies + w2 × 100 copies

Parameters: -w 10000 -id 0

Observations

1. Inflated Identity

Expected: 85% (w1 vs w3) and 96.8% (w1 vs w2)

Observed:

  • Window 1 (w1) vs Window 2 (adjacent w3) → 96.8%
  • Window 1 (w1) vs Windows 3–101 (w3 copies) → 96.5%
  • Window 1 (w1) vs Window 201 (first w2) → 96.8%
  • Window 1 (w1) vs Windows 202–300 (remaining w2 copies) → 83.8%

2. Unexpected Zero Identity

Expected: 85%

Observed:

  • Window 2 (w3) vs Windows 103–199 (w1 copies) → 0%

3. Filtering Behavior

When using: -id 70

ModDotPlot removes:

  • Rows with identity = 0 (expected)
  • Rows comparing Window 1 (w1) with Windows 3–101 (w3 copies)

The results suggest that identity values may vary depending on window position or ordering, producing inconsistent identity estimates for identical sequence comparisons.

I can provide the simulated dataset and command details if needed.

Thank you for developing ModDotPlot.

Xiao

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions