Update url matching to use levenshtein distance#23
Update url matching to use levenshtein distance#23nuudles wants to merge 1 commit intoMacPass:masterfrom
Conversation
This update uses the levenshtein distance algorithm to determine the best matched entry, similar to how the KeePassHTTP plugin performs. This alleviates issues such as "www.facebook.com" not matching an entry whose URL is "facebook.com".
|
In KeePassHTTP the Levenshtein distance is used to order Login entries but not for actually retrieving them, or do I read the code wrong?. You're using it to actually match, which changes the behaviour drastically. I do not intend to move away from the original implementation and to be honest I did just port @jameshurst implementation without any changes to the actual logic. If I'm wrong, it'll be merged promptly ;) |
|
I just dipped a bit deeper, the sorting is done in KeePassHTTPKit. There might be a good place to implement the levenshtein distance to align KeePassHTTPKit with KeePassHTTP. |
|
Hey @mstarke! Thanks for your prompt response! Looking into it further, I think I got tripped up by their README, where they state:
Looking at their code It looks like they first filter the entries that match the scheme and URL, then further filter those entries down to only those which match the Levenshtein distance. Their initial filter is a bit more robust than the one in KeePassHTTPKit currently: while (listResult.Count == listCount && (origSearchHost == searchHost || searchHost.IndexOf(".") != -1))
{
parms.SearchString = String.Format("^{0}$|/{0}/?", searchHost);
var listEntries = new PwObjectList<PwEntry>();
db.RootGroup.SearchEntries(parms, listEntries);
foreach (var le in listEntries)
{
listResult.Add(new PwEntryDatabase(le, db));
}
searchHost = searchHost.Substring(searchHost.IndexOf(".") + 1);
//searchHost contains no dot --> prevent possible infinite loop
if (searchHost == origSearchHost)
break;
}
listCount = listResult.Count;It looks like they do searches with each split of the "." character so for That algorithm would still solve my issue where the passed in If it makes sense to you, I'd be happy to implement the algorithm closer to what the KeePassHTTP behavior is. |
This update uses the levenshtein distance algorithm to determine the best matched entry, similar to how the KeePassHTTP plugin performs. This alleviates issues such as "www.facebook.com" not matching an entry whose URL is "facebook.com".
Note that this might produce false positives, particularly if passed a URL that doesn't exist in any of the entries, but in my experiments it works quite well.