Mark Burnett, a security researcher, recently released a collection of 10 million passwords along with their usernames. My question was, how different are 10 million usernames from their passwords? Taking a tiny bit of time, I performed a simple analysis looking at the Levenshtein distance between them and composed the graph below.
What this means is, if people in this dataset used their username as a password (ex: user dino, password dino), but then changed it a little (password dino1), how many insertions, deletions or substitutions did these users have to make from the set? See for yourself.
Distance of 0 means usernames and passwords are exactly identical (in the graph below, 213,133 passwords are same as their usernames). Distance of 1 means one character was added, deleted or changed. And so on...
The post Levenshtein distance between 10 million usernames and their passwords appeared first on Dino's Anabasis.