Use Docker to Search in 320 Million Pwned Passwords

This week Troy Hunt, a security researcher announced a freely downloadable list of pwned passwords. Troy is the creator of Have I Been Pwned? website and service that will notify you when one of your registered email addresses have been compromised by a data breach.

In his latest blog post he introduced 306 Million Freely Downloadable Pwned Passwords with an update of another 14 Million just on the following day. He also has setup a online search at https://haveibeenpwned.com/Passwords

You can enter passwords and check if they have been compromised. But do not enter actively used passwords here, even if Troy is a nice person living in sunny Australia.

Pwned Passwords online service

My recommendation is

  1. If you are in doubt if your password has been pwned, just change it first and then check the old one in the online form.
  2. Use a Password manager like 1Password to create an individual long random password for each service you use.

But the huge password list is still quite interesting to work with.

Let's build a local search

What you can do is download the list of passwords (about 5 GByte compressed) and search locally in a safe place. You won't get the cleartext passwords, but only SHA1 sums of them. But we can create SHA1 sums of the passwords we want to search in this huge list.

You can download the files that are compressed with 7-Zip. You also need a tool to create a SHA1 sum of a plain text. And then you need another tool, a database or algorithm to quickly search in that text file that has nearly 320 Million lines.

Use Docker for the task

I immediately thought of a Container that has all these tools installed. But I didn't want to add the huge password lists into that container as it would build a Docker image of about 12 GByte or probably 5-6 GB Docker image on the Docker Hub.

The password files should be persisted locally on your laptop and mounted into the container to search in them with the tools needed for the task.

And I want to use some simple tools to get the work done. A first idea was born in the comments of Troys blog post where someone showed a small bash script with grep to search in the file.

I first tried grep, but this took about 2 minutes to find the hash in the file. So I searched a little bit and found sgrep - a tool to grep in sorted files. Luckily the password files are sorted by the SHA1 hash. But I found only the source code and there is no standard package to install it. So we also need a C compiler for that.

In times before Docker you had a lot of stress installing many tools on your computer. But let's see how Docker can help us with all these steps.

Build the Docker image

I found the Sources of sgrep on GitHub and we will need Make and a C compiler to build the sgrep binary.

I will use a multi-stage build Dockerfile and explain every single line. You can build the image line by line and see the benefits of build caches while working on the Dockerfile. So after adding a line to the file you can run docker build -t pwned-passwords . to build and update the image.

For the beginning let's choose a small Linux base image. We will name the first stage as build. So the Dockerfile starts with

FROM alpine:3.6 AS build

The next step is we have to install Git, Make and the C compiler with its header files.

RUN apk update && apk add git make gcc musl-dev

Now we clone the GitHub repo with the source code of sgrep.

RUN git clone https://github.com/colinscape/sgrep

In the next line I'll create a bin folder that is needed for the build process. Then we go to the source directory and run the make command as there is a Makefile in that directory.

RUN mkdir sgrep/bin && cd sgrep/src && make

After these steps we have the sgrep binary compiled for Alpine Linux. But we also have installed a ton of other tools.

Now put all these instructions into a Dockerfile and build the Docker image.

$ docker build -t pwned-passwords .

Let's inspect all image layers we have created so far.

$ docker history --format "{{.ID}}\t{{.Size}}\t{{.CreatedBy}}" pwned-passwords
78171a118279	24.5kB	/bin/sh -c mkdir sgrep/bin && cd sgrep/src...
2323bcb14b5f	93.6kB	/bin/sh -c git clone https://github.com/co...
8ec1470030af	119MB	/bin/sh -c apk update && apk add git make ...
7328f6f8b418	0B	/bin/sh -c #(nop)  CMD ["/bin/sh"]
<missing>	3.97MB	/bin/sh -c #(nop) ADD file:4583e12bf5caec4...

As you can see we now have a Docker image of more than 120 MByte, but the sgrep binary is only 15 KByte. Yes, this is no typo. Yes, we will grep through GByte of data with a tiny 15 KByte binary.

Multi-stage build for the win

With Docker 17.05 and newer you can now add another FROM instruction and start with a new stage in your Dockerfile. The last stage will create the final Docker image. So every instruction after the last FROM defines what goes into the image you want to share eg. on Docker Hub.

So let's start our final stage of our Docker image build with

FROM alpine:3.6

The last stage does not need a name. Now we have an empty Alpine Linux again, all the 120 MByte of development environment won't make it into the final image. But if you build the Docker image more than once the temporary layers are still there and will be reused if they are unmodified. So the Docker build cache helps you speed up while working on the shell script.

In the previous build stage we have created the much faster sgrep command. What we now need is a small shell script that converts a plaintext password into a SHA1 sum and runs the sgrep command.

To create a SHA1 sum I'll use openssl command. And it would be nice if the shell script can download the huge files for us. As the files are compressed with 7-zip we also need wget to download and 7z to extract them.

In the next instruction we install OpenSSL and the 7-Zip tool.

RUN apk update && apk add openssl p7zip

The COPY instruction has an option --from where you can specify another named stage of your build. So we copy the compiled sgrep binary from the build stage into the local bin directory.

COPY --from=build /sgrep/bin/sgrep /usr/local/bin/sgrep

The complete shell script is called search and can be found in my pwned-passwords GitHub repo. Just assume we have it in the current directory. The next COPY instruction copies it from your real machine into the image layer.

COPY search /usr/local/bin/search

As the last line of the Dockerfile we define an entrypoint to run this shell script if we run the Docker container.

ENTRYPOINT ["/usr/local/bin/search"]

Now append these lines to the Dockerfile and build the complete image. You will see that the first layers are already cached and only the last stage will be built.

The search script

You can find the search script in my GitHub repo as well as the Dockerfile. You only need these two tiny files to build the Docker image yourself.

#!/bin/sh
set -e

if [ ! -d /data ]; then
  echo "Please run this container with a volume mounted at /data."
  echo "docker run --rm -v \ $(pwd):/data pwned-passwords $*"
  exit 1
fi

FILES="pwned-passwords-1.0.txt pwned-passwords-update-1.txt"
for i in $FILES
do
  if [ ! -f "/data/$i" ]; then
    echo "Downloading $i"
    wget -O "/tmp/$i.7z" "https://downloads.pwnedpasswords.com/passwords/$i.7z"
    echo "Extracting $i to /data"
    7z x -o/data "/tmp/$i.7z"
    rm "/tmp/$i.7z"
  fi
done

if [[ $1 != "" ]]
then
PWD=$1
else
PWD="password"
echo "checking $PWD"
fi

hash=`echo -n $PWD | openssl sha1 | awk '{print $2}' | awk 'BEGIN { getline; print toupper($0)  }'`
echo "Hash is $hash"
totalcount=0
for i in $(sgrep -c $hash /data/*.txt)
do
  file=$(echo "$i" | cut -f1 -d:)
  count=$(echo "$i" | cut -f2 -d:)
  if [[ $count -ne 0 ]]; then
    echo "Oh no - pwned! Found $count occurences in $file"
  fi
  totalcount=$(( $totalcount + $count ))
done
if [[ $totalcount -eq 0 ]]; then
  echo "Good news - no pwnage found!"
else
  exit 1
fi

Build the final image

Now with these two files, Dockerfile and search shell script build the small Docker image.

$ docker build -t pwned-passwords .

Let's have a look at the final image layers with

$ docker history --format "{{.ID}}\t{{.Size}}\t{{.CreatedBy}}" stefanscherer/pwned-passwords
24eca60756c8	0B	/bin/sh -c #(nop)  ENTRYPOINT ["/usr/local...
c1a9fc5fdb78	1.04kB	/bin/sh -c #(nop) COPY file:ea5f7cefd82369...
a1f4a26a50a4	15.7kB	/bin/sh -c #(nop) COPY file:bf96562251dbd1...
f99b3a9601ea	10.7MB	/bin/sh -c apk update && apk add openssl p...
7328f6f8b418	0B	/bin/sh -c #(nop)  CMD ["/bin/sh"]
<missing>	3.97MB	/bin/sh -c #(nop) ADD file:4583e12bf5caec4...

As you can see, OpenSSL and 7-Zip take about 10 MByte, the 16 KByte sgrep binary and the 1 KByte shell script are sitting on top of the 4 MByte Alpine base image.

I also have pushed this image to the Docker Hub with a compressed size of about 7 MByte. If you trust me, you can use this Docker image as well. But you will learn more how multi-stage builds feel like if you build the image yourself.

Search for pwned passwords

We now have a small 14.7 MByte Linux Docker image to search for pwned passwords.

Run the container with a folder mounted to /data. If you forgot this, the script will show you how to run it.

Running the container for the first time it will download the two password files (5 GByte) which may take some minutes depending on your internet connectivity.

After the script has downloaded everything two files should appear in the current folder. For me it looks like this:

file list

Now search for passwords by adding a plaintext password as an argument

$ docker run --rm -v $(pwd):/data pwned-passwords troyhunt
Hash is 0CCE6A0DD219810B5964369F90A94BB52B056494
Oh no - pwned! Found 1 occurences in /data/pwned-passwords-1.0.txt

If you don't trust my script or the sgrep command, the run the container without network connectivity

$ docker run --rm -v $(pwd):/data --network none pwned-passwords secret4949
Hash is 6D26C5C10FF089BFE81AB22152E2C0F31C58E132
Good news - no pwnage found!

So you have luck, you can securely check that your password secure4949 hasn't been breached. But beware this is still no good password :-)

Run pwned-passwords

Works on Windows

If you have Docker installed on your Windows machine, you can also use my Docker image or build the image yourself.

With Docker 4 Windows it only depends on the shell you use.

For PowerShell the command to run the image is

docker run --rm -v "$(pwd):/data" pwned-passwords yourpass

PowerShell

And if you prefer the classic CMD shell use this command

docker run --rm -v "%cd%:/data" pwned-passwords yourpass

CMD shell

On my Windows 7 machine I have to use Docker Machine, but even here you can easily search for pwned passwords. All you have to do is mount a directory for the password files as /data into the container.

docker run --rm -v "/c/Users/stefan.scherer/pwned:/data" stefanscherer/pwned-passwords troyhunt

Windows 7 with pwned-passwords image

Conclusion

You now know that there are Millions of passwords out there that may be used in a brute force attack to other online services.

So please use a password manager instead of predictable patterns how to modify passwords for different services.

You also have learned how Docker can keep your computer clean but still compile some open source projects from source code.

You have seen the benefits of multi-stage builds to create and share minimal Docker images without the development environment.

And you now have the possibility to search your current passwords in a save place without leaking it to the internet. Some other online service may collect all the data entered into a form. So keep your passwords secret and change

If you want to hear more about Docker, follow me on Twitter @stefscherer.

Stefan Scherer

Read more posts by this author.