Image Server and Deduplicator
What is it?
A small but notable issue which has haunted our family for some time
now is our storage solution for all of our images. Some of our hard
drives are sitting in cold storage and have multiple duplicates of
images, alongside multiple manually created backups from years ago.
On top of this, we have no centralized location for storing all of
our newer images.
In order to resolve these two issues, I have recently begun working
on two projects: A custom image server / deduplicator (this
project), and a home file server on an old HP desktop. This project
aims to resolve the duplicate image issue, and will run on the file
server. The image server software provides a simple web interface
for uploading and viewing images.
Due to the nature of this project, I do not ever plan on releasing a
publicly-facing image server website or anything big like that. It
is simply meant to be a one-off solution to our image problem.
However, the source code is publicly available should anyone be
interested in forking it.
What can it do?
The main purpose of this project is to detect duplicate images.
However, the project also lets you upload images using the web
interface to make consolidating the large number of small folders of
files easier.
This project also has a simple viewing interface to see
all of the images. This proved convenient during testing, and also
was kind of fun to make, as most of the infrastructure for it had
already been set up. It offers options to sort the images by date
taken, date uploaded, file size, etc. in ascending or descending
order. It also lets you filter the results by the date the images
were taken.
How was it made?
I decided to make this project using multiple technologies that were
new to me, since I was excited to learn them (I also thought they
fit the job well). I used HTMX and Pug in leiu of a heavy frontend
like React, since I knew the interface would be fairly simple. I
also used Tailwindcss for the project, which came with the usual
benefits of quick prototyping. As for the core web server
functionality I went with the tried and true NodeJS.
I decided to split up the project into two parts: the aforementioned
web server, and an image processor/decoder program. Both parts of
the project would essentially only interact through one shared
interface: The database. This database would contain a catalogue of
every image's original name, date taken, and most importantly, a
list of detected duplicate images. This made making a lot of the
viewer's functionality relatively trivial to implement, as the
database had to be created for duplicate detection regardless.
I decided on Go for the image decoding, since I heard that it was
built to be a performant language. One construct in particular had
me very excited to use it:
GoRoutines. They
turned out to be an amazingly simple way of parallelizing the task
of processing the images. In addition, Go surprised me with the way
errors were handled. In Go, functions return errors to a variable,
and you are forced to handle them. This made catching runtime errors
before they happened much simpler!
I developed this
project over the course of about a month in 2023, when I had some
free time between exams. I plan on running it on the file server
that I am working on right now.