When you take a photo with a digital camera or smartphone, you’re capturing more than just a beautiful image. Within that image file, you also have something called EXIF data (EXIF stands for “Exchangeable Image File Format”). This data includes camera settings, the timestamp of the photo, and GPS location information. Sometimes it’s best to scrub EXIF image data.
This metadata can be really useful, especially for avid photographers. But if you use these images in your applications (either internally sourced or uploaded by your users), then you open yourself up to privacy risks. For example, sharing photos with location data can unintentionally reveal sensitive information—like your business’s location or personal address. This can lead to privacy breaches and compliance violations if you don’t scrub EXIF image data in your DevOps pipeline.
Example: This hero image from an eCommerce site tells the whole world that it’s a customer with a corporate account in a stock image supplier.
In this post, we’ll go over why you need to scrub EXIF image data and how to integrate this process into your DevOps pipeline. We’ll also look at some tools and methods that can help you along the way.
Let’s start with some why questions.
Why should you scrub EXIF data?
EXIF data can be incredibly useful, providing detailed information about a photo—like camera settings and GPS coordinates. But this convenience comes with risks. If your business handles images, then you need to be aware of the potential privacy issues and compliance challenges that come with EXIF data.
To mitigate these risks, scrubbing EXIF data is a smart and proactive step. Here are some reasons why:
- Meeting regulatory requirements: Many data protection regulations—such as GDPR and CCPA—require the minimization of personal data exposure. Scrubbing EXIF data helps you comply with these laws.
- Enhancing user trust and data security: When you remove EXIF data from images, you protect your users’ privacy. This builds trust, demonstrating your commitment to data security.
- Preventing information exposure: Scrubbing EXIF data ensures you don’t unintentionally share sensitive information about your organization’s operations, locations, or schedules.
Why should you use your DevOps pipeline to do it?
EXIF data scrubbing can be an automated step in your DevOps pipeline. When you do it that way, you make sure that the task is handled consistently and efficiently every time. Using your DevOps pipeline for this is a good idea for many reasons:
- Reduces manual effort: By automating EXIF data removal, you save time and reduce the workload on your team.
- Executes tasks consistently and reliably: Automated processes ensure that EXIF data scrubbing happens every time an image is processed, without relying on human intervention. DevOps pipeline automation never needs coffee, never forgets, never calls in sick, and never takes PTO.
- Removes the potential for human error: By leveraging automation, you eliminate the potential for mistakes that can occur with manual data scrubbing.
- Ensures privacy protection at scale: Handling EXIF data through your pipeline allows you to maintain consistent privacy protection, no matter how many images you process.
How to scrub EXIF data in your pipeline
When it comes to handling EXIF data, your first order of business is to answer an important question: Should you scrub all EXIF data from an image, or is there some benefit in retaining some of it (the non-sensitive portion)? Let’s think about this question for a bit.
Full scrubbing versus selective removal of EXIF data
When you fully scrub EXIF data from an image, you’re left with no potentially sensitive data. This completely eliminates the risk of unintentionally exposing any sensitive details. It’s a simple and straightforward strategy, which is why many organizations go with it.
With a selective removal strategy, you retain certain EXIF fields that might be useful for your application, and you just remove the sensitive data. For example, you might keep camera settings but strip out GPS coordinates and timestamps. This approach can be useful if certain metadata is valuable for your application’s functionality. However, you’ll need a deeper understanding of which EXIF fields pose privacy risks.
Making this process part of your development workflow
When you incorporate EXIF data processing into your development workflow, you build privacy protection into your software from the start. So, how do you do this?
Start by including EXIF data scrubbing tools in your local development environment. This helps developers test and see the impact of data scrubbing early in the process. It also brings familiarity with specific tools—along with their features, effectiveness, and quirks.
Next, integrate these tools into your CI/CD pipeline. Once you automate the scrubbing process during code builds and deployments, you ensure that all images are processed consistently. Ultimately, automation is key. It’s how you’ll achieve consistency, reliability, and scale.
Properly integrating EXIF scrubbing into your CI/CD pipeline will go a long way in helping you maintain privacy standards across your entire application, regardless of how or where images are uploaded.
Tools and methods for scrubbing EXIF data
Several EXIF data processing tools are available to help you. Here are a couple of popular options:
ExifTool
ExifTool is a powerful and versatile CLI application for reading, writing, and editing EXIF data. It supports a wide range of image formats and metadata types. To see ExifTool in action, consider the following picture:
If we examine the image properties for this file (antelope-canyon.jpg), this is what we see:
To use ExifTool to scrub all the EXIF data from this image, we would do this:
$ exiftool -EXIF= antelope-canyon.jpg 1 image files updated |
Now, when we look at the image properties, this is what we see:
All information about the camera, camera settings, location, and timestamp has been removed.
Integrating ExifTool into your GitHub Actions is straightforward. Here are some examples:
- Remove EXIF GPS Tags is a GitHub Action that uses ExifTool to remove GPS tags from images.
- ExifTool Scrub is a GitHub Action that spins up a Docker container with ExifTool installed, which can then be used to scrub all EXIF data from images.
ImageMagick
ImageMagick is another powerful tool for processing images, including removing EXIF data. It provides a range of functionalities and can be easily integrated into your CI/CD pipeline. It also has existing integrations through GitHub Actions:
- ImageMagick Action is a GitHub Action that leverages ImageMagick to manipulate images, including stripping EXIF data.
In addition to these CLI tools, you can use libraries written for specific programming languages to help with EXIF data scrubbing. Examples include Pillow (Python) and Sharp (JavaScript).
Conclusion
Scrubbing EXIF image data is vital to protecting data privacy and ensuring compliance in your company. When you make this process a part of your DevOps pipeline, you can take advantage of automation, which reduces the risk of human error and brings you reliability and consistency. Tools like ExifTool and ImageMagick make it easy to remove sensitive metadata effectively.
For more information on how to implement these practices in your CI/CD pipeline, check out Akamai’s Image and Video Manager as well Linode’s helpful guides on working with CI/CD pipelines and automation.
Comments