Google Image Search is Peeping your PDF’s

Google Image Search is Peeping your PDF's

What does your company use PDF’s for? The answer may range from info products to reports to clients to “absolutely nothing.” But if you do have PDFs tucked away somewhere on your site, it’s important to know that Google can now take them apart and show the images in search results.

The PDF Search

Finding PDFs in search results is nothing new. If a PDF is indexed somewhere on your site, it can turn up in relevant search results. (This can be a problem with carelessly uploaded internal reports.) However, previously PDFs were an all-or-nothing game. If it was deemed relevant to a search, the whole document came up.

Now, Google goes into PDFs and identifies images, and treating them just like any other results for its Image Search feature. That has a few consequences:

  • Google guesses at what the images are of and what searches they’re relevant to.
  • Your copyrighted images, including charts and graphs, can show up divorced from the document that identifies them as yours.
  • Users can now download these results as image files, making them easy to re-use without your permission. (Previously, a pirate would have had to strip them out of the PDF themselves, a much more intentional process).

Of course, the fallout isn’t all bad. This also means that your content is more discoverable than ever, and that you can leverage a Google Image Search to find charts and graphics relevant to your field. But the new information may change how you handle your own PDFs.

Basic Precautions

If you want to protect your content, there are a few steps you can take:

  1. Don’t upload internal PDFs to your website. These documents should be handled on a private intranet, or shared in a controlled manner using secure services like Dropbox or Google Drive. If you put a document on your website, even away from the customer-facing portion, assume Google will find it.
  2. If you don’t want images indexed, adjust your “robots” tag. If you tell Google not to index a page, it will respect your wishes. The easiest way to do this is to give the page with the PDF a robots meta tag set to noindex, nofollow. Now you can make a public link to the document for your clients, without it coming up in search results.
  3. Label images as yours. While the above methods work if you need absolute privacy, it really is better to get your PDFs in search results where possible. If all you’re worried about is copyright, just add your URL and a copyright notice on each image. This can be a nondescript line at the bottom corner. (It must be in the image itself, not a caption, to be effective). Now even if your image is spotted in search results, everyone will know who it belongs to.
  4. Re-evaluate past PDFs. PDFs are the kind of thing that easily get put on a website and forgotten. Remember, any old PDFs you once uploaded, from annual reports to how-to guides, are still out there. Re-assess this existing content and either add the copyright notices to its images or take it down.

This new feature from Google is a very minor shift, but it always pays to protect your content. Whether the new feature will end up being particularly useful, of course, is anyone’s guess. Does it seem helpful to you?