Google doesn’t just show links to standard html web pages, they also crawl and index non-html content including pdf files.
There are a couple of options for those wishing to remove pdf links from Google:
- Get the pdf taken down from the original website so you see a true “404 page not found error” (preferred)
- Ask the site owner to block Google by adding a “robots.txt” directive (acceptable)
- Keep the pdf in place but edit it’s contents so the privacy issue is resolved at the source (least preferred)
Let’s go through these options one-by-one:
1) Get the pdf removed from the site so you see a true 404 error
This is the best option and there is no need to wait for Google to revisit the pdf. You can expedite the removal process with the Google removal tool.
Important Note: The pdf must return a true page not found response (404). Redirects (302) or other responses (200) may result in a denial and the pdf may linger around in Google for some time.
Expected result: Google’s automated tool will check the url and if it’s a 404, they then will remove the pdf. This may take slightly longer than normal web pages, expect up to one week.
2) Ask the site owner to block Google
Any site owner can add a piece of code the site instructing Google not to visit the pdf file. Once that code is in place you or anyone else can remove the pdf by going to the Google removal tool.
The code is called a “robots.txt” directive.
Expected result: Google’s automated tool will check the url and if it’s blocked by robots.txt they will remove the pdf. This may take slightly longer than normal web pages, expect up to one week.
3) Edit the contents of the pdf
In theory, if you edit the contents of a pdf file so the privacy issue is resolved at the source it is then just a case of waiting for Google to update things at their end.
And here lays the problem; Google doesn’t crawl (visit) pdf files as frequently as normal html web pages. Choosing this option may result in long delays before they update the search engine pages.
Expected result: Google will update the search results only after they see the updated pdf, this could take one day or it could take weeks.