Case study: Unlock valuable content trapped in PDFs

PDF logo

Converting PDFs to webpages can unlock valuable content

UPDATE: This interview is creating conversation. See: http://lists.w3.org/Archives/Public/w3c-wai-ig/2012JanMar/thread.html

We were very happy to have the opportunity to talk to Mark Bryant, Systems and Technical Manager for the Victorian Government’s Department of Primary Industries about a major web development project in which the Department made the decision – no more PDFs.

Converting PDFs to webpages results in staggering increase in page views

A striking lack of PDF files differentiates the Victorian Government’s Department of Primary Industries web site from just about every other government and business site in Australia.

Since DPI Systems and Technical Manager Mark Bryant completed a PDF purge in July 2011 the DPI site has also registered an astonishing increase in page views – up 1.6 million per annum from 4.2 million to 5.8 million.

Mark says the reason for the increase in page views is simple – the conversion to HTML and removal of thousands of PDF files has unlocked a vast wealth of useful information, and made it easily searchable and accessible.

“As we converted more and more PDFs to HTML/web format, the stats just kept going up and up until we reached around 1.6 million extra page views per year – it was fantastic.”

Today, the DPI web site has 22,000 pages and just a couple PDF files which have made their way back onto the site – a stark change from 2009, when the site featured 6000 pages and 9000 PDF files.

Accessibility issues

Mark said although PDFs are a convenient tool for web content publishers, they present considerable accessibility challenges for users.

“I think it was part of the cultural change from print to online; PDFs allowed people to create documents as if they were going to be printed, and then save them as a PDF and put them up on the web, a complete re-think was needed.”

Mark said major disadvantages of PDFs include:

  • not showing up in search results
  • failing Australian Human Rights Commission requirements for being accessible to people with a disability, such as compatibility with screen readers
  • penalising people who have slow internet connections
  • often extremely large document sizes.

Business case for turning PDFs into HTML

In July 2009 DPI started a major DPI web redevelopment project, focusing on technical upgrades, governance, visual identity, information architecture, and market research.

“Our users were telling us they wanted to do things in a different way, and when we converted a few PDFs to web pages we found the web pages outperformed PDF by as much as 160 to one.

“Initially we tried to create a web page to match each PDF, but in the end we introduced a blanket rule – no PDFs as it was far too difficult to manage both formats,” Mark said.

“There was some resistance, but the business case is pretty simple when you can show that a web page is being read around 160 times more often than a PDF.

“If you are spending money preparing content for the web, then that money is essentially being wasted if that content is locked up in a format people are unwilling to use.”

Team to convert 9000 PDFs

With PDF clearly identified as a barrier to site use and accessibility, Mark established a team of five people responsible for converting 9000 PDF files into web pages.

Starting in July 2010, the conversion team worked with content owners to ensure all relevant content for each PDF was captured – in some cases involving very large PDFs, this required creation of ‘micro sites’.

“No information was discarded, and with all content now in HTML search results work a treat, which means our audience is more likely to find the information they need,” Mark said.

“Now if you want to use a PDF on the DPI web site, you need a pretty good business case, you accept any responsibility, and you make sure its WCAG2 compliant (Web Content Accessibility Guidelines), which means you have to have a web version anyway.”