https://gdstechnology.blog.gov.uk/2018/04/27/open-document-format-in-government-an-update/

Open Document Format in government: an update

Person using a computer

The Open Standards team was asked 4 years ago by the Open Standards Board to help government publish documents in a more open, transparent and accessible way. We’ve since made progress in achieving these objectives but we still have more work to do.  

This blog focuses on how far we've come in our mission to make Open Document Format (ODF) the default standard for editable documents. ODF is not intended to replace read-only documents like PDFs, so we have not included PDF usage in our statistics below.

The problem with government documents

The original problem facing government was that too many documents were published in a closed format. These documents appear corrupted when opened in a web-based editor, as shown below.

Screenshot of a closed document opened using software that does not support it. The text is blanked out and it is impossible to read
Close format documents are not always open to users

We started by promoting the use of Open Document Format

We knew that our users wanted to read and edit documents on a wide range of operating systems without restrictions.

Our solution was to switch government from using closed format documents to using ODF. This standard allows users to open and edit documents, spreadsheets and presentations on any platform, so no user is at a disadvantage.

When the Open Standards Board adopted ODF in 2014, our intention was to end government reliance on closed document formats and to have ODF as the main source of editable document attachments. That hasn't happened yet. We recognise why this is the case and the work we’ve still got to do (detailed below). It’s a work in progress.

Government documents are getting more user friendly...

We've recorded a big drop in the number of closed format documents being uploaded to GOV.UK in the last 2 years.

 Graph showing that, over the last two years, closed formats have dropped from around 8,000 attachments a month to under 2,000. Open Format attachments have dropped from 4,000 to around 3,000
Number of open and closed attachments added to GOV.UK (2016-2018)

This graph shows that there are more ODF documents now being uploaded than closed formats. But there’s also been a slight drop in the number of open formats being uploaded.

This drop in total number of attachments is because more departments are realising that HTML – the language of the web – is suitable for publishing documents. Departments now know that they can publish their content in a web-friendly and accessible way without uploading and sharing files. We see this as a positive movement. HTML is an open standard which has been selected by the Open Standards Board for viewing documents.

What documents do our users download?

The graph below shows how many times different file formats (excluding PDF as it is not editable) were downloaded from GOV.UK in the first quarter of 2018. This data helps inform some of the next actions we need to take.

Graph of total downloads in the first quarter of the year. The data is: CSVs 700,000. DOC 690,000. XLSX 680,000. Docx 390,000. ODS 300,000. XLS 280,000. ZIP 220,000. ODT 110,000. XLSM 80,000. RTF 50,000. PDFs are excluded as these are not editable.
Blue is an open format. Red is a closed format. Purple is an unknown format.

Comma-separated values (CSV) files are an open format and are the most popular data filetype on GOV.UK.  You can also see that ODS is slightly more popular than Microsoft Excel files (XLS).

From looking through some of the documents, we know that Microsoft Excel (XLS, XLSX, and XLSM) files are often used to gather information from users. Eventually, we hope these formats will be replaced with HTML forms.

The Microsoft Word (DOC) files are mostly older content. We need to make sure that legacy documents are refreshed so that they are accessible for everyone.

ODT is downloaded about a third of the time, compared to DOCX.  We need to see whether all DOCX attachments have an equivalent ODT available.

Finally, the ZIP files sometimes contain open formats, sometimes closed. The Rich Text File (RTF) format content is mostly legacy forms.

Barriers to using open documents

Some departments haven't updated their workflows to publish ODF documents. This is because we haven’t been able to promote the format, and departments have not had enough support to help them make the transition to an open publishing process.

We also need to improve our user guidance to help people find updated software which is compatible with ODF. Every modern office suite supports ODF, but we still hear from users who are confused about how they should open these files, especially on mobile devices.

Our plan to continue improving document publication

Based on our data, we have created a 5-step plan to get our mission back on track during 2018.

  1. We’ve updated how to publish on GOV.UK guidance to make sure GOV.UK publishers are clear they must provide an open standard format version of the documents they’re uploading.
  2. Where we see a department regularly uploading closed formats, without providing an open equivalent, we will visit their publishing team and help them to understand the benefits of open formats.
  3. We will review historic documents that get lots of downloads and work with departments to republish them as ODF documents or HTML.
  4. We’re improving guidance to help users who are unsure of how to open ODF. This will include better information for Welsh speakers.
  5. Finally, we will collect statistics on what files have been uploaded and downloaded quarterly. This will enable us to track whether our actions are making a difference.

We cannot have important documents published in formats which do not meet open standards. Government documents are for everyone. Whether you're using Windows, Mac, GNU/Linux, Chrome OS, iOS, Android, or any other system - you have the right to read what we have written and we will continue on our journey to make documents open and accessible.

Leave a comment below or email the Open Standards team if you need support making your organisation’s documents more open and accessible.

If this sounds like a good place to work, take a look at Working for GDS - we're usually in search of talented people to come and join the team.

14 comments

  1. Comment by Lewis Cowles posted on

    Has unoconv been considered as a way to offer Open Documents, that have been converted from proprietary formats? Using automated tools lessens the human and training burden for legacy documents and can be used to encourage a healthy transitionary period.

    Existing tools could likely be modified to give the public the tools to flag documents and relieve administrative pressure on government workers to get it right before shipping (the eventual goal)

    Reply
    • Replies to Lewis Cowles>

      Comment by Terence Eden posted on

      Hi Lewis, we have looked at several document converters.

      One of the problems is that converting is never a lossless process. We're worried that an automated process might mess up an important document and provide misleading or incorrect information.

      At the moment, the human publisher manually reviews everything they upload themselves - so they can see if the conversion to a different format has damaged the document. If we automate it, there's no one checking that the new document is correct.

      We're also considering a "view-on-the-web" option. We already do this for CSV. Allowing people to view any document in their browser would also be helpful for people who can't install software.

      We are looking into this - and trying to find the right way to satisfy our users' needs.

      Thanks

      Reply
      • Replies to Terence Eden>

        Comment by Xisco Fauli posted on

        Hi Terence,
        In the LibreOffice QA team we use these scripts https://github.com/x1sc0/office-interoperability-tools/commits/master to identify regressions in the documents' layout automatically.
        Basically they convert the documents to the desire format ( let's say from DOC/DOCX to ODT ), create a PDF for each format and compare the PDFs, identifying differences in them. Currently we use it with ~5000 documents.
        I guess it could help you with your task.
        Regards

        Reply
  2. Comment by Lewis Cowles posted on

    Absolutely fantastic Terence,

    The tool I mentioned (rightly not the only tool); may generate lossy output, but it does not by default remove or delete the source document.

    My reservation about human publishers is the time, effort and cost. I fully expect humans will need to be involved at an appropriate stage, and that you're in a better position than me to assess that.

    I can tell from your language a UX process is being followed which is great! I suppose I buy-in to the idea of lean services that are balanced between meeting all needs and cost of delivery is maybe a divergence.

    Reply
  3. Comment by Chris posted on

    Good news!
    A huge number of documents I see are PDF - that's an issue that won't go away unless you take it on now.
    Some of them have been printed out, signed, then scanned back in before uploading as PDF 'proof'.
    Authorities used to throw away what are now seen as precious documents. Badly scanned PDFs could turn out to be very valuable in future.

    Barely readable scans are really difficult to search, impossible for users in most cases as the websites don't index the contents of scans.

    Maybe central government could introduce a British-Library-like scheme where all published docs have to be lodged centrally too?

    It's clearly a training issue but also a human one - maybe you need to introduce an electronic signing standard (blockchain?) where one page is signed and there's proof that it was attached to the rest of the document?

    p.s. It's nice to be able to comment but, as almost nobody does, it seems like an after thought and a bit 'megaphone'. Why not dignify citizen input via a discussion forum and thus engage more?

    Reply
    • Replies to Chris>

      Comment by Terence Eden posted on

      Hi Chris,

      I feel your pain on PDFs. We will be discussing it at the Open Standards Board meeting next week. Our long term hope is that the majority of documents and forms will be published as HTML.

      That said, we know that there are a large number of people who either don't have access to a computer or are not confident using them. We need to make it easy for them to use our forms as well.

      All of our published documents are regularly archived by the National Archive - https://www.nationalarchives.gov.uk/ - and by the British Library - https://www.bl.uk/collection-guides/uk-national-government-publications

      Digital Signing is another complex issue which we're looking at. Again, it is a balance between user needs, suitable technology, and security. At the moment we recommend that the best way to be sure that a document is genuine is to download it over https from GOV.UK - but I think some form of signing and assurance will be necessary in the future.

      Finally, we do have a dedicate space for anyone to discuss Open Standards. Pop along to https://github.com/alphagov/open-standards/issues and take part in the conversation.

      All the best.

      Reply
      • Replies to Terence Eden>

        Comment by R K Hayden posted on

        Whilst accepting the issues with scanned PDFs, and also agreeing that more should be published in HTML, there's still a case for for producing PDFs, which is an open standard. For some people PDFs and the tools available to access them suit their reading style better, especially for longer documents, or if they wish to make notes against them.

        Reply
    • Replies to Chris>

      Comment by David Pearson posted on

      I'd only add that I have many doubts about PDFs which are scanned documents. They are usually at odds with clear requirements around accessibility - including under the EU Web Accessibility Directive. I know there are some special cases - but we refuse to publish any scanned PDFs except "in emergency" and/or there's no viable choice and there's an overwhelming need to publish something.

      Reply
  4. Comment by Paul posted on

    Lots of credit for moving in the right direction.

    Reply
  5. Comment by Henry Armitage posted on

    "Back on track"? As far as I'm concerned, this is amazing progress. You're moving widespread, established user habits. Many private companies would envy being able to do that. Excellent work.

    Keep up posts like this, they are great.

    Reply
  6. Comment by Keith Prust posted on

    Agree that this is the right approach and lots of progress has been made, but there's still some way to go. It's not always possible to view open document formats on some mobile devices. And organisations often lock down the devices they give to staff to stop them downloading apps that could open an ODF document. Do you have plans to make sure that ODF is accessible on all official devices used by government departments?

    Reply
  7. Comment by Rob Pearson posted on

    As Keith Prust says: Do you have plans to make sure that ODF is accessible on all official devices used by government departments?
    It is great to see the communication about this important topic, thank you.

    Reply
  8. Comment by Terence Eden posted on

    Hi Keith and Rob,
    Next week we'll be publishing better guidance on which mobile apps support ODF. The good news is that most modern phones have "office" viewers which are usually compatible with ODF.
    This guidance is designed for government IT teams and regular users.

    If you know of departments which don't allow ODF compatible apps to be installed - please let us know using "openstandards -at- digital.cabinet-office.gov.uk"
    Thanks

    Reply

Leave a comment

We only ask for your email address so we know you're a real person