In 2013 we did a series of blogs demystifying big data. We looked at what big data is, how big data is used and recommended some big data best practices.

Since 2013 the big data industry has continued to grow exponentially. A Gartner survey, conducted in 2015, showed that over 75% of companies had already invested in big data or were planning to in the next two years, leading Gartner’s research director Nick Heudecker to state that big data analytics were becoming standard practice in business.

[1] The Australian Bureau of Statistics has also come under fire this year after it quietly announced in December 2015 that at the 2016 census in August it would, for the first time, retain all the names and addresses it has collected to “enable a richer and dynamic statistical picture of Australia”.[2]

Traditionally, it has been ABS’s practice to destroy identifying information as soon as all other information on the census form is transcribed. The ABS argues that identification information will be stored safely and separately from the rest of the census data, creating a firewall that protects against individual identification. But secured information is never 100% safe. The privacy risks are obvious. If not managed well, the collection and use of big data conflict with the Privacy Act 1988 (Cth) (Act) and the Australian Privacy Principles (APPs), as well as an individual’s and society’s ideas about how personal information ought to be used.

So where does the law draw the line? The Office of the Australian Information Commissioner’s (OAIC’s) recent consultation draft ‘Guide to Big Data and the Australian Privacy Principles’[3] sheds some light on this.


The OAIC uses Gartner’s ‘three Vs’ to define big data: “high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight, decision making, and process optimisation.”[4]

OAIC refers to the term big data activities’, which includes big data analytics as well as handling of personal information before and after analysis such as how personal information is managed, collected, dealt with and maintained. Big data analytics is the process through which algorithms are applied to vast amounts of data, illuminating insightful and valuable trends. The handling of big data must be in accordance with the APPs when it is comprised of personal information. Personal information is defined under section 6(1) of the Act is information or an opinion about an identified individual, or an individual who is reasonably identifiable.

Big Data Risks: De-identification and Re-identification of Personal Information

The OAIC recommends entities undertaking big data activities to first consider whether de-identified personal information could be used. Data that has been successfully de-identified is no longer personal information according to the Act. This means the information may be used, shared and published without jeopardising personal privacy and lessens the risk that personal information will be compromised should a data breach occur.

The OAIC also recommends that entities considering de-identification of data could undertake a Privacy Impact Assessment (PIA). This risk assessment should consider the nature of the personal information, the de-identification techniques, the context in which the de-identified information will be handled and an assessment of the risk of re-identification during or following big data activities. PIAs are considered as practical tools that facilitate ‘privacy by design’ as they encourage entities to develop projects with privacy designed into the project, rather than being bolted on afterwards as a separate compliance activity.

If de-identification is appropriate there are several ways this can be done. Some examples recommended by the OAIC are:

  • Removal or modifying personal identifiers (name, address, date of birth) from the data and removing or modifying quasi-identifiers (profession, income) that are unique to an individual.
  • Reducing the specificity of personal information by using categories such as age ranges
  • Hiding the ‘uniqueness’ of data by swapping identifying information for one person with the information for another person with similar characteristics.
  • Creating ‘synthetic data’ sets in which the trends mirror the original data sets and are thus realistic and useful.
  • Suppressing data by not releasing particular information that may enable re-identification.[5]

Unsuccessful de-identification may lead to re-identification of personal information. OAIC recommends appropriate mitigation strategies such as using different or additional de-identification techniques or placing restrictions on the use of the de-identified information. The combination of appropriately de-identified personal information and implementing mitigation strategies reduces the risk of re-identification.

Big Data and APPs

OAIC also encourages entities to tailor their personal information handling practices for big data to comply with the APPS, such as:

  • Considering the management of personal information (APP1): by planning, and explaining how personal information will be handled before it is collected and implementing the four steps outlined in the Privacy Management Framework;
  • Notice and collection of personal information (APPs 3 and 5): by mapping the information lifecycle, identifying what personal information is collected, whether it might be utilised for big data and for what purpose. PIAs can also be useful to inform of big data activities;
  • Dealing with personal information (APPs 6, 7 and 8): by ensuring personal information may only be used or disclosed for the purpose for which it was collected or the individual has consented to or would reasonably expect the use or disclosure of their information; and
  • Maintaining personal information (APPs 10 and 11): by taking reasonable steps of ensure the personal information it holds is accurate, up-to-date, complete and secure.

If your business handles big data or you have any concerns about big data activities, we can help you to meet any compliance requirements and ensure best practice. Contact us to get advice from our experienced privacy lawyers.


[1] Gartner, Gartner survey shows more than 75 percent of companies are investing in or planning to invest in big data in the next two year, September 2015. [2] Australian Bureau of Statistics, Retention of names and addresses collected in the 2016 census of population and housing, December 2015. [3] Office of the Australian Information Commissioner, Guide to big data and the Australian Privacy Principles – Consultation draft, May 2016. [4] Gartner, The importance of ‘Big data’: A Definition, cited in Department of Finance and Deregulation, The Australian Public Service Big Data Strategy, 2013, pg 8. [5] Office of the Australian Information Commissioner, De-identification of data and information, April 2014, pg 4-5.