Advertisements

Cloud Analytics have proven with a recent demonstration that it is here to stay and that nobody is outside its radar coverage. The analytics are as per Gartner’s definition as being the six key elements of data sources, data models, processing applications, computing power, analytic models and sharing or storage of results and that any analytics initiative “in which one or more of these elements is implemented in the cloud” qualifies as cloud analytics. With the Panama Papers Cloud Analytics, there is definitely no doubt that big data analytics helped decoding the ginormous amount, sheer variety and volume of the data.  This rendered it manually impractical, if not impossible, as was the use of traditional analytics and search software.  HEXANIKA  reproduced this paper of DATANAMI shortly after the global whirlwinds caused by the publication of the PANAMA Papers.  Here it is : 

The Panama Papers: How Cloud Analytics Made It All Possible.

In late 2014, an anonymous person offered to send a German journalist 11.5 million encrypted documents detailing the structure of offshore business entities created and managed by a Panamanian law firm in the world’s most notorious tax havens. The massive data set was simply too big for one reporter to comprehend. But thanks to the power of big data analytics running in the cloud, a large team of journalists started piecing it together.

This is the story of how the International Consortium of Investigative Journalists (ICIJ) worked with the Süddeutsche Zeitung (SZ), Germany’s largest daily newspaper, to analyze the data known as the Panama Papers. The ICIJ investigation (https://panamapapers.icij.org) surfaced unsavory connections between powerful politicians, business owners, banks, and offshore businesses, which the ICIJ alleges are used to cover up tax evasion and other financial crimes.

cloud-computing The Panama Papers has already led to the resignation of Iceland’s prime minister and the head of a Chilean-based anticorruption group. It’s also raised questions about the financial dealings of others, including Russian and Chinese leaders, Argentinean soccer stars, and Saudi Arabian royalty. More stories based on the ICIJ’s investigation are slated for the weeks to come, and next month the public will have access to ICIJ’s entire Panama Papers database.

The Panama Papers is essentially a full data dump from Mossack Fonseca, the Panama-based  law firm that’s been said to be the fourth largest creator of offshore businesses in the world. The treasure trove consists of 2.6TB of data, including relational database files, emails, and various types of documents about the 215,000 offshore bank accounts and shell companies that the law firm and its predecessors created for thousands of individuals between 1977 and 2015.

Setting up the systems that would enable ICIJ journalists to pour through this massive data set was the responsibility of the ICIJ’s Data and Research Unit Editor, Mar Cabra. In an interview with Datanami, the Spanish journalist discussed the technical challenges that the Panama Papers represented, and the practical solutions that were implemented.

An ‘Impossible’ Challenge

As is usually the case in data analytics projects, most of the work involved preparing the data. There were two main types of data that ICIJ had to deal with: structured database files, and unstructured documents.

The anonymous source gave SZ, and subsequently ICIJ, a copy of Mossack Fonseca’s entire internal database, which contained the names of the shell companies, and the names of the companies’ officers, shareholders, intermediaries, and beneficiaries. This was critical data, but without the database schema, it was difficult to piece it together. One of ICIJ’s tech experts spent months essentially reverse-engineering the database to make it searchable, Cabra said.

Read more at the original document of DATANAMI‘s .