von Stefan Wehrmeyer

In our database of court donations, anyone can look up which organizations and associations have received over 170 million Euros from the German judiciary. This blogpost aims to explain the coding and technical aspects behind the web application. A link to the complete source code on Github is included at the end.

We at CORRECTIV believe that the development of a story isn’t completed with its publication. That’s why we make our work transparent, publish source codes, and release our data. We’re excited to see what else can be uncovered.

## Project course

First we had to gather all of the data. Our reporter Jonathan Sachse described the obstacles he faced in his blogpost last week (in German). After obtaining all relevant data sets from the various federal states, we organized them by state, year, and authority.

This is where the actual coding set in:

The data is available in PDF, Excel, and Word formats. The aim is to develop a single spreadsheet containing all payment transactions. A quick look through the source files revealed that there was no coherent data pattern. Even though some documents include much more information, the entire set could only be broken down by the name of the institution and the allocated sum.

## Conversion into CSV

In order to merge the data, it’s advisable to convert the individual sets into the same format. The CSV format (Comma Separated Values) is ideal for automatic processing: no formatting, no hidden or combined columns — a single spreadsheet with a clearly defined header. Our spreadsheet has to at least include the columns „name“ and „sum“. Additional columns are optional but should have the same label if they include the same things.

The actual conversion takes a lot of manual work. While Excel spreadsheets can often export simple CSV-data, PDFs usually have to be processed with OCR in order to make the font machine-readable. Afterwards they’re processed through a tool such as Tabula, which recognizes charts.

The resulting CSV sets build the foundation of our data. Mistakes or other problems in the data caused by the treatment with OCR can be corrected at this point. Since this has to be done by hand, it takes quite a bit of time.

## Consolidation and automatic cleansing

After the conversion process a lot of the data remains dirty: the name column often still includes continuos numbering, the address of the organization or other information. The sum column also includes all kinds of digits, punctuation marks, blank characters, and variations in currency (€, EUR, Euro,…).

A Python script (an application written in the programming language Python) reads all CSV data sets, forms an aggregate of column names, cleans known columns according to certain patterns (known phrases and worse), and rewrites all data as a large CSV file.

## Deduplication with Open Refine

Open Refine helps to clean tabulated data sets. The roughest mistakes were already removed by the Python script so that we only need Refine for the deduplication. Refine helps to avoid including the same organization more than once with different sums in our spreadsheet — instead, one sum per organization is the goal. For this we apply the „Cluster & Edit“ mode to the name column.

Open Refine offers several possibilities to merge entries. One of the most accurate features is „nearest neighbor“, which allows Open Refine to compare each individual entry to determine their resemblance. With over 44.000 entries, this takes some time. Therefore it’s advisable to use one of several possible „key collision“ methods. „Key collision“ builds a pattern out of each line, for example by looking at every third character, and combines the entries with matching patterns. Open Refine gives you the possibility to try a whole variety of „key collision“ functions. We played around with the different parameters and found a whole lot of great duplicates to combine.

But what is an actual duplicate and which entries only seem similar? That’s a matter of interpretation. Organizations in varying indexes may have the same name without actually being identical. Regional branches aren’t the same thing as their federal association. The keying function „metaphone3“, for example, merges all organizations beginning with the word „Förderverein“ (booster club), leading to a bunch of mistakes. The fingerprint function in Open Refine is better equipped to recognize a number of typing and OCR mistakes and even normalizes them into the right spelling.

Deduplication entails a number of human decisions. These can be extracted from Refine in the undo/redo mode as JSON-formatted commands and incorporated into the data processing pipeline. This makes the whole process transparent and reproducible. It also allows us to reproduce our decisions when working with new data sets or subsequent updates.

## Pipeline

The idea behind a data pipeline is to let data flow from its source to the database in automatic processing stages so that bugs in the source don’t have to be fixed manually. We built such a pipeline for this project: the individual CSV files are merged into a single document and cleansed. Saved Refine commands are applied to the entire CSV file before uploading the result to our final database.

## The web application

The goal of this project is to empower interested citizens to use our donation database. A web interface with search and filtering options as well as individual pages for each organization is a natural solution.

The web application is a Django project. While the data is gathered in a conventional database, the search is indexed in ElasticSearch. The application uses the official ElasticSearch Python Client and a ElasticSearch query with interlaced aggregators, filter queries, and sorting.

The complete source code of our web application can be found on our CORRECTIV GitHub account.


Knight-Mozilla-Fellow at CORRECTIV

The Knight-Mozilla OpenNews Fellowship brings developers, data journalists and newsrooms together. Next year, CORRECT!V is one of six newsrooms to host a fellow for one year. Today, the application phase begins for fellows.

von Daniel Drepper , Stefan Wehrmeyer

This fellowship is one of the best known programs in the world to pair up developers and journalists. The Knight Foundation has sponsored the fellowship since 2011. In the past, the New York Times, the Guardian, the Washington Post and ProPublica have been partners.

Next year the hosting partners are the TV program Frontline, the Los Angeles Times, National Public Radio, the online publisher Vox, the Coral Project and CORRECTIV.

Fellows of the OpenNews program will work within the newsrooms on specific projects. In the past, these have been tools for data-journalism and visualization or impact analysis of investigative stories, among others.

The fellowship promotes global exchange between participating fellows and newsrooms. The fellows will work toward a better future of journalism not only in the newsrooms, but also by attending international conferences.

From today until August 21st, everyone who is interested in journalism and technology can apply for the fellowship. The Knight Foundation pays for accommodation and living expenses for ten months. We are looking forward to the fellowship, to the new contacts – and we are excited to meet our fellow.

© Open Road by Paul De Los Reyes unter der Lizenz CC BY 2.0


Become an OpenNews Fellow at CORRECTIV

Our newsroom will host a Knight-Mozilla OpenNews fellow in 2016. Here's why you should apply and come work with us.

von Daniel Drepper , Stefan Wehrmeyer

Six news organisations are part of the 2016 OpenNews fellowship programme. CORRECTIV is the only non-US newsroom in the list and we want you to work with us as a fellow! Here are some information bits about us.

Who we are

CORRECTIV is the first non-profit investigative newsroom in Germany and heavily inspired by the model of ProPublica (although we are still much smaller). We are funded by memberships and major grants from foundations and our focus is on mid- to long-term investigations. Our team currently consists of around 15 reporters and editors, so small enough for a fellow to have real impact. Nonetheless we collaborate with media organisations in Germany and Europe and many of our publications reach an international audience.

What we are working on

A few examples of our work from the past year which illustrate our vision of open journalism:

  • We investigated what happens to monetary fines paid as a result of criminal proceedings. The result: Judges and federal prosecutors distribute these funds with almost no supervision and at their own discretion. Instead of just writing a story, we collected data from the past eight years on court donations totalling more than 350 million Euros. Everyone can now search for suspicious transactions in our database.
  • Multi-resistant bacteria are one of the largest health threats of our time. We investigated how dangerous the situation really is and visualized how often health insurers are charged for the treatment of the most relevant superbugs. Everyone can enter their post code into the database to track how their region is affected in comparison to the rest of Germany and how the number of reported superbugs has developed.
  • We are investigating the free trade agreement TTIP. We have illustrated the most important objectives with catchy graphics and are continuously publishing original documents.
  • Our investigations into subjects such as TTIP and court donations are long-term projects. We are gearing up to start a two-year investigation about multi-resistant superbugs in Europe. Climate change is also a topic we want to thoroughly tackle. We strive to give citizens information that can be continually updated since most stories simply don’t have a finite ending, but instead keep on evolving.
  • We are currently developing a virtual newsroom. Using this new platform, we want to investigate stories and subjects important to our society collectively with citizens and other journalists. In this way, we can go far beyond what we can accomplish individually. With the virtual newsroom, local investigations can be combined to discover and paint a larger picture.

Why you should become a fellow!

The OpenNews fellowship allows you to enrich a newsroom with your perspective and skills while learning about the on-going investigations and new methods in journalism in general. Come help us with our investigations and build cool tools with us! All you need to do is apply until August 21, 2015!

© Ivo Mayr

Euros für Ärzte

Doctors eagerly welcome big pharma’s money

Pharmaceutical companies have a history of being quite generous with physicians: They pay for their contracts, invite them to congressional meetings, pay for their hotel rooms, reward them for drug monitoring. A total of 575 million euros flowed from pharmaceutical companies into the hands of more than 71,000 doctors and medical experts this past year in Germany alone. But only 20,000 of these doctors are willing to have their name and the amount of money they’ve accepted disclosed.

von Christina Elmer , Markus Grill , Stefan Wehrmeyer

Direct link to the database

It was unprecedented: In late June, 54 pharmaceutical companies presented the amount of money they doled out to doctors in Germany. A total of 575 million euros flowed from pharmaceutical companies into the hands of more than 71,000 doctors, experts and medical institutions. Barely a third of these physicians were willing to have their names published alongside these numbers. But a joint collaboration with CORRECTIV and Spiegel Online led to the publishing of the first database to include the names of 20,489 physicians and how much money they received from the pharmaceutical industry in the past year. Any internet user can search this database for doctors by name, city and zip code.

Who are the doctors positioned at the top of the list? First is Dr. Hans Christoph Diener from Essen, who in the past year received more than 200,000 euros for lectures, consultations, trainings and other various expenses. Then came Dr. Juergen Rockstroh from Bonn with 148,000 euros, followed by Bochum-based Dr. Albrecht Nauck, an endocrinologist that specializes in diabetes with 128,000 euros and another endocrinologist, Dr. Thomas Forst from Mainz with 100,000 euros.

Although these four doctors are situated at the top, it does not necessarily mean they pocketed the most money in Germany. They are simply the top „earners“ of the doctors willing to have their names released.

Jens Schreiber, an internist from Magdeburg, accepted money from a record 11 different pharmaceutical companies last year. The highest of his 11 „subsidies?“ An astonishing 24,000 euros from the pharmaceutical company Novartis.

The „Transparency Project

“The release of this data will increase understanding and acceptance for the cooperation between pharmaceutical companies and doctors“, says Birgit Fischer, in praise of the data before its publication to the public.

Fischer is Head of Sales for the Research Association of the Pharmaceutical Industry (VFA), a big pharma lobby based in Berlin. „Access to these numbers can help the public understand how doctors and pharmaceutical companies work together in the healthcare system.“

20160620 VfA PK5.JPG

Director of VfA Brigit Fischer and FSA CEO Holger Diener (left): „More trust in the pharma industry“

Markus Grill

The transparency project, however, has a major flaw: The data is nearly indecipherable for patients. The project requires the 54 participating companies to simply host documents, with the names of the thousands of doctors, somewhere on their website. In many instances, the documents were unreadable PDFs or only included the first name of the doctor. In other cases the city was listed, but not the zip code. Nowhere was there a searchable and usable presentation of the data combined. And according to Holger Diener, the CEO of FSA, Germany’s self-regulatory body of the pharmaceutical „no such database is planned.“

Some companies even expressly forbid the use of their data. Gruenenthal, a German pharmaceutical company, wrote on their website that „the release of this data in no way provides permission for the use or editing of this data.“ UCB Pharma also posted a warning: „The release of our data is under no condition a permission for visitors to the website to continue working with this data.“

A databank with 20,000 names

Despite these warnings, CORRECTIV and Spiegel Online moved forward and used the pharmaceutical companies’ data to create a searchable, and free to use, database. This is the first database in Germany that lists more than 20,000 doctors by name and the exact sum of all finances given.

A total of 119 million euros were paid to doctors for lectures, trainings and travel expenses in the last year. That amounts to an average of 1,646 euros for every doctor.

Add to that another 366 million euros as payment for drug monitoring and other medical studies, which the companies refuse to provide additional information about. „We don’t further differentiate in the research sector“, says Birgit Fischer as she attempts to justify the current lack of transparency when it comes to drug monitoring and application observations. „When made, this was a decision that we all agreed to.“

The pharmaceutical industry always argues that these observations are important for research, but Juergen Windeler, Head of the Institute for Quality and Economics in Health (IQWiG) finds this mildly funny. „These studies are scientifically worthless“, he says. „They do not provide any worthwhile information about the use or effectiveness of a medication. That is why we don’t even look at them.“

The payments influence doctors

For years experts have debated what influence, if any, pharmaceutical payments have on doctors. Most doctors believe they won’t be influenced, even if they are sponsored by the industry. But then came an infamous study from a California hospital. Doctors working at the hospital were asked if their selection of medication was influenced by pharmaceutical referrals. 61 per cent said „not at all.“ The study’s designers then switched things around and asked whether the doctors felt their colleagues were influenced by pharmaceutical payments. This time 84 per cent of respondents said that they believe their colleagues are influenced.

In Germany, Klaus Lieb is one of many who researches the impact of payments. He is director of the Clinic for Psychiatry and Psychotherapy at the University of Mainz. „We doctors have a blindspot when it comes to conflicts of interest. We let pharmaceutical companies pay for us and still believe we are independent.“

LIEB_Klaus_ Foto Peter Pulkowski011DSC_2136.jpg

Klaus Lieb, Uniklinik Mainz: „Doctors who attend pharma sponsored trainings prescribe on average more expensive drugs.“

Peter Pulkowski

Lieb published a study in the professional magazine Plos One that showed that doctors, who frequently meet with pharmaceutical reps, prescribe more medications than those who do not. “Moreover, doctors who attend pharma-sponsored training session prescribe higher priced drugs on average.“ Additionally, doctors closely affiliated with the pharmaceutical industry emphasize the benefits of medications and downplay the risks associated with the same drugs. „In the meantime, there is a very good database with all of these findings“, Lieb say.

An example from the US

Calculations from CORRECTIV and Spiegel Online show that only 29 per cent of doctors agreed to have their names released. This low number worries Lieb. „Transparency looks different than this“, he says. „ You can’t start much with this on the individual level, especially compared to the Sunshine Act.“

The Physician Payments Sunshine Act is a law enacted by the US-government by President Barack Obama’s leadership in 2010. It mandates that all pharmaceutical companies publish the amount of money and name of the doctor they provided compensation to, regardless of whether the doctor has agreed or not.

“No laws planned“

When asked whether there will be any changes to the transparency project, since such a low percentage of doctors agreed to have this data publish, Germany’s Federal Health Minister, Hermann Groehe of the CDU party, says: „There are no additional regulations planned other than already existing transparency and corruption fighting regulations.“ Obama’s Sunshine Act is not an option for Germany, he continued: „The rules in the US face criticism. Many argue that the presentation of payments to doctors and companies, without context, increases the likelihood they’ll be seen as corrupt“, the ministerium responded when questioned.

Klaus Lieb from the University Clinic in Mainz views the release of money given to doctors as progress. „When these payments are released, the doctors take their own conflicts of interest to heart. [The doctor] has to look them in the face and acquires a certain perspective. This helps him see past his blind spots.“

Peter Sawicki, who was a top drug checker for many years as head of the IQWIG, sees the pharmaceutical industry’s transparency guidelines purely as a campaign.

“This is another measure to present themselves as clean to the public.“ He went on to stress that payments from the pharmaceutical industries massively influence doctors, a claim that is supported by multiple studies.“It is really time that we enact consequences, in light of these findings and a establish advanced training opportunities for doctors that are independent of big pharma-independent“, he says. „But the political will is missing.“

Contributors: Hristio Boytchev, Ariel Hauptmeier, Christoph Henrichs, Simon Jockers, Ivo Mayr, Philipp Seibt, Patrick Stotz, Achim Tack.

© imago/Westend61

Super bugs

Sloppy hygiene in hospitals causes more deaths than road traffic accidents

More than a quarter of hospitals in Germany were not meeting hygiene recommendations in 2014, according to new research by CORRECTIV and the ARD magazine "Plusminus". The Berlin Ministry of Health puts the responsibility on federal states and individual clinics. The board of the health insurance company BKK has criticised what it calls "serious deficits".

read more 10 minutes

von Hristio Boytchev , Stefan Wehrmeyer

Dangerous bacteria have an easy time of it in German clinics. Doctors don’t wash their hands often enough, which means that their surgical instruments become contaminated. Above all, there are too few skilled hygiene workers on the ground – in the hospitals – who understand the proper hygiene procedures which could help to fix the problem.

These are the conclusions of a joint evaluation of hospital quality reports and data from the BKK Landesverband Nordwest by CORRECTIV and the ARD magazine “Plusminus”. According to the analysis, in 2014 more than a quarter of clinics in Germany did not have the recommended number of hygiene personnel. The worst state was Bremen, where 43 per cent of hospitals didn’t meet requirements. Next was Thuringia with 42 per cent, followed by Berlin with 37 per cent. Hamburg came the closest to meeting requirements, with only 10 percent of hospitals falling short on hygiene standards.

Dirk Janssen, vice-chancellor of the BKK-Landesverband Nordwest, considers the results alarming. They show the “serious deficiencies” of many hospitals in the handling of hygiene, he said, and if nothing changes “it costs [the lives of] thousands of patients every year”.

Quality reports from the hospitals were the starting point for the analysis. Every year, every hospital in Germany must give an account of its facilities, standards and medical procedures. However, the reports are written in-house – and the reality could be even worse.

According to the recommendations of the national infection control agency, the Robert-Koch-Institute (RKI), if a hospital has more than 400 beds it should hire at least one person from each of these four professional groups:

1. Hospital hygienists — doctors who have undergone special training. They are responsible for hygiene in the hospital. They must be up-to-date with their training, have the time in their schedules to educate the other employees, and if there are problems inform the hospital’s management and, hopefully, implement solutions.

2. Hygiene specialists — nurses who implement the hygienists’ guidelines and who work closely with hospital staff.

3. Hygiene officers — responsible for setting the hygiene guidelines in each medical profession, and act as a contact point for staff.

4. Hygiene staff at nursing homes — responsible for setting the requirements among the carers at nursing homes.

If a hospital has fewer than 400 beds a full-time hospital hygienist is not required, though staff from the three other professions must be present.

The recommendations were made in 2009 and were then incorporated into the Federal Law on Infection Protection, as well as the hygiene regulations of each individual state. At the beginning of 2015 the Federal Minister for Health, Hermann Gröhe, presented a ten-point plan to fight hospital infections and resistant pathogens, which received a lot of media attention. One of the few concrete points in this is a funding program which should be used to recruit additional hygiene personnel by the end of 2016. After this transitional period, the hygiene regulations were finally binding.

But it did not go according to plan. The deadline was extended from the end of 2016 until the end of 2019. The recruitment of new staff has been sluggish. Why? The Ministry of Health says that this is the responsibility of the clinics and each federal state.

“The respective carriers or heads of the hospitals and medical facilities are responsible for implementing the plan”, writes Sebastian Gülde, Ministry of Health press officer, by email. At the request of the Länder, the deadline has been extended to better attract sufficiently qualified staff.

“The hospitals have launched the initiative,” said Walter Popp, vice president of the German Society for Hospital Hygiene. If sanctions weren’t used, the clinics could simply continue as before – and save the money meant for recruiting hospital hygienists. The salaries offered to hospital hygienists are also not adequate, Popp added.

There are also not enough training opportunities. Specialist training to be a hospital hygienist is only offered at the larger hospitals.

“Hospital hygiene has been neglected in Germany for decades,” says Popp. It’s estimated there are at the most 300 hygiene doctors in Germany – but there are more than 350 hospitals, which should each have at least one hygiene doctor. In their quality reports many hospitals say that they employ a hospital hygienist. However these are often not in the required full-time positions, but are external consultants who only work part time. According to our data, at least 16 per cent of clinics with 400 beds or more only employ an external consultant, instead of a hospital hygienist in a full time position.

The data shows the names of the external hygiene consultants, which can be used to calculate how many different hospitals a single hygienist is advising. Andreas Schwarzkopf is the top performer of these flying physicians, he works in at least ten clinics in the south of Germany. And this is only the voluntary information in the quality reports. In fact the hygienists look after even more clinics than the data shows: Walter Popp says that according to his own data, Schwarzkopf has advised 12 clinics externally in 2014.

How is this achieved? Schwarzkopf is chauffeur driven about 2,000 km a week from clinic to clinic, and writes hygiene protocols and plans in the car. He tries to visit every clinic at least twice a year – he says that the bigger the hospital, the more often he visits. In order to try and ensure quality,he relies on the local staff, who can also reach him by mobile phone. He tells the hospitals that they are responsible if something goes wrong. His business is going well, he says. Schwarzkopf would like to hire additional hygienists, but currently can’t find the staff. “This job is not for everyone,” he says.

Popp says that hiring external consultants comes at a high cost for the hospitals. The price is negotiable, so it can vary greatly. Popp does not want to say what his rates are. But he will say that there are no big profit margins if you pay attention to quality. For some other colleagues though, this could be a lucrative business. It is not sensible for a hygienist to take on too many hospitals.

Petra Gastmeier, head of the Institute for Hygiene at the Charité Hospital in Berlin, has also worked as an external hygiene consultant in the past. But not any more. “My head has only a certain storage capacity,” she says. When a call comes from a hospital, you have to know immediately what the situation on the ground is, and who is part of the team dealing with the situation. As an external consultant this is only possible to a limited extent.

This is also the view of Franz Sitzmann, who’s worked as a hygienist for many years and has written several specialist books on the subject. “A hospital hygienist must be present and accessible,” says Sitzmann, “hygiene cannot be prescribed from above“.  It’s not enough to have someone who passes through a few times a year.  

Johanna Knüppel, spokeswoman of the German Professional Association for Nursing Care, says it’s important to have sufficient hygiene staff. “What do hygiene specialists do? They monitor the staff who are working with the patients. Without them ‘everyday’ conditions return.” And all too often this means a single carer in an intensive care unit that has to deal with several patients at the same time.

“Often a nurse has to cope with four or five patients,” says Knüppel. They literally have their hands full – and because of this, perhaps they might not manage to disinfect their hands in between treating different patients.

Janssen is now calling for a binding commitment on the numbers of personnel who specialise in hygiene, nursing and cleaning. In addition, he wants there to be more stringent hygiene controls. Finally there is also a need for comprehensive reporting of antibiotic-resistant hospital pathogens, which so far only exists for a few of the germs.

If these changes are not made, writes Janssen, it might mean that thousands of people will die each year in Germany because they become infected with dangerous and sometimes resistant germs while they’re in hospital. Estimates range from 500,000 to one million infections a year – with 15,000 to 30,000 deaths, of which between a third and a half would be avoidable with better hygiene in hospitals. So between 5,000 and 15,000 peoples lives could be saved every year through better hygiene. Consider how much has been done to reduce the number of traffic incidents to 3,500 per year. Meanwhile, German hospitals are scrambling to get enough hygiene staff.

Translation: Victoria Parsons