Patient text de-identification, natural language processing

Scrubber helps investigators ensure patient privacy when using physician notes for clinical research. Scrubber removes HIPAA defined Protected Health Identifiers by matching human expert and machine defined rules for text processing. This software can work with note text either unformatted or contained within XML files or SQL databases. Launched in 2006, the use of Scrubber has been approved by numerous hospital IRBs and quality has been validated by physician review.

The Scrubber pipeline has moved to Apache cTAKES! 


For more information, see:

“Integrating Public and Private Medical Texts for Patient De-Identification with Apache cTAKES”
Presentation to the I2b2 National Center for Biomedical Computing

Improved de-identification of physician notes through integrative modeling of both identifying and non-identifying medical text
McMurry AJ, Fitch B, Savova G, Kohane IS, Reis BY. BMC Medical Informatics and Decision Making

Scrubber is open source

Past People

Andy McMurry Andy McMurry Informatics Team Lead and Architect Daniela Bourges-Waldegg Daniela Bourges-Waldegg Director of Informatics Technology and Architecture


The Scrubber tool has moved to the Apache cTAKES™ pipeline. Apache cTAKES™ is a natural language processing system for extraction of information from electronic medical record clinical free-text.