You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 27 Next »



News

Scrubber now uses Apache cTakes to provide parallel concept extraction during de-idenification. Apache cTAKES graciously invited us to port the Scrubber de-identification pipeline to the Apache hosted codebase. The maintenance version of the 2.X will remain available as will the 3.0 release candidate. The publication describing this work has been accepted, this site will be updated shortly to reflect the described methods and results.

McMurry* AJ, Fitch* B, Savova G, Kohane IS, Reis BY. “Improved de-identification of physician notes through integrative modeling of both identifying and non-identifying medical text”, BMC Medical Informatics and Decision Making Accepted minor revise Jan 2013.


Reference Docs



Venn Diagram



Feature Set Construction (Text words-> Lexical Features)

Error rendering macro 'viewxls'

com.atlassian.confluence.macro.MacroExecutionException: com.atlassian.confluence.macro.MacroExecutionException: The viewfile macro is unable to locate the attachment "scrubber_classification_table.xls" on this page

3.X is a new vision for the scrubber. As we approached diminishing returns for improving REGEX and whitelists/black lists, we have shifted towards a machine learning methods approach and learning from large bodies of medical information from publications and UMLS dictionaries.

  • No labels