Mask Sensitive Data in Machine Learning
One of the biggest problems in the AI industry is how to protect the data. Data is the basic raw material for building models, without this no model is possible. Let’s assume you are working for an eCommerce Retail company like Amazon, or in the health domain with some big hospital chain or government hospital, or in the banking industry with some big bank like SBI or ICICI bank. They have huge data that can be used to develop ML models. These models can help them to serve their customers or other stakeholders in a better and efficient way. To develop these models data need to be given to the Data Science team but there is no guarantee that data will not fall in the wrong hands and misused during or post-project. How to ensure that the given data helps the data science team in developing high performing model and at the same time data is useless in wrong hands. For that purpose, there are many methods and 4 methods recommend by Michaela in this article. This article summaries
- Data Removal and Encryption
- Data Coarsening: Rounding off of data to a lower rate of precision
- Data Masking: e.g. instead of sending two addresses can we send the distance between these two.
- Principal Components Analysis (PCA):