Advertisement

A Machine Learning Perspective on Managing Noisy Structured Data

A Machine Learning Perspective on Managing Noisy Structured Data Modern analytics depend on high-effort tasks like data preparation and data cleaning to produce accurate results. This talk describes recent work on making routine data preparation tasks such as data cleaning dramatically easier. I will first introduce a formal probabilistic framework to describe the quality of structured data and demonstrate how this framework allows us to cast data cleaning as a statistical learning and inference problem. I will then show how this connection allows us to obtain formal guarantees on automated data cleaning and describe how it forms the basis of the HoloClean framework, a state-of-the-art ML-based solution for managing noisy structured data. I will close with additional examples of how a statistical learning view on managing noisy data can lead to new solutions to classical database problems such as the discovery of functional dependencies in structured data.

See more at

AI,data platforms and analytics,data cleaning,probabilistic framework,structured data,HoloClean framework,machine learning,microsoft research,Theodoros Rekatsinas,

Post a Comment

0 Comments