Data quality in the deep learning era: Active semi-supervised learning and text normalization for natural language understanding