Learning Without Expert Labels For Multimodal Data