Learning deep models via optimal transport distance