How to use multimodal machine learning