Zero-Shot Natural Language-Driven Video Analysis and Synthesis