Learning to Represent Temporal Dynamics and Generative Factors for Intelligent Visual Navigation