Learning Through Text-Image Pairs and Image Sequences