Temporally consistent semantic segmentation in videos