Improving instruction generation for vision-language navigation by reward designing