Speech-based real-time presentation tracking using semantic matching