Modifications Of Q-Learning To Optimize Dynamic Treatment Regimes