Accelerating Linear Algebra and Machine Learning Kernels on a Massively Parallel Reconfigurable Architecture