In this paper, we present an object detection and pose estimation algorithm that runs on low-cost and limited-processing hardware. The algorithm relies on two main steps; in the detection phase, it employs a Convolutional Neural Network (CNN)-namely MobileNet SSD to recognize and track objects of interest in the scene; while in the pose estimation phase, the algorithm uses stereo correspondences to 3D reconstruct the spatial coordinates of multiple ORB features in the bounding box of the recognized object. The final position of the object in space is computed using a weighted average of the same spatial coordinates of the stereo-corresponded key points, where the weights are exponentially proportional to the level of ORB stereo matching. This algorithm was deployed on embedded systems and it achieved comparable results with respect to state-of-art, GPU-dependent, deep-learning algorithms for object pose estimation. We tested our approach in a calibrated environment where the object displacement (in either X or Y directions) can be easily measured. We used objects from the Yale-CMU-Berkeley (YCB) dataset to compare our estimation results with a deep learning based method, but our approach is not limited to this object selection. In the end, we achieved an estimated error of less than 10 mm. These results demonstrate that our approach can be used in affordable embedded systems for assistive technology in tasks such as for control of robotic arms for pick-and-place.