"Life is like riding a bicycle. To keep your balance you must keep moving"
Bio
I am an AI Software Architect at Intel, currently focused on full stack AI Software and Performance R&D.
I received my Master's in Electrical Engineering from Rochester Institute of Technology in 2016, where I was advised by
Amlan Ganguly and Ray Ptucha
on Multi-Core Systems with NoC Architectures and Deep Learning. Over the course of my Master's I had an internship at Intel,
where I worked on creating high performance Deep Learning models for Intel Atom based SoC.
I received my Bachelor's degree in Electronics and Communication Engineering from Visvesvaraya Technological University, India in 2012.
I enjoy doing Interdisciplinary Research, and attending Hackathons. I am a Neuroscience, Physics, and Hardware Architecture enthusiast.
Experiences
Mar 2016 - Present
Intel Corporation
Engineer
Technologies
• Deep Learning Software and Computer Architecture
• Machine Learning Algorithms, and Deep Learning Data Science for Computer vision
• Hybrid computing (Distributed + Heterogeneous)
Products Timeline
Mar 2019 – Present
Deep Learning Software Architecture for Next-Gen AI Products
Intel NPU (Formerly Intel Movidius VPU)
Products Released/Public:
2024
NPU4 in Lunar Lake
2023
NPU2.7 in Meteor Lake
2022
Discrete Keembay in Raptor Lake Surface laptops
2019
Keembay
Sep 2017 – Mar 2019
Deep Learning Graph Compiler nGraph
Intel Nervana NNP-Training
Mar 2016 – Sep 2017
Computer Vision and Deep Learning
Intel Atom+FPGA+iGPU
Oct 2015 - Dec 2015
Intel Corporation (Hillsboro, OR)
Software Engineer Intern
Performance analysis and optimization of machine learning algorithms (Deep Learning) for Computer Vision and Mobile Application. In Torch, OpenCV and TensorFlow.
Aug 2014 - Mar 2016
Rochester Institute of Technology (Rochester, NY)
Research Assistant
Aug 2014 - Mar 2016
@Multi-Core System Lab
Improving thermal performance of multi-core network-on-chip based architectures through a distributed and intelligent proactive thermal-aware task reallocation algorithm. Optimized for faster computation training time of Neural Network.
May 2015 – Oct 2015
@Machine Intelligence Lab
Developing improved scheme for video classification by going deeper using convolutional neural network for better accuracy and computational time.
Aug 2012 - May 2013
Hindustan Aeronautics Limited (Bangalore, India)
Apprentice Engineer
Worked on data analysis of Solid State Digital Video Recording system and Electronic Flight instrument system for fault detection and integration of the system.
Research paper / Hackathons / Academic Projects
Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning:
SYSML 2018 link
The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new
topology remains challenging, as each requires some level of manual effort. This issue is compounded by the
proliferation of frameworks and hardware platforms. The current approach, which we call "direct optimization", requires
deep changes within each framework to improve the training performance for each hardware backend (CPUs, GPUs, FPGAs,
ASICs) and requires O(fp) effort; where f is the number of frameworks and p is the number of platforms. While optimized
kernels for deep-learning primitives are provided via libraries like Intel Math Kernel Library for Deep Neural Networks
(MKL-DNN), there are several compiler-inspired ways in which performance can be further optimized. Building on our
experience creating neon (a fast deep learning library on GPUs), we developed Intel nGraph, a soon to be open-sourced
C++ library to simplify the realization of optimized deep learning performance across frameworks and hardware platforms.
Initially-supported frameworks include TensorFlow, MXNet, and Intel neon framework. Initial backends are Intel
Architecture CPUs (CPU), the Intel(R) Nervana Neural Network Processor(R) (NNP), and NVIDIA GPUs. Currently supported
compiler optimizations include efficient memory management and data layout abstraction. In this paper, we describe our
overall architecture and its core components. In the future, we envision extending nGraph API support to a wider range
of frameworks, hardware (including FPGAs and ASICs), and compiler optimizations (training versus inference
optimizations, multi-node and multi-device scaling via efficient sub-graph partitioning, and HW-specific compounding of
operations).
An Artificial Neural Networks based Temperature Prediction Framework for Network-on-Chip based Multicore Platform:
Thesis link
Continuous improvement in silicon process technologies has made possible the integration of hundreds of cores on a
single chip. However, power and heat have become dominant constraints in designing these massive multicore chips causing
issues with reliability, timing variations and reduced lifetime of the chips. Dynamic Thermal Management (DTM) is a
solution to avoid high temperatures on the die. Typical DTM schemes only address core level thermal issues. However, the
Network-on-chip (NoC) paradigm, which has emerged as an enabling methodology for integrating hundreds to thousands of
cores on the same die can contribute significantly to the thermal issues. Moreover, the typical DTM is triggered
reactively based on temperature measurements from on-chip thermal sensor requiring long reaction times whereas
predictive DTM method estimates future temperature in advance, eliminating the chance of temperature overshoot.
Artificial Neural Networks (ANNs) have been used in various domains for modeling and prediction with high accuracy due
to its ability to learn and adapt. This thesis concentrates on designing an ANN prediction engine to predict the thermal
profile of the cores and Network-on-Chip elements of the chip. This thermal profile of the chip is then used by the
predictive DTM that combines both core level and network level DTM techniques. On-chip wireless interconnect which is
recently envisioned to enable energy-efficient data exchange between cores in a multicore environment, will be used to
provide a broadcast-capable medium to efficiently distribute thermal control messages to trigger and manage the DTM
schemes.
Identification of human finger snaps. Feature extraction using Cepstrum. Random Forest and PCA is used for training on the recorded data. Coding in Python.
SketchitUp @BrickHack-2015 | Rochester Institute of Technology:
A Web app that helps draw images that are uploaded by joining points(tracing the image). Using Machine Learning, Python, OpenCV, HTML5, JS and CSS
Literade @HackMIT-2015 | Massachusetts Institute of Technology:
A Web app that converts boring articles to colorful for images for children to read. Using Machine Learning, Python, HTML, CSS and JS
Other hackathon attended
Tinder4Food (Android App) @HackNY-2015 | New York University.
HackBU-2016 | Binghamton University.
BrickHack2-2016 | Rochester Institute of Technology
Moving Target Detection and Aiming - 2013 :
Designed Arm Robot that can aim at moving targets in real time using a cat toy laser.
Regression technique was used for arm movement in MATLAB.
e-Data Analysis - 2014 :
Descriptive statistic was performed on the quantitative description of stock data and to understand the stock performances of a company.
Weather Prediction - 2014 :
Historical temperature analysis using statistics and temperature prediction using machine learning algorithms like PCA, SVD and Bayesian Networks.
Multi-Channel ADPCM CODEC (MCAC) - 2013 :
Designed the RTL and the verification model for the Adaptive Quantizer and Tone & Transition model and some of their sub models for the pipelined design of MCAC.
Memory Access Bus Arbiter (ARB) of DTMF receiver - 2013 :
Designed the RTL model and performed verification, logic synthesis, test insertion and detailed timing analysis using Verilog HDL.
Boundary Scan Sum - 2013 :
Hierarchically designed Boundary Scan Sum with optimal sizing, clean DRC and LVS in Cadence Virtuoso, 0.6 micron technology.
Deep Learning for Image Classification - 2013 :
Deep Neural Networks with feature extraction was implemented for image classification on "calTech101" dataset.