Hi, I'm Harsh

B. Tech in Data Science and Minor in Finance from NIIT University

Similar to this line created by ChatGPT, fueled by data and dreams i want to create my own ChatGPT one day.

Contact Me

About Me

My Introduction

A Final year Data Science student with a desire to pursue AI/ML for passion, with 29,000+ views on technical and mathematical articles on GeeksForGeeks.

9 Data Projects
Completed
12 Articles
Written
2 Published
Papers

Skills

My Technical Level

Development

All About the Core

Python

90%

Java

80%

PySpark

75%

R

70%

C++

40%

JavaScript

70%

Android

85%

MS Excel

70%

Photoshop

70%

Indesign

90%

Frameworks

Everyone Needs Support

NumPy

80%

pandas

90%

matplotlib

70%

scikit-learn

85%

Spark MLlib

70%

Pytorch

85%

Deep Graph Library

55%

OpenCV

65%

Pillow

65%

NLTK

60%

streamlit

80%

seaborn

70%

Flask

40%

Machine Learning

Theory, theory!

Linear and Logistic Regression

95%

Decision Trees

95%

Ensemble Models

90%

Clustering

65%

Convolutional Neural Networks

80%

Graph Neural Networks

60%

Recommender Systems

75%

Natural Language Processing

65%

Exploratory Data Analysis

90%

Multi-modal Learning

70%

Time Series

55%

Cloud and Engineering

Fly Fast & High!

AWS Sagemaker

65%

AWS EMR

75%

AWS Lambda

70%

Big Query

40%

Docker

60%

Apache Airflow

40%

Kafka

40%

Databases and Viz

Wow! Factor

MySQL

85%

AWS Redshift

75%

Amazon RDS

70%

Tableau

50%

Power BI

50%

Looker

60%

My Background

Education
Work

Minor in Finance

NIIT University, India
2021-2024

B. Tech in Data Science with Specialization in Data Science

NIIT University, India
2020-2024

Higher Secondary in Science

Velammal International School, India
2018-2020

Full Stack Web development

Freelance at NIIT University
Jan 2022 - Ongoing
What I did here

  • Developing a Placement Portal for our college utilizing React, Express, Node.js, and SQL Server.Created pages according to designs created in Visily and developed API endpoints.

  • Conducted schema analysis with mapping of the database and designed the UI for the portal. Using react and visily.

  • When deployed the placement portal is expected to reduce our college's placement team workload by 40%

  • Tools and languages used - ReactJS, Javascript, Tailwind CSS, Git, SQL Server, DaisyUi, NodeJS, Express

Artificial intelligence researcher

Internship at Qodeit
Dec 2022 - Feb 2023
What I did here

  • Developed an advanced resume parser with a high success rate in extracting data from PDF and Word files, achieving a 70% improvement over the previous version. Integrated the ChatGPT API to provide clients with an enhanced and tailored resume output, leveraging AI-generated suggestions to optimize their resumes and increase their chances of success in job applications.

  • Engineered and Created ML, NLP models to parse and extract information with over 92% accuracy, allowing for successful text classification and authorship attribution or identification.

  • Contributed to both product and service sides of the company, demonstrating versatility and played a role in service-oriented tasks, including image classification.

  • Tools and languages used - OpenCV, NLTK, scikit-learn, Flask, Python, Google Collab

Data Scientist

Intern at GlobalCert
Jan 2022 - Mar 2022
What I did here

  • Organized a team of 6 data scientists (interns) to work on a marketing problem using EDA and implemented a method to increase sales by 14% and maintained the documentation of our progress.

  • Devised and programmed a way to design the hiring process of GlobalCert that will help to scale the business into different dimensions and improved the hiring system by proposing different hiring process for different levels and a method to create the levels of work.

Technical Content Writer

Intern at GeeksForGeeks
Feb 2021 - Jan 2022
What I did here

  • Authored and modified 10 articles on python and various calculus topics, composed 50-150 lines of code on every programming article.

  • Guided 1 content writer on how to write and research articles.

Portfolio

My Projects

Food AI

Cross-Modal Representation Learning

  • Built a system for retrieval of food recipes given images of corresponding food

  • Beat the CCA baseline top-10 recall for recipe retrieval in the original im2recipe paper by 20 percentage points by using ResNet and BERT feature extractors and introducing cross-modality through a shared embedding layer

  • Implemented a second approach using triplet loss trained neural networks and attained median retrieval rank of 1 and top-10 recall of 82.49% for 1,000 random food images

  • Tech Stack


    Research Papers Referred

    View Code View Report View Presentation

    Ensembling Large-scale Object Detectors

    Computer Vision

  • Developed algorithms for building YOLO model bagging and boosting ensembles

  • Obtained an average precision of 87.5% on the Flickr-32 dataset using a generic logo detector system of two boosted YOLO models

  • Tech Stack


    Research Papers Referred

    View Report

    Movie Recommendation from Conversational Data

    Natural Language Processing

  • Built a movie recommendation system leveraging user conversations, critics data and domain adaptation techniques, which is a re-implementation of this paper

  • Tuned hyperparameters for three CF approaches: KNN, SVD and SVDpp to obtain a 3% improvement in results

  • Experimented with neural MF and obtained comparable results of RMSE=1.232 and MAE=0.9569

  • Tech Stack


    Research Papers Referred

    View Code View Report View Presentation

    Logo Detection

    Convolutional Neural Networks

  • Reproduced results for open set logo detection from the paper here achieving a 24 percentage point increase in mean average precision (mAP) compared to the original using YOLOv5

  • Focused on classifying textual logos and obtained a classification accuracy of 22.56% against 47 classes of the Flickr-47 dataset using a logo classification architecture consisting of YOLOv5 and template matching

  • Tech Stack


    Research Papers Referred

    View Code View Report View Presentation

    Autoencoder Image Colorization

    Convolutional Neural Networks

  • Built a 11-layer deep autoencoder neural network using residual connections that colorizes black and white images

  • Trained the network on 10,000 images from FloydHub and deployed online via Streamlit

  • Tech Stack

    View Code

    New York Taxi Fare Prediction

    Big Data

  • Analyzed a 55-million-record strong taxi fare dataset to determine varying trends in taxi fares across both location and time

  • Performed feature engineering and zoomed in on trips to and from airports and across different boroughs of NYC

  • Predicted taxi fares to an RMSE score of 4.28 by training a Random Forest model on the augmented dataset

  • Tech Stack

    View Code

    FPL Team-Maker

    Exploratory Data Analysis

  • Developed and deployed a customizable application that uses pandas and Exploratory Data Analysis to suggest an optimal team to be entered into the Fantasy Premier League fantasy soccer game

  • 50+ monthly active users. Ranked top 2% in worldwide ranking among 8.2 million players in the year 2020

  • Tech Stack

    View Code

    Undergrad Final Year Project

    Natural Language Processing

  • Built a text simplification system that can work on text and simplify it by removing difficult-to-understand words

  • Modeled and trained Transformer models that internalized the semantics of and recognized complex words in input

  • Improved the performance of the application by preceding the transformer architecture with a Complex Word Identification (90.23% accuracy) model that flagged the complex words beforehand

  • Tech Stack

    View Code

    Abalone Age Prediction

    Machine Learning - Regression

  • Determined the ages of abalones (snails) using classification techniques and leveraging their physical characteristics

  • Improved the accuracy of determining age using regression techniques and obtained a MAE of 0.936

  • Concluded that the dataset is not large enough to get the desired MAE of 0.5 implying correct age prediction

  • Tech Stack

    View Code

    Alien Shooter

    Python Game Development

  • Expanded the ‘Space Invader’ game to include three modes of play: Arcade, Timed and Survival

  • Tech Stack

    View Code

    Reminder - Todo List

    Android Development

  • Developed an Android application that acts as a combination of a reminder app and a notes app

  • Published the app on Google Play Store, and currently has 50+ installs with a rating of 4.6

  • Tech Stack

    View Code

    Research

    My Publications

    International Journal of Computer Applications

    Vol. 178, No. 50 (43-49)

    Abstract

    Abalones are sea snails or molluscs otherwise commonly called as ear shells or sea ears. Because of the economic importance of the age of the abalone and the cumbersome process that is involved in calculating it, much research has been done to solve the problem of abalone age prediction using its physical measurements available in the UCI dataset. This paper reviews the various methods like decision trees, clustering, SVM using Tomek links, CGANs and CasCor used in an attempt to solve it. Furthermore, in contrast to previous research that saw this as a classification problem, this paper approaches it as a linear regression problem and analyses the results.

    Read it!

    International Journal of Computer Sciences and Engineering

    Vol. 8, Issue 6 (1-5)

    Abstract

    Natural Language Processing is an active and emerging field of research in the computer sciences. Within it is the subfield of text simplification which is aimed towards teaching the computer the so far primarily manual task of simplifying text, efficiently. While handcrafted systems using syntactic techniques were the first simplification systems, Recurrent Neural Networks and Long Short Term Memory networks employed in seq2seq models with attention were considered state-of-the-art until very recently when the transformer architecture which did away with the computational problems that plagued them. This paper presents our work on simplification using the transformer architecture in the process of making an end-to-end simplification system for linguistically complex reference books written in English and our findings on the drawbacks/limitations of the transformer during the same. We call these drawbacks as the Fact Illusion Induction, Named Entity Problem and Deep Network Problem and try to theorize the possible reasons for them.

    Read it!

    Certifications

    Extra Courses I have Undertaken

    Certified Cloud Practitioner

    Expiry Date: July 17, 2024

    View Certificate

    LookML Developer

    Expiry Date: March 28, 2022

    View Certificate

    AWS Machine Learning Engineer Nanodegree

    Expiry Date: Does not expire

    View Certificate

    Applied Data Science with Python Specialization

    Expiry Date: Does not expire

    View Certificate

    Machine Learning

    Expiry Date: Does not expire

    View Certificate

    Deep Learning Specialization

    Expiry Date: Does not expire

    View Certificate

    Blog

    My Technical Articles

    5 Minute Paper Explanations: Food AI Part I

    Intuitive deep dive of the im2recipe paper “Learning Cross-modal Embeddings for Cooking Recipes and Food Images”

    Read it!

    5-Minute Paper Explanations: Food AI Part II

    Intuitive deep dive of im2recipe related paper “Dividing and Conquering Cross-Modal Recipe Retrieval: from Nearest Neighbours Baselines to SoTA”

    Read it!

    5-Minute Paper Explanations: Food AI Part III

    Intuitive deep dive of im2recipe related paper “Cross-Modal Retrieval and Synthesis (X-MRS): Closing the Modality Gap in Shared Representation Learning”

    Read it!

    5-Minute Paper Explanations: Food AI Part IV

    Intuitive deep dive of im2recipe related paper “Transformer Decoders with Multimodal Regularization for Cross-Modal Food Retrieval”

    Read it!

    Why Gradient Descent Works?

    Everybody knows what Gradient Descent is and how it works. Ever wondered why it works? Here’s a mathematical explanation

    Read it!

    A Philosophical Look at Climate Change

    … And why its here to stay

    Read it!

    10 Points to Make it Big in the Data Industry

    People want to make careers here. But they are often deafened by the noise that surrounds them.

    Read it!

    What Mainstream AI is (Not) Doing

    The pandemic accelerated AI adoption — and made Big Tech richer — but did AI adoption happen in the places where it was needed?

    Read it!

    Introduction to PySpark via AWS EMR and Hands-on EDA

    Performing EDA on NY Taxi Fare Dataset to see PySpark in action — because cloud computing is the next big thing!

    Read it!

    Fantasy Premier League x Data Analysis: Being Among the Top 2%

    A brief overview of the application I built, in which I have employed data analysis to power my FPL team up the charts

    Read it!

    Kernel Regression from Scratch in Python

    Everyone knows Linear Regression, but do you know Kernel Regression?

    Read it!

    Intro to Machine Learning via the Abalone Age Prediction Problem

    The best way to dive into ML is to see it in action. Here it is!

    Read it!

    Let's Get in Touch

    Hit me up!

    +91-8999791682

    Location

    New Delhi, India