Hi, I'm Ketan

Data Scientist
from Champaign, IL 




Aesthetics Market Model
Guidepoint • Python, SQL | Snowflake, Tableau | pandas, re, scikit-learn, Matplotlib

The Aesthetics Market Model is a proprietary market model specifically designed to provide insights into the aesthetics market, which includes data on trends, demand, patient preferences, competitive landscape, and more. It is based on proprietary data sources, such as point-of-sale data and survey data, and incorporates other factors such as demographic data, economic indicators, and regulatory changes to provide a comprehensive analysis of the aesthetics market.

Read more

Aesthetics Directory
Guidepoint • Python, SQL | Snowflake, Tableau | pandas, re, BeautifulSoup

The Aesthetics Directory is a comprehensive database of aesthetics facilities across the US providing a wealth of information, including facility names, addresses, websites, contact details, treatments offered, affiliated doctors, facility types, and more. The directory serves as a powerful tool for market analysis, providing insights into trends, competitive landscapes, and high-demand areas. Armed with actionable data, companies can optimize marketing, tailor sales approaches, and strategically plan expansions in the dynamic and competitive aesthetic industry.

Read more


COVID-19 Literature Clustering
Python | pandas, re, scikit-learn, Matplotlib, seaborn, spaCy, Bokeh

This project utilizes a comprehensive approach to organize and visualize COVID-19 literature. By employing k-means clustering, t-SNE dimensionality reduction, and Latent Dirichlet Allocation (LDA) for topic modeling, the dataset's dimensionality is reduced, and thematic clusters are identified. K-means and t-SNE independently reveal relationships between papers, while LDA enhances the understanding of each cluster by identifying prevalent keywords. The evaluation involves plot examination, paper review, and classification model testing.

Read more

TasteAI: Restaurant Review Analyzer and Recommender
Python | pandas, gensim, nltk, numpy, pyLDAvis, seaborn, Matplotlib

The project focuses on categorizing and analyzing restaurant reviews using Latent Dirichlet Allocation (LDA) for topic modeling. By extracting underlying themes, the aim is to provide actionable insights for restaurants to enhance customer experience, self-evaluate, and stay competitive. This approach not only improves overall restaurant performance but also enables personalized recommendations for customers, contributing to a more satisfying dining experience.

Read more

Maple Syrup Production Predictor
Python | statsmodels, numpy, pandas, seaborn, Matplotlib

The goal of the project was to build a multivariate time series forecaster to predict the production quantity of Maple Syrup for the next year considering different factors that affect its production—Daily Precipitation, Daily Soil Moisture, Daily Temperature, and Eight Day NDVI (Normalized Difference Vegetation Index). Different metrics were used to assess the causality and stationarity of different time series and to build the most suitable Vector Auto Regressive model.

Read more

Credit Card Fraud Detector
Python | pandas, re, scikit-learn, sklearn, imblearn, seaborn, Matplotlib

The ultimate goal of the project was to develop a highly accurate and efficient machine learning model that can be used to detect fraudulent credit card transactions in real-time. The project also has broader implications for the financial industry, as the detection of fraudulent transactions is a critical aspect of financial security and risk management. It deals with several challenges of imbalanced datasets—bias issues, false accuracy, poor generalization, inappropriate evaluation metrics, etc.

Read more

Python | Tensorflow, Keras, Matplotlib

The project involves using LSTM (Long Short-Term Memory) to analyze a large dataset of sentences and assign appropriate emojis to them taking the word order into account. The LSTM model is trained to understand the context and sentiment of the sentences and then generate a suitable emoji that reflects the meaning of the sentence. This project proves why LSTM is good at capturing long-term dependencies in the input sequences.

Read more


Data Scientist

Sep 2022 - Dec 2023

Graduate Research Assistant
The Center for Health Informatics

Jan 2022 - May 2022

Data Analyst Intern
Alliant Infotech

Mar 2020 - Aug 2020


MS, Information Management
University of Illinois at Urbana-Champaign 

Aug 2021 - Dec 2023

B. Tech, Electronics & Communication
Shri G.S. Institute of Technology & Science

Aug 2017 - Aug 2021