Skip to Content

Recent Krenicki Center Projects

A CRM system with predictive capabilities to decipher when a lead will be in the market for a product

In collaboration with a multinational capital goods manufacturer with more than a thousand dealerships in the U.S., an enhanced CRM (customer relationship management) system that predicts when a customer will be in the market for a specific product was developed.

Here is a poster demonstrating the problem statement, the approach, and the outcome of this project.


Determining the right demography to deliver educational campaigns

Business Problem: This organization was not the usual audience that comes to Krenicki Center with problems. It was the Indiana Poison Center, and the goal was to identify areas where educational programs were to be spread to curtail poisoning due to Type 2 Diabetes medicines because they were being widely used and in children’s reach.

Approach: Sulfonylureas (the drug for Type 2 Diabetes) cause poisoning if children get close to their parents’ medications. The idea was to have a visual that showed where this is occurring most often in the state of Indiana. Identifying where it happens most, accounting for population (highly populated areas will show up as more accounts) and standardizing for that were the steps taken. To mitigate this in areas which were showing a huge spike, educational programs and materials were spread to get the word out. Try to ensure in say, area Z, where people tend to have more of these diabetes medications, one doesn't get those in the hands of children or store it in a specific location to stay out of their reach. It was a campaign on education support, so IPC needed to know who to target to run these kind of education initiatives to reduce poisoning due to sulfonylureas.

Outcome: The clients still deem this work to be useful, and we can proudly see the visualization on their website.

Tools used: Tableau

Even though I am Tableau-certified, this project gave me hands-on practice which further helped me in my interviews and job hunt.” - Mohinder Goyal, MSBAIM Class of 2020

Interpretation of Forecasting Models

An American exercise equipment and media company had received a record-breaking order volume alongside a reduction in staff due to the COVID-19.

Supply chain was broken, and delivery times changed from 30 to 60 days due to anchoring and port congestion​. Our objective was to:

  • Identify the factors affecting ocean travel time i.e., from point A to B.
  • Improve the current forecasting methodology using external factors like marine traffic and port congestion

interpretation of forecasting models

Methodology: The client presented their forecasting model, wherein ocean lead time was factored using simple exponential smoothing with no trend, seasonality and external factors​.

Validation WAPE (Weighted Absolute Percentage Error) of 11.35% for data from week of 1st February 2021 to week of 17th May 2021 had been achieved till our team was brought in.

The approach to problem solving involved the following steps:

Data Collection: Collect the Anchorage Time, Port Time, No. of Vessels, and No. of Calls at the port of arrival for each week.

Data Exploration: Remove “Calls” cluster of variables using correlation analysis.

Feature Engineering: Create lagged derivatives for Ocean, Anchor, Port & Vessels clusters.

Data Partition: Split the Train & Test data using rolling forecasting origin techniques.

Forecasting: Build regression, SVM, Random Forest, Gradient Boosting models.

Outcome: We achieved a validation WAPE of 11.13% and were able to demonstrate that among the external factor clusters - Anchor, Port, Vessels and Lead Times; Anchor variables are not significant in the presence of the other three clusters.​

Apart from the Historical Lead Times in the Ocean, the previous week’s Port Time Number of vessels 2-weeks earlier are affecting the Lead Time Ocean for the current week. ​

Tools used: Python & Tableau

Optimizing Ticket Handling in the Medical Diagnostics Team: Leveraging Analytics to Reduce Costs and Improve Efficiency

An American Pharmaceutical was spending a lot of time on various tickets put in by various users needing support. The goal was to find out which tickets would it be possible to automate & to identify which tickets should be handled in house versus being outsourced to their third-party support vendor. This would, thereby, reduce the cost of operating in the medical diagnostics team of the firm.

Extensive descriptive analytics was carried out to understand whether different categories of tickets that were taking up a lot of time were an anomaly or expected behavior according to the client. Then, keyword modelling was ventured into to gauge the repetitiveness of few keywords in the tickets raised to identify the presence of duplicate tickets. Keywords were also being used for categorizations in the internal system to fill in some missing data or draw conclusions about the data. Further, diagnostic analysis was done in the form of time modelling to figure out where within the process tickets were open for a long time or if the duration was attributed to the internal team waiting on a vendor to resolve the ticket. Allocation to the wrong owner and time required for reassigning the ticket needed to be identified. Finally, metrics were established for the tickets that were being handled by the third party to see if they were meeting their service level agreements.

Tools used: Power BI for descriptive analytics and diagnostic analytics; R for keyword modelling

Outcome: On implementation, a predicted decrease in volume of 100 tickets per day due to the team’s recommendations.

Determining the Customer Lifetime Value & Expected Profit for a multilevel marketing firm

Context: Customer Lifetime Value (CLTV) is generally defined by the following formula:

Customer Lifetime Value Formula

where, is the margin or profit from year ; is the retention rate in year ; is the discounting rate to account for Net Present Value of Money and is the acquisition cost.

 Approach: The first model built was for calculating CLTV. A challenge faced from the get-go was that the retention rate ( did not have much value because there was no renewal period captured for a customer. This indicated that the customer would continue to exist in the database for posterity regardless of them purchasing something again. The way this problem was tackled was by calculating something called ‘Probability of Purchase’. (What is the probability that you will ever make a purchase again?) Probability of Purchase (PoP) was used as a proxy for retention rate. If your PoP is higher, that means that your retention rate is higher. The second model was built around predicting the future purchase amount, thereby determining profit associated with the entity. With this scope defined, there were 2 models to build for 2 different entities.

Outcome: We realized our goal by sorting the registered customers in some ascending or descending order (based on the score or the probability) and then designing marketing interventions based on that. An example of the application of our results would be finding the most engaged customers and sending him lesser discounts as compared to an unengaged customer who we would send bigger discounts to encourage them to starting shopping again.

Tools used: The s3 data storage from AWS was utilized to access the vast amount of data from the company. Then AWS EMR, a cloud big data platform was used for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications. Running on cloud computing clusters, PYSPARK (which is a python application of Apache Spark) was used to query the dataset and preprocess to clean and create the variables that were needed for the analysis. Multiple machine learning tools were applied to make predictions on purchase probability and to interpret the analysis result.

Models used: LightGBM classifier, which is a gradient boosting framework that uses tree-based learning algorithms was one amongst the models used. Iterating multiple times with validation to tune the hyper parameters, the best prediction results could be estimated. These results were then compared using other baseline algorithms such as Linear Probability Model, Logistic Regression, Random Forest, and Gradient Boosting. Since ML algorithms are designed for prediction purpose, it does not provide the feature importance by itself. Thus, SHAP (SHapley Additive exPlanations) was incorporated, which is a game theoretic approach to explain the output of any machine learning model. Using SHAP, which features were contributing more to the prediction result and in which way were identified (positive vs negative).

Past Krenicki Center Projects

Optimal Clustering of Products for Regression-Type and Classification-Type Predictive Modeling for Assortment Planning

In collaboration with a national retailer, this study focused on assessing the impact of sales prediction accuracy when clustering sparse demand products in various ways. While also trying to identify scenarios when framing the problem as a regression-problem or classification-problem would lead to the best demand decision-support. This problem was motivated because modeling very sparse demand products can be extremely difficult. Some retailers frame the prediction problem as a classification problem, where they obtain the propensity that a product will sell or not sell within a specified planning horizon. Likewise they might model it in a regression setting that is plagued by many zeros in the response. In our study, we clustered products using k-means, SOMs, and HDBSCAN algorithms using lifecycles, failure rates, product usability, and market-type features. We found there was a consistent story behind the clusters generated, which was primarily distinguished by particular demand patterns. Next, we aggregated the clustering results into a single input feature, which led to improved prediction accuracy of the models we examined. When forecasting sales, we investigated a variety of different regression and classification type models, and reported a short list of those models that performed the best in each case. Lastly, we identified certain scenarios we observed when modeling the problem, a classification problem versus a regression problem, so that our partner could more strategically forecast their assortment decision.

Effect of Forecast Accuracy on Inventory Optimization Model

This study described an optimization solution to minimize costs at the inventory system by the retailer. Previously, all demands were forecasted yearly and information regarding item distribution was not used. The retailer used weekly and monthly demand forecasts by using yearly forecasts. As a result, the retailer purchased items in bulk to prepare for unexpected demand from vendors, which generated huge holding costs. By approaching the distribution of each item, then a dynamic economic order quantity model would be possible. We solved this problem by using diverse distributions for each item. We then built formulas to calculate costs and service levels, and optimized our model to minimize the cost. While also meeting several business requirements, such as minimum service level, for each item. We demonstrated the impact that the quality of the demand forecast had on the client’s business.

A Comparative Study of Machine Learning Frameworks for Demand Forecasting

While working with a national consulting company our study had a two pronged approach with objectives. Firstly we asked which machine learning approaches perform the best at predicting demand for grocery items? Secondly we asked what is the performance one could expect to achieve using an open-source workflow versus using proprietary in-house machine learning software? Our main motivation behind this research was that consulting companies regularly assist their retail clients to try to understand demand as accurately as possible. Efficient and accurate demand forecasts enable retailers to anticipate demand and plan better. In addition to delivering accurate results, data science teams must also continue to develop and improve their workflow so that experiments can be performed with greater ease and speed. We found that with using open-source technologies such as scikit-learn, postgreSQL, and R, a decent performing workflow could be developed. This could help train and score forecasts for thousands of products and stores accurately at various aggregated levels (e.g. day/week/month) level using deep-learning algorithms. While the performance of our solution is yet to be compared to the data science team’s commercial platform, we will add that data soon. We have learned how they are able to achieve performance gains through model accuracy and runtime. Making this collaboration a great learning experience.

A Retrospective Investigation of Test & Learn Business Experiments & Lift Analysis

This study provided an analysis to retrospectively investigate how various promotional activities (e.g. discount rates and bundling) affect a firm’s KPIs such as sales, traffic, and margins. The motivation for this study is that in the retail industry, a small change in price has significant business implications. The Fortune 500 retailer we collaborated with thrives on low price margins and had historically run many promotions. However, until this study they had limited ability to estimate the impact of these promotions on the business. The solution given employs a traditional log-log model of demand versus price to obtain a baseline measure of price sensitivity, followed by an efficient dynamic time-series intermittent forecast to estimate the promotional lift. We believe our approach was both a novel and practical solution to retrospectively understand promotional effects of test-and learn type experiments that all retailers could implement to help improve their revenue management.

An Analytical Approach for Understanding Promotion Effects on Demand and Improving Profits

The objective of this study was to design and develop a better revenue management system that focused on leveraging and understanding of price elasticity and promotional effects to predict demand for grocery items. This study was important because the use of sales promotions in grocery retailing has intensified over the last decade where competition between retailers has increased. Category managers constantly face the challenge of maximizing sales and profits for each category. Price elasticities of demand play a major role in the selection of products for promotions, and are a major lever retailers will use to push not only the products on sale. We modeled price sensitivity and developed highly accurate predictive demand models based on the product, discount, and other promotional attributes, using machine learning approaches, and compared performance of those models against time-series forecasts.