hangers

Data Analysis for evaluating the sale performance (retail store case)

After working in retail sales for a full year, I wanted to reflect on my performance using data. I used my personal weekly sales records to create a dataset and analyze it using basic data analysis techniques. This project had three goals:

  1. Evaluate my performance over one year

  2. Practice data analysis tools and techniques

  3. Build and publish a dataset and notebook on Kaggle — one of the most widely used platforms for data science learning and sharing

⚠️ Note: The dataset used in this analysis has been carefully manipulated to protect company privacy. While the exact numbers were changed, the patterns and relationships remain true to the original data. The goal is to focus on trends and insights rather than raw values.

🔍 Key Insights from My Analysis

  • Minimum working hours are essential: In commission-based retail jobs, part-time hours may not be enough to earn consistent commission.

  • Sales matter more than hours: Earning commission depends more on how much you sell than how long you work — though there is still a positive correlation with hours.

  • Seasonal trends are real: Sales and commissions increase significantly during high seasons like year-end and graduation time.

  • Performance needs context: A single salesperson’s data isn’t enough to judge performance. Comparing with team averages and trends over time gives better insight.

  • More KPIs needed: Metrics like units per sale or average transaction size would help paint a clearer picture of performance, though they may not always be accessible due to company confidentiality.

This project helped me not only understand my own progress but also grow my practical skills in dataset creation, analysis, and sharing insights with the wider data community.

🛠️ 1. Building the Dataset and Notebook in Kaggle

Kaggle is a well-known platform for data scientists and analysts to learn, explore datasets, practice coding, and share notebooks publicly. It provides access to real and simulated datasets, public kernels (notebooks), and competitions that help users sharpen their skills in machine learning, data visualization, and data cleaning.

For this project, I started by collecting my personal weekly sales data from the past year in a retail clothing store. I first created a dataset in Excel, including weekly sales, income, commission, and other related variables. Then, I imported this dataset into a DataFrame using Python in PyCharm, a desktop IDE. To ensure compliance with company privacy rules, I manipulated the dataset by changing the values using a consistent scaling method so that the ratios, trends, and proportions remained the same, but the raw values no longer reflected actual figures.

Once the data was cleaned and anonymized, I exported it as a .csv file and uploaded it to Kaggle Datasets. After that, I created a Kaggle Notebook, connected the dataset to it, and started the analysis.

In the notebook, I:

  • Used pandas to read the CSV and load it into a DataFrame

  • Explored the structure of the data using commands like .head(), .info(), and .describe()

  • Formatted the data by converting strings to numbers (e.g., removing commas ,, extra spaces, and decimals)

  • Conducted exploratory data analysis (EDA) by using bar charts, scatter plots, and basic regression techniques

Although I’m still completing beginner courses like Google Data Analytics and IBM Data Analyst, this project helped me apply what I’ve learned so far, and I plan to improve and update the notebook as I learn more advanced techniques.


📈 2. Dataset Overview—Variables and Graphs

The dataset is organized around biweekly pay periods and includes several variables related to work hours, sales, and commission. Here are the main variables:

  • Net Income: My take-home pay after taxes

  • Deduction: Deductions from gross income (used to calculate net)

  • Working Hours (W1, W2): Hours worked during each of the two weeks in the biweekly pay period

  • Working Hours Total: Sum of the two weeks

  • Sales (Week 1, Week 2, Total, YTD): Sales I made each week, total per period, and Year-To-Date (YTD) cumulative sales

  • Supercrease & Supercrease YTD: A special bonus/income received for providing a particular service

  • Commission (Week 1, Week 2, Total, YTD): Commission earned based on sales

  • Vac Each Pay: Vacation pay added in each period

  • Sat Pay: Additional pay received for working holidays or weekends

In the analysis, I created visualizations to understand how these variables interact:

  • Scatter plots to show the relationship between working hours and commissions

  • Bar charts to visualize sales and income over time

  • Trend lines and simple regression to explore correlations

You can view the full dataset and notebook here:
👉 Check the Kaggle Notebook (replace with your actual link once published)


📌 3. Conclusion

Based on my year-long data and analysis, here are a few key takeaways:

  1. Minimum working hours are crucial in commission-based sales jobs. Part-time work may not generate enough sales to reach commission thresholds.

  2. Sales performance drives commission more than just working hours, although longer hours still show a positive (but weaker) correlation with commission earnings.

  3. Seasonal effects are significant — sales and commissions rise during certain times of the year like graduation season or the end-of-year holidays.

  4. Individual data alone isn’t enough to assess performance fairly. Comparing trends and progress over time is useful, but without knowing team averages or benchmarks, conclusions about “good” or “bad” performance remain limited.

  5. Other performance metrics (like units per sale or average transaction value) could offer deeper insights into sales efficiency but may not always be accessible due to confidentiality.

This personal project was a great way to start applying data analysis in a real-world context. I hope it encourages others to look into their own data and use platforms like Kaggle to grow their skills and share their learning journey.