Tools and Technologies Used
- Basic Python
- Pandas
- Matplotlib & Seaborn
- Jupyter Notebook
- Basic Data Analysis
Dataset Details
- Dataset Name: Black Friday Sales Data
- Size: 23 MB CSV file, 537,000+ rows, 12 columns
- Columns: User ID, Product ID, Gender, Age, Occupation, City Category, Stay in Current City Years, Marital Status, Product Categories, Purchase amount
Project Steps Overview
1. Walkthrough of the Dataset
- Goal: Load and inspect the dataset to understand its structure.
- Actions:
- Load the dataset into a pandas DataFrame.
- Examine structure and check for missing values.
- Use
df.info()
for column types and non-null entries.
2. Analyzing Columns
- Goal: Understand distributions and relevance of each column.
- Actions:
- Focus on Gender, Age, Marital_Status, Product_Category, and Purchase.
- Drop sparse columns (Product_Category_2, Product_Category_3).
- Use
unique()
and nunique()
for overview.
3. Analyzing Gender
- Goal: Explore gender distribution and purchasing trends.
- Actions:
- Identify data imbalance (more male customers).
- Use
groupby()
to analyze purchase totals by gender.
4. Analyzing Age & Marital Status
- Goal: Examine how age and marital status affect purchases.
- Actions:
- Categorize users by age and marital status.
- Analyze spending patterns across groups.
5. Multi Column Analysis
- Goal: Analyze relationships between multiple factors.
- Actions:
- Combine Age, Marital Status, and Gender.
- Use visualizations (pie/bar plots) for insights.
6. Occupation and Products Analysis
- Goal: Explore how occupation influences purchases and product popularity.
- Actions:
- Analyze Occupation, Product_ID, and Product_Category_1.
7. Combining Gender & Marital Status
- Goal: Investigate combined effects on purchasing behavior.
- Actions:
- Visualize with Seaborn count plots.
Conclusion
This project provides insights into Black Friday shoppers' purchasing behavior and the influence of demographic factors such as gender, age, marital status, and occupation. The workflow covers data cleaning, individual column analysis, and multi-feature analysis to uncover hidden patterns in the dataset.