Sugarcane Data Analysis Project
Python
Pandas
Matplotlib
Seaborn
ydata_profiling
Project Overview
This project analyzes sugarcane production data across different countries and continents using Python data analysis tools.
Technologies Used
- Python
- Pandas for data manipulation
- Matplotlib and Seaborn for visualization
- ydata_profiling for automated reporting
Analysis Steps
-
Data Loading and Cleaning
- Loaded sugarcane production dataset
- Removed unnecessary index column
- Cleaned numeric data by removing dots and fixing decimal separators
- Handled missing values
- Converted data types to appropriate numeric formats
-
Exploratory Data Analysis
- Dataset shape: Shows number of rows and columns
- Analyzed continental distribution of sugarcane production
- Created visualizations for key metrics:
- Production (Tons)
- Production per Person (Kg)
- Acreage (Hectare)
- Yield (Kg/Hectare)
-
Key Visualizations
- Box plots showing distribution of main metrics
- Histograms with KDE for numeric variables
- Pie chart showing top producers' percentage
- Bar plots for top producing countries
- Heat map for correlation analysis
Key Findings
-
Continental Distribution
- Visualized number of sugarcane-growing countries per continent
-
Production Analysis
- Created percentage analysis of top producers
- Identified leading countries in production
-
Land Usage
- Analyzed countries with highest acreage
- Compared land use vs production
-
Yield Analysis
- Identified countries with highest yield per hectare
- Examined relationship between land area and production
Tools for Further Analysis
- Generated comprehensive profile report using ydata_profiling
- Created correlation matrix to understand relationships between variables