Sugarcane Data Analysis Project

Python Pandas Matplotlib Seaborn ydata_profiling

Project Overview

This project analyzes sugarcane production data across different countries and continents using Python data analysis tools.

Technologies Used

Analysis Steps

  1. Data Loading and Cleaning
    • Loaded sugarcane production dataset
    • Removed unnecessary index column
    • Cleaned numeric data by removing dots and fixing decimal separators
    • Handled missing values
    • Converted data types to appropriate numeric formats
  2. Exploratory Data Analysis
    • Dataset shape: Shows number of rows and columns
    • Analyzed continental distribution of sugarcane production
    • Created visualizations for key metrics:
      • Production (Tons)
      • Production per Person (Kg)
      • Acreage (Hectare)
      • Yield (Kg/Hectare)
  3. Key Visualizations
    • Box plots showing distribution of main metrics
    • Histograms with KDE for numeric variables
    • Pie chart showing top producers' percentage
    • Bar plots for top producing countries
    • Heat map for correlation analysis

Key Findings

  1. Continental Distribution
    • Visualized number of sugarcane-growing countries per continent
  2. Production Analysis
    • Created percentage analysis of top producers
    • Identified leading countries in production
  3. Land Usage
    • Analyzed countries with highest acreage
    • Compared land use vs production
  4. Yield Analysis
    • Identified countries with highest yield per hectare
    • Examined relationship between land area and production

Tools for Further Analysis