This project discusses a comprehensive exploratory data analysis (EDA) of COVID-19 data sourced from Kaggle, utilizing Python programming. The objective is to uncover patterns in the COVID-19 dataset while predicting future outcomes such as survival and mortality rates.
The analysis employs several Python libraries:
The first step in the analysis involves determining the population of different countries from the dataset. For instance, India emerges as the country with the highest population. Understanding population figures is crucial for further analysis, as it provides context for metrics like survival and mortality rates.
Testing rates are also examined, with the United States recording the highest number of tests performed. This metric is vital, as a higher testing rate can lead to more accurate identification of COVID-19 cases, impacting the apparent spread of the virus.
The analysis calculates survival rates, defined as the probability of a person surviving COVID-19. This is derived by calculating the inverse of the mortality rate, which reflects the likelihood of death due to the virus. The analysis identifies that the lowest survival rate is in Yemen, while the highest is reported in Western Sahara.
Visualizations are generated to represent these metrics clearly. For instance, charts depict the total deaths across countries, highlighting the United States as the country with the highest fatalities, followed by Iran and India.
Visualization plays a key role in making the data comprehensible, especially for non-technical audiences. The video showcases several graphs, including:
The analysis concludes by emphasizing the importance of understanding both the mortality rate and the survival rate in the context of the pandemic. The findings illustrate significant disparities between countries, reflecting the varied impact of COVID-19 globally.
For those interested in further exploration, a GitHub link is provided for accessing the Jupyter notebook and dataset used in the analysis.