Project Overview
This is a Machine Learning API project that predicts insurance premium categories based on user demographics and lifestyle factors. The project implements a complete ML pipeline with a FastAPI backend and Streamlit frontend.
FastAPI
Pydantic
scikit-learn
pandas
pickle
uvicorn
Streamlit
Docker
Python 3.10
Tech Stack
- Backend:
- FastAPI - Modern, fast web framework for building APIs
- Pydantic - Data validation and settings management
- scikit-learn - Machine learning library
- pandas - Data manipulation and analysis
- pickle - Model serialization
- uvicorn - ASGI server
- Frontend:
- Streamlit - Web app framework for ML applications
- Deployment:
- Docker - Containerization
- Python 3.10 - Base runtime environment
Machine Learning Model
- Algorithm: Random Forest Classifier - Ensemble learning method for classification
- Input Features:
- age: User's age (1-119 years)
- weight: Weight in kg
- height: Height in meters (max 2.5m)
- income_lpa: Annual income in lakhs per annum
- smoker: Boolean - smoking status
- city: City name for tier classification
- occupation: One of 7 categories (retired, freelancer, student, government_job, business_owner, unemployed, private_job)
- Computed Features:
- BMI: weight / (height²)
- Lifestyle Risk: Based on smoking + BMI
- High: Smoker + BMI > 30
- Medium: Smoker OR BMI > 27
- Low: Neither condition
- Age Group: young (<25), adult (25-44), middle_aged (45-59), senior (60+)
- City Tier: 1 (metro), 2 (major cities), 3 (others)
API Endpoints
- POST /predict: Predicts insurance premium category for a user.
Features
- Data Validation: Pydantic models ensure input data integrity, field validation with custom constraints, type safety with Annotated types
- City Classification:
- Tier 1 Cities: 7 major metros (Mumbai, Delhi, Bangalore, etc.)
- Tier 2 Cities: 47 major cities
- Tier 3 Cities: All other locations
- Frontend Interface: Interactive form with input validation, real-time prediction via API calls, error handling for connection issues, user-friendly results display
Model Information
- Source: CampusX FastAPI Demo
- Algorithm: Random Forest Classifier
- File: insurance_model.pkl (serialized model)
Key Benefits
- Scalable Architecture - Microservices approach with API
- Data Validation - Robust input validation with Pydantic
- Containerized - Easy deployment with Docker
- Interactive UI - User-friendly Streamlit interface
- RESTful API - Can be integrated with other applications