Innovative Data Science Solutions & Machine Learning Applications
Showcasing innovative solutions that demonstrate expertise in machine learning and data analytics
Python toolkit for exploratory data analysis with visualization and statistical functions. Simplifies EDA through statistical summaries, data quality checks, and visualizations. Built with modular architecture and constants-driven design, works seamlessly in Jupyter notebooks, IPython, and terminal environments. Published on PyPI with 10 releases and extensive documentation.
Multi-environment CSV data analysis orchestrator that resolves dependency conflicts between profiling engines through isolated conda environments while providing a unified interface. Enables parallel execution across pandas-profiling, ydata-profiling, sweetviz, autoviz, and D-Tale with automated workflow management, YAML configuration, and comprehensive error handling for seamless multi-tool EDA operations.
Desktop application for managing multiple AI chat services simultaneously with automated window control and synchronized prompt distribution. Features unified AI window management, grid layouts, and YAML configuration. Documentation-only demo with Electron frontend and Python backend concepts for Windows 10/11, providing intelligent workspace management and seamless multi-service coordination for enhanced AI workflows.
Classification, regression, and predictive analytics
Home loan default prediction analyzing 58.44M records across 7 datasets. XGBoost classifier with 143 engineered features achieves 83% accuracy, 52.8% recall, and 0.785 ROC AUC at threshold 0.60. Addresses class imbalance (8% default rate), memory optimization (68.5% reduction), multicollinearity, and saves 1200+ hours of manual review through intelligent risk assessment.
XGBoost
Employee performance prediction using XGBoost multiclass classification (92.5% accuracy, 93.3% CV F1-score) with SHAP interpretability. Analyzes 1,200 employee records across 28 features, identifies top 3 performance drivers, and provides HR recommendations. Full pipeline includes EDA, feature engineering, model comparison, and deployment-ready inference.
XGBoost
B2B sales lead quality prediction using XGBoost classifier. Achieves 81.06% ROC AUC and 84.74% recall on 7,420 IT sales leads. Handles class imbalance, high-cardinality categoricals, and missing data through frequency encoding and threshold optimization. Includes statistical analysis, cross-validation, feature importance, and actionable business insights.
XGBoost
Binary classification model for Portuguese bank term deposit prediction using LightGBM. UCI ML dataset (41,188 records). Handles class imbalance (7.87:1), multicollinearity (VIF>26k), and data leakage. Test: 87.8% accuracy, 57.3% recall, 81.1% ROC AUC with comprehensive cross-validation optimization for banking campaign targeting.
Daily bike rental demand prediction using Ridge Regression on Capital Bikeshare data (2011-2012). Addresses multicollinearity, zero-inflated features, and non-normal distributions. Test R²=0.832, CV R²=0.815±0.032. Includes statistical analysis, VIF removal, and comparison of 10 regression algorithms for accurate demand forecasting.
Machine learning regression model predicting 1985 automobile prices. Lasso model achieves 91.7% R² with superior generalization over XGBoost. Handles extreme multicollinearity (VIF 16,676→8.36), data leakage detection, and outlier treatment through PCA and domain-driven feature engineering with comparison of 10 regression algorithms.
CLI tools, desktop apps, and automation frameworks
Desktop application management tool with intelligent window orchestration. Manage and arrange multiple applications across monitors with customizable grid and side-by-side layouts. Windows 10/11. Enhances productivity through smart workspace organization, automated application control, and multi-monitor support.
Automated CSV data analysis with statistical profiling and visualization. Features interactive CLI, automatic delimiter detection, memory-efficient processing, statistical profiling, and visualization generation. Supports Python 3.8-3.13. Published on PyPI v2.0.0 with comprehensive documentation.
Portfolio sites and web applications
Static portfolio website using vanilla JavaScript ES6 modules and Bootstrap 5. Component-based architecture with dynamic nav/footer loading, modular CSS organization, Intersection Observer for animations, ESLint, Stylelint, Prettier for code quality, Lighthouse CI for performance monitoring, and optimized GitHub Pages deployment.
Dive deeper into my data science journey by exploring my Jupyter notebooks, Kaggle competitions, and open-source contributions. Each repository tells a story of problem-solving, innovation, and continuous learning in the field of data science and machine learning.
Complete source code, detailed documentation, and comprehensive project implementations with step-by-step analysis
View GitHub Profile51+ data science notebooks covering machine learning, deep learning, geospatial analysis, time series, and AI fairness with comprehensive exercises and implementations
Explore Kaggle ProfileInteractive data analysis notebooks with visualizations, insights, and detailed methodology explanations
View Notebooks