Innovative Data Science Solutions & Machine Learning Applications
Showcasing innovative solutions that demonstrate expertise in machine learning and data analytics
Python toolkit for exploratory data analysis with 30+ functions across 11 files for statistical summaries, data quality checks, and batch visualizations. Built with modular architecture (10 specialized modules) and constants-driven design. Works in Jupyter, IPython, and terminal. Features environment detection, built-in help system (help(), quick_start(), examples(), list_all()), and multi-dataset comparison. Published on PyPI.
Multi-environment CSV orchestrator resolving numpy/pandas/scipy conflicts through isolated conda environments. Unified interface for YData Profiling, SweetViz, DataPrep engines with subprocess isolation. Features Rich CLI, memory chunking (1GB default), parallel environment setup, lazy loading, and graceful degradation. 28 tests, cross-platform support.
Hybrid Python + Electron app for 15+ AI platforms (ChatGPT, Claude, Gemini, Perplexity, Grok, DeepSeek). Features browser extension for simultaneous prompt distribution, grid/side-by-side layouts, multi-display support, Groq API prompt enhancement, profile switching, and HTTP polling server with JSON-RPC IPC communication.
Classification, regression, and predictive analytics
Home loan default prediction analyzing 58.44M records across 7 datasets. XGBoost classifier with 143 engineered features achieves 83% accuracy, 52.8% recall, and 0.785 ROC AUC at threshold 0.60. Addresses class imbalance (8% default rate), memory optimization (68.5% reduction), multicollinearity, and saves 1200+ hours of manual review through intelligent risk assessment. Grade A evaluation.
XGBoost
Multiclass classification (3 classes) for employee performance using XGBoost with 92.5% accuracy, 0.9756 ROC AUC, and SHAP interpretability. Analyzes 1,200 records across 11 features with class weights for imbalance. Top drivers: environment satisfaction, salary hike, promotion history. Includes department-wise analysis and HR recommendations.
XGBoost
Sales lead quality prediction using XGBoost with 81.06% ROC AUC and 84.74% recall on 7,420 leads. Handles 24% missing data, 26-category high cardinality via frequency encoding, and 1.6:1 class imbalance. Business impact: $142K cost savings, $380K revenue gains, 45% junk lead reduction. Grade A evaluation.
XGBoost
Term deposit prediction using LightGBM on UCI dataset (41,188 records). Addresses 7.87:1 class imbalance, VIF >26K multicollinearity, and data leakage prevention (4 temporal features removed). Achieves 87.8% accuracy, 60.9% recall, 20% lower CV variance than XGBoost. 5 models compared, threshold optimization 0.1-0.9. Grade A evaluation.
Bike rental demand prediction using Ridge Regression on Capital Bikeshare (731 days). R²=0.832, CV R²=0.815±0.032 with only 0.5% overfitting gap (vs XGBoost 9.4%). Resolves VIF 662→37 multicollinearity. 10 algorithms compared, 72x faster than XGBoost. Provides fleet rebalancing recommendations. Grade A+ evaluation.
1985 automobile price prediction using Lasso on UCI dataset (200 samples, 42 features). 91.7% R² with 29 sparse coefficients, 600x faster inference than XGBoost. Resolves extreme multicollinearity (VIF 16,676→8.36) via PCA. 10 algorithms compared with 2.3-point CV-test gap vs XGBoost's 8.3-point gap. Grade A+ evaluation.
CLI tools, desktop apps, and automation frameworks
Hybrid Python + Electron app for intelligent window orchestration on Windows. Features grid/side-by-side layouts, multi-monitor support (span, distribute, overflow modes), profile-based management, keyboard shortcuts, dark/light themes, and first-run onboarding with JSON-RPC IPC, two-tier caching, and live config reload.
Automated CSV profiling with interactive CLI, automatic delimiter
detection, and memory-efficient chunked processing. Features Rich
console interface, exception hierarchy, public Python API with analyze()
function, TableOne/ResearchPy integration. 20 tests, supports Python 3.8-3.13.
PyPI v2.0.0.
Python CLI tool for extracting and organizing content from Jupyter notebooks. Extracts code cells, function/class definitions, imports, markdown, and Base64 embedded images. Features structured output directories, section-based outlines from headers, and execution count preservation. 37 tests with pytest. v0.0.1 Alpha.
Personal financial data extraction system for processing PDF statements from Indian banks, credit cards, and UPI platforms. Supports 10+ institutions. Features institution-specific extractors, unified transaction schema, FY-based consolidation, and CSV outputs covering 5 years of financial data.
Portfolio sites and web applications
Static portfolio with vanilla JS ES6 modules and Bootstrap 5. Features typing animation, 3D card tilt effects, infinite scroll carousels, light/dark theme with FOUC prevention, custom PDF viewer, and certificate viewer. Lighthouse CI, ESLint/Stylelint/Prettier, GitHub Pages deployment.
Job search workflow app with Next.js 15, React 19, Prisma, and shadcn/ui. Features application tracking, AI document analysis (LangChain with Ollama/OpenAI/Vertex AI), Google Calendar/Tasks sync, RSS job aggregation, and browser extension for 50+ portals. Rate limiting, CSRF, optional SQLite encryption.
Dive deeper into my data science journey by exploring my Jupyter notebooks, Kaggle competitions, and open-source contributions. Each repository tells a story of problem-solving, innovation, and continuous learning in the field of data science and machine learning.
Complete source code, detailed documentation, and comprehensive project implementations with step-by-step analysis
View GitHub Profile51+ data science notebooks covering machine learning, deep learning, geospatial analysis, time series, and AI fairness with comprehensive exercises and implementations
Explore Kaggle ProfileInteractive data analysis notebooks with visualizations, insights, and detailed methodology explanations
View Notebooks