Featured Projects

Showcasing innovative solutions that demonstrate expertise in machine learning and data analytics

Machine Learning Projects

Classification, regression, and predictive analytics

Home Loan Default Predictor

Home loan default prediction analyzing 58.44M records across 7 datasets. XGBoost classifier with 143 engineered features achieves 83% accuracy, 52.8% recall, and 0.785 ROC AUC at threshold 0.60. Addresses class imbalance (8% default rate), memory optimization (68.5% reduction), multicollinearity, and saves 1200+ hours of manual review through intelligent risk assessment. Grade A evaluation.

Python XGBoost Scikit-learn Jupyter

INX Employee Performance Analysis

Multiclass classification (3 classes) for employee performance using XGBoost with 92.5% accuracy, 0.9756 ROC AUC, and SHAP interpretability. Analyzes 1,200 records across 11 features with class weights for imbalance. Top drivers: environment satisfaction, salary hike, promotion history. Includes department-wise analysis and HR recommendations.

Python XGBoost SHAP Scikit-learn

FicZon Sales Effectiveness

Sales lead quality prediction using XGBoost with 81.06% ROC AUC and 84.74% recall on 7,420 leads. Handles 24% missing data, 26-category high cardinality via frequency encoding, and 1.6:1 class imbalance. Business impact: $142K cost savings, $380K revenue gains, 45% junk lead reduction. Grade A evaluation.

Python XGBoost Scikit-learn Jupyter

Portuguese Bank Campaign Predictor

Term deposit prediction using LightGBM on UCI dataset (41,188 records). Addresses 7.87:1 class imbalance, VIF >26K multicollinearity, and data leakage prevention (4 temporal features removed). Achieves 87.8% accuracy, 60.9% recall, 20% lower CV variance than XGBoost. 5 models compared, threshold optimization 0.1-0.9. Grade A evaluation.

Python LightGBM Scikit-learn Jupyter

Capital Bikeshare Demand Forecaster

Bike rental demand prediction using Ridge Regression on Capital Bikeshare (731 days). R²=0.832, CV R²=0.815±0.032 with only 0.5% overfitting gap (vs XGBoost 9.4%). Resolves VIF 662→37 multicollinearity. 10 algorithms compared, 72x faster than XGBoost. Provides fleet rebalancing recommendations. Grade A+ evaluation.

Python Ridge Regression Scikit-learn Jupyter

Automobile Price Predictor

1985 automobile price prediction using Lasso on UCI dataset (200 samples, 42 features). 91.7% R² with 29 sparse coefficients, 600x faster inference than XGBoost. Resolves extreme multicollinearity (VIF 16,676→8.36) via PCA. 10 algorithms compared with 2.3-point CV-test gap vs XGBoost's 8.3-point gap. Grade A+ evaluation.

Python Lasso PCA Scikit-learn

Development Tools & Automation

CLI tools, desktop apps, and automation frameworks

SpectraTact

Hybrid Python + Electron app for intelligent window orchestration on Windows. Features grid/side-by-side layouts, multi-monitor support (span, distribute, overflow modes), profile-based management, keyboard shortcuts, dark/light themes, and first-run onboarding with JSON-RPC IPC, two-tier caching, and live config reload.

Electron Python Windows Automation
PyPI

AutoCSV Profiler AutoCSV Profiler Downloads

Automated CSV profiling with interactive CLI, automatic delimiter detection, and memory-efficient chunked processing. Features Rich console interface, exception hierarchy, public Python API with analyze() function, TableOne/ResearchPy integration. 20 tests, supports Python 3.8-3.13. PyPI v2.0.0.

Python 3.8+ CLI Pandas Rich
Private

Notebook Extractor

Python CLI tool for extracting and organizing content from Jupyter notebooks. Extracts code cells, function/class definitions, imports, markdown, and Base64 embedded images. Features structured output directories, section-based outlines from headers, and execution count preservation. 37 tests with pytest. v0.0.1 Alpha.

Python nbformat CLI pytest
Private

StatementForge

Personal financial data extraction system for processing PDF statements from Indian banks, credit cards, and UPI platforms. Supports 10+ institutions. Features institution-specific extractors, unified transaction schema, FY-based consolidation, and CSV outputs covering 5 years of financial data.

Python pdfplumber Pandas ETL

Web Development

Portfolio sites and web applications

Portfolio Website (dhaneshbb.github.io)

Static portfolio with vanilla JS ES6 modules and Bootstrap 5. Features typing animation, 3D card tilt effects, infinite scroll carousels, light/dark theme with FOUC prevention, custom PDF viewer, and certificate viewer. Lighthouse CI, ESLint/Stylelint/Prettier, GitHub Pages deployment.

JavaScript Bootstrap 5 GitHub Pages ESLint
Private

JobSync

Job search workflow app with Next.js 15, React 19, Prisma, and shadcn/ui. Features application tracking, AI document analysis (LangChain with Ollama/OpenAI/Vertex AI), Google Calendar/Tasks sync, RSS job aggregation, and browser extension for 50+ portals. Rate limiting, CSRF, optional SQLite encryption.

Next.js 15 TypeScript Prisma LangChain
View My Code

Explore More Work

Dive deeper into my data science journey by exploring my Jupyter notebooks, Kaggle competitions, and open-source contributions. Each repository tells a story of problem-solving, innovation, and continuous learning in the field of data science and machine learning.

GitHub Repositories

Complete source code, detailed documentation, and comprehensive project implementations with step-by-step analysis

View GitHub Profile

Kaggle Notebooks

51+ data science notebooks covering machine learning, deep learning, geospatial analysis, time series, and AI fairness with comprehensive exercises and implementations

Explore Kaggle Profile

Jupyter Notebooks

Interactive data analysis notebooks with visualizations, insights, and detailed methodology explanations

View Notebooks