← Back to projects
othercomplete

Data Science Portfolio

A collection of machine learning projects implementing classification, clustering, and association rule mining algorithms on real-world datasets.

Data Science Portfolio

A comprehensive collection of data science projects completed for CSCI 5523 (Introduction to Data Mining), demonstrating proficiency across the machine learning pipeline from exploratory analysis to model evaluation.

Project 1: Classification & Analysis

Exploratory Data Analysis

  • Telecom customer churn dataset analysis
  • Feature distribution visualization
  • Correlation analysis and data quality assessment

Decision Trees & kNN

  • Implementation of classification algorithms
  • Hyperparameter tuning and cross-validation
  • Performance comparison across algorithms

Naive Bayes Spam Classification

  • Text preprocessing and feature extraction
  • Probabilistic classification for spam detection
  • Precision/recall analysis

Multi-Dataset ML Analysis

  • Applied ML pipelines to iris, diabetes, and thyroid datasets
  • Comparative model performance evaluation
  • ROC curve analysis and model selection

Project 2: Advanced Analytics

Apriori Algorithm

  • Market basket analysis implementation
  • Association rule mining with support/confidence metrics
  • Frequent itemset discovery

Instacart Transaction Analysis

  • Large-scale retail transaction data
  • Customer purchase pattern identification
  • Product association recommendations

Cluster Analysis

  • K-means and hierarchical clustering
  • Cluster quality evaluation (silhouette scores)
  • Dendrogram visualization

COVID-19 Literature Clustering

  • CORD-19 research paper analysis
  • Text embedding and similarity measures
  • Research topic discovery through unsupervised learning

Technical Skills Demonstrated

  • Data Preprocessing: Handling missing values, normalization, encoding
  • Supervised Learning: Classification, regression, model evaluation
  • Unsupervised Learning: Clustering, dimensionality reduction
  • Association Mining: Apriori, frequent pattern discovery
  • Visualization: matplotlib, seaborn, dendrograms

Tools & Libraries

  • Python (Jupyter Notebooks)
  • pandas, NumPy for data manipulation
  • scikit-learn for ML algorithms
  • matplotlib, seaborn for visualization