Skip to main content
Cursor Boston
Public showcase

Cursor Boston × PyData Data Science Hack

Evening data-science hack at Moderna HQ — talk from Eric Ma, then build

Date
2026-05-13 · 6:30 PM – 9:30 PM ET
Venue
Moderna HQ325 Binney St, Cambridge, MA 02142
Submission branch
pydata-2026-submissions

The challenge

Use Cursor and Marimo to build a notebook that uncovers one compelling insight from the dataset(s) below.

That's it. One notebook. One insight. Make it interesting.

Prizes

  • $200Cursor Credits × 3

    Best presentations

    Awarded to the top 3 presentations as scored by a panel of human judges on data story telling.

  • $200Cursor Credits × 3

    Best submissions

    Awarded to the top 3 notebooks as picked by an AI agent acting as a blind judge.

  • $20Cursor Credits

    For submitting

    Every attendee who submits a notebook gets $20 in Cursor Credits — no judging required.

Prizes are awarded to individuals, not teams. You can only win a single $200 voucher during the event — you can't win both “best presentation” and “best submission” at the same time.

Competition datasets

These are the “competition datasets” for this hackathon. Use any one (or several) of them in your submission. You must use at least one of these to be eligible for the prizes.

Timeline

  1. 7:00Welcome · Sebastian Wallkoetter, Benjamin Batorsky
  2. 7:15Marimo + Cursor for Data Science · Eric Ma
  3. 8:15Q&A with Cursor · virtual, parallel
  4. 9:00Presentations
  5. 9:25Winners announced
  6. 9:30End

Submissions close at 9:00. You must have your notebook PR'd before then to be eligible for either prize.

Submit your notebook

One PR per attendee, targeting the pydata-2026-submissions branch. The event is over, but the exercise stays open: anyone can do the notebook later and open a PR to be listed with the others. Once a maintainer merges it through to main, your card appears in the grid below.

Let Cursor do it for you

Copy this self-contained prompt, paste it into a fresh Cursor chat, and answer its questions about your notebook + title + description. The agent forks the repo, creates the right files, and opens the PR — you don't have to read the manual steps below.

Or follow the manual steps

  1. 1

    Fork the repo

    Click Fork on cursor-boston so you can push without write access to the upstream repo.

  2. 2

    Add your folder

    Create pydata-2026-submissions/<your-gh-handle>/ with submission.py and meta.json inside.

  3. 3

    Open PR into pydata-2026-submissions

    Set the base branch to pydata-2026-submissions, not main. PRs to main will be redirected.

  4. 4

    Wait for the deploy

    A maintainer batches merges into pydata-2026-submissions, then ships develop main. Your card appears below after the Vercel deploy.

meta.json template

{
  "title": "Short, specific title for your notebook",
  "description": "1–3 sentences. What does the notebook do? What did you find?",
  "displayName": "Your name as you want it on the page",
  "tags": ["healthcare", "embeddings"],
  "collaborators": [
    { "displayName": "Pat Collaborator", "githubHandle": "pat-collab" }
  ]
}

Full rules + field reference: pydata-2026-submissions/README.md

How the prizes are decided

Two prize tracks, judged independently. Submit your notebook for the AI-judged track; sign up at the event and pitch for the human-judged track.

Best submission

An AI agent reviews every notebook as a blind judge and picks the top 3.

Process

  1. Submit your final notebook before 9:00.
  2. Wait for winners to be announced.

Eligibility

  • Submit your final notebook (see Submission below).
  • Use at least one of the competition datasets.

Best presentation

25 slots, first-come first-serve. 1-minute hard limit. Sign up at the event by entering your name and GitHub handle into the presentation list — if it's full, it's full.

Process

  1. Sign up on the pitch list at the event (25 slots).
  2. Submit your final notebook before 9:00.
  3. Line up to present.
  4. Present — 1 minute, notebook as backdrop.
  5. Wait for winners to be announced.

Eligibility

  • Submit your final notebook before you present.
  • Use at least one of the competition datasets.
  • Actually present.

Eval criteria

Data story telling. How clearly do you communicate your insight and its significance using the available dataset(s)? A panel of human judges scores each presentation; scores are combined and the top 3 win. Ties broken at random.

Final judging board

Merged submissions (27)

Cards are sorted by AI judge score. Only PRs opened before 9:00 PM Eastern on May 13 are winner eligible; later PRs are shown separately for showcase visibility.

13

Eligible

14

After 9 PM

Winner eligible (13)

PR opened before 9:00 PM Eastern on May 13.

  • Speculative Decoding for Robot Diffusion Policies

    by Aarav Raina · @aaravraina3

    Ports speculative decoding (vLLM/Medusa/EAGLE) from LLM serving to robot diffusion policies. A 4.5k-param MLP draft and the 263M-param lerobot/diffusion_pusht verifier are gated by an action-difference threshold tau. Across 3 episodes of pusht: mean 2.2x speedup, mean 51% MLP acceptance, +8.9% action MSE -- consistently hitting the 10 Hz / 100 ms deadline that pure diffusion misses. Includes a serving Pareto, adaptive denoising (EAGLE analogue), cross-episode validation, and an interactive tau s

    • robotics
    • lerobot
    • inference-infra
    • speculative-decoding
    • diffusion-policy
    • marimo
    Winner eligible
    AI judge score9/10

    This is a sharp and technically ambitious notebook: it adapts speculative decoding from LLM serving to robot diffusion policies and backs the claim with latency, acceptance-rate, MSE, and cross-episode validation. The narrative is clear, the 10 Hz deadline makes the result actionable, and the interactive tau/Pareto exploration uses Marimo well. The main caveat is that the methodology is complex enough that a reader has to trust several implementation details, but the evidence presented is unusually strong for a single-evening submission.

  • The Bank's Recontact Calendar Was Upside-Down

    by Brad Egan · @bradagi

    Marimo analysis of the UCI Bank Marketing dataset surfacing two campaign-management levers the bank ignored: a 31-90 day recontact sweet spot that converts at 42% (the bank placed only 6% of recontacts there), and a dial-count exhaustion curve where calls past attempt #6 burn 56 minutes per subscription vs 35 on the first three. Includes a within-month sanity check that controls for the obvious seasonality confound.

    Winner eligible
    AI judge score8.8/10

    Refined tie-break score: this notebook delivers the strongest winner-eligible 8-range result because it turns the UCI Bank Marketing data into a crisp, operational recommendation: the bank underused the 31-90 day recontact window where conversion was strongest while over-dialing weaker late windows. It adds strong visual contrasts, checks the obvious seasonality confound within May, estimates missed opportunity, and separately analyzes dial-count exhaustion. It is less technically ambitious than the 9/10 entries, but it has the clearest actionable insight among the tied 8s.

  • Which Neighborhoods Are Transit-Privileged?

    by Orijit · @ori98

    This notebook joins BPDA neighborhood boundaries with MBTA rapid-transit access, a real-time service-alert exposure index, and MassDOT Blue Book reliability ratios (ArcGIS) to compare which Boston neighborhoods look most and least transit-privileged when access, disruption messaging, and chronic wait-time performance are combined.

    • transit
    • mbta
    • geospatial
    • boston
    • exploratory
    Winner eligible
    AI judge score8.5/10

    Refined tie-break score: this notebook builds a thoughtful composite measure of transit privilege by combining rapid-transit access, live alert exposure, and chronic Blue Book wait-time reliability. The geospatial joins, ethical framing, sensitivity controls, and factor breakdown make it more rigorous than a simple map. It ranks just below the top 8-range entry because the live-alert component is time-sensitive and the composite weighting remains exploratory, so the finding is useful but less cleanly decisive.

  • Exoplanet Explorer

    by Ryan Spangler · @rdspangler

    Every spike on the exoplanet discovery chart is a telescope, not a planet. This Marimo notebook walks through 5,700+ confirmed worlds from NASA's archive — annotated discovery timeline, planet-class radius histogram, and a transparent Earth-likeness shortlist of the closest cousins JWST could observe next.

    • astronomy
    • nasa
    • exoplanets
    • interactive
    • marimo
    Winner eligible
    AI judge score8.3/10

    Refined tie-break score: this is a polished, well-explained exoplanet explorer that combines discovery-history storytelling with interactive Earth-like candidate filtering and transparent ranking. The controls and candidate cards are strong for public exploration. It ranks below the higher 8-range entries because the main result is broader educational exploration rather than a single sharply defended quantitative claim.

  • The Valley Between Worlds — Finding the Fulton Gap

    by Amir Molavi · @amirmolavi

    An interactive marimo notebook that reveals the Fulton Gap: a striking valley in the size distribution of small exoplanets discovered by Kepler. The gap, near 1.8 Earth radii, hints at atmospheric stripping of close-in planets — visualized with public NASA Exoplanet Archive data.

    • exoplanets
    • kepler
    • astronomy
    • visualization
    • marimo
    Winner eligible
    AI judge score8.1/10

    Refined tie-break score: this notebook is a polished Marimo narrative around the Fulton Gap, with careful sample cuts, effective visualizations, and an interactive irradiation filter that demonstrates the expected radius-valley shift. It is elegant and accessible, but among the tied 8s it is least novel because it primarily recreates a known published result rather than surfacing a new actionable or original finding.

  • MBTA Reliability Oracle

    by Harry Joshi · @harryj12

    AI-powered transit reliability and infrastructure anomaly intelligence for the MBTA. This Marimo notebook builds a scalable reliability foundation from MBTA 2026 rapid transit travel-time data, including route-pair reliability metrics, anomaly timelines, and an infrastructure failure detector based on joint travel-time spikes and service-volume drops.

    • mbta
    • transit
    • reliability
    • anomaly-detection
    • data-science
    Winner eligible
    AI judge score7/10

    The notebook builds a practical MBTA reliability baseline from heavy-rail travel-time data, with route-pair variability metrics, unstable segment rankings, and anomaly-day detection. The operational framing is clear and the Polars-based workflow is credible for larger transit files. It falls short of the top tier because it depends on local CSV availability and the key insight is more of a reliability tooling foundation than one surprising, tightly defended finding.

  • Mutation Radar — Spike RBD triage from ProteinGym ACE2-binding DMS

    by Pierre · @buenogrande

    Marimo notebook for the May 13, 2026 Cursor Boston × PyData hack: loads the ProteinGym SARS-CoV-2 Spike RBD ACE2-binding deep mutational scanning assay, summarizes per-site tolerance, and surfaces flexible vs brittle positions with heatmaps and rankings. Public data only (Hugging Face URLs). Final submission.

    Winner eligible
    AI judge score7/10

    This final notebook has a clear early-triage story for Spike RBD mutations and turns ProteinGym DMS measurements into flexible-site, brittle-site, and high-binding mutation rankings. The narrative framing is strong and the heatmap/watchlist format is useful for quickly prioritizing variants. It is less rigorous than the top entries because it mostly summarizes single-assay fitness rather than validating whether the watchlist predicts real surveillance or immune-escape risk, but it is a solid and relevant analysis.

  • Boston Liquor License Explorer

    by Gavin Sadler · @gavinsadler

    Interactive analysis of ~1,500 active Boston liquor licenses from the city's open data portal. Includes a clustered marker map, a kernel density heatmap showing license concentration across neighborhoods, and a time-of-day heatmap with a slider to explore which establishments are still open at any given hour.

    • geospatial
    • folium
    • open-data
    • boston
    • heatmap
    Winner eligible
    AI judge score6/10

    The notebook provides a useful interactive geospatial explorer for Boston liquor licenses, with clustered markers, density heatmap, and time-of-day filtering. It is polished as a map interface and uses city open data effectively, but the insight is mostly descriptive concentration/exploration rather than a surprising or rigorously defended finding. The work is solid, especially visually, but less analytical than the top submissions.

  • Boston Economic Indicator

    by Johnny Wang · @johnnywang1998

    What indicates Boston's Economy?

    Winner eligible
    AI judge score6/10

    The notebook provides a polished narrative tour of Boston economic indicators, with useful attention to missingness/zero-filled legacy fields and multi-series context around jobs, mobility, hospitality, housing, and development. It is visually and rhetorically engaging, but the core insight is diffuse: it reads more like an exploratory dashboard than one sharply defended economic indicator or surprising finding. The analysis is solid enough to be useful, but not as focused or rigorous as the strongest entries.

  • Customer Segmentation Dashboard — UCI Online Retail

    by Vishal Prasanna · @vishalprasanna11

    An interactive marimo notebook that segments e-commerce customers using RFM (Recency, Frequency, Monetary) analysis on the UCI Online Retail dataset. Visualizes customer groups like Champions, Loyal, At Risk, and Lost — and surfaces AI-generated marketing recommendations for each segment.

    • retail
    • customer-segmentation
    • rfm
    • ecommerce
    • dashboard
    Winner eligible
    AI judge score6/10

    The notebook builds a useful RFM customer segmentation dashboard on UCI Online Retail, with cleaned transaction data, segment labels, and practical marketing playbooks. It is business-friendly and interactive, but the core RFM approach is standard and the AI recommendations are prewritten playbook copy rather than model-derived evidence. It is a solid applied dashboard rather than a surprising analytical insight.

  • MBTA x Weather

    by Paramjeet Singh · @paramjeet-singh-neu

    Shows how weather affects the MBTA and how each line behaves as a different system.

    Winner eligible
    AI judge score5/10

    The notebook has a coherent idea: compare MBTA route headway unpredictability and relate it to weather exposure, with a concrete Green Line vs Blue Line framing. The live CV calculation and chart make the finding approachable, but the analysis is fragile because it depends on a live API snapshot and a local weather CSV that may be absent, and the weather connection is more suggestive than rigorously supported. It is a reasonable exploratory submission but not a fully defended insight.

  • UCI Bank Marketing: more calls, more loans?

    by Jonathan Littel · @jonathanlittel

    Looks at marketing data by a Portuguese bank, and compares the number of marketing touches with the average loan size taken by customers.

    • exploratory
    • histogram
    • pandas
    Winner eligible
    AI judge score3/10

    The notebook loads the UCI Bank Marketing data, gives summary statistics, and explores whether more campaign contacts are associated with customer balance. It is honest about missing customer/loan timing limitations, but the analysis is narrow, mostly descriptive, and does not establish a strong or actionable insight. The result is a small exploratory check rather than a compelling hackathon finding.

  • Ed's One Big Insight

    by Ed Ruiz · @ed2uiz

    PyData x Cursor 2026 hackathon submission.

    • ai
    • marimo
    Winner eligible
    AI judge score1/10

    The PR has a metadata file but the submitted `submission.py` is empty, so there is no notebook, analysis, evidence, or insight to evaluate. This is therefore a broken/empty submission under the rubric.

After deadline showcase (14)

PR opened after 9:00 PM Eastern. Included on the page, but not eligible for winner selection.

  • The Hidden Axes of Primary Ciliary Dyskinesia in Public Molecular Data

    by Trevor Campbell · @trevordcampbell

    A Marimo public-data atlas showing how GTEx tissue expression and ProteinGym benchmark coverage reveal hidden PCD axes, then using ClinVar, PubMed, Open Targets, gnomAD, Human Protein Atlas, and cited clinical context to qualify where evidence is strong, sparse, robust, or only visible outside benchmarks.

    • marimo
    • biology
    • gtex
    • proteingym
    • clinvar
    • healthcare
    After deadline
    AI judge score9/10

    This is an unusually ambitious and well-framed public-data atlas: it combines GTEx tissue expression, ProteinGym coverage, ClinVar, PubMed, Open Targets, and other evidence layers to ask which PCD mechanisms are visible or underrepresented in public molecular data. The notebook is careful about provenance, limitations, and clinical boundaries while still producing a useful evidence-gap map. It is broad rather than a single narrow statistical punchline, but the scope, rigor, and storytelling make it one of the strongest submissions.

  • bank data

    by dave gogi · @pocketp1ck

    bank marketing

    After deadline
    AI judge score8/10

    The notebook extracts a sharp story from the UCI Bank Marketing data: conversion rises dramatically across the inferred 2008-2010 campaign timeline, and the analysis connects that lift to warmer targeting and prior-success signals while excluding post-call duration from predictive modeling. It is well structured, uses sanity-check assertions, and balances descriptive trend analysis with realistic modeling. The main caveat is that the inferred year reconstruction depends on row ordering, but the author is explicit about that assumption.

  • TERRA: Project Hail Mary–style exoplanet explorer (NASA archive + mission mode)

    by pjsk02 · @pjsk02

    Loads confirmed planets from the NASA Exoplanet Archive (with a synthetic fallback if TAP is unreachable), scores candidates with an Earth Similarity Index and optional distance-weighted viability, and lets you filter and explore top systems in interactive Plotly charts—plus a “Mission Mode” for nearer-star prioritization.

    • exoplanets
    • nasa
    • marimo
    • visualization
    • astronomy
    • interactive

    With: Gnana Shishir Kumar

    After deadline
    AI judge score8/10

    The notebook is a polished and imaginative exoplanet mission-planner interface with NASA Exoplanet Archive integration, Earth-similarity scoring, interactive controls, Plotly candidate ranking, and a clear mission-mode framing for nearer-star prioritization. The metadata update now gives the submission a complete package. It is still more of an exploratory decision-support tool than a rigorously validated scientific analysis, but the presentation, interactivity, and coherent scoring workflow make it one of the stronger submissions.

  • Second Earth Scout: NASA Exoplanet Habitability Analysis

    by Ankit Tej Yadav · @ankittejyadav

    An advanced exoplanet intelligence system using Random Forest classifiers and anomaly detection to identify 'Dark Horse Earths'—habitable worlds that defy standard planetary archetypes.

    • astronomy
    • machine-learning
    • exoplanets
    • habitability
    After deadline
    AI judge score7/10

    This replacement notebook improves the original Second Earth Scout concept with a clearer habitability-intelligence narrative, NASA Exoplanet Archive data, Earth-similarity and habitable-zone scoring, clustering, anomaly detection, and Random Forest ranking. The dark-horse framing is engaging and the analysis is more polished than the first version, but the model still relies on synthetic labels derived from handcrafted habitability scores, so the evidence is more exploratory than independently validated. Overall it is a strong interactive exploration with moderate methodological rigor.

  • Who Converts and When: A Campaign Timing Analysis of Bank Marketing Data

    by Mallika Gaikwad · @mallikagaikwad

    Analyzes 45,000 bank outreach records to uncover that retired and student customers contacted in March, September, or October convert at 5-10x the rate of the same segments in May. Includes call fatigue analysis and a data leakage warning around call duration — framed as three concrete pipeline design rules.

    • business
    • marketing
    • campaign-optimization
    After deadline
    AI judge score7/10

    The notebook presents a clear campaign timing insight from the UCI Bank Marketing data: conversion varies sharply by job segment and month, with retired and student customers performing especially well in March, September, and October. It also includes useful call-fatigue analysis and correctly warns about duration leakage for predictive pipelines. The analysis is focused and actionable, though it is mostly descriptive segmentation rather than a deeper causal or validated modeling study.

  • MBTA Transit Equity & Civic Improvement Dashboard

    by manisha002307735 · @manisha002307735

    A multilingual, AI-powered civic tool that exposes transit inequity in Boston using live MBTA data. Features ghost bus detection, wait time Pain Index, RAG-based policy fact-checking, a conversational transit agent (Ask the T), equity scorecards with Census demographics, and an auto-generated open letter to the MBTA Board. Supports 7 languages including Haitian Creole and Arabic. Built with Claude API, LangGraph, MCP, RAG, plotly, and marimo.

    • transit-equity
    • mbta
    • boston
    • civic-tech
    • llm
    • rag
    After deadline
    AI judge score7/10

    The notebook is ambitious and user-facing: it combines MBTA live data, equity scorecards, ghost-bus heuristics, multilingual copy, RAG-style policy snippets, and advocacy outputs into a civic dashboard. The scope and practical framing are strong, but some demographics/policy inputs are illustrative or hardcoded and the LLM features depend on optional API keys, so the evidence is less rigorous than the interface suggests. It is a compelling prototype with meaningful civic value, but not a fully validated transit equity analysis.

  • Wall of Shame — where protein fitness predictors collectively fail

    by Mohd Mujtaba Shoeb · @shuaeb6

    Joins the ProteinGym benchmark summary (217 deep mutational scanning assays scored by ~80 models) with per-assay metadata to find proteins every state-of-the-art model fails on. Stability assays are easiest (median best-model Spearman 0.72), activity is the frontier. Interactive py3Dmol viewer fetches AlphaFold structures by UniProt ID, colored by pLDDT — the hardest proteins overlap with regions AlphaFold itself isn't confident in.

    • proteingym
    • alphafold
    • protein-engineering
    • marimo
    • 3d-viewer
    After deadline
    AI judge score7/10

    The notebook asks a strong benchmark question: where do all ProteinGym models fail collectively, rather than which model wins on average. Joining model-level Spearman summaries to assay metadata and linking hard proteins to AlphaFold structure confidence is an interesting and useful failure-mode analysis. It is held back by a few unfinished interpretation notes and a relatively light final synthesis, but the core idea and implementation are solid.

  • From UCI Wall-Following to LeRobot-Shaped Episodes

    by Saad Ali · @saadai113

    Compares classic UCI wall-following robot navigation features with modern LeRobot-style episodic data, including static and lagged classifiers plus a schema peek at LeRobot data.

    • robotics
    • machine-learning
    • classification
    • lerobot

    With: Saad Ali

    After deadline
    AI judge score7/10

    This is a coherent Marimo notebook with a strong technical through-line: it connects a classic UCI robot wall-following dataset to modern LeRobot episode structure, compares logistic regression and MLP baselines, and shows how lagged observations approximate short-term memory. The framing is clear and the code is compact. It loses points because the PR packaging was messy, the original metadata was empty, and the analysis is mostly a compact model comparison/schema sketch rather than a deeper diagnostic study with visualizations or extensive interpretation.

  • Cartography of 24 Minds — TF-IDF, PCA, and clustering of historical-thinker personas

    by AaronGrace978 · @aarongrace978

    Marimo notebook for the May 13, 2026 Cursor Boston × PyData hack. Treats 24 historical-thinker persona prompts as a text corpus, then maps them with TF-IDF, cosine similarity, PCA, KMeans clusters, and distinctive-term tables. Self-contained: embedded corpus, no API keys, no internet at run-time.

    • nlp
    • tf-idf
    • pca
    • clustering
    • exploratory
    • humanities
    After deadline
    AI judge score6/10

    The notebook is a polished self-contained NLP exploration that maps 24 historical-thinker persona prompts with TF-IDF, cosine similarity, PCA, KMeans, and distinctive-term tables. It is creative and readable, but the corpus is hand-authored and small, so the analysis is more of a thoughtful visualization exercise than a strongly evidenced external-data insight. The presentation is strong, while the empirical stakes are modest.

  • Boston Crime & Safety Analysis — District Risk Scoring

    by Abdul Saif Mohammed · @saiff7

    Analyzes Boston crime incidents and Vision Zero crash data to compute a district-level safety score. Finds that districts B2, B3, and C11 are highest risk while A1 and A7 are safest, with an interactive gesture-controlled map.

    • boston
    • crime
    • safety
    • geospatial
    • folium
    After deadline
    AI judge score6/10

    The submission builds a clear Boston crime and safety analysis with district-level risk scoring, heatmaps, crash concern overlays, and a concrete finding that B2, B3, and C11 are highest risk while A1 and A7 are safer. The interactive map is useful and the civic-data framing is strong. It loses points because the notebook is an exported Jupyter script rather than a polished Marimo notebook, installs packages at runtime, relies on browser/camera gesture controls that are not central to the analysis, and uses a simple severity heuristic without much validation or uncertainty analysis.

  • Revenue Concentration Risk Analysis in E-Commerce Retail

    by Vimalesh · @vimaleshraja

    Analysis and churn prediction of the revenue

    After deadline
    AI judge score6/10

    The notebook gives a broad and practical Online Retail analysis: cleaning, time/geography/product/customer EDA, RFM segmentation, revenue concentration, and churn-style simulations. It is useful as a business dashboard and covers the dataset well, but the core finding is a fairly standard concentration/RFM story rather than a surprising single insight. The work is solid, but the narrative and rigor are more walkthrough-like than exceptional.

  • Probably Don't Drive your Bike at Night

    by Amen · @mahtaraatwit

    I used a bike share dataset to find out if people bike more at night than during the day.

    • bikes
    • morebikes
    • streetlights

    With: Pat Collaborator

    After deadline
    AI judge score5/10

    The notebook has an interesting descriptive geospatial question: whether nighttime Bluebikes trip ends are closer to Boston streetlights than daytime trip ends, with nearest-neighbor distances and map-style visualizations. The one-line takeaway is clear and it acknowledges confounders, but the submission depends on local data files not shown in the PR and originally had a syntax typo, so reproducibility and polish are weak. It is a reasonable exploratory analysis but not a fully robust insight.

  • Will make it on time? MBTA Transport Analysis

    by Kumari Simran · @simrankumari30

    I wanted to find the historical trends of MBTA Transports on time. Which modes are most reliable v/s which are the least reliable.

    • government
    • data
    • exploratory

    With: Kumari Simran

    After deadline
    AI judge score4/10

    The notebook explores MBTA GTFS and reliability data with route-type counts, bus shape visualizations, and bus reliability mapping. It has a relevant civic-data theme, but the analysis is mostly exploratory infrastructure plotting, includes a weak opening/metadata package, and does not clearly answer the stated on-time reliability question with a crisp finding. It is a useful start, but less polished and less insight-focused than stronger submissions.

  • MA Public Salary Data - Sanaz

    by Sanaz Agarwal · @sanaz01

    Publicly available data for salaries paid to Govt. employees in MA. A detailed trend analysis based on Zip Codes and Titles

    After deadline
    AI judge score4/10

    The notebook outlines a useful interactive dashboard for public salary/employee earnings data, including department, title, ZIP, and map-based breakdowns over 2011-2025. However, it depends on local data files that are not included in the PR, starts with placeholder cells, and the analysis is mostly descriptive distribution plotting rather than a sharp finding. The dashboard idea is valuable but the submitted package is only partially reproducible.