Evaluating and Analyzing the LACCTiC API and Algorithim¶
Disclaimer: This is the source code which is meant to accompany the main report, all detailed analyses is explained in the report. Some packages (pandas and matplotlib are loaded twice (in function, and out of function).
When given the key for the API, the following instructions are also given: ``` Here is the main link to my API (the rest of this message is copy-pasted) https://c03mmwsf5i.execute-api.us-east-2.amazonaws.com/production/api_ranking/
The endpoints with "_page" roughly correspond to the pages on LACCTiC. You can look up "REST" frameworks to get a sense of how to navigate everything.
For a little example, https://c03mmwsf5i.execute-api.us-east-2.amazonaws.com/production/api_ranking/runner_page/ gives me a list of runners and their information. Because I have not specified a runner, I get a list of them. 75,000 would be a bit much all at once, so the results are "paginated." The attribute "next" in this class takes you to the link to the next page (which has a "?page=2" at the end of the URL).
Every runner, team, league, race etc has an "id" which is unique. To see a specific runner, you go to the same link as before but with the ID at the end of the URL. For example, https://c03mmwsf5i.execute-api.us-east-2.amazonaws.com/production/api_ranking/runner_page/12758/ gives the runner_page information for just Conner Mantz.
The data is stored in JSON format and there are three main data sources, each formatted with a different structure:
1. Runner pages - these pages show runner information and their results from each race they particiapted in
2. Race pages - these pages show results from a particular race
3. Team pages - these pages show roster information of any runner who has EVER raced for a particular team. (Team pages will be ignored for this analysis because needed team members for race will be within a given dataset for that particular race)
Creating Functions to read each source type¶
Since reading in each source type may become repetitive, functions can and will be created so that each runner and race can be loaded into a pandas dataframe by simply inputting the runner or race ID. Functions are used throughout the notebook so that replicatability for a different race or performance can be conducted in the future.
def LaccticRunnerToDF(runner_code): #creates a function which uses the LACCTiC runner code
import urllib.request #for reading web data
import json #for reading and converting JSON data
import pandas as pd #for creating the pandas dataframe
import io
#takes the runner_code and inputs it into the url string and saves the link as runner_url
runner_url = "https://c03mmwsf5i.execute-api.us-east-2.amazonaws.com/production/api_ranking/runner_page/"+str(runner_code)+"/?format=json"
response = urllib.request.urlopen(runner_url) #opens the request and saves it as response
json_string = response.read().decode('utf-8') #decodes the response in JSON format
eq_parsed_json = json.loads(json_string) #parses the JSON into a dictionary
runnerName = str(eq_parsed_json['firstname'] + ' ' + eq_parsed_json['lastname']) #saves the name of the runner
seasonList = eq_parsed_json['season_ratings'] #creates a list of the season ratings from the dictionary
races_df = pd.DataFrame() #creates an empty data frame
for season in seasonList: #iterates through the list of season ratings, goes through season by season
races = season['season_xc_performances'] #selects the season xc performances
s = json.dumps(races) #parses the season xc performances
season_df=pd.read_json(io.StringIO(s)) #loads the races from one season into a dataframe
races_df = pd.concat([races_df, season_df], ignore_index=True) #appends that dataframe to the dataframe of all seasons
#The race column in a dataframe is further nested and needs to be analyzed
for index, row in races_df.iterrows():
races_df.loc[index, 'date'] = row['race']['date'] #creates a date column from the race information
races_df.loc[index, 'race_id'] = row['race']['id'] #creates a race ID column from the race information
races_df.loc[index, 'race'] = row['race']['meet_name'] #creates a race name column from the race information
races_df.loc[index, 'runner_name'] = runnerName #adds a runner name column, all the same, but can be useful for JOIN later
#races_df.drop('significant', axis=1, inplace=True) #drops the dummy variable 1 if race has any significance
print("Returning a Panda's DF for: " + runnerName) #prints to the console what the function is returning
return(races_df) #returns a complete data frame of all races a runner has run.
def LaccticRaceToDF(race_code): #creates a function which uses the LACCTiC race code
import urllib.request #for reading web data
import json #for reading and converting JSON data
import pandas as pd #for creating the pandas dataframe
import io
#takes the race_code and inputs it into the url string and saves the link as runner_url
race_url = "https://c03mmwsf5i.execute-api.us-east-2.amazonaws.com/production/api_ranking/race_page/"+str(race_code)+"/?format=json"
response = urllib.request.urlopen(race_url) #opens the request and saves it as response
json_string = response.read().decode('utf-8') #decodes the response in JSON format
eq_parsed_json = json.loads(json_string) #parses the JSON into a dictionary
results = eq_parsed_json['xc_results'] #selects the results from the xc race
s = json.dumps(results) #saves the results to s
race_df = pd.read_json(io.StringIO(s)) #converts the results of the race to a pandas dataframe
for index, row in race_df.iterrows(): #the runner column needs to be further parsed to get data
race_df.loc[index, 'runner_id'] = row['runner']['id'] #gets the runner_id from the runner section
race_df.loc[index, 'team'] = row['runner']['team']['name'] #gets the team name from the runner section
race_df.loc[index, 'runner_name'] = row['runner']['firstname'] + " " + row['runner']['lastname'] #gets the runner name from the runner section
race_df.drop('runner', axis=1, inplace=True) #drops the runner column after it is iterated over
raceName = str(eq_parsed_json['meet_name']) #saves the name of the race
print("Returning a Pandas DF of race results for: " + raceName) #prints to the console what the function is returning
return(race_df) #returns a complete data frame of all races a runner has run.
Exploratory Research Questions¶
Using these functions to quickly load race and runner data, three exploratory research questions will be answered. These questions will be answered based off of the 2023 NCAA Cross Country Championship, for reasons outlined in the report.
**Research Question 1** (Race Source Based): Which teams performed the best and worst based on their predicted place by the LACCTiC alogrithm
**Research Question 2** (Runner Source Based): After reading an individual's source page, can we examine if they have had significant improvement from year to year
**Research Question 3** (*Joining* Runner and Race Sources): Which individuals at the national championship were the most consistent racers, and which indivudals had extreme (fast or slow) performance TiCs.
Research Question 1: Which teams performed the best and worst based on their predicted place by the LACCTiC Alogrithim¶
First the function for races can be called to load the race data into a dataframe.
NCAA2023Champs = LaccticRaceToDF(8187)
NCAA2023Champs
Returning a Pandas DF of race results for: NCAA Division I Cross Country Championships
| time | modern_tic | predicted_place | place | runner_id | team | runner_name | |
|---|---|---|---|---|---|---|---|
| 0 | 1717.7001 | 6.677224 | 9 | 1 | 53652.0 | Harvard | Graham Blanks |
| 1 | 1720.7002 | 6.678969 | 8 | 2 | 89976.0 | New Mexico | Habtom Samuel |
| 2 | 1735.7003 | 6.687649 | 5 | 3 | 38094.0 | Stanford | Ky Robinson |
| 3 | 1739.7004 | 6.689951 | 2 | 4 | 90518.0 | Oklahoma State | Denis Kipngetich |
| 4 | 1743.8005 | 6.692305 | 12 | 5 | 24322.0 | Northern Arizona | Drew Bosley |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 231 | 1982.7232 | 6.820709 | 171 | 232 | 61608.0 | Virginia | Justin Wachtel |
| 232 | 1982.8233 | 6.820760 | 199 | 233 | 12417.0 | Eastern Kentucky | Keeton Thornsberry |
| 233 | 1983.9234 | 6.821314 | 175 | 234 | 90187.0 | Villanova | Xian Shively |
| 234 | 1986.9235 | 6.822825 | 177 | 235 | 38113.0 | Butler | Jack McMahon |
| 235 | 2056.9236 | 6.857449 | 234 | 236 | 84563.0 | North Carolina | Luke Wiley |
236 rows × 7 columns
A performance_score column will be added to the data frame, where a positive predicted score indicates a better perofmance. (For example: Habtom Samuel's performance_score of 6 means he performed 6 places better than expected) A negative performance_score means a runner performed worse than expected.
NCAA2023Champs['performance_score'] = NCAA2023Champs['predicted_place'] - NCAA2023Champs['place']
NCAA2023Champs
| time | modern_tic | predicted_place | place | runner_id | team | runner_name | performance_score | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1717.7001 | 6.677224 | 9 | 1 | 53652.0 | Harvard | Graham Blanks | 8 |
| 1 | 1720.7002 | 6.678969 | 8 | 2 | 89976.0 | New Mexico | Habtom Samuel | 6 |
| 2 | 1735.7003 | 6.687649 | 5 | 3 | 38094.0 | Stanford | Ky Robinson | 2 |
| 3 | 1739.7004 | 6.689951 | 2 | 4 | 90518.0 | Oklahoma State | Denis Kipngetich | -2 |
| 4 | 1743.8005 | 6.692305 | 12 | 5 | 24322.0 | Northern Arizona | Drew Bosley | 7 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 231 | 1982.7232 | 6.820709 | 171 | 232 | 61608.0 | Virginia | Justin Wachtel | -61 |
| 232 | 1982.8233 | 6.820760 | 199 | 233 | 12417.0 | Eastern Kentucky | Keeton Thornsberry | -34 |
| 233 | 1983.9234 | 6.821314 | 175 | 234 | 90187.0 | Villanova | Xian Shively | -59 |
| 234 | 1986.9235 | 6.822825 | 177 | 235 | 38113.0 | Butler | Jack McMahon | -58 |
| 235 | 2056.9236 | 6.857449 | 234 | 236 | 84563.0 | North Carolina | Luke Wiley | -2 |
236 rows × 8 columns
A traceplot of the performance_score by race place can be used to get a good feel of how the predictions performed throughout each place, the shape of the trend plot will be further analyzed in the report. A function to make the traceplot will be created so future traceplots can be analyzed if needed.
def raceDFtoTracePlot(raceDF): #creates a function which requires a race dataframe
import matplotlib.pyplot as plt
plt.bar(raceDF['place'], raceDF['performance_score'], color='black') #plotting the trace plot
#Adding Titles:
plt.title('Performance Score by Place')
plt.xlabel('Place')
plt.ylabel('Performance Score')
raceDFtoTracePlot(NCAA2023Champs)
A pandas data frame can be "Grouped By" or pivoted so that teams can be the emphasis from this race data frame. The smallest performance score, average performance score, median performance score, and maximum performance score for each time will be described by this pivot. The output will be saved as a csv and further analyzed in the report.
def raceGroupByTeamPS(raceDF):
import pandas as pd
groupedDF = pd.DataFrame() #declares an empty dataframe
betaDF = raceDF.groupby(['team']).min(numeric_only=True) #grouping by the min
groupedDF['Min'] = betaDF['performance_score']
betaDF = raceDF.groupby(['team']).median(numeric_only=True) #grouping by the median
groupedDF['Median'] = betaDF['performance_score']
betaDF = raceDF.groupby(['team']).mean(numeric_only=True) #grouping by the mean
groupedDF['Mean'] = betaDF['performance_score']
betaDF = raceDF.groupby(['team']).max(numeric_only=True) #grouping by the max
groupedDF['Max'] = betaDF['performance_score']
return groupedDF
NCAA2023Groupteams = raceGroupByTeamPS(NCAA2023Champs)
NCAA2023Groupteams
| Min | Median | Mean | Max | |
|---|---|---|---|---|
| team | ||||
| Air Force | -2 | 25.0 | 25.333333 | 60 |
| Alabama | -87 | -53.0 | -53.000000 | -19 |
| Arkansas | -170 | -7.0 | -20.714286 | 58 |
| BYU | -35 | 2.0 | 4.166667 | 48 |
| Boise State | 32 | 32.0 | 32.000000 | 32 |
| ... | ... | ... | ... | ... |
| Virginia | -151 | -37.0 | -32.857143 | 38 |
| Wake Forest | -94 | 40.0 | 19.714286 | 51 |
| Weber State | -14 | -14.0 | -14.000000 | -14 |
| Wingate | 87 | 87.0 | 87.000000 | 87 |
| Wisconsin | -114 | -0.5 | -5.666667 | 119 |
61 rows × 4 columns
NCAA2023Groupteams.to_csv('ResearchQuestion1.csv', index=True)
Research Question 2: After reading an individual's source page, can we examine if they have had significant improvment from year to year¶
With most runners spending three to five years in the NCAA system, the goal is to get faster and have better performances as you move throughout your college career. This next research question will have functions which allow for easy replicatability on any athlete. The top 3 athletes in the 2023 NCAA cross country meet will be saved as output to be analyzed in the report.
def RunnerDFAnalyzeByYear(runner_df): #defining the function, taking in the "runner_df" variable
import pandas as pd
runner_df['date'] = pd.to_datetime(runner_df['date'], errors='coerce') #make sure the date column is date type
runner_df['date'] = runner_df['date'].dt.year #converts the date to year
runner_df = runner_df.groupby(['date']).min() #groups by year
#eliminating unescceary columns:
runner_df.drop('time', axis=1, inplace=True)
runner_df.drop('race_weight_sig', axis=1, inplace=True)
runner_df.drop('significant', axis=1, inplace=True)
return runner_df
This function can be nested with the original LACCTiCRunnerToDF() function to quickly read,sort, and save the top 3 finishers
ResearchQuestion2ABlanks = RunnerDFAnalyzeByYear(LaccticRunnerToDF(53652))
ResearchQuestion2ABlanks.to_csv('ResearchQuestion2ABlanks.csv', index=True)
ResearchQuestion2ASamuel = RunnerDFAnalyzeByYear(LaccticRunnerToDF(89976))
ResearchQuestion2ASamuel.to_csv('ResearchQuestion2ASamuel.csv', index=True)
ResearchQuestion2ARobinson = RunnerDFAnalyzeByYear(LaccticRunnerToDF(38094))
ResearchQuestion2ARobinson.to_csv('ResearchQuestion2ARobinson.csv', index=True)
Returning a Panda's DF for: Graham Blanks Returning a Panda's DF for: Habtom Samuel Returning a Panda's DF for: Ky Robinson
Research Question 3: Which individuals at the national championship were the most consistent racers, and which indivudals had extreme (fast or slow) performance tics.¶
This final research question is the most complex as it requires joining the two different types of source documents together. This is the reason that runnerID is inputted as a column in a race data frame.
NCAA2023ChampsRunnerID = NCAA2023Champs['runner_id'] #saving only the runner_ids column
Using the LaccticRunnerToDF() function and a for loop, the entire column can be iterated and all the races each runner has run can be created into one dataframe.
import pandas as pd
AllRacesForNCAA2023Champs = pd.DataFrame() #creating an empty dataframe
for runner in NCAA2023ChampsRunnerID: #iterating through the column
iteratedRunner = pd.DataFrame() #creating a second empty dataframe, which gets cleared before every iteration starts
iteratedRunner = LaccticRunnerToDF(int(runner)) #calls the laccticRunnertoDF function and makes a dataframe
AllRacesForNCAA2023Champs = pd.concat([AllRacesForNCAA2023Champs, iteratedRunner], ignore_index=True) #combines the two dataframes
AllRacesForNCAA2023Champs
Returning a Panda's DF for: Graham Blanks Returning a Panda's DF for: Habtom Samuel Returning a Panda's DF for: Ky Robinson Returning a Panda's DF for: Denis Kipngetich Returning a Panda's DF for: Drew Bosley Returning a Panda's DF for: Nico Young Returning a Panda's DF for: Patrick Kiprop Returning a Panda's DF for: Brian Musau Returning a Panda's DF for: Parker Wolfe Returning a Panda's DF for: Fouad Messaoudi Returning a Panda's DF for: Devin Hart Returning a Panda's DF for: Victor Shitsama Returning a Panda's DF for: Kirami Yego Returning a Panda's DF for: Liam Murphy Returning a Panda's DF for: Alex Maier Returning a Panda's DF for: Sanele Masondo Returning a Panda's DF for: Alex Phillip Returning a Panda's DF for: Aaron Las-Heras Returning a Panda's DF for: Perry Mackinnon Returning a Panda's DF for: Victor Kiprop Returning a Panda's DF for: Santiago Prosser Returning a Panda's DF for: Jason Bowers Returning a Panda's DF for: Rodger Rivera Returning a Panda's DF for: Dylan Schubert Returning a Panda's DF for: Brodey Hasty Returning a Panda's DF for: Matthew Richtman Returning a Panda's DF for: Tom Brady Returning a Panda's DF for: Luke Combs Returning a Panda's DF for: Nicholas Bendtsen Returning a Panda's DF for: Nickolas Scudder Returning a Panda's DF for: Chris Devaney Returning a Panda's DF for: James Corrigan Returning a Panda's DF for: Evans Kiplagat Returning a Panda's DF for: Sam Lawler Returning a Panda's DF for: Kenneth Rooks Returning a Panda's DF for: Timothy Chesondin Returning a Panda's DF for: Rodgers Kiplimo Returning a Panda's DF for: Haftu Strintzos Returning a Panda's DF for: Austin Vancil Returning a Panda's DF for: David Mullarkey Returning a Panda's DF for: Jackson Sharp Returning a Panda's DF for: Ben Shearer Returning a Panda's DF for: Gable Sieperda Returning a Panda's DF for: Victor Kibiego Returning a Panda's DF for: Ethan Strand Returning a Panda's DF for: Creed Thompson Returning a Panda's DF for: Ben Rosa Returning a Panda's DF for: Ethan Coleman Returning a Panda's DF for: Robert DiDonato Returning a Panda's DF for: Adisu Guadia Returning a Panda's DF for: Brett Gardner Returning a Panda's DF for: Arturs Medveds Returning a Panda's DF for: Ben Perrin Returning a Panda's DF for: Jack Jennings Returning a Panda's DF for: Said Mechaal Returning a Panda's DF for: Joey Nokes Returning a Panda's DF for: Nicholas Russell Returning a Panda's DF for: Titus Cheruiyot Returning a Panda's DF for: Connor Nisbet Returning a Panda's DF for: Joe Hudson Returning a Panda's DF for: Will Anthony Returning a Panda's DF for: Corey Gorgas Returning a Panda's DF for: Lucas Bons Returning a Panda's DF for: Sean Maison Returning a Panda's DF for: Florian LePallec Returning a Panda's DF for: Cruz Gomez Returning a Panda's DF for: Eli Bennett Returning a Panda's DF for: Vincent Mauri Returning a Panda's DF for: Max Murphy Returning a Panda's DF for: Adam Spencer Returning a Panda's DF for: Hannes Burger Returning a Panda's DF for: Owen Smith Returning a Panda's DF for: Yasin Sado Returning a Panda's DF for: Lex Young Returning a Panda's DF for: Josh Truchon Returning a Panda's DF for: Matt Strangio Returning a Panda's DF for: Chandler Gibbens Returning a Panda's DF for: Jake Gebhardt Returning a Panda's DF for: Haftu Knight Returning a Panda's DF for: Matthew Forrester Returning a Panda's DF for: Joseph O'Brien Returning a Panda's DF for: Bradley Makuvire Returning a Panda's DF for: Bob Liking Returning a Panda's DF for: Theo Quax Returning a Panda's DF for: Will Muirhead Returning a Panda's DF for: Valentin Soca Returning a Panda's DF for: Myles Richter Returning a Panda's DF for: Assaf Harari Returning a Panda's DF for: Jesse Hamlin Returning a Panda's DF for: Jacob Lewis Returning a Panda's DF for: Joshua DeSouza Returning a Panda's DF for: Tyler Berg Returning a Panda's DF for: Isaac Alonzo Returning a Panda's DF for: Daniel O'Brien Returning a Panda's DF for: Cole Sprout Returning a Panda's DF for: Giedrius Valincius Returning a Panda's DF for: Acer Iverson Returning a Panda's DF for: Hillary Cheruiyot Returning a Panda's DF for: Michael Morgan Returning a Panda's DF for: Jacob McLeod Returning a Panda's DF for: Anthony Monte Returning a Panda's DF for: Toby Gualter Returning a Panda's DF for: Lachlan Wellington Returning a Panda's DF for: Emmanuel Sgouros Returning a Panda's DF for: Murphy Smith Returning a Panda's DF for: Abdel Laadjel Returning a Panda's DF for: Alex Comerford Returning a Panda's DF for: Davis Bove Returning a Panda's DF for: Abdelhakim Abouzouhir Returning a Panda's DF for: Nathan Lawler Returning a Panda's DF for: Wil Smith Returning a Panda's DF for: Daniel McGoey Returning a Panda's DF for: Nicholas Kiprotich Returning a Panda's DF for: Elliott Cook Returning a Panda's DF for: Taha Er-Raouy Returning a Panda's DF for: Owen MacKenzie Returning a Panda's DF for: Daniel Abdala Returning a Panda's DF for: Wes Porter Returning a Panda's DF for: Robert Cozean Returning a Panda's DF for: Bryce Cerkowniak Returning a Panda's DF for: Timothy Sindt Returning a Panda's DF for: Jason Renze Returning a Panda's DF for: Benjamin Godish Returning a Panda's DF for: Jona Bodirsky Returning a Panda's DF for: Charlie Sprott Returning a Panda's DF for: Gabriel Sanchez Returning a Panda's DF for: Parker Stokes Returning a Panda's DF for: Taonga Mbambo Returning a Panda's DF for: Evan Burke Returning a Panda's DF for: Eric Casarez Returning a Panda's DF for: Birhanu Harriman Returning a Panda's DF for: Matthew Farrell Returning a Panda's DF for: Micah Wilson Returning a Panda's DF for: Cooper Schroeder Returning a Panda's DF for: Nick Foster Returning a Panda's DF for: Andrew Nolan Returning a Panda's DF for: Luke Tewalt Returning a Panda's DF for: Levi Taylor Returning a Panda's DF for: Shane Brosnan Returning a Panda's DF for: Matias Reynaga Returning a Panda's DF for: Nathan Lopez Returning a Panda's DF for: Noah Hibbard Returning a Panda's DF for: Gavin Ehlers Returning a Panda's DF for: Jonas Gertsen Returning a Panda's DF for: Joaquin Campos Returning a Panda's DF for: Marco Langon Returning a Panda's DF for: Gitch Hayes Returning a Panda's DF for: Lukas Kiprop Returning a Panda's DF for: Michael Maiorano Returning a Panda's DF for: Peter Visser Returning a Panda's DF for: Dean Casey Returning a Panda's DF for: Paul Stafford Returning a Panda's DF for: Tyler Wirth Returning a Panda's DF for: Brandon Olden Returning a Panda's DF for: Quinn Gallagher Returning a Panda's DF for: Eli Nahom Returning a Panda's DF for: Jack Roberts Returning a Panda's DF for: Caleb Jarema Returning a Panda's DF for: Abel Teffra Returning a Panda's DF for: MacCallum Rowe Returning a Panda's DF for: William Zegarski Returning a Panda's DF for: Baidy Ba Returning a Panda's DF for: Matthew Neill Returning a Panda's DF for: Isaiah Givens Returning a Panda's DF for: Rob McManus Returning a Panda's DF for: Jayden Nats Returning a Panda's DF for: Nikodem Dworczak Returning a Panda's DF for: Connor Livingston Returning a Panda's DF for: Victor Neiva Returning a Panda's DF for: Carter Solomon Returning a Panda's DF for: Hunter Jones Returning a Panda's DF for: Nathan Mountain Returning a Panda's DF for: Luke Henseler Returning a Panda's DF for: Thomas Termote Returning a Panda's DF for: Jarrett Kirk Returning a Panda's DF for: Ezekiel Rop Returning a Panda's DF for: Sean Kay Returning a Panda's DF for: James Overberg Returning a Panda's DF for: Will Allen Returning a Panda's DF for: Luke Venhuizen Returning a Panda's DF for: Isaac Hedengren Returning a Panda's DF for: Henry Myers Returning a Panda's DF for: Sam Ells Returning a Panda's DF for: Aidan Ross Returning a Panda's DF for: Leo Young Returning a Panda's DF for: Teddy Buckley Returning a Panda's DF for: Yaseen Abdalla Returning a Panda's DF for: Rowen Ellenberg Returning a Panda's DF for: Damien Dilcher Returning a Panda's DF for: Ian Harrison Returning a Panda's DF for: Logan Law Returning a Panda's DF for: Zach Hughes Returning a Panda's DF for: David Slapak Returning a Panda's DF for: Nolan Hosbein Returning a Panda's DF for: Zach Stewart Returning a Panda's DF for: Sam Burgess Returning a Panda's DF for: Ryan Kredell Returning a Panda's DF for: Brian Kiptoo Returning a Panda's DF for: Lucas Guerra Returning a Panda's DF for: Rynard Swanepoel Returning a Panda's DF for: Jacob Hunter Returning a Panda's DF for: Jake Derouin Returning a Panda's DF for: Abdirizak Ibrahim Returning a Panda's DF for: Abraham Avila-Martinez Returning a Panda's DF for: Will Minnette Returning a Panda's DF for: Nick Soldevere Returning a Panda's DF for: Matthew Larkin Returning a Panda's DF for: Hudson Heikkinen Returning a Panda's DF for: Joe Ewing Returning a Panda's DF for: Samuel Field Returning a Panda's DF for: Pedro Marin Returning a Panda's DF for: Zachary Cloud Returning a Panda's DF for: Cooper Laird Returning a Panda's DF for: Paul Talens Returning a Panda's DF for: Jacob Nenow Returning a Panda's DF for: Ahmed Ibrahim Returning a Panda's DF for: Charlie North Returning a Panda's DF for: CJ Singleton Returning a Panda's DF for: Gary Martin Returning a Panda's DF for: Devon Comber Returning a Panda's DF for: Luke Ondracek Returning a Panda's DF for: Jonathan DeSouza Returning a Panda's DF for: Enock Kipchumba Returning a Panda's DF for: Thomas Chaston Returning a Panda's DF for: Silas Winders Returning a Panda's DF for: Zane Bergen Returning a Panda's DF for: Jonathan Carmin Returning a Panda's DF for: Silas Derfel Returning a Panda's DF for: Colton Sands Returning a Panda's DF for: Caleb Niednagel Returning a Panda's DF for: Harvey Cramb Returning a Panda's DF for: Justin Wachtel Returning a Panda's DF for: Keeton Thornsberry Returning a Panda's DF for: Xian Shively Returning a Panda's DF for: Jack McMahon Returning a Panda's DF for: Luke Wiley
| time | modern_tic | race_weight_sig | significant | race | date | race_id | runner_name | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1717.2001 | 6.687363 | 0.565449 | 1 | NCAA DI Championships | 2024-11-23 | 10175.0 | Graham Blanks |
| 1 | 1774.7001 | 6.719916 | 0.000000 | 0 | NCAA Division I Northeast Region Cross Country... | 2024-11-15 | 10112.0 | Graham Blanks |
| 2 | 1334.6001 | 6.693729 | 0.000000 | 0 | Ivy League Heptagonal Cross Country Championships | 2024-11-02 | 9900.0 | Graham Blanks |
| 3 | 1360.5002 | 6.692466 | 0.434551 | 1 | Wisconsin Pre-Nationals | 2024-10-19 | 9563.0 | Graham Blanks |
| 4 | 1717.7001 | 6.677224 | 0.402916 | 1 | NCAA Division I Cross Country Championships | 2023-11-18 | 8187.0 | Graham Blanks |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3731 | 1916.8133 | 6.816961 | 0.000000 | 0 | NCAA Division I Southeast Region Cross Country... | 2023-11-10 | 8125.0 | Luke Wiley |
| 3732 | 1497.2091 | 6.792148 | 0.000000 | 0 | ACC Cross Country Championships | 2023-10-27 | 7822.0 | Luke Wiley |
| 3733 | 1511.9222 | 6.780028 | 0.527606 | 1 | Nuttycombe Invite | 2023-10-13 | 7593.0 | Luke Wiley |
| 3734 | 1487.9146 | 6.775412 | 0.472394 | 1 | Virginia Invitational | 2023-09-23 | 7154.0 | Luke Wiley |
| 3735 | 906.8015 | 6.783454 | 0.000000 | 0 | Charlotte Opener | 2023-09-01 | 6664.0 | Luke Wiley |
3736 rows × 8 columns
Next the dataframe will be filtered so that only races that took place in the 2023 season, the target sample, will be used.
print(AllRacesForNCAA2023Champs['date'].dtype) #check the type of the date column
AllRacesForNCAA2023Champs['date'] = pd.to_datetime(AllRacesForNCAA2023Champs['date'], errors='coerce') #change the type to date
print(AllRacesForNCAA2023Champs['date'].dtype) #re-check the type of the date column
RacesForNCAA2023ChampsIn2023 = AllRacesForNCAA2023Champs[AllRacesForNCAA2023Champs['date'].dt.year == 2023] #only select 2023 races
RacesForNCAA2023ChampsIn2023
object datetime64[ns]
| time | modern_tic | race_weight_sig | significant | race | date | race_id | runner_name | |
|---|---|---|---|---|---|---|---|---|
| 4 | 1717.7001 | 6.677224 | 0.402916 | 1 | NCAA Division I Cross Country Championships | 2023-11-18 | 8187.0 | Graham Blanks |
| 5 | 1771.6001 | 6.734465 | 0.000000 | 0 | NCAA Division I Northeast Region Cross Country... | 2023-11-10 | 8113.0 | Graham Blanks |
| 6 | 1427.4001 | 6.716947 | 0.000000 | 0 | Ivy League Heptagonal Cross Country Championships | 2023-10-28 | 7901.0 | Graham Blanks |
| 7 | 1403.4001 | 6.705545 | 0.330531 | 1 | Nuttycombe Invite | 2023-10-13 | 7593.0 | Graham Blanks |
| 8 | 1406.0001 | 6.722057 | 0.000000 | 0 | Battle in Beantown | 2023-09-29 | 7260.0 | Graham Blanks |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3731 | 1916.8133 | 6.816961 | 0.000000 | 0 | NCAA Division I Southeast Region Cross Country... | 2023-11-10 | 8125.0 | Luke Wiley |
| 3732 | 1497.2091 | 6.792148 | 0.000000 | 0 | ACC Cross Country Championships | 2023-10-27 | 7822.0 | Luke Wiley |
| 3733 | 1511.9222 | 6.780028 | 0.527606 | 1 | Nuttycombe Invite | 2023-10-13 | 7593.0 | Luke Wiley |
| 3734 | 1487.9146 | 6.775412 | 0.472394 | 1 | Virginia Invitational | 2023-09-23 | 7154.0 | Luke Wiley |
| 3735 | 906.8015 | 6.783454 | 0.000000 | 0 | Charlotte Opener | 2023-09-01 | 6664.0 | Luke Wiley |
1234 rows × 8 columns
There are now 1234 races to join with the dataframe from the 2023 National championship. Before joining by name of runner, it should be made clear that all runners have unique names.
NCAA2023Champs['runner_name'].describe()
count 236 unique 236 top Graham Blanks freq 1 Name: runner_name, dtype: object
All names are unique, so the race data frame can be joined with the compilation of every race that these individuals ran in 2023
JoinedNCAA2023Results = NCAA2023Champs.merge(RacesForNCAA2023ChampsIn2023, how='right') #uses the merge command to join
JoinedNCAA2023Results
| time | modern_tic | predicted_place | place | runner_id | team | runner_name | performance_score | race_weight_sig | significant | race | date | race_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1717.7001 | 6.677224 | 9.0 | 1.0 | 53652.0 | Harvard | Graham Blanks | 8.0 | 0.402916 | 1 | NCAA Division I Cross Country Championships | 2023-11-18 | 8187.0 |
| 1 | 1771.6001 | 6.734465 | NaN | NaN | NaN | NaN | Graham Blanks | NaN | 0.000000 | 0 | NCAA Division I Northeast Region Cross Country... | 2023-11-10 | 8113.0 |
| 2 | 1427.4001 | 6.716947 | NaN | NaN | NaN | NaN | Graham Blanks | NaN | 0.000000 | 0 | Ivy League Heptagonal Cross Country Championships | 2023-10-28 | 7901.0 |
| 3 | 1403.4001 | 6.705545 | NaN | NaN | NaN | NaN | Graham Blanks | NaN | 0.330531 | 1 | Nuttycombe Invite | 2023-10-13 | 7593.0 |
| 4 | 1406.0001 | 6.722057 | NaN | NaN | NaN | NaN | Graham Blanks | NaN | 0.000000 | 0 | Battle in Beantown | 2023-09-29 | 7260.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1229 | 1916.8133 | 6.816961 | NaN | NaN | NaN | NaN | Luke Wiley | NaN | 0.000000 | 0 | NCAA Division I Southeast Region Cross Country... | 2023-11-10 | 8125.0 |
| 1230 | 1497.2091 | 6.792148 | NaN | NaN | NaN | NaN | Luke Wiley | NaN | 0.000000 | 0 | ACC Cross Country Championships | 2023-10-27 | 7822.0 |
| 1231 | 1511.9222 | 6.780028 | NaN | NaN | NaN | NaN | Luke Wiley | NaN | 0.527606 | 1 | Nuttycombe Invite | 2023-10-13 | 7593.0 |
| 1232 | 1487.9146 | 6.775412 | NaN | NaN | NaN | NaN | Luke Wiley | NaN | 0.472394 | 1 | Virginia Invitational | 2023-09-23 | 7154.0 |
| 1233 | 906.8015 | 6.783454 | NaN | NaN | NaN | NaN | Luke Wiley | NaN | 0.000000 | 0 | Charlotte Opener | 2023-09-01 | 6664.0 |
1234 rows × 13 columns
The "JoinedNCAA2023Results" data frame can now be grouped, ordered and sorted so that there is a strong indicator of each competitor's season via a quick glance.
FinalOutput = pd.DataFrame() #creating an empty dataframe
#grabbing the min LACCTiC score
minDF = JoinedNCAA2023Results.groupby(['runner_name']).min(numeric_only=True)
FinalOutput['min_tic'] = minDF['modern_tic']
#grabbing the median LACCTiC score
medDF = JoinedNCAA2023Results.groupby(['runner_name']).median(numeric_only=True)
FinalOutput['median_tic'] = medDF['modern_tic']
#grabbing the max LACCTiC score
maxDF = JoinedNCAA2023Results.groupby(['runner_name']).max(numeric_only=True)
FinalOutput['max_tic'] = maxDF['modern_tic']
#Finding the tic range.
FinalOutput['tic_range'] = FinalOutput['max_tic'] - FinalOutput['min_tic']
#adding the NCAA place of each runner
FinalOutput['NCAA_Place'] = maxDF['place']
#addingthe tic from NCAA's
FinalOutput['NCAA_tic']=NCAA2023Champs.set_index('runner_name', inplace=False)['modern_tic']
FinalOutput
| min_tic | median_tic | max_tic | tic_range | NCAA_Place | NCAA_tic | |
|---|---|---|---|---|---|---|
| runner_name | ||||||
| Aaron Las-Heras | 6.706710 | 6.713347 | 6.723592 | 0.016882 | 18.0 | 6.706710 |
| Abdel Laadjel | 6.735604 | 6.744722 | 6.748714 | 0.013110 | 106.0 | 6.743779 |
| Abdelhakim Abouzouhir | 6.738011 | 6.742990 | 6.750157 | 0.012145 | 109.0 | 6.744323 |
| Abdirizak Ibrahim | 6.733002 | 6.735203 | 6.783142 | 0.050141 | 203.0 | 6.783142 |
| Abel Teffra | 6.753284 | 6.761864 | 6.791276 | 0.037992 | 159.0 | 6.761864 |
| ... | ... | ... | ... | ... | ... | ... |
| Yasin Sado | 6.734332 | 6.746884 | 6.763103 | 0.028771 | 73.0 | 6.734419 |
| Zach Hughes | 6.752287 | 6.774722 | 6.785640 | 0.033353 | 192.0 | 6.777365 |
| Zach Stewart | 6.771251 | 6.778628 | 6.796119 | 0.024869 | 195.0 | 6.778628 |
| Zachary Cloud | 6.760277 | 6.765889 | 6.795584 | 0.035307 | 212.0 | 6.789355 |
| Zane Bergen | 6.756685 | 6.789057 | 6.817309 | 0.060624 | 226.0 | 6.811538 |
236 rows × 6 columns
This new dataframe can now be sorted by tic_range to see which racers were most consistent. This output will be saved
FinalOutput.sort_values(by='tic_range', axis=0, ascending=True, inplace=False)
| min_tic | median_tic | max_tic | tic_range | NCAA_Place | NCAA_tic | |
|---|---|---|---|---|---|---|
| runner_name | ||||||
| Corey Gorgas | 6.728850 | 6.730626 | 6.732070 | 0.003220 | 62.0 | 6.728850 |
| Adam Spencer | 6.729663 | 6.732603 | 6.735698 | 0.006035 | 70.0 | 6.732603 |
| Giedrius Valincius | 6.736004 | 6.741542 | 6.742293 | 0.006288 | 96.0 | 6.741542 |
| Evans Kiplagat | 6.713864 | 6.721170 | 6.723162 | 0.009298 | 33.0 | 6.713864 |
| Michael Morgan | 6.742252 | 6.746253 | 6.751858 | 0.009606 | 99.0 | 6.742252 |
| ... | ... | ... | ... | ... | ... | ... |
| Robert DiDonato | 6.723638 | 6.740148 | 6.800521 | 0.076883 | 49.0 | 6.723638 |
| Victor Shitsama | 6.699334 | 6.710528 | 6.780174 | 0.080840 | 12.0 | 6.699334 |
| Luke Wiley | 6.775412 | 6.787801 | 6.857449 | 0.082037 | 236.0 | 6.857449 |
| Jonas Gertsen | 6.748716 | 6.755286 | 6.832435 | 0.083719 | 144.0 | 6.755800 |
| Daniel McGoey | 6.739662 | 6.746786 | 6.848128 | 0.108466 | 112.0 | 6.745792 |
236 rows × 6 columns
ConsistentRunners = FinalOutput.sort_values(by='tic_range', axis=0, ascending=True, inplace=False)
ConsistentRunners.to_csv('ResearchQuestion3.csv', index=True)
Finally, a plot will be made which will this tic_range to be better visualized to accompany the findings from the output:
import matplotlib.pyplot as plt
sorted_by_place = JoinedNCAA2023Results.sort_values(by='place') #sorting the columns by the NCAA place
runner_names_sorted_by_place = sorted_by_place['runner_name'].unique() #getting the names of each runner
#arranging the data for box plots:
data = [sorted_by_place[sorted_by_place['runner_name'] == runner]['modern_tic'] for runner in runner_names_sorted_by_place]
#making the plot:
plt.figure(figsize=(16, 6)) #adjusting size to be appropiate
plt.boxplot(data, tick_labels=runner_names_sorted_by_place)
# adding labels:
plt.title('Box Plot of Modern TIC Range at NCAA 2023 XC Championship')
plt.xlabel('Runner Name (Ordered by Place)')
plt.ylabel('Modern TIC')
plt.xticks(rotation=90) #rotating x-axis
plt.show()