Evaluating and Analyzing the LACCTiC API and Algorithim¶

Disclaimer: This is the source code which is meant to accompany the main report, all detailed analyses is explained in the report. Some packages (pandas and matplotlib are loaded twice (in function, and out of function).


When given the key for the API, the following instructions are also given: ``` Here is the main link to my API (the rest of this message is copy-pasted) https://c03mmwsf5i.execute-api.us-east-2.amazonaws.com/production/api_ranking/

The endpoints with "_page" roughly correspond to the pages on LACCTiC. You can look up "REST" frameworks to get a sense of how to navigate everything.

For a little example, https://c03mmwsf5i.execute-api.us-east-2.amazonaws.com/production/api_ranking/runner_page/ gives me a list of runners and their information. Because I have not specified a runner, I get a list of them. 75,000 would be a bit much all at once, so the results are "paginated." The attribute "next" in this class takes you to the link to the next page (which has a "?page=2" at the end of the URL).

Every runner, team, league, race etc has an "id" which is unique. To see a specific runner, you go to the same link as before but with the ID at the end of the URL. For example, https://c03mmwsf5i.execute-api.us-east-2.amazonaws.com/production/api_ranking/runner_page/12758/ gives the runner_page information for just Conner Mantz.

The data is stored in JSON format and there are three main data sources, each formatted with a different structure:
1. Runner pages - these pages show runner information and their results from each race they particiapted in
2. Race pages - these pages show results from a particular race
3. Team pages - these pages show roster information of any runner who has EVER raced for a particular team. (Team pages will be ignored for this analysis because needed team members for race will be within a given dataset for that particular race)

Creating Functions to read each source type¶

Since reading in each source type may become repetitive, functions can and will be created so that each runner and race can be loaded into a pandas dataframe by simply inputting the runner or race ID. Functions are used throughout the notebook so that replicatability for a different race or performance can be conducted in the future.

In [3]:
def LaccticRunnerToDF(runner_code): #creates a function which uses the LACCTiC runner code
    import urllib.request #for reading web data
    import json #for reading and converting JSON data
    import pandas as pd #for creating the pandas dataframe
    import io

    #takes the runner_code and inputs it into the url string and saves the link as runner_url
    runner_url = "https://c03mmwsf5i.execute-api.us-east-2.amazonaws.com/production/api_ranking/runner_page/"+str(runner_code)+"/?format=json"
    response = urllib.request.urlopen(runner_url) #opens the request and saves it as response
    json_string = response.read().decode('utf-8') #decodes the response in JSON format
    eq_parsed_json = json.loads(json_string) #parses the JSON into a dictionary
    runnerName = str(eq_parsed_json['firstname'] + ' ' + eq_parsed_json['lastname']) #saves the name of the runner
    seasonList = eq_parsed_json['season_ratings'] #creates a list of the season ratings from the dictionary
    
    races_df = pd.DataFrame() #creates an empty data frame
    
    for season in seasonList: #iterates through the list of season ratings, goes through season by season
        races = season['season_xc_performances'] #selects the season xc performances
        s = json.dumps(races) #parses the season xc performances
        season_df=pd.read_json(io.StringIO(s)) #loads the races from one season into a dataframe
        races_df = pd.concat([races_df, season_df], ignore_index=True) #appends that dataframe to the dataframe of all seasons

    #The race column in a dataframe is further nested and needs to be analyzed
    for index, row in races_df.iterrows():
        races_df.loc[index, 'date'] = row['race']['date'] #creates a date column from the race information
        races_df.loc[index, 'race_id'] = row['race']['id'] #creates a race ID column from the race information
        races_df.loc[index, 'race'] = row['race']['meet_name'] #creates a race name column from the race information
        races_df.loc[index, 'runner_name'] = runnerName #adds a runner name column, all the same, but can be useful for JOIN later

    #races_df.drop('significant', axis=1, inplace=True) #drops the dummy variable 1 if race has any significance
        

    print("Returning a Panda's DF for: " + runnerName) #prints to the console what the function is returning
    return(races_df) #returns a complete data frame of all races a runner has run.
In [4]:
def LaccticRaceToDF(race_code): #creates a function which uses the LACCTiC race code
    import urllib.request #for reading web data
    import json #for reading and converting JSON data
    import pandas as pd #for creating the pandas dataframe
    import io

    #takes the race_code and inputs it into the url string and saves the link as runner_url
    race_url = "https://c03mmwsf5i.execute-api.us-east-2.amazonaws.com/production/api_ranking/race_page/"+str(race_code)+"/?format=json"
    response = urllib.request.urlopen(race_url) #opens the request and saves it as response
    json_string = response.read().decode('utf-8') #decodes the response in JSON format
    eq_parsed_json = json.loads(json_string) #parses the JSON into a dictionary
    results = eq_parsed_json['xc_results'] #selects the results from the xc race
    s = json.dumps(results) #saves the results to s
    race_df = pd.read_json(io.StringIO(s)) #converts the results of the race to a pandas dataframe

    for index, row in race_df.iterrows(): #the runner column needs to be further parsed to get data
        race_df.loc[index, 'runner_id'] = row['runner']['id'] #gets the runner_id from the runner section
        race_df.loc[index, 'team'] = row['runner']['team']['name'] #gets the team name from the runner section
        race_df.loc[index, 'runner_name'] = row['runner']['firstname'] + " " + row['runner']['lastname'] #gets the runner name from the runner section

    race_df.drop('runner', axis=1, inplace=True) #drops the runner column after it is iterated over

    raceName = str(eq_parsed_json['meet_name']) #saves the name of the race
    print("Returning a Pandas DF of race results for: " + raceName) #prints to the console what the function is returning
    return(race_df) #returns a complete data frame of all races a runner has run.

Exploratory Research Questions¶

Using these functions to quickly load race and runner data, three exploratory research questions will be answered. These questions will be answered based off of the 2023 NCAA Cross Country Championship, for reasons outlined in the report.


**Research Question 1** (Race Source Based): Which teams performed the best and worst based on their predicted place by the LACCTiC alogrithm
**Research Question 2** (Runner Source Based): After reading an individual's source page, can we examine if they have had significant improvement from year to year
**Research Question 3** (*Joining* Runner and Race Sources): Which individuals at the national championship were the most consistent racers, and which indivudals had extreme (fast or slow) performance TiCs.

Research Question 1: Which teams performed the best and worst based on their predicted place by the LACCTiC Alogrithim¶

First the function for races can be called to load the race data into a dataframe.

In [6]:
NCAA2023Champs = LaccticRaceToDF(8187)
NCAA2023Champs
Returning a Pandas DF of race results for: NCAA Division I Cross Country Championships
Out[6]:
time modern_tic predicted_place place runner_id team runner_name
0 1717.7001 6.677224 9 1 53652.0 Harvard Graham Blanks
1 1720.7002 6.678969 8 2 89976.0 New Mexico Habtom Samuel
2 1735.7003 6.687649 5 3 38094.0 Stanford Ky Robinson
3 1739.7004 6.689951 2 4 90518.0 Oklahoma State Denis Kipngetich
4 1743.8005 6.692305 12 5 24322.0 Northern Arizona Drew Bosley
... ... ... ... ... ... ... ...
231 1982.7232 6.820709 171 232 61608.0 Virginia Justin Wachtel
232 1982.8233 6.820760 199 233 12417.0 Eastern Kentucky Keeton Thornsberry
233 1983.9234 6.821314 175 234 90187.0 Villanova Xian Shively
234 1986.9235 6.822825 177 235 38113.0 Butler Jack McMahon
235 2056.9236 6.857449 234 236 84563.0 North Carolina Luke Wiley

236 rows × 7 columns

A performance_score column will be added to the data frame, where a positive predicted score indicates a better perofmance. (For example: Habtom Samuel's performance_score of 6 means he performed 6 places better than expected) A negative performance_score means a runner performed worse than expected.

In [8]:
NCAA2023Champs['performance_score'] = NCAA2023Champs['predicted_place'] - NCAA2023Champs['place']
NCAA2023Champs
Out[8]:
time modern_tic predicted_place place runner_id team runner_name performance_score
0 1717.7001 6.677224 9 1 53652.0 Harvard Graham Blanks 8
1 1720.7002 6.678969 8 2 89976.0 New Mexico Habtom Samuel 6
2 1735.7003 6.687649 5 3 38094.0 Stanford Ky Robinson 2
3 1739.7004 6.689951 2 4 90518.0 Oklahoma State Denis Kipngetich -2
4 1743.8005 6.692305 12 5 24322.0 Northern Arizona Drew Bosley 7
... ... ... ... ... ... ... ... ...
231 1982.7232 6.820709 171 232 61608.0 Virginia Justin Wachtel -61
232 1982.8233 6.820760 199 233 12417.0 Eastern Kentucky Keeton Thornsberry -34
233 1983.9234 6.821314 175 234 90187.0 Villanova Xian Shively -59
234 1986.9235 6.822825 177 235 38113.0 Butler Jack McMahon -58
235 2056.9236 6.857449 234 236 84563.0 North Carolina Luke Wiley -2

236 rows × 8 columns

A traceplot of the performance_score by race place can be used to get a good feel of how the predictions performed throughout each place, the shape of the trend plot will be further analyzed in the report. A function to make the traceplot will be created so future traceplots can be analyzed if needed.

In [10]:
def raceDFtoTracePlot(raceDF): #creates a function which requires a race dataframe
    import matplotlib.pyplot as plt
    plt.bar(raceDF['place'], raceDF['performance_score'], color='black') #plotting the trace plot
    #Adding Titles:
    plt.title('Performance Score by Place')
    plt.xlabel('Place')
    plt.ylabel('Performance Score')

raceDFtoTracePlot(NCAA2023Champs)
No description has been provided for this image

A pandas data frame can be "Grouped By" or pivoted so that teams can be the emphasis from this race data frame. The smallest performance score, average performance score, median performance score, and maximum performance score for each time will be described by this pivot. The output will be saved as a csv and further analyzed in the report.

In [12]:
def raceGroupByTeamPS(raceDF):
    import pandas as pd
    groupedDF = pd.DataFrame() #declares an empty dataframe

    betaDF = raceDF.groupby(['team']).min(numeric_only=True) #grouping by the min
    groupedDF['Min'] = betaDF['performance_score']
    betaDF = raceDF.groupby(['team']).median(numeric_only=True) #grouping by the median
    groupedDF['Median'] = betaDF['performance_score'] 
    betaDF = raceDF.groupby(['team']).mean(numeric_only=True) #grouping by the mean
    groupedDF['Mean'] = betaDF['performance_score']
    betaDF = raceDF.groupby(['team']).max(numeric_only=True) #grouping by the max
    groupedDF['Max'] = betaDF['performance_score']
    return groupedDF

NCAA2023Groupteams = raceGroupByTeamPS(NCAA2023Champs)
NCAA2023Groupteams
Out[12]:
Min Median Mean Max
team
Air Force -2 25.0 25.333333 60
Alabama -87 -53.0 -53.000000 -19
Arkansas -170 -7.0 -20.714286 58
BYU -35 2.0 4.166667 48
Boise State 32 32.0 32.000000 32
... ... ... ... ...
Virginia -151 -37.0 -32.857143 38
Wake Forest -94 40.0 19.714286 51
Weber State -14 -14.0 -14.000000 -14
Wingate 87 87.0 87.000000 87
Wisconsin -114 -0.5 -5.666667 119

61 rows × 4 columns

In [13]:
NCAA2023Groupteams.to_csv('ResearchQuestion1.csv', index=True)

Research Question 2: After reading an individual's source page, can we examine if they have had significant improvment from year to year¶

With most runners spending three to five years in the NCAA system, the goal is to get faster and have better performances as you move throughout your college career. This next research question will have functions which allow for easy replicatability on any athlete. The top 3 athletes in the 2023 NCAA cross country meet will be saved as output to be analyzed in the report.

In [15]:
def RunnerDFAnalyzeByYear(runner_df): #defining the function, taking in the "runner_df" variable
    import pandas as pd
    runner_df['date'] = pd.to_datetime(runner_df['date'], errors='coerce') #make sure the date column is date type
    runner_df['date'] = runner_df['date'].dt.year #converts the date to year
    runner_df = runner_df.groupby(['date']).min() #groups by year
    #eliminating unescceary columns:
    runner_df.drop('time', axis=1, inplace=True)
    runner_df.drop('race_weight_sig', axis=1, inplace=True)
    runner_df.drop('significant', axis=1, inplace=True)
    return runner_df

This function can be nested with the original LACCTiCRunnerToDF() function to quickly read,sort, and save the top 3 finishers

In [17]:
ResearchQuestion2ABlanks = RunnerDFAnalyzeByYear(LaccticRunnerToDF(53652))
ResearchQuestion2ABlanks.to_csv('ResearchQuestion2ABlanks.csv', index=True)

ResearchQuestion2ASamuel = RunnerDFAnalyzeByYear(LaccticRunnerToDF(89976))
ResearchQuestion2ASamuel.to_csv('ResearchQuestion2ASamuel.csv', index=True)

ResearchQuestion2ARobinson = RunnerDFAnalyzeByYear(LaccticRunnerToDF(38094))
ResearchQuestion2ARobinson.to_csv('ResearchQuestion2ARobinson.csv', index=True)
Returning a Panda's DF for: Graham Blanks
Returning a Panda's DF for: Habtom Samuel
Returning a Panda's DF for: Ky Robinson

Research Question 3: Which individuals at the national championship were the most consistent racers, and which indivudals had extreme (fast or slow) performance tics.¶

This final research question is the most complex as it requires joining the two different types of source documents together. This is the reason that runnerID is inputted as a column in a race data frame.

In [20]:
NCAA2023ChampsRunnerID = NCAA2023Champs['runner_id'] #saving only the runner_ids column

Using the LaccticRunnerToDF() function and a for loop, the entire column can be iterated and all the races each runner has run can be created into one dataframe.

In [22]:
import pandas as pd
AllRacesForNCAA2023Champs = pd.DataFrame() #creating an empty dataframe

for runner in NCAA2023ChampsRunnerID: #iterating through the column
    iteratedRunner = pd.DataFrame() #creating a second empty dataframe, which gets cleared before every iteration starts
    iteratedRunner = LaccticRunnerToDF(int(runner)) #calls the laccticRunnertoDF function and makes a dataframe
    AllRacesForNCAA2023Champs = pd.concat([AllRacesForNCAA2023Champs, iteratedRunner], ignore_index=True) #combines the two dataframes
    
AllRacesForNCAA2023Champs
Returning a Panda's DF for: Graham Blanks
Returning a Panda's DF for: Habtom Samuel
Returning a Panda's DF for: Ky Robinson
Returning a Panda's DF for: Denis Kipngetich
Returning a Panda's DF for: Drew Bosley
Returning a Panda's DF for: Nico Young
Returning a Panda's DF for: Patrick Kiprop
Returning a Panda's DF for: Brian Musau
Returning a Panda's DF for: Parker Wolfe
Returning a Panda's DF for: Fouad Messaoudi
Returning a Panda's DF for: Devin Hart
Returning a Panda's DF for: Victor Shitsama
Returning a Panda's DF for: Kirami Yego
Returning a Panda's DF for: Liam Murphy
Returning a Panda's DF for: Alex Maier
Returning a Panda's DF for: Sanele Masondo
Returning a Panda's DF for: Alex Phillip
Returning a Panda's DF for: Aaron Las-Heras
Returning a Panda's DF for: Perry Mackinnon
Returning a Panda's DF for: Victor Kiprop
Returning a Panda's DF for: Santiago Prosser
Returning a Panda's DF for: Jason Bowers
Returning a Panda's DF for: Rodger Rivera
Returning a Panda's DF for: Dylan Schubert
Returning a Panda's DF for: Brodey Hasty
Returning a Panda's DF for: Matthew Richtman
Returning a Panda's DF for: Tom Brady
Returning a Panda's DF for: Luke Combs
Returning a Panda's DF for: Nicholas Bendtsen
Returning a Panda's DF for: Nickolas Scudder
Returning a Panda's DF for: Chris Devaney
Returning a Panda's DF for: James Corrigan
Returning a Panda's DF for: Evans Kiplagat
Returning a Panda's DF for: Sam Lawler
Returning a Panda's DF for: Kenneth Rooks
Returning a Panda's DF for: Timothy Chesondin
Returning a Panda's DF for: Rodgers Kiplimo
Returning a Panda's DF for: Haftu Strintzos
Returning a Panda's DF for: Austin Vancil
Returning a Panda's DF for: David Mullarkey
Returning a Panda's DF for: Jackson Sharp
Returning a Panda's DF for: Ben Shearer
Returning a Panda's DF for: Gable Sieperda
Returning a Panda's DF for: Victor Kibiego
Returning a Panda's DF for: Ethan Strand
Returning a Panda's DF for: Creed Thompson
Returning a Panda's DF for: Ben Rosa
Returning a Panda's DF for: Ethan Coleman
Returning a Panda's DF for: Robert DiDonato
Returning a Panda's DF for: Adisu Guadia
Returning a Panda's DF for: Brett Gardner
Returning a Panda's DF for: Arturs Medveds
Returning a Panda's DF for: Ben Perrin
Returning a Panda's DF for: Jack Jennings
Returning a Panda's DF for: Said Mechaal
Returning a Panda's DF for: Joey Nokes
Returning a Panda's DF for: Nicholas Russell
Returning a Panda's DF for: Titus Cheruiyot
Returning a Panda's DF for: Connor Nisbet
Returning a Panda's DF for: Joe Hudson
Returning a Panda's DF for: Will Anthony
Returning a Panda's DF for: Corey Gorgas
Returning a Panda's DF for: Lucas Bons
Returning a Panda's DF for: Sean Maison
Returning a Panda's DF for: Florian LePallec
Returning a Panda's DF for: Cruz Gomez
Returning a Panda's DF for: Eli Bennett
Returning a Panda's DF for: Vincent Mauri
Returning a Panda's DF for: Max Murphy
Returning a Panda's DF for: Adam Spencer
Returning a Panda's DF for: Hannes Burger
Returning a Panda's DF for: Owen Smith
Returning a Panda's DF for: Yasin Sado
Returning a Panda's DF for: Lex Young
Returning a Panda's DF for: Josh Truchon
Returning a Panda's DF for: Matt Strangio
Returning a Panda's DF for: Chandler Gibbens
Returning a Panda's DF for: Jake Gebhardt
Returning a Panda's DF for: Haftu Knight
Returning a Panda's DF for: Matthew Forrester
Returning a Panda's DF for: Joseph O'Brien
Returning a Panda's DF for: Bradley Makuvire
Returning a Panda's DF for: Bob Liking
Returning a Panda's DF for: Theo Quax
Returning a Panda's DF for: Will Muirhead
Returning a Panda's DF for: Valentin Soca
Returning a Panda's DF for: Myles Richter
Returning a Panda's DF for: Assaf Harari
Returning a Panda's DF for: Jesse Hamlin
Returning a Panda's DF for: Jacob Lewis
Returning a Panda's DF for: Joshua DeSouza
Returning a Panda's DF for: Tyler Berg
Returning a Panda's DF for: Isaac Alonzo
Returning a Panda's DF for: Daniel O'Brien
Returning a Panda's DF for: Cole Sprout
Returning a Panda's DF for: Giedrius Valincius
Returning a Panda's DF for: Acer Iverson
Returning a Panda's DF for: Hillary Cheruiyot
Returning a Panda's DF for: Michael Morgan
Returning a Panda's DF for: Jacob McLeod
Returning a Panda's DF for: Anthony Monte
Returning a Panda's DF for: Toby Gualter
Returning a Panda's DF for: Lachlan Wellington
Returning a Panda's DF for: Emmanuel Sgouros
Returning a Panda's DF for: Murphy Smith
Returning a Panda's DF for: Abdel Laadjel
Returning a Panda's DF for: Alex Comerford
Returning a Panda's DF for: Davis Bove
Returning a Panda's DF for: Abdelhakim Abouzouhir
Returning a Panda's DF for: Nathan Lawler
Returning a Panda's DF for: Wil Smith
Returning a Panda's DF for: Daniel McGoey
Returning a Panda's DF for: Nicholas Kiprotich
Returning a Panda's DF for: Elliott Cook
Returning a Panda's DF for: Taha Er-Raouy
Returning a Panda's DF for: Owen MacKenzie
Returning a Panda's DF for: Daniel Abdala
Returning a Panda's DF for: Wes Porter
Returning a Panda's DF for: Robert Cozean
Returning a Panda's DF for: Bryce Cerkowniak
Returning a Panda's DF for: Timothy Sindt
Returning a Panda's DF for: Jason Renze
Returning a Panda's DF for: Benjamin Godish
Returning a Panda's DF for: Jona Bodirsky
Returning a Panda's DF for: Charlie Sprott
Returning a Panda's DF for: Gabriel Sanchez
Returning a Panda's DF for: Parker Stokes
Returning a Panda's DF for: Taonga Mbambo
Returning a Panda's DF for: Evan Burke
Returning a Panda's DF for: Eric Casarez
Returning a Panda's DF for: Birhanu Harriman
Returning a Panda's DF for: Matthew Farrell
Returning a Panda's DF for: Micah Wilson
Returning a Panda's DF for: Cooper Schroeder
Returning a Panda's DF for: Nick Foster
Returning a Panda's DF for: Andrew Nolan
Returning a Panda's DF for: Luke Tewalt
Returning a Panda's DF for: Levi Taylor
Returning a Panda's DF for: Shane Brosnan
Returning a Panda's DF for: Matias Reynaga
Returning a Panda's DF for: Nathan Lopez
Returning a Panda's DF for: Noah Hibbard
Returning a Panda's DF for: Gavin Ehlers
Returning a Panda's DF for: Jonas Gertsen
Returning a Panda's DF for: Joaquin Campos
Returning a Panda's DF for: Marco Langon
Returning a Panda's DF for: Gitch Hayes
Returning a Panda's DF for: Lukas Kiprop
Returning a Panda's DF for: Michael Maiorano
Returning a Panda's DF for: Peter Visser
Returning a Panda's DF for: Dean Casey
Returning a Panda's DF for: Paul Stafford
Returning a Panda's DF for: Tyler Wirth
Returning a Panda's DF for: Brandon Olden
Returning a Panda's DF for: Quinn Gallagher
Returning a Panda's DF for: Eli Nahom
Returning a Panda's DF for: Jack Roberts
Returning a Panda's DF for: Caleb Jarema
Returning a Panda's DF for: Abel Teffra
Returning a Panda's DF for: MacCallum Rowe
Returning a Panda's DF for: William Zegarski
Returning a Panda's DF for: Baidy Ba
Returning a Panda's DF for: Matthew Neill
Returning a Panda's DF for: Isaiah Givens
Returning a Panda's DF for: Rob McManus
Returning a Panda's DF for: Jayden Nats
Returning a Panda's DF for: Nikodem Dworczak
Returning a Panda's DF for: Connor Livingston
Returning a Panda's DF for: Victor Neiva
Returning a Panda's DF for: Carter Solomon
Returning a Panda's DF for: Hunter Jones
Returning a Panda's DF for: Nathan Mountain
Returning a Panda's DF for: Luke Henseler
Returning a Panda's DF for: Thomas Termote
Returning a Panda's DF for: Jarrett Kirk
Returning a Panda's DF for: Ezekiel Rop
Returning a Panda's DF for: Sean Kay
Returning a Panda's DF for: James Overberg
Returning a Panda's DF for: Will Allen
Returning a Panda's DF for: Luke Venhuizen
Returning a Panda's DF for: Isaac Hedengren
Returning a Panda's DF for: Henry Myers
Returning a Panda's DF for: Sam Ells
Returning a Panda's DF for: Aidan Ross
Returning a Panda's DF for: Leo Young
Returning a Panda's DF for: Teddy Buckley
Returning a Panda's DF for: Yaseen Abdalla
Returning a Panda's DF for: Rowen Ellenberg
Returning a Panda's DF for: Damien Dilcher
Returning a Panda's DF for: Ian Harrison
Returning a Panda's DF for: Logan Law
Returning a Panda's DF for: Zach Hughes
Returning a Panda's DF for: David Slapak
Returning a Panda's DF for: Nolan Hosbein
Returning a Panda's DF for: Zach Stewart
Returning a Panda's DF for: Sam Burgess
Returning a Panda's DF for: Ryan Kredell
Returning a Panda's DF for: Brian Kiptoo
Returning a Panda's DF for: Lucas Guerra
Returning a Panda's DF for: Rynard Swanepoel
Returning a Panda's DF for: Jacob Hunter
Returning a Panda's DF for: Jake Derouin
Returning a Panda's DF for: Abdirizak Ibrahim
Returning a Panda's DF for: Abraham Avila-Martinez
Returning a Panda's DF for: Will Minnette
Returning a Panda's DF for: Nick Soldevere
Returning a Panda's DF for: Matthew Larkin
Returning a Panda's DF for: Hudson Heikkinen
Returning a Panda's DF for: Joe Ewing
Returning a Panda's DF for: Samuel Field
Returning a Panda's DF for: Pedro Marin
Returning a Panda's DF for: Zachary Cloud
Returning a Panda's DF for: Cooper Laird
Returning a Panda's DF for: Paul Talens
Returning a Panda's DF for: Jacob Nenow
Returning a Panda's DF for: Ahmed Ibrahim
Returning a Panda's DF for: Charlie North
Returning a Panda's DF for: CJ Singleton
Returning a Panda's DF for: Gary Martin
Returning a Panda's DF for: Devon Comber
Returning a Panda's DF for: Luke Ondracek
Returning a Panda's DF for: Jonathan DeSouza
Returning a Panda's DF for: Enock Kipchumba
Returning a Panda's DF for: Thomas Chaston
Returning a Panda's DF for: Silas Winders
Returning a Panda's DF for: Zane Bergen
Returning a Panda's DF for: Jonathan Carmin
Returning a Panda's DF for: Silas Derfel
Returning a Panda's DF for: Colton Sands
Returning a Panda's DF for: Caleb Niednagel
Returning a Panda's DF for: Harvey Cramb
Returning a Panda's DF for: Justin Wachtel
Returning a Panda's DF for: Keeton Thornsberry
Returning a Panda's DF for: Xian Shively
Returning a Panda's DF for: Jack McMahon
Returning a Panda's DF for: Luke Wiley
Out[22]:
time modern_tic race_weight_sig significant race date race_id runner_name
0 1717.2001 6.687363 0.565449 1 NCAA DI Championships 2024-11-23 10175.0 Graham Blanks
1 1774.7001 6.719916 0.000000 0 NCAA Division I Northeast Region Cross Country... 2024-11-15 10112.0 Graham Blanks
2 1334.6001 6.693729 0.000000 0 Ivy League Heptagonal Cross Country Championships 2024-11-02 9900.0 Graham Blanks
3 1360.5002 6.692466 0.434551 1 Wisconsin Pre-Nationals 2024-10-19 9563.0 Graham Blanks
4 1717.7001 6.677224 0.402916 1 NCAA Division I Cross Country Championships 2023-11-18 8187.0 Graham Blanks
... ... ... ... ... ... ... ... ...
3731 1916.8133 6.816961 0.000000 0 NCAA Division I Southeast Region Cross Country... 2023-11-10 8125.0 Luke Wiley
3732 1497.2091 6.792148 0.000000 0 ACC Cross Country Championships 2023-10-27 7822.0 Luke Wiley
3733 1511.9222 6.780028 0.527606 1 Nuttycombe Invite 2023-10-13 7593.0 Luke Wiley
3734 1487.9146 6.775412 0.472394 1 Virginia Invitational 2023-09-23 7154.0 Luke Wiley
3735 906.8015 6.783454 0.000000 0 Charlotte Opener 2023-09-01 6664.0 Luke Wiley

3736 rows × 8 columns

Next the dataframe will be filtered so that only races that took place in the 2023 season, the target sample, will be used.

In [24]:
print(AllRacesForNCAA2023Champs['date'].dtype) #check the type of the date column

AllRacesForNCAA2023Champs['date'] = pd.to_datetime(AllRacesForNCAA2023Champs['date'], errors='coerce') #change the type to date
print(AllRacesForNCAA2023Champs['date'].dtype) #re-check the type of the date column
RacesForNCAA2023ChampsIn2023 = AllRacesForNCAA2023Champs[AllRacesForNCAA2023Champs['date'].dt.year == 2023] #only select 2023 races

RacesForNCAA2023ChampsIn2023
object
datetime64[ns]
Out[24]:
time modern_tic race_weight_sig significant race date race_id runner_name
4 1717.7001 6.677224 0.402916 1 NCAA Division I Cross Country Championships 2023-11-18 8187.0 Graham Blanks
5 1771.6001 6.734465 0.000000 0 NCAA Division I Northeast Region Cross Country... 2023-11-10 8113.0 Graham Blanks
6 1427.4001 6.716947 0.000000 0 Ivy League Heptagonal Cross Country Championships 2023-10-28 7901.0 Graham Blanks
7 1403.4001 6.705545 0.330531 1 Nuttycombe Invite 2023-10-13 7593.0 Graham Blanks
8 1406.0001 6.722057 0.000000 0 Battle in Beantown 2023-09-29 7260.0 Graham Blanks
... ... ... ... ... ... ... ... ...
3731 1916.8133 6.816961 0.000000 0 NCAA Division I Southeast Region Cross Country... 2023-11-10 8125.0 Luke Wiley
3732 1497.2091 6.792148 0.000000 0 ACC Cross Country Championships 2023-10-27 7822.0 Luke Wiley
3733 1511.9222 6.780028 0.527606 1 Nuttycombe Invite 2023-10-13 7593.0 Luke Wiley
3734 1487.9146 6.775412 0.472394 1 Virginia Invitational 2023-09-23 7154.0 Luke Wiley
3735 906.8015 6.783454 0.000000 0 Charlotte Opener 2023-09-01 6664.0 Luke Wiley

1234 rows × 8 columns

There are now 1234 races to join with the dataframe from the 2023 National championship. Before joining by name of runner, it should be made clear that all runners have unique names.

In [26]:
NCAA2023Champs['runner_name'].describe()
Out[26]:
count               236
unique              236
top       Graham Blanks
freq                  1
Name: runner_name, dtype: object

All names are unique, so the race data frame can be joined with the compilation of every race that these individuals ran in 2023

In [28]:
JoinedNCAA2023Results = NCAA2023Champs.merge(RacesForNCAA2023ChampsIn2023, how='right') #uses the merge command to join
JoinedNCAA2023Results
Out[28]:
time modern_tic predicted_place place runner_id team runner_name performance_score race_weight_sig significant race date race_id
0 1717.7001 6.677224 9.0 1.0 53652.0 Harvard Graham Blanks 8.0 0.402916 1 NCAA Division I Cross Country Championships 2023-11-18 8187.0
1 1771.6001 6.734465 NaN NaN NaN NaN Graham Blanks NaN 0.000000 0 NCAA Division I Northeast Region Cross Country... 2023-11-10 8113.0
2 1427.4001 6.716947 NaN NaN NaN NaN Graham Blanks NaN 0.000000 0 Ivy League Heptagonal Cross Country Championships 2023-10-28 7901.0
3 1403.4001 6.705545 NaN NaN NaN NaN Graham Blanks NaN 0.330531 1 Nuttycombe Invite 2023-10-13 7593.0
4 1406.0001 6.722057 NaN NaN NaN NaN Graham Blanks NaN 0.000000 0 Battle in Beantown 2023-09-29 7260.0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
1229 1916.8133 6.816961 NaN NaN NaN NaN Luke Wiley NaN 0.000000 0 NCAA Division I Southeast Region Cross Country... 2023-11-10 8125.0
1230 1497.2091 6.792148 NaN NaN NaN NaN Luke Wiley NaN 0.000000 0 ACC Cross Country Championships 2023-10-27 7822.0
1231 1511.9222 6.780028 NaN NaN NaN NaN Luke Wiley NaN 0.527606 1 Nuttycombe Invite 2023-10-13 7593.0
1232 1487.9146 6.775412 NaN NaN NaN NaN Luke Wiley NaN 0.472394 1 Virginia Invitational 2023-09-23 7154.0
1233 906.8015 6.783454 NaN NaN NaN NaN Luke Wiley NaN 0.000000 0 Charlotte Opener 2023-09-01 6664.0

1234 rows × 13 columns

The "JoinedNCAA2023Results" data frame can now be grouped, ordered and sorted so that there is a strong indicator of each competitor's season via a quick glance.

In [30]:
FinalOutput = pd.DataFrame() #creating an empty dataframe

#grabbing the min LACCTiC score
minDF = JoinedNCAA2023Results.groupby(['runner_name']).min(numeric_only=True) 
FinalOutput['min_tic'] = minDF['modern_tic']

#grabbing the median LACCTiC score
medDF = JoinedNCAA2023Results.groupby(['runner_name']).median(numeric_only=True) 
FinalOutput['median_tic'] = medDF['modern_tic']

#grabbing the max LACCTiC score
maxDF = JoinedNCAA2023Results.groupby(['runner_name']).max(numeric_only=True) 
FinalOutput['max_tic'] = maxDF['modern_tic']

#Finding the tic range.
FinalOutput['tic_range'] = FinalOutput['max_tic'] - FinalOutput['min_tic']

#adding the NCAA place of each runner
FinalOutput['NCAA_Place'] = maxDF['place']

#addingthe tic from NCAA's
FinalOutput['NCAA_tic']=NCAA2023Champs.set_index('runner_name', inplace=False)['modern_tic']

FinalOutput 
Out[30]:
min_tic median_tic max_tic tic_range NCAA_Place NCAA_tic
runner_name
Aaron Las-Heras 6.706710 6.713347 6.723592 0.016882 18.0 6.706710
Abdel Laadjel 6.735604 6.744722 6.748714 0.013110 106.0 6.743779
Abdelhakim Abouzouhir 6.738011 6.742990 6.750157 0.012145 109.0 6.744323
Abdirizak Ibrahim 6.733002 6.735203 6.783142 0.050141 203.0 6.783142
Abel Teffra 6.753284 6.761864 6.791276 0.037992 159.0 6.761864
... ... ... ... ... ... ...
Yasin Sado 6.734332 6.746884 6.763103 0.028771 73.0 6.734419
Zach Hughes 6.752287 6.774722 6.785640 0.033353 192.0 6.777365
Zach Stewart 6.771251 6.778628 6.796119 0.024869 195.0 6.778628
Zachary Cloud 6.760277 6.765889 6.795584 0.035307 212.0 6.789355
Zane Bergen 6.756685 6.789057 6.817309 0.060624 226.0 6.811538

236 rows × 6 columns

This new dataframe can now be sorted by tic_range to see which racers were most consistent. This output will be saved

In [32]:
FinalOutput.sort_values(by='tic_range', axis=0, ascending=True, inplace=False)
Out[32]:
min_tic median_tic max_tic tic_range NCAA_Place NCAA_tic
runner_name
Corey Gorgas 6.728850 6.730626 6.732070 0.003220 62.0 6.728850
Adam Spencer 6.729663 6.732603 6.735698 0.006035 70.0 6.732603
Giedrius Valincius 6.736004 6.741542 6.742293 0.006288 96.0 6.741542
Evans Kiplagat 6.713864 6.721170 6.723162 0.009298 33.0 6.713864
Michael Morgan 6.742252 6.746253 6.751858 0.009606 99.0 6.742252
... ... ... ... ... ... ...
Robert DiDonato 6.723638 6.740148 6.800521 0.076883 49.0 6.723638
Victor Shitsama 6.699334 6.710528 6.780174 0.080840 12.0 6.699334
Luke Wiley 6.775412 6.787801 6.857449 0.082037 236.0 6.857449
Jonas Gertsen 6.748716 6.755286 6.832435 0.083719 144.0 6.755800
Daniel McGoey 6.739662 6.746786 6.848128 0.108466 112.0 6.745792

236 rows × 6 columns

In [33]:
ConsistentRunners = FinalOutput.sort_values(by='tic_range', axis=0, ascending=True, inplace=False)
ConsistentRunners.to_csv('ResearchQuestion3.csv', index=True)

Finally, a plot will be made which will this tic_range to be better visualized to accompany the findings from the output:

In [35]:
import matplotlib.pyplot as plt

sorted_by_place = JoinedNCAA2023Results.sort_values(by='place') #sorting the columns by the NCAA place
runner_names_sorted_by_place = sorted_by_place['runner_name'].unique() #getting the names of each runner
#arranging the data for box plots:
data = [sorted_by_place[sorted_by_place['runner_name'] == runner]['modern_tic'] for runner in runner_names_sorted_by_place] 

#making the plot:
plt.figure(figsize=(16, 6)) #adjusting size to be appropiate
plt.boxplot(data, tick_labels=runner_names_sorted_by_place)
# adding labels:
plt.title('Box Plot of Modern TIC Range at NCAA 2023 XC Championship')
plt.xlabel('Runner Name (Ordered by Place)')
plt.ylabel('Modern TIC')
plt.xticks(rotation=90) #rotating x-axis
plt.show()
No description has been provided for this image