A Computer Science graduate and U.S. Navy Veteran based in the Dallas, TX area.
It’s Nice To Meet You!
Skills
Statistical Analysis. A/B testing. Predictive Analytics. Descriptive Statistics. Data Cleansing. Data Visualization. Data Exploration.
Software/Tools
Python (Pandas/NumPy/Seaborn). SQL. PostgreSQL. Microsoft Excel. Microsoft PowerPoint. Tableau. R.
Education
B.A. Computer Science, University of Rhode Island.
Data Analytics Certificate, Thinkful (Chegg Skills).
This dataset comes from The Center for World University Rankings (Kaggle), which provides information for top Universities in the World. There are 14 columns and 2,220 rows of data.
This dataset revolves around the assessment and comparison of universities globally throughout the years 2012-2015.
The data contains multiple columns, both qualitative and quantitative, ranging from nominal, discrete, and continuous variable types. Tools used: Jupyter Notebook, Python (Numpy, Pandas, Matplotlib, Seaborn), Excel, and PowerPoint.
There are two main questions I would like to answer using this dataset:
Hypothesis 1 - (Quality of Education):
Null Hypothesis (H0): There is no significant difference in the quality of education provided by higher-ranked and lower-ranked universities.
Alternative Hypothesis (H1): Higher-ranked universities provide better quality education than lower-ranked universities.
Hypothesis 2 - (Number of Publications):
Null Hypothesis (H0): There is no significant difference in the number of publications between higher-ranked and lower-ranked universities.
Alternative Hypothesis (H1): Higher-ranked universities have a greater number of publications than lower-ranked universities.
The data will be used to test the hypotheses in the following ways:
Using a heatmap with seaborn, we can determine the relationships between variables. We can see there are strong relationships between the publication rank and the world rank (0, 92)
Given these findings, both the Pearson correlation coefficient and the difference in means at the confidence interval provide evidence that supports the alternative hypothesis (H1). Higher-ranked universities appear to provide better quality education than lower-ranked universities, and there is a negative relationship between the quality of education and the ranking scores. Overall, we can reject the null hypothesis and conclude that there is a statistically significant difference between the two variables being tested.
Given that the calculated difference in means and the correlation coefficient both suggest a negative relationship between university ranking and the number of publications, the results do not support the alternative hypothesis (H1). Higher-ranked universities tend to have a lower number of publications than lower-ranked universities, and there is a negative relationship between the publications and the ranking scores. This conclusion is consistent with both the Pearson correlation coefficient and the calculated difference in means. Overall, we can accept the null hypothesis and conclude that there is a statistically significant difference between the two variables being tested.
For Hypothesis #1, the investigation into the relationship between university ranking scores and the quality of education has shown evidence in favor of the alternative hypothesis (H1).
Conversely, for Hypothesis #2, an analysis between university ranking scores and the number of publications has revealed that from both the calculated difference in means and the Pearson correlation coefficient, evidence reinforces the null hypothesis.
Focus on Quality of Education Alongside Other Factors: Although the publications are essential, universities should prioritize providing high-quality education. The analysis showed that world rank and education quality rank are not always directly related. This can be achieved by maintaining rigorous academic standards, investing in faculty development, promoting innovative teaching methods, and continuously assessing and improving educational programs.
Emphasize the importance of Publications: The correlation between publication rank and world rank implies that universities should focus on improving their research output and publications. This can be achieved by investing in research facilities, supporting faculty members in their research, and collaborating with other institutions.
This presentation analyzes the factors that drive home prices by providing data-driven insights for an investment bank. Variables that provide impact on prices will be identified in order to provide informed decisions on how to allocate dollars earmarked for investment into mortgage-backed securities.
This data set was a sample of 1,460 houses that were sold between 2006 and 2010 in Ames, Iowa.
There are 81 columns with a good distribution between numerical and categorical data.
This data was retrieved from Kaggle.
Find correlations between numerical data and the housing sale price
Used descriptive statistics, A/B testing with t-tests, PivotTables, and PivotCharts to provide insights into central tendencies and relationships between variables.
Assess the quality of the data, handling missing values, and detecting outliers.
Incorporate visualizations to identify patterns and relationships with data.
This project was done using Excel, splitting data into treatment and control groups, while utilizing pivot tables and the Data Analysis ToolPak to manipulate data.
Based on insights from using Exploratory Data Analysis, new columns of data were created in order to get the proper data needed for t-tests.
Confidence intervals were also calculated for the hypotheses in order to accept or reject the null.
The top neighborhood to bring in the highest average sales for homes is: North Ames, which accounted for 14.42% of total sales and generated a revenue of $32,815,593.
College Creek and Northridge Heights are other neighborhoods that could be of interest.
3 bedroom homes were sold the most, comprising of 55.07% of total home sales with a revenue of $145,569,724.
With 95% confidence, the difference in means is between $32,101 and $13,617.
Rejected the null that there is no significant difference in means between houses that have 2 bedrooms and those that have 3.
2 car garages sold the most consisting of 56.44% of homes sold with the revenue at $151,493,771.
With 95% confidence, the difference in means is between $73,817 and $90,507.
Rejected the null that there is no significant difference in means between houses with and without garages.
The average sale price was higher for homes with 2-3 fireplaces inside, but the most sales came from homes with 0-1 fireplace, with 47.25% (0) and 44.52% (1) of total home sales along with $97,518,723 (0) and $137,698,541 (1) in revenue.
With 95% confidence, the difference in means is between $68,087 and $82,044.
Rejected the null that there is no significant difference in means between houses that have fireplaces and those that don’t.
Higher quality homes (6-10 rating) sold more than lower quality homes, accounting for 63.15% of total home sales, and brought in $196,554,295 in revenue.
With 95% confidence, the difference in means is between $93,420 and $81,678.
Rejected the null that there is no significant difference in means between houses with a quality rating of 1-5 (low) and those with a quality rating of 6-10 (high)
Homes with Central Air sold more than homes without it, accounting for 93.49% of total home sales and bringing in $254,144,859 in revenue.
With 95% confidence, the difference in means is between $71,662 and $90,182.
Rejected the null that there is no significant difference in means between houses that have central air and those that do that.
Based on these findings, specific factors exert a significant influence on the prices of homes in the market. When making informed decisions about investing in mortgage-backed securities, it is crucial to take into account these factors and their impact on home prices.
Give priority to three-bedroom homes, as they have the highest sales volume and generate substantial revenue.
Pay close attention to neighborhoods like North Ames, College Creek, and Northridge Heights, which have shown promising sales performance.
Give preference to homes with Central Air Conditioning, as they are in high demand among buyers.
Consider properties with two-car garages, as they have demonstrated strong sales performance.
Evaluate homes with an overall quality rating of 6 to 10, as they attract a significant number of buyers.
Lariat Car Rental Company is a fictitious car rental company that has been featured in a few well-known television series, such as: Breaking Bad, Surpernatural, and Veronica Mars.
Lariat is committed to driving sustainable growth and optimizing its financial performance. With a strategic focus on profitability and cost efficiency, the company has established the following goals to align its operations:
Lariat Car Rental Company aims to achieve sustainable profitability, maintain a competitive edge in the market, and deliver exceptional value to its customers while lowering business costs.
Three strategies were created to meet these goals, they are listed below:
To deliver an interactive dashboard for the shareholders that demonstrates the strategies and their predictive results. This dashboard formulates profit optimization strategies using data analysis.
Below is an image that shows the completed Excel Dashboard.
Here is a simple visual comparing the profits of each of the calculated strategies compared to the current data pulled from 2018.
This visual shows the breakdown of the profits including the Total Revenue, Total Costs, including insurance, and the Total Profits made. Strategy 1 and 2 are close together in numbers, but you can see that Strategy 1 is just above Strategy 2. And although Strategy 3 made a lower revenue, the costs were significantly lower compared to the other strategies and almost the same as the baseline data from 2018, which results in Strategy 3 having a higher profit altogether.
This visual is a Scenario Summary created in Excel to show a simple table view of the numbers for each strategy compared to the baseline data in 2018.
All calculations were done using 25% of the current vehicle inventory or 25% of the current number of rentals to ensure a more accurate comparison.
The better recommendation would be Strategy 3, to remove (x) number of vehicles that bring in the lowest net revenue and replace them with an (x) number of vehicles that bring in the highest net revenue.
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'naep';
SELECT *
FROM naep
LIMIT 50;
SELECT state, COUNT(avg_math_4_score) AS count_4,
AVG(avg_math_4_score) AS avg_4,
MIN(avg_math_4_score) AS min_4,
MAX(avg_math_4_score) AS max_4
FROM naep
GROUP BY state
ORDER BY state;
SELECT state, COUNT(avg_math_4_score) AS count_4,
AVG(avg_math_4_score) AS avg_4,
MIN(avg_math_4_score) AS min_4,
MAX(avg_math_4_score) AS max_4
FROM naep
GROUP BY state
HAVING (MAX(avg_math_4_score) - MIN(avg_math_4_score)) > 30
ORDER BY state;
SELECT avg_math_4_score::float AS avg_score, state AS bottom_10_states
FROM naep
WHERE year = '2000'
ORDER BY avg_score
LIMIT 10;
SELECT ROUND(AVG(avg_math_4_score), 2) AS avg_math_4_score
FROM naep
WHERE year = '2000';
SELECT state AS below_250
FROM naep
WHERE avg_math_4_score < 250
AND year = '2000';
SELECT state AS scores_missing_y2000
FROM naep
WHERE avg_math_4_score IS NULL
AND year = '2000';
SELECT n.state, ROUND(n.avg_math_4_score, 2) AS avg_math_4_score, f.total_expenditure
FROM naep AS n
LEFT OUTER JOIN finance AS f ON n.id = f.id
WHERE n.year = '2000'
AND avg_math_4_score IS NOT NULL
ORDER BY f.total_expenditure DESC;
SELECT DISTINCT yearbuilt
FROM houseprices;
SELECT COUNT(DISTINCT mszoning) AS count_mszoning,
COUNT(DISTINCT mssubclass) AS count_mssubclass
FROM houseprices;
SELECT DISTINCT street, lotshape
FROM houseprices
ORDER BY street;
SELECT neighborhood, lotconfig, COUNT(lotconfig) AS count_lotconfig
FROM houseprices
GROUP BY neighborhood, lotconfig
ORDER BY neighborhood, count_lotconfig, lotconfig;
SELECT yearbuilt, ROUND(AVG(saleprice), 0)::float AS avg_saleprice
FROM houseprices
GROUP BY yearbuilt
ORDER BY yearbuilt DESC;
SELECT yearbuilt, ROUND(AVG(garagecars), 0)::float AS avg_garage
FROM houseprices
WHERE garagecars >= 1
GROUP BY yearbuilt;
SELECT yearbuilt, MAX(lotarea), COUNT(*)
FROM houseprices
WHERE garagecars = 0
GROUP BY yearbuilt;
SELECT yearbuilt, AVG(lotarea)::float AS avg_lot_per_year
FROM houseprices
GROUP BY yearbuilt
HAVING AVG(lotarea)::float < 10000
ORDER BY avg_lot_per_year DESC;
SELECT yearbuilt, COUNT(*)
FROM houseprices
WHERE lotarea BETWEEN 10000 AND 15000
GROUP BY yearbuilt
ORDER BY yearbuilt;
SELECT ROUND(AVG(overallqual), 0)::integer AS avg_quality,
COUNT(DISTINCT garagetype) AS garage_count, neighborhood
FROM houseprices
GROUP BY neighborhood
ORDER BY neighborhood;
SELECT ROUND(AVG(lotarea), 2)::float, yearbuilt
FROM houseprices
WHERE street != 'Grvl' AND lotconfig = 'Corner'
-- (or) WHERE street NOT IN('Grvl')
GROUP BY yearbuilt
HAVING AVG(lotarea) > 1000
ORDER BY yearbuilt;
SELECT owner_id, transaction_id, service
FROM owners AS o
INNER JOIN transactions AS t ON o.pet_id = t.pet_id
ORDER BY owner_id;
SELECT owner_id, o.pet_id, transaction_id, visits_count
FROM owners AS o
INNER JOIN transactions AS t ON o.pet_id = t.pet_id
INNER JOIN visits AS v ON t.pet_id = v.pet_id
ORDER BY transaction_id;
SELECT o.pet_id, size, visits_count AS num_visits
FROM owners AS o
INNER JOIN visits AS v ON o.pet_id = v.pet_id
WHERE visits_count >= 10
ORDER BY num_visits DESC;
SELECT owner_id
FROM owners
UNION
SELECT owner_id
FROM owners_2
ORDER BY owner_id;
SELECT owner_id, SUM(visits_count) AS num_visits
FROM owners AS o
INNER JOIN visits AS v ON o.pet_id = v.pet_id
GROUP BY owner_id
ORDER BY num_visits DESC
LIMIT 3;
SELECT o1.owner_id, transaction_id, date, service
FROM owners AS o1
LEFT OUTER JOIN transactions AS t ON o1.pet_id = t.pet_id
WHERE transaction_id IS NOT NULL
UNION ALL
SELECT o2.owner_id, transaction_id, date, service
FROM owners_2 AS o2
LEFT OUTER JOIN transactions AS t ON o2.pet_id = t.pet_id
WHERE transaction_id IS NOT NULL
ORDER BY date, transaction_id, owner_id DESC;
SELECT CONCAT(owner_id, ', ', o.pet_id) AS owner_pet, visits_count
FROM owners AS o
LEFT OUTER JOIN visits AS v ON o.pet_id = v.pet_id
WHERE visits_count >= 3
UNION ALL
SELECT CONCAT(owner_id, ', ', o2.pet_id) AS owner_pet, visits_count
FROM owners_2 AS o2
LEFT OUTER JOIN visits AS v ON o2.pet_id = v.pet_id
WHERE visits_count >= 3
ORDER BY visits_count DESC, owner_pet;
SELECT owner_id, v.pet_id, visits_count
FROM visits AS v
LEFT OUTER JOIN owners AS o ON v.pet_id = o.pet_id
ORDER BY visits_count DESC;
SELECT owner_id, t.pet_id, date, service
FROM transactions AS t
LEFT OUTER JOIN owners AS o ON t.pet_id = o.pet_id
WHERE date = '2019-06-17'
OR service = 'haircut'
ORDER BY date;
SELECT pet_id, service,
CASE
WHEN service = 'nails' THEN 'gift'
ELSE 'no gift'
END AS get_gift
FROM transactions
ORDER BY get_gift;
SELECT date, COUNT(transaction_id)
FROM transactions
WHERE date IN('2019-06-17', '2019-06-18')
GROUP BY date;
WITH all_owners AS
(
SELECT *
FROM owners
UNION ALL
SELECT *
FROM owners_2
)
SELECT size, COUNT(size) AS size_count
FROM all_owners
GROUP BY size
ORDER BY size DESC;
SELECT COUNT(*)
FROM purchases
SELECT COUNT(*) AS num_after_may
FROM purchases
WHERE EXTRACT(month FROM created_at) > 5;
SELECT name, state
FROM purchases
WHERE state = 'FL';
SELECT COUNT(*)
FROM purchases
WHERE zipcode = 11065;
SELECT COUNT(*)
FROM purchases
WHERE user_id BETWEEN 1 AND 10
AND state NOT IN('FL', 'GA');
SELECT COUNT(*)
FROM purchases
WHERE zipcode IN(99652, 11065, 66513);
SELECT COUNT(*)
FROM purchases
WHERE user_id BETWEEN 10 AND 50
AND user_id NOT IN(20, 30);
SELECT LEFT(UPPER(name), 5) AS upper, address, state, zipcode
FROM purchases
ORDER BY name;
SELECT CONCAT('name: ', name, '; date: ', DATE(created_at)) AS name_date
FROM purchases;
SELECT zipcode
FROM purchases
WHERE zipcode IS NULL
ORDER BY name;
SELECT user_id, created_at, name, address
FROM purchases
WHERE state = 'GA' AND user_id = 18
OR user_id = 20
ORDER BY created_at DESC;
SELECT name, address, state, zipcode
FROM purchases
WHERE state = 'IL'
OR (zipcode BETWEEN 30000 AND 80000
AND (LEFT(address, 2) = '23' OR LEFT(address, 2) = '12'))
AND state IN ('CO', 'TX', 'WY');
SELECT *
FROM purchases
ORDER BY zipcode DESC
LIMIT 10;
SELECT *
FROM purchases
WHERE LEFT(name, 1) = 'S'
-- (or) WHERE name LIKE 'S%'
ORDER BY name;
SELECT *
FROM purchases
WHERE LEFT(name, 1) = 'S' OR LEFT(name, 1) = 'T'
-- (or) WHERE name LIKE 'S%' OR name LIKE 'T%'
ORDER BY name;
SELECT COUNT(*)
FROM purchases
WHERE LENGTH(name) >= 15;
SELECT COUNT(*)
FROM purchases
WHERE LEFT(state, 1) > 'M'
AND zipcode > 50000
AND user_id > 10;
SELECT *
FROM purchases
WHERE name LIKE '__e%';
I am currently looking for opportunities as a Data Analyst.
With proficiency in dashboarding, performance tracking, Python (Pandas, NumPy, Seaborn), Excel, SQL, Tableau, and R to transform data into meaningful and easily understood visualizations and presentations, I enjoy leveraging data-driven insights to uncover meaningful patterns, optimize business strategies, and drive informed decision-making.
I grew up around the New England area in the U.S. but have traveled all over since my service in the U.S. Navy. Outside of data, I enjoy traveling, history, video games, and food!
View my Resume here!
Experience
2019-2022
Surveillance Officer
Reviewed security footage using Genetek and iTrak systems to report Daily Logs and notate incidents throughout the property.
Conducted routine audits on employees spanning 114 live table games to ensure compliance with regulations and protocols.
Analyzed visual information for over 4,200 slot machines and virtual table gaming machines with attention to detail to identify patterns and anomalies.
2017-2019
Security Officer
Collaborated with cross-functional teams to provide insights based on surveillance data on three different properties.
Prepared written reports of daily activities, observations, and incidents to ensure accurate documentation.
Monitored CCTV security footage for suspicious activity, trespassers, and potential threats.
2012-2016
AME2 - Aviation Structural Mechanic, Safety Equipment
Leveraged complex problem-solving skills to review relevant information and develop solutions on 12 F/A-18E/F Super Hornet naval aircraft.
Identified the cause of operating errors and provided efficient troubleshooting while using pertinent data and individual judgment to ensure compliance with naval laws, regulations, and standards.
Executed over 1,894 maintenance actions specializing in Environmental Control Systems, Life Support Systems, Egress / Ejection Seat maintenance, and basic corrosion control.
Contact