Resistance, Mean, Median, 5 Number Summary and BoxPlots

Resistance, Mean, Median, 5 Number Summary and BoxPlots


Hello, Mr. Tarrou again. We are going to now
start learning how to describe distributions with numbers as opposed to just describing
their shape, center, spread, and outliers with a visual inspection. We are going to
put some concrete numbers on those, more thoroughly go over the 5 number summary, and especially
after this video talk about how to and what does standard deviation mean and break it
down into very very small parts. So please if you would, if you would like an organized
set of notes for your class, if you don’t already have some hit pause and copy these.
Ok, so Mr. Tarrou back. I have written the next notes for the next section of the textbook.
Because of how you get my notes a lot of these vocabulary words, and even some of the formulas
and the descriptions, you would have seen in a previous video. But, to rehash, we are
going to describe distributions with numbers. One vocabulary word I have not emphasized
very much yet, even though I have said it, is this idea of resistant. Now, a resistant
measure is a measurement that is not strongly effected by outliers. And my exam is here.
Very simple example. There are four numbers. 1, 2, 3, 4are one, two, two, and three. If
we look at these four numbers we have a mean of two and a median of two. GREAT! Two measures
of center. They are equal because the distribution is symmetric…of my four simple numbers.
Now if I bring in the dark evil outlier of 100. Well, now my mean jumps up greatly. When
I take these five numbers and add them up and have to divide by 5, my mean varies very,
very much. It goes from to 2 to 21. That is a huge difference. The reason is why because
when you calculate mean, you have to use all of the data points. Well, when you add in
that 100, it gets thrown into the calculation. Whereas the median still stays a value of
2. Now to be resistant, the mean is not resistant, it changed greatly due to the outlier. The
median, excuse me, now is a resistant measure. It does not have to stay the same. The definition
of resistant is that it is not strongly effected by outliers. Now in intro to statistics we
actually have just a very small number of measurements that are resistant. We have median,
we have the IQR and it is calculated through the median process so the IQR is resistant,
and if you want to say the mode is resistant as well you can. Though that is not something
we normally worry too much about. But really there is only two measurements, if you want
to include mode there are 3 measurements in statistics that are resistant… median, the
IQR which uses the median in its calculation, and maybe if you want to consider mode. Everything
else in the book, all of the other formulas you are going to learn this year are going
to be not resistant. They are going to be heavily influenced and effected by outliers.
So the mean. The mean again is closest to the tail in a skewed distribution. Here is
the formula for calculating mean. You got this really scary looking Greek symbol in
it, a summation symbol. Don’t fret. You have learned and have known how to find mean for
years for now. You know to find mean, you just add all your numbers and divide by how
many you have. That is what this formula says. Add all of your individual, with the x sub
i, add all of your individual x’s and then when you are done, multiply that by 1/n or
divide by how many numbers you have. So statistics formulas are a lot scarier than they really
are. That Greek summation symbol (Sigma) just means add over, and over, and over, and over,
and over, again which is why love graphing calculators and computers now to help us with
our statistics. We just learn the concepts and learn how to analyze and let the computers
do the grunt work. x bar is the sample of the mean… or the mean of a sample. Mu is
the mean of a population. This will be give to use, basically never calculated by us in
intro to statistics. The measure of center for bell curves. If you have bell curve distribution,
a unimodal symmetric distribution you want to use the mean. That is the best measure
of center for symmetric unimodal distributions. Uh, means are not resistant. I seem to have
somebody at the door so I will be right back. Ok, so he came for a textbook. And we are
back. So the mean is closest to the tail if you have a skewed distribution, the formula,
the measure of center, the best choice if you have a bell curve, and again it is not
resistant. Median and IQR are the only resistant measures that we have and if you want to throw
in mode, that is fine. Ok. Median, a measure of center that we have talked about already.
Quite a bit. It used for non-normal data. It is the middle number or the average of
the two middle numbers. And the median is resistant like I just said twice now which
is why it is the best measure of center if you have non-normal data. Let’s key in on
this idea that the median is the middle number or the middle two numbers. We have had some
problems in the textbook about… if you have a group of people, say 10 people and their
average income is $40,000. Well, if I could tell you out of those ten people the average
income is $40,000 I could just take the $40,000, multiply it by 10 and say ok everyone make
400,000 dollars total. I can do that because when I say that the average is $40,000 I have
added all ten of those numbers up divided by 10. The middle number or the median value
only is calculated the middle number or the middle two numbers. So, if I told you that
ten people had made a median of 40,000 dollars of income… how much do the ten people make
in total? I don’t know. I only used one or two numbers to calculate the median. I don’t
know how small the other numbers are, the other what… 4 or 5 numbers. And I don’t
know how large the other 4 or 5 vales are. I say 4 or 5 because you can vary a little
bit in how you find the median and what you do from there. So, you need realize the difference
in the calculation in means and medians and how that can effect some of your questions
from the textbook. Again it is resistant. Quartiles. I gave you an example of that in
a previous video. If you want to find quartiles, take the low to high and find the median.
Then take the first half of your numbers and find the median, that is Q1. Take the second
half of your numbers and find the median, that is Q3. Again if you want an example of
that look at the previous video. Alright. So going over to some more summaries, we have
got the 5 number summary which I have just maybe mentioned in passing. I kind of forgot
what I have said by now. And range and percentiles. Some of these values again I have already
mentioned. We are moving into a graphical display of a box plot in a minute. So here
are some other concepts in section 1.2 of my textbook. You will also maybe have seen
these in other videos because I repeat myself a lot in the beginning of the year trying
to… You can’t give notes when no one has a clue about what you are talking about. Five
number summary. That is min, Q1, median, Q3, and max. The graphical display of the five
number summary is the box plot. The left whisker is the min. Where the box starts is the Q1
value, median, Q3, and the max. We are going to break that apart some more here in a minute.
Trying to make sure that I don’t run out of time. Your range, again I have said that a
number of times, that is the max minus the min value. The range is one single value.
So if your low is 20 and your highest is 100, your range is 80. Don’t say that your range
is between 20 and 100. I used to do that and got corrected eventually after a few years
of teaching. That was a little bit late, but anyway. Percentiles. You know if you take
the SAT’s and you score in the 95th percentile, then you scored better than 95 percent of
your fellow students. Graphical display, we just covered that. It is an Ogive graph or
a Relative Cumulative Frequency Distribution. So make sure you go and look those up. Let’s
talk a little bit about box plots. They are our graphical display of the five number summary.
They display it. Ok, great. And more time, your five number summary is what we use to
describe the shape of non-normal data. It is not written up here because I have not
talked about standard deviation a lot, but if it is bell curve… a symmetric unimodal
distribution your best description for shape is not the 5 number summary but mean and standard
deviation. We will talk about that later again. Box plots can show skewness, but not gaps.
So let me explain what I am talking about there. uh… Here we go. So here is a box
and whisker plot. Ok. We have talked about left skewed and how if it was left skewed
the data would just sort of taper off to the left and become more spread out. Does this
box plot look left skewed? The answer is no. Because these three quarters are all very
evenly spaced with just one long whisker. That is probably due to an outlier. To show
skewness, it would need to… and it will not be perfect. We always deal with real life
data so this will vary. But you are looking for your segments to kind of increase in width
as you either to the left or the opposite would be right skewed. This would be more
like a left skewed distribution with a tight upper 25 percent and as I work my way down
towards the minimum my gaps get farther and farther apart. This is skewness. One thing
that I have not written up here though is one of the very best uses of box plots. They
are not to analyze the shape of the data. That is what we have histograms for. No. If
you go into your calculator you can put data in one, two, or three lists and you can display
up to three of those box plots at once. You will also see examples in your textbook where
there are multiple box plots together, maybe up to four or five of them. You want to use
these box plots for comparing multiple distributions. You put those on the same x scale, usually
x, sometimes your book will put them vertically. When they are side by side then you can talk
about any comparisons or contrasts you see between these. Uh… With these being the
medians, maybe the data values that these two box plots now represent… that I made
up that don’t have any numbers… but the median values are the same but this grouping
of data has a much small range or a much small spread. Later when we are talking about standard
deviation, it has a much smaller standard deviation. All this box plot shows is that
the data is more spread out. This might actually have more numbers here than up in this graph.
Because cannot see how much data is displayed just how far it is spread out. This minimum
is smaller than this minimum. The maximum is larger than this maximum value. The medians
are about the same. Remember if you are describing distributions… shape, center, spread, and
outliers. I am just doing a quick overview. Well, I just repeated here what this says.
And I think I am running off screen. Let me see if I have any time left. Yes I do. So
one last concept before we call this video a wrap is modified box plots. Now we are going
to have a video… actually we have already done a video of how to find histograms and
box plots with our graphing calculators. So we definitely want to make sure we know how
to do that. But um, I have not introduced it in an official set of notes yet. Modified
box plots. Those are awesome because they take care of the IQR test for us which means
that they will show us the outliers with individual points on the side. I just talked about how
do you that IQR test. It is the IQR range which is Q3-Q1 times 1.5. You add that number
to Q3, you subtract that number from Q1 and you get this boundary… this imaginary boundary
you don’t want your data to fall outside of. Well, your calculator will do that for you
and when you see this box and whisker plot or box plot your whiskers will start and end
at the first or the last piece of data that actually passes the IQR test. So if the lower
bound is maybe 40 but the first piece of data that is greater than that lower bound is 50,
then that whisker is going to be at 50 not 40… not that calculated lower bound that
we set up by doing the IQR test. You definitely want to chose this from your calculator because
why not get more information from your graph automatically if you can. Well, we are going
to erase all of this and talk about standard deviation in a second and talk about properties
of standard deviation which is the most important measure of spread that we are going to talk
about this year in Intro to Statistics. BAM!

27 thoughts on “Resistance, Mean, Median, 5 Number Summary and BoxPlots

  • September 24, 2011 at 5:48 am
    Permalink

    Very nice bro. Good explanation A++++

    Reply
  • September 24, 2011 at 12:04 pm
    Permalink

    @Evousa Thank you very much. I am really enjoying making these videos this year. There will be many more to follow.

    Reply
  • December 15, 2011 at 8:04 pm
    Permalink

    @linday1ful Thank you! I hope you do great in your class. Please spread the word about my new math channel I am building:)

    Reply
  • January 30, 2012 at 8:49 pm
    Permalink

    I like the quality of this video

    Reply
  • January 30, 2012 at 9:01 pm
    Permalink

    @Starr169 Thanks.

    Reply
  • February 18, 2012 at 8:09 pm
    Permalink

    if i have non-normal data, which one of the below that i am able to use to perform a parametric statistic test?
    a) anova
    b) box-plots
    c) Correlation
    d) paired t-test of coefficients

    please help me!!! Thanks!!!

    Reply
  • February 18, 2012 at 9:23 pm
    Permalink

    @mypostedtestvideo321 I have only taught statistics from three editions of the same book over the years of teaching AP Statistics…and it does not use the vocabulary of parametric statistic test. So I don't really know how to answer your question. Anova is also not language used in my book. I will say the only thing that I teach that allows you to work around non-normality is the Central Limit Theorem…which states that with a large enough sample the distribution of averages will be…

    Reply
  • February 18, 2012 at 9:30 pm
    Permalink

    @mypostedtestvideo321 …approximately normal even if the population is not. Box-plots just show shape…Correlation measures strength and direction of a linear relationship…t-tests are a test for when you are working with averages and you are working with the sample standard deviation.

    Reply
  • February 22, 2012 at 8:13 pm
    Permalink

    @Angeliniwini Thanks for the positive feedback. I really appreciate it:) My book does not have the concept of extreme outliers. It sounds like you do Q3+3(IQR) for the upper bound and Q1-3(IQR) for the lower bound. I will look it up tomorrow and get back to you. Keep up the good work!

    Reply
  • April 4, 2012 at 7:40 pm
    Permalink

    Thanks. The necklace and then the pendent were 2 seperate gifts:)

    Reply
  • January 25, 2014 at 11:28 am
    Permalink

    First thx for all the great work u are doing but i am wondering about which book you are talking? Thx 🙂

    Reply
  • May 30, 2014 at 7:38 pm
    Permalink

    "Scary looking Greek symbol" – I like your humor. 😀 I've learned some useful things in this video including that the mean is the closest to the tail of a skewed distribution. 🙂

    Reply
  • September 10, 2014 at 3:05 am
    Permalink

    I'm so glad you have Statistics! I used your videos for Algebra, Algebra 2, and now Stats! Thanks to you, math doesn't seem as terrifying anymore. A sincere Thank You, Mr. Tarrou (:

    Reply
  • January 23, 2015 at 5:30 am
    Permalink

    You are an amazing teacher!  Simplifying what many complicate!  You make math enjoyable, as it should be!  Kudos to you Sir! 🙂  

    Reply
  • February 17, 2015 at 7:26 pm
    Permalink

    I just want to say that if not for your videos I would have really struggled in my precalc and trig classes. Thanks to you I passed both of those classes and now I'm taking Stats I and I have my first exam today and I feel like I will do well. Just by uploading these videos you have helped so many people. Thank YOU so much!

    Reply
  • February 19, 2015 at 3:49 pm
    Permalink

    I am so excited to see you do Statistics videos, too!  I watched your videos all through Pre Calculus, and I love your style of teaching!  Thank you!!

    Reply
  • February 22, 2015 at 5:19 am
    Permalink

    Thank you so much for making all of your videos. They are so helpful and clear! I was so confused about a lot of the statistics information on my upcoming test and your videos really clarified a lot for me. So thank you very much.
    Is there any possibility you could please do a video on frequency tables? (about marginal frequencies for two way tables, two-way contingency tables,  and conditional relative frequencies and association) ? It would be so greatly appreciated!
    Thanks again Professor! 

    Reply
  • March 31, 2015 at 10:08 pm
    Permalink

    I wish I knew you taught Stats; otherwise, I would have watched your videos when I had the class the first semester. I am glad I saw your channel for Statistics as I can easily prepare for the AP exam with your assistance as well as my prep books.
    Thank you so much 🙂

    Reply
  • August 25, 2015 at 4:30 am
    Permalink

    Can you be more specific with "Non-Normal" data?? Thanks!

    Reply
  • September 24, 2015 at 9:04 pm
    Permalink

    hello professor,Suppose we made 7 trips in day and calculated the mean value from given dataset of 7 trips, after the eighth trip of the day, the mean distance traveled remains unchanged. What is the distance of the eighth trip? and what about standard deviation of distance ,When we add the eighth trip to the data set, Does it increase, decrease or stay the same?

    Reply
  • July 19, 2016 at 3:17 pm
    Permalink

    what textbook do you use for your AP stat class?

    Reply
  • September 20, 2016 at 4:12 am
    Permalink

    I love this teacher .damn great lecture.

    Reply
  • May 7, 2017 at 6:28 pm
    Permalink

    subscribed to your channel! I wish I discovered them earlier. Will definitely recommend to my friends, thank you!

    Reply
  • May 11, 2017 at 12:19 am
    Permalink

    Watching your videos to study last minute for the AP exam tomorrow – such a big help!

    Reply
  • September 20, 2017 at 3:37 am
    Permalink

    Dear Prof Rob, I'm a new stats teacher. My book asks the students to consider if events are independent based on the numbers in a marginal distribution chart. One example was looking at the people who died on the titanic. We thought that if the relative frequencies were different, the data was independent. the author of the text said that was wrong. Because the percentages were different there was some interaction or a hidden influence that we couldn't see. Well, we all don't get it. Can you shed some light on this apparent backwards thinking?

    Reply
  • August 30, 2018 at 6:27 pm
    Permalink

    Prof Rob bob you didn’t jump to the front of the board in the beginning?? 😮

    Reply
  • September 10, 2018 at 1:39 am
    Permalink

    The wonderfully clear way by which you relay your knowledge is immensely helpful and the positive impact / motivation your videos produce feels priceless, and cannot be said enough.

    Thank you for doing soo much!!
    I felt like only saying ‘thank you’ wouldn’t quite cut it ^_^

    BAM!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *