Hello, Mr. Tarrou again. We are going to now

start learning how to describe distributions with numbers as opposed to just describing

their shape, center, spread, and outliers with a visual inspection. We are going to

put some concrete numbers on those, more thoroughly go over the 5 number summary, and especially

after this video talk about how to and what does standard deviation mean and break it

down into very very small parts. So please if you would, if you would like an organized

set of notes for your class, if you don’t already have some hit pause and copy these.

Ok, so Mr. Tarrou back. I have written the next notes for the next section of the textbook.

Because of how you get my notes a lot of these vocabulary words, and even some of the formulas

and the descriptions, you would have seen in a previous video. But, to rehash, we are

going to describe distributions with numbers. One vocabulary word I have not emphasized

very much yet, even though I have said it, is this idea of resistant. Now, a resistant

measure is a measurement that is not strongly effected by outliers. And my exam is here.

Very simple example. There are four numbers. 1, 2, 3, 4are one, two, two, and three. If

we look at these four numbers we have a mean of two and a median of two. GREAT! Two measures

of center. They are equal because the distribution is symmetric…of my four simple numbers.

Now if I bring in the dark evil outlier of 100. Well, now my mean jumps up greatly. When

I take these five numbers and add them up and have to divide by 5, my mean varies very,

very much. It goes from to 2 to 21. That is a huge difference. The reason is why because

when you calculate mean, you have to use all of the data points. Well, when you add in

that 100, it gets thrown into the calculation. Whereas the median still stays a value of

2. Now to be resistant, the mean is not resistant, it changed greatly due to the outlier. The

median, excuse me, now is a resistant measure. It does not have to stay the same. The definition

of resistant is that it is not strongly effected by outliers. Now in intro to statistics we

actually have just a very small number of measurements that are resistant. We have median,

we have the IQR and it is calculated through the median process so the IQR is resistant,

and if you want to say the mode is resistant as well you can. Though that is not something

we normally worry too much about. But really there is only two measurements, if you want

to include mode there are 3 measurements in statistics that are resistant… median, the

IQR which uses the median in its calculation, and maybe if you want to consider mode. Everything

else in the book, all of the other formulas you are going to learn this year are going

to be not resistant. They are going to be heavily influenced and effected by outliers.

So the mean. The mean again is closest to the tail in a skewed distribution. Here is

the formula for calculating mean. You got this really scary looking Greek symbol in

it, a summation symbol. Don’t fret. You have learned and have known how to find mean for

years for now. You know to find mean, you just add all your numbers and divide by how

many you have. That is what this formula says. Add all of your individual, with the x sub

i, add all of your individual x’s and then when you are done, multiply that by 1/n or

divide by how many numbers you have. So statistics formulas are a lot scarier than they really

are. That Greek summation symbol (Sigma) just means add over, and over, and over, and over,

and over, again which is why love graphing calculators and computers now to help us with

our statistics. We just learn the concepts and learn how to analyze and let the computers

do the grunt work. x bar is the sample of the mean… or the mean of a sample. Mu is

the mean of a population. This will be give to use, basically never calculated by us in

intro to statistics. The measure of center for bell curves. If you have bell curve distribution,

a unimodal symmetric distribution you want to use the mean. That is the best measure

of center for symmetric unimodal distributions. Uh, means are not resistant. I seem to have

somebody at the door so I will be right back. Ok, so he came for a textbook. And we are

back. So the mean is closest to the tail if you have a skewed distribution, the formula,

the measure of center, the best choice if you have a bell curve, and again it is not

resistant. Median and IQR are the only resistant measures that we have and if you want to throw

in mode, that is fine. Ok. Median, a measure of center that we have talked about already.

Quite a bit. It used for non-normal data. It is the middle number or the average of

the two middle numbers. And the median is resistant like I just said twice now which

is why it is the best measure of center if you have non-normal data. Let’s key in on

this idea that the median is the middle number or the middle two numbers. We have had some

problems in the textbook about… if you have a group of people, say 10 people and their

average income is $40,000. Well, if I could tell you out of those ten people the average

income is $40,000 I could just take the $40,000, multiply it by 10 and say ok everyone make

400,000 dollars total. I can do that because when I say that the average is $40,000 I have

added all ten of those numbers up divided by 10. The middle number or the median value

only is calculated the middle number or the middle two numbers. So, if I told you that

ten people had made a median of 40,000 dollars of income… how much do the ten people make

in total? I don’t know. I only used one or two numbers to calculate the median. I don’t

know how small the other numbers are, the other what… 4 or 5 numbers. And I don’t

know how large the other 4 or 5 vales are. I say 4 or 5 because you can vary a little

bit in how you find the median and what you do from there. So, you need realize the difference

in the calculation in means and medians and how that can effect some of your questions

from the textbook. Again it is resistant. Quartiles. I gave you an example of that in

a previous video. If you want to find quartiles, take the low to high and find the median.

Then take the first half of your numbers and find the median, that is Q1. Take the second

half of your numbers and find the median, that is Q3. Again if you want an example of

that look at the previous video. Alright. So going over to some more summaries, we have

got the 5 number summary which I have just maybe mentioned in passing. I kind of forgot

what I have said by now. And range and percentiles. Some of these values again I have already

mentioned. We are moving into a graphical display of a box plot in a minute. So here

are some other concepts in section 1.2 of my textbook. You will also maybe have seen

these in other videos because I repeat myself a lot in the beginning of the year trying

to… You can’t give notes when no one has a clue about what you are talking about. Five

number summary. That is min, Q1, median, Q3, and max. The graphical display of the five

number summary is the box plot. The left whisker is the min. Where the box starts is the Q1

value, median, Q3, and the max. We are going to break that apart some more here in a minute.

Trying to make sure that I don’t run out of time. Your range, again I have said that a

number of times, that is the max minus the min value. The range is one single value.

So if your low is 20 and your highest is 100, your range is 80. Don’t say that your range

is between 20 and 100. I used to do that and got corrected eventually after a few years

of teaching. That was a little bit late, but anyway. Percentiles. You know if you take

the SAT’s and you score in the 95th percentile, then you scored better than 95 percent of

your fellow students. Graphical display, we just covered that. It is an Ogive graph or

a Relative Cumulative Frequency Distribution. So make sure you go and look those up. Let’s

talk a little bit about box plots. They are our graphical display of the five number summary.

They display it. Ok, great. And more time, your five number summary is what we use to

describe the shape of non-normal data. It is not written up here because I have not

talked about standard deviation a lot, but if it is bell curve… a symmetric unimodal

distribution your best description for shape is not the 5 number summary but mean and standard

deviation. We will talk about that later again. Box plots can show skewness, but not gaps.

So let me explain what I am talking about there. uh… Here we go. So here is a box

and whisker plot. Ok. We have talked about left skewed and how if it was left skewed

the data would just sort of taper off to the left and become more spread out. Does this

box plot look left skewed? The answer is no. Because these three quarters are all very

evenly spaced with just one long whisker. That is probably due to an outlier. To show

skewness, it would need to… and it will not be perfect. We always deal with real life

data so this will vary. But you are looking for your segments to kind of increase in width

as you either to the left or the opposite would be right skewed. This would be more

like a left skewed distribution with a tight upper 25 percent and as I work my way down

towards the minimum my gaps get farther and farther apart. This is skewness. One thing

that I have not written up here though is one of the very best uses of box plots. They

are not to analyze the shape of the data. That is what we have histograms for. No. If

you go into your calculator you can put data in one, two, or three lists and you can display

up to three of those box plots at once. You will also see examples in your textbook where

there are multiple box plots together, maybe up to four or five of them. You want to use

these box plots for comparing multiple distributions. You put those on the same x scale, usually

x, sometimes your book will put them vertically. When they are side by side then you can talk

about any comparisons or contrasts you see between these. Uh… With these being the

medians, maybe the data values that these two box plots now represent… that I made

up that don’t have any numbers… but the median values are the same but this grouping

of data has a much small range or a much small spread. Later when we are talking about standard

deviation, it has a much smaller standard deviation. All this box plot shows is that

the data is more spread out. This might actually have more numbers here than up in this graph.

Because cannot see how much data is displayed just how far it is spread out. This minimum

is smaller than this minimum. The maximum is larger than this maximum value. The medians

are about the same. Remember if you are describing distributions… shape, center, spread, and

outliers. I am just doing a quick overview. Well, I just repeated here what this says.

And I think I am running off screen. Let me see if I have any time left. Yes I do. So

one last concept before we call this video a wrap is modified box plots. Now we are going

to have a video… actually we have already done a video of how to find histograms and

box plots with our graphing calculators. So we definitely want to make sure we know how

to do that. But um, I have not introduced it in an official set of notes yet. Modified

box plots. Those are awesome because they take care of the IQR test for us which means

that they will show us the outliers with individual points on the side. I just talked about how

do you that IQR test. It is the IQR range which is Q3-Q1 times 1.5. You add that number

to Q3, you subtract that number from Q1 and you get this boundary… this imaginary boundary

you don’t want your data to fall outside of. Well, your calculator will do that for you

and when you see this box and whisker plot or box plot your whiskers will start and end

at the first or the last piece of data that actually passes the IQR test. So if the lower

bound is maybe 40 but the first piece of data that is greater than that lower bound is 50,

then that whisker is going to be at 50 not 40… not that calculated lower bound that

we set up by doing the IQR test. You definitely want to chose this from your calculator because

why not get more information from your graph automatically if you can. Well, we are going

to erase all of this and talk about standard deviation in a second and talk about properties

of standard deviation which is the most important measure of spread that we are going to talk

about this year in Intro to Statistics. BAM!

Very nice bro. Good explanation A++++

@Evousa Thank you very much. I am really enjoying making these videos this year. There will be many more to follow.

@linday1ful Thank you! I hope you do great in your class. Please spread the word about my new math channel I am building:)

I like the quality of this video

@Starr169 Thanks.

if i have non-normal data, which one of the below that i am able to use to perform a parametric statistic test?

a) anova

b) box-plots

c) Correlation

d) paired t-test of coefficients

please help me!!! Thanks!!!

@mypostedtestvideo321 I have only taught statistics from three editions of the same book over the years of teaching AP Statistics…and it does not use the vocabulary of parametric statistic test. So I don't really know how to answer your question. Anova is also not language used in my book. I will say the only thing that I teach that allows you to work around non-normality is the Central Limit Theorem…which states that with a large enough sample the distribution of averages will be…

@mypostedtestvideo321 …approximately normal even if the population is not. Box-plots just show shape…Correlation measures strength and direction of a linear relationship…t-tests are a test for when you are working with averages and you are working with the sample standard deviation.

@Angeliniwini Thanks for the positive feedback. I really appreciate it:) My book does not have the concept of extreme outliers. It sounds like you do Q3+3(IQR) for the upper bound and Q1-3(IQR) for the lower bound. I will look it up tomorrow and get back to you. Keep up the good work!

Thanks. The necklace and then the pendent were 2 seperate gifts:)

First thx for all the great work u are doing but i am wondering about which book you are talking? Thx ðŸ™‚

"Scary looking Greek symbol" – I like your humor. ðŸ˜€ I've learned some useful things in this video including that the mean is the closest to the tail of a skewed distribution. ðŸ™‚

I'm so glad you have Statistics! I used your videos for Algebra, Algebra 2, and now Stats! Thanks to you, math doesn't seem as terrifying anymore. A sincere Thank You, Mr. Tarrou (:

You are an amazing teacher! Â Simplifying what many complicate! Â You make math enjoyable, as it should be! Â Kudos to you Sir! ðŸ™‚ Â

I just want to say that if not for your videos I would have really struggled in my precalc and trig classes. Thanks to you I passed both of those classes and now I'm taking Stats I and I have my first exam today and I feel like I will do well. Just by uploading these videos you have helped so many people. Thank YOU so much!

I am so excited to see you do Statistics videos, too! Â I watched your videos all through Pre Calculus, and I love your style of teaching! Â Thank you!!

Thank you so much for making all of your videos. They are so helpful and clear! I was so confused about a lot of the statistics information on my upcoming test and your videos really clarified a lot for me. So thank you very much.

Is there any possibility you could please do a video on frequency tables? (about marginal frequencies for two way tables, two-way contingency tables, Â and conditional relative frequencies and association) ? It would be so greatly appreciated!

Thanks again Professor!Â

I wish I knew you taught Stats; otherwise, I would have watched your videos when I had the class the first semester. I am glad I saw your channel for Statistics as I can easily prepare for the AP exam with your assistance as well as my prep books.

Thank you so much ðŸ™‚

Can you be more specific with "Non-Normal" data?? Thanks!

hello professor,Suppose we made 7 trips in day and calculated the mean value from given dataset of 7 trips, after the eighth trip of the day, the mean distance traveled remains unchanged. What is the distance of the eighth trip? and what about standard deviation of distance ,When we add the eighth trip to the data set, Does it increase, decrease or stay the same?

what textbook do you use for your AP stat class?

I love this teacher .damn great lecture.

subscribed to your channel! I wish I discovered them earlier. Will definitely recommend to my friends, thank you!

Watching your videos to study last minute for the AP exam tomorrow – such a big help!

Dear Prof Rob, I'm a new stats teacher. My book asks the students to consider if events are independent based on the numbers in a marginal distribution chart. One example was looking at the people who died on the titanic. We thought that if the relative frequencies were different, the data was independent. the author of the text said that was wrong. Because the percentages were different there was some interaction or a hidden influence that we couldn't see. Well, we all don't get it. Can you shed some light on this apparent backwards thinking?

Prof Rob bob you didnâ€™t jump to the front of the board in the beginning?? ðŸ˜®

The wonderfully clear way by which you relay your knowledge is immensely helpful and the positive impact / motivation your videos produce feels priceless, and cannot be said enough.

Thank you for doing soo much!!

I felt like only saying â€˜thank youâ€™ wouldnâ€™t quite cut it ^_^

BAM!