• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Sumproduct

Ufoo

Member
Hello: the following formula is for answering the question
Is there a way of getting the second sales of Josh without using the helper column?: =INDEX($E$5:$E$17,SUMPRODUCT(SMALL(($C$5:$C$17="Josh")*(ROW($C$5:$C$17)-4),2+COUNTIF($C$5:$C$17,"<>Josh")))).

I have failed to understand the sumproduct part of the formula. Someone please help. I am attaching the excel sheet with the question and the answer. The problem was posted and solved by Chandoo. Thanks a lot Chandoo.
 

Attachments

Hi ,

I do not know from where you have got the posted formula , but it seems to me unnecessarily complicated.

1. Getting the maximum in any array is possible using the MAX function ; the LARGE function performs a similar function (!) , with a second parameter of 1.

2. Getting the second , third , fourth ,... values in an array is possible using the LARGE function , with the second parameter assuming the values 2 , 3 , 4 ,...

Let us take a few examples.

What is the second largest sale ?

=LARGE(NetSales,2)

returns 2277.

What is Josh's largest sale ?

=LARGE(IF(SalesPerson = "Josh", NetSales),1)

Because of the IF function inside the LARGE function , the above formula has to be entered using CTRL SHIFT ENTER.

Naturally , what is Josh's second largest sale is obtained by using 2 as the second parameter in the above formula.

It is a different matter if we want Josh's second sale : here , we need to go down the rows starting from the top , which will be the smallest row , and go downwards in increasing order of rows. For this , we will need to use the SMALL function , instead of the LARGE function.

The first sale of Josh will occur in the smallest row ; the second sale of Josh will occur in the next bigger row.

Getting the second row is the first step ; once we have this , we use it to index within the Net Sales column. The formula to achieve this will be :

=INDEX(NetSales, SMALL(IF(SalesPerson = "Josh", ROW(NetSales) - MIN(ROW(NetSales)) + 1), 2))

entered as an array formula , using CTRL SHIFT ENTER.

The INDEX function will have to be used only when the criteria is something other than the parameter that we want to retrieve ; what I mean is we want to retrieve the Net Sales , where the criterion is not the Net Sales itself , but the Profit , or the number of customers or the occurrences of any parameter.

Narayan
 
T
Hi ,

I do not know from where you have got the posted formula , but it seems to me unnecessarily complicated.

1. Getting the maximum in any array is possible using the MAX function ; the LARGE function performs a similar function (!) , with a second parameter of 1.

2. Getting the second , third , fourth ,... values in an array is possible using the LARGE function , with the second parameter assuming the values 2 , 3 , 4 ,...

Let us take a few examples.

What is the second largest sale ?

=LARGE(NetSales,2)

returns 2277.

What is Josh's largest sale ?

=LARGE(IF(SalesPerson = "Josh", NetSales),1)

Because of the IF function inside the LARGE function , the above formula has to be entered using CTRL SHIFT ENTER.

Naturally , what is Josh's second largest sale is obtained by using 2 as the second parameter in the above formula.

It is a different matter if we want Josh's second sale : here , we need to go down the rows starting from the top , which will be the smallest row , and go downwards in increasing order of rows. For this , we will need to use the SMALL function , instead of the LARGE function.

The first sale of Josh will occur in the smallest row ; the second sale of Josh will occur in the next bigger row.

Getting the second row is the first step ; once we have this , we use it to index within the Net Sales column. The formula to achieve this will be :

=INDEX(NetSales, SMALL(IF(SalesPerson = "Josh", ROW(NetSales) - MIN(ROW(NetSales)) + 1), 2))

entered as an array formula , using CTRL SHIFT ENTER.

The INDEX function will have to be used only when the criteria is something other than the parameter that we want to retrieve ; what I mean is we want to retrieve the Net Sales , where the criterion is not the Net Sales itself , but the Profit , or the number of customers or the occurrences of any parameter.

Narayan
Thanks for this. You are right about the sumproduct part of the formula which I posted being complicated. But I love sumproduct. Whenever I see it I like to get to unravel it.
 
Leila Gharani (youtube) has helped me figure out the reason behind using COUNTIF in the k in the above formula (=INDEX($E$5:$E$17,SUMPRODUCT(SMALL(($C$5:$C$7="Josh")*(ROW($C$5:$C$17)-4),2+COUNTIF($C$5:$C$17,"<>Josh")))).

Our interest is to find the second sales of Josh, which means k for SMALL function would be 2. However, if you put 2 you get =INDEX($E$5:$E$17,0 (the first sales of Joseph) because the second smallest number in the array resulting from $C$5:$C$7="Josh")*(ROW($C$5:$C$17)-4 is zero. By adding the COUNTIF portion k becomes 12. This k ignores the 1st to 10th smallest numbers which are zeros (in the array $C$5:$C$7="Josh")*(ROW($C$5:$C$17)-4. The 12th smallest value is picked by the SMALL function. Please note that these 10 zeros represent cells which do not have the name Josh. Happy to share.
 
Leila Gharani (youtube) has helped me figure out the reason behind using COUNTIF in the k in the above formula (=INDEX($E$5:$E$17,SUMPRODUCT(SMALL(($C$5:$C$7="Josh")*(ROW($C$5:$C$17)-4),2+COUNTIF($C$5:$C$17,"<>Josh")))).
Hi Ufoo ,

It is good that you have understood how the formula works.

I would like you to understand why a simpler formula can be used.

1. The INDEX function , on its own , will display a correct result only if the index value is a scalar value , a single value. If multiple values are going to be used , then the only way it will work correctly is if some other function is wrapped around it. One such function can be the SUM function.

2. The SUMPRODUCT function has been used , which is basically going to sum up all the results that are made available to it ; if the entire formula has to work correctly , it is absolutely essential that the SUMPRODUCT function is passed exactly one value ; if two results such as 3 and 7 are passed to it , what it will return will be 10 , and in this case , such a return value will give a wrong output.

3. The reason for using the COUNTIF function is as you have mentioned , but what your source has not mentioned is that using the COUNTIF is necessitated by the construction of the formula ; if the formula had been constructed differently , the COUNTIF would not be required.

If the formula is revised as follows :

=INDEX($E$5:$E$17,SUM(SMALL(IF($C$5:$C$17="Josh", ROW($C$5:$C$17)-4),2)))

entered as an array formula , using CTRL SHIFT ENTER , then the COUNTIF is eliminated.

The reason for this is that in the first version , we are using the multiplication operator , which will result in zeros ; since we are using the SMALL function , zeros will be taken into consideration , and hence need to be counted.

In the second version which I have posted , we are using the IF function , which will result in numeric values where the condition is satisfied , and FALSE values where the condition is not satisfied. The SMALL function ignores FALSE values , and hence there is no necessity for counting them.

Note that because an IF function is used , even if the SUM function were to be replaced by the SUMPRODUCT function , you would still have to enter the formula as an array formula , using CTRL SHIFT ENTER.

4. The use of the following construct should be avoided at all times.

ROW($C$5:$C$17)-4

The recommended way would be as follows :

=INDEX(NetSales, SMALL(IF(SalesPerson = "Josh", ROW(NetSales) - MIN(ROW(NetSales)) + 1), 2))

If you wish to replace the named ranges by their actual references , the formula would look like this :

=INDEX($E$5:$E$17, SMALL(IF($C$5:$C$17 = "Josh", ROW($E$5:$E$17) - MIN(ROW($E$5:$E$17)) + 1), 2))

The first method would work only if the range starts on row 5 ; if it starts on any other row , the 4 would need to be replaced by the appropriate number.

The second method will work every time , what ever be the range reference.

Note that the formula I have posted avoids the SUMPRODUCT altogether , because , as I mentioned earlier , the use of the SUMPRODUCT function is not warranted when only one index value is to be used. The SUMPRODUCT function has been used only to avoid using CTRL SHIFT ENTER to enter the formula.

Narayan
 
Hi!

Excellent discussions so far.

The formula can be further simplified to:

=INDEX($E$5:$E$17,MATCH(2,COUNTIF(OFFSET($C$5,0,0,ROW($C$5:$C$17)-ROW($C$5)+1,1),"Josh"),0))

Find the row where the countif value is 2 and display the result using index.

with best regards,

Arun N
 
Hi Ufoo ,

It is good that you have understood how the formula works.

I would like you to understand why a simpler formula can be used.

1. The INDEX function , on its own , will display a correct result only if the index value is a scalar value , a single value. If multiple values are going to be used , then the only way it will work correctly is if some other function is wrapped around it. One such function can be the SUM function.

2. The SUMPRODUCT function has been used , which is basically going to sum up all the results that are made available to it ; if the entire formula has to work correctly , it is absolutely essential that the SUMPRODUCT function is passed exactly one value ; if two results such as 3 and 7 are passed to it , what it will return will be 10 , and in this case , such a return value will give a wrong output.

3. The reason for using the COUNTIF function is as you have mentioned , but what your source has not mentioned is that using the COUNTIF is necessitated by the construction of the formula ; if the formula had been constructed differently , the COUNTIF would not be required.

If the formula is revised as follows :

=INDEX($E$5:$E$17,SUM(SMALL(IF($C$5:$C$17="Josh", ROW($C$5:$C$17)-4),2)))

entered as an array formula , using CTRL SHIFT ENTER , then the COUNTIF is eliminated.

The reason for this is that in the first version , we are using the multiplication operator , which will result in zeros ; since we are using the SMALL function , zeros will be taken into consideration , and hence need to be counted.

In the second version which I have posted , we are using the IF function , which will result in numeric values where the condition is satisfied , and FALSE values where the condition is not satisfied. The SMALL function ignores FALSE values , and hence there is no necessity for counting them.

Note that because an IF function is used , even if the SUM function were to be replaced by the SUMPRODUCT function , you would still have to enter the formula as an array formula , using CTRL SHIFT ENTER.

4. The use of the following construct should be avoided at all times.

ROW($C$5:$C$17)-4

The recommended way would be as follows :

=INDEX(NetSales, SMALL(IF(SalesPerson = "Josh", ROW(NetSales) - MIN(ROW(NetSales)) + 1), 2))

If you wish to replace the named ranges by their actual references , the formula would look like this :

=INDEX($E$5:$E$17, SMALL(IF($C$5:$C$17 = "Josh", ROW($E$5:$E$17) - MIN(ROW($E$5:$E$17)) + 1), 2))

The first method would work only if the range starts on row 5 ; if it starts on any other row , the 4 would need to be replaced by the appropriate number.

The second method will work every time , what ever be the range reference.

Note that the formula I have posted avoids the SUMPRODUCT altogether , because , as I mentioned earlier , the use of the SUMPRODUCT function is not warranted when only one index value is to be used. The SUMPRODUCT function has been used only to avoid using CTRL SHIFT ENTER to enter the formula.

Narayan
I can't thank you enough @NARAYANK991
 
Back
Top