Stastics

End of Semester POL Reflection

We've done quite a bit of work this semester in statistics, covering quite a few topics, and while I have a multitude of work that I could reflect upon from this year, I think that if I were to choose one piece of work I would have to choose my work in excel. Pretty much our first large(ish) scale project we did this year, we were looking at income inequality. There are many different rations that measure inequality, including 20/20, interquartile range, and a few others. Our class was tasked with trying to find a way to find these measurements using coding in excel. Personally, I love technology, so the opportunity to merge two of my favorite things, math and tech, was pretty exciting to me. So, that night, I went home and spent upwards of 3 hours working on a code that would work for the 20/20 (later adjusted for other measurements) measurements. Kyle had already created a working program, but I felt that it was somewhat dysfunctional and clunky, and I wanted to compress it a bit and make it a little easier to use overall.

My initial frustrations came with trying to find a specific command within google sheets that would let me fast-track through the process. After quite a bit of searching, it became quite apparent that I would have to apply the little that I did know about coding in sheets to get the job done. It took, as mentioned before, around three hours to get working unconditionally, through quite a bit of trial and error, and then a little more time to adjust it for other measurements, but the final product looked something like:

To put it simply, what is going on in is that is going through each data point and finding where the top quintile in the data is. Once that is done, it finds the number in the row where the quintile is spits out the number of the amount of numbers left in that row up to the quintile mark multiplied by the value number in that row. Once that is done, a separate, much less complex equation goes through and finds all the “TRUE” statements, signifying that that row is still within the quintile, adds up the sum of these row’s frequencies*values (f*v), adds them up, and adds the row that hits the quintile’s max cutoff for f*v. The actual code that parses the numbers is in two parts, the very first being a simpler

=if(F2-D2>0,True,if(D2=F2,C2,(((B2-(D2-F2))*A2))))

(KEY: “F” - Quintile, “B” - Frequency, “D” - Frequency Sums, “C” - f*v, “A” - Value)

This code is broken up into three parts. It starts by saying if the amount of numbers counted this far, subtracted from the quintile “cut-off”, is greater than zero, the output will be zero. But, in the case that the first set of numbers is greater than the quintile, then it will first check to see of they are the same. If they are, then it will put out the f*v of that line. If the numbers are not, however, the same, then it will go to the final step of the code, which subtracts the frequency from sum of the frequencies subtracted from the quintile, multiplied by the value. To put this a little more simply, it finds where, in that row, the counted numbers hit the value of the quintile, and then multiplies that number by the value of the numbers on that row. These last two steps are very important to the rest of the code, as the rest of the cells of code depend on the previous statement to be true. If the previous statement is anything other than true, then it will instinctively spit out false, forcing all of the other cells below it in that column to false as well. This allows the second (or third, technically) code to parse only the rows that are true, and add, since true and false aren’t numbers, the number in the row where the quintile cut-off hits. The next code adds one more step to the process:

=if(G2=True,if(D3<F2,True,if(D3=F2,C3,(((B3-(D3-F2))*A3))))) - Same key as before, except that “G” is the cell above the current cell.

This part of the code just adds the aforementioned “if the previous cell in the column is anything besides true, then it’ll spit out false”. Other than that, the only real change is it checks to see of the sum of the frequencies is more than the quintile. It’s almost the exact same thing, except for this form made this type of code a little easier to execute.

Creating this code did take a lot of trial-and-error. The only real coding ability I had when I started with this code was “=if(expression, true, false)”, so that’s what I worked with. I really just had to ask myself “What do I need this code to do?”, and start getting to work at it. I basically just worked from left to right with the code in a linear fashion, starting out with a very basic code that I can’t specifically remember at this point, and figure out “what else does this code need?” After working and tweaking, I finally had come with a rough, but full-functioning-given-the-conditions-I-had-created code. After playing around with some numbers and checking to see if the code worked the way it was supposed to, I was finally able to come with a code that would work unconditionally that one could simply tack on to the side of a similar set of data with ease. The final part of the code, which actually takes what the first code(s) are finding, and turns them into the 20/20 measurement, was a little more crafty in a slightly different way. Thanks to the nature of the code that I created, and the previous combing through different types of commands for sheets, I had the perfect type of command to parse through the data and add all the data points within quintile. After that, making it divide the top and bottom 20% (the bottom being found by some simple inversion in the code’s drag-and-drop system), was child’s play in light of what I had accomplished already.

Now for the real question: so what? What does it matter that you did an arguably simple (compared to the types of crazy things that can be accomplished with enough time and knowledge) code in sheets? Well, I think it really highlights the way I approach math; in a very linear fashion. While there are many different methods and mindsets to troubleshooting, I think this is a very important thing for me to keep in mind for my future. I think this reflects on the some of the personality tests that I’ve taken over the past few years. From what they’ve told me, I work in a pretty similar way; more alone, and able to think things out very logically. If given the resources, I can find the solution to the problem in a logical fashion. I think this will be very helpful for me in the future, as I am hoping to pursue a career in engineering. With this linear mindset available, I think I will be able to achieve great things.

Links to Spreadsheets:
Original - https://docs.google.com/spreadsheets/d/1_zZe0ynxOKzkwDGbhWhnZZtxjKLRdQARKa6R2b4Yx4o/edit?usp=sharing
Re-coded - https://docs.google.com/a/animashighschool.com/spreadsheets/d/1z98b3Zd2Q_-N-Ohkq0bFNPmgvWZ3CXpOnvp1Qd7mqXE/edit?usp=sharing