|
Back to Legal Ramblings |
|
[ Saturday, January 29, 2005 (12:31 AM) ] ( link ) Histograms in Microsoft Excel 2002: For a long time, I've been utterly baffled by how to create a histogram graph in Microsoft Excel 2002. For those who haven't installed the Data Analysis Toolpack, Excel is frustratingly obscure. So here's a brief guide, for my own reference and the reference of others. (And also because I have nothing else fun to talk about.) The goal is to create a bar graph histogram.
b. Next, you need to enter an array function. I don't know exactly what that means, but trust me, that's what you want to do. Select the top of the column right next to your bin lengths: here, that's Cell D1. Then highlight the whole length of the column next to your bin lengths: here, that's D1 to D12. Is that space still highlighted? Good. Now, just start typing: =FREQUENCY(A1:A63,C1:C12) and don't hit anything else, including ENTER. Ok, a brief explanation of that formula. A1:A63 is the range of the data you want to draw the histogram around. C1:C12 is the list of bin lengths that you had (here, intervals of 5 from 5 to 60). You don't have to type these in directly of course; you can also just select both columns, but that's more basic Excel fun. c. Anyway, I hope you did not hit ENTER yet. Because, you see, you can't just hit ENTER: If you just hit ENTER, all that will happen is that D1 will display "20," and that's it. Instead, while D1 through D12 are still highlighted, AND you've just typed in the formula above, hit CTRL-SHIFT-ENTER. Like magic, the entire highlighted portion of Column D will fill up with numbers. Here's what I have on my own screen:
Here's how to interpret these data. Cell D1's figure of 20 means that there are 20 numbers in Set S (as typed in A1:A63) that are less than or equal to 5 (the corresponding bin size). D2's figure of 10 means that there are 10 numbers in Set S that are greater than 5 but less than or equal to 10. Etc. And, voila, you've got the density figures you need for a histogram. 4. But we're still some ways away from an actual histogram. In fact, we're quite a shocking distance away. So now we go through the horrible, arbitrary steps to actually create a histogram-like graph (which won't even be perfect, by the way). a. Select BOTH columns' data, so everything between C1 and D12. Go to the Chart Wizard (either on the toolbar or by going to Insert > Chart). Under Chart Type, choose "XY (scatter)", then choose the "Chart sub-type" that does NOT have lines connecting the dots. Click "Next." b. The next window is "Chart Source Data." I'm sure you can do fancy stuff here, but it's not necessary. Click "Next." c. Here you can prettify your chart. Again, not strictly necessary, but go ahead and put in a Chart Title ("Random Numbers"), a Value (X) Axis ("Bin Lengths"), and a Value (Y) Axis ("Frequency"). Play with whatever else you want. Click "Next." d. On the next window, click "Finish." 5. Ok! Now you have a chart. Here are some clean-up operations that you may have to do, because Excel is a hideously stupid program. a. Your X-Axis may go to a range way above your maximum (which is 60 here). My own chart, for instance, goes to 80. Why? I don't know. At any rate, to fix this problem, click on the very bottom line of the chart (right above the X-axis values). That should highlight the bottom line (two boxes will be at the ends of that line--and that's it; if any more little black boxes show up, you're highlighting the wrong thing). Right click on that highlighted line, and select "Format Axis." Go to the "Scale" tab. Change the "Maximum" to something more reasonable--like 60. If the graph also has stupid intervals--like intervals of 20 rather than 5--change the Major Unit and Minor Unit to 5, which is probably the most reasonable interval here. b. You can do a similar thing with the Y-Axis, although mine is refreshingly in intervals of 5, going up to 25. Here's what the graph looks like now: 6. Now, we'll make this LOOK like a histogram. a. There should be a series of dots. Right-click on one of the dots. Select "Chart Type." Under "Chart type," select "Column." Click "OK." Now you have a bunch of columns that go up to the right figure. b. Right click on one of the bars. Click on "Format Data Series." Under the "Patterns" tab, you can change the color of the bar. (You also probably want to make "Borders" "Automatic.") Then, under the "Options" tab, change "Gap width" to 0. c. [You can also go into "Format Data Series" and go to the "Data Labels" tab, and then click on "Label Contains > Value." This'll put the value of each column at the top of the column.] Here's what the graph looks like now: And there you go! Astonishingly, a histogram. It requires a little bit of explanation though, that you'll probably have to provide somewhere. The column of "20" above "5" means that there are 20 values of less than or equal to 5; the column of "10" above "10" means there are 10 values of greater than 5 and less than or equal to 10, etc. But you actually now have something you can do some work on. Alternatively, you can just install the Data Analysis Toolpack, which (I understand) simplifies this entire process, but not that much.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
Back to Steven Wu's Home Page |