CS 205 - Programming for the Sciences
Spring 2008 - Final Project
100 points


Out: April 24, 2008
Due: May 7, 2008 (Wednesday of finals week), no later than 10am, no late work accepted


Honor Code

This final project is to be your own work. Any assistance you receive must be from one of the instructors. Assistance may "purchased" for a penalty of 0-5 points depending on the type and complexity of the assistance. Questions regarding interpretation and clarification of the assignment or provided code will not incur penalties and are encouraged. Questions regarding previous assignments and examples also will not incur penalties.


The last class period (Tuesday, April 29) will be an open lab, project work day. Assistance received during the class period will be heavily "discounted" and likely will not incur any penalty. This is to encourage student to start early on the project and answer questions of interpretation and clarification that can then be conveyed to the entire class.


Note: Example usages of almost all of the code you are asked to write for this project are contained the in-class exercises, programming assignments, and (practice) exams given out throughout the entire term.


Logistics

Because this project uses a data file, the solution folder for this project must reside on a local disk. For those with their own computers, this likely will be somewhere on the C: drive. For those using school computers, probably the best thing to do is to put the solution folder on a USB drive so that you can take it to any computer. (Please make sure you do not remove a USB drive before invoking the Safely Remove Hardware utility.)


Use a Web browser go to the course webpage http://csserver.evansville.edu/~hwang/s08-courses/cs205.html. Under today's date, save the compressed folder FinalProjectNameSurfer.zip. Extract the solution folder. Double-click into the folder NameSurfer, then double-click on NameSurfer.sln (the Visual Studio solution file). This will launch Visual Studio with the solution loaded.


When you have finished your project or on Wednesday, May 7, at 10am, whichever comes first, please make sure your name is in the comments as indicated in both Form1.cs and NameData.cs, and submit a compressed folder of your project solution folder as an attachment to an email message to Dr. Hwang (hwang@evansville.edu). Final project scores will be posted to Blackboard and final course grades will be posted to WebAdvisor no later than 5pm on Friday, May 9.


Note: As this project is in place of the final exam, it is worth 25% of the final course grade. Final grades are based on the final weighted score percentage. The grading scale will be no higher than 90/80/70/60 and may be lower depending on overall class performance.



Background

The Social Security Administration provides a neat web site showing the distribution of names chosen for children in the US (http://www.ssa.gov/OACT/babynames/). Among the statistics presented is data giving the 1000 most popular boy and girl names for children born in the US for each decade starting with the 1880s. For this project, we use the data starting with the 1900s. The data can be boiled down to a single text file with a format as shown below. On each line we have the name, followed by the rank of that name in the decades starting in 1900, 1910, 1920, ..., 2000 (11 numbers). A rank of 1 was the most popular name that decade, while a rank of 997 was not very popular. A 0 means the name did not appear in the top 1000 that decade at all. The elements on each line are separated from each other by a single space. The lines are in alphabetical order, although we will not depend on that.


...
Sam 58 69 99 131 168 236 278 380 467 408 466
Samantha 0 0 0 0 0 0 272 107 26 5 7
Samara 0 0 0 0 0 0 0 0 0 0 886
Samir 0 0 0 0 0 0 0 0 920 0 798
Sammie 537 545 351 325 333 396 565 772 930 0 0
Sammy 0 887 544 299 202 262 321 395 575 639 755
Samson 0 0 0 0 0 0 0 0 0 0 915
Samuel 31 41 46 60 61 71 83 61 52 35 28
Sandi 0 0 0 0 704 864 621 695 0 0 0
Sandra 0 942 606 50 6 12 11 39 94 168 257
...


We see that "Sam" was #58 in 1900 and is slowly moving down. "Samantha" popped on the scene in 1960 and is moving up strong to #7. "Samir" barely appears in 1980, but by 2000 is up to #798. The database is for children born in the US, so ethnic trends show up when immigrants have kids.


Ultimately, we want to organize the data to graph it as shown below (with the names Sam and Samantha - the figures are shrunk from the actual interface so they are a little fuzzy). There are around 4500 names in the database. The data just records literally what people put on the forms, so there are things like "A" and "Baby" recorded as names (the data is more cleaned up in the later years). We will not worry about that, and we will not combine names that are similar in some sense - "Cathy" and "Catherine" and "Kathryn" and "Katie" and "Kati" will all count as different names.


While this project is large, it consists of pieces that are similar to programs that we have written before. It is laid out below as a series of parts to be completed. It is suggested that you do the parts in the order given as generally an earlier part must be completed to make a later part work. After each part, you should be able to test your program and see that what you have written works.


Interface Notes

The area where the graph goes is a panel named gridPanel. The textbox where a name is input is named aName. The application window has its minimum and maximum size set to the current size so that its size cannot be changed while it is running.





Part 1a (30 points): NameData class

We will use a NameData class to encapsulate the data for one name - the name and its rank over the decades. This is essentially the data of one line from the file shown above. The start of the NameData class is contained in NameData.cs. It currently contains the following items:



You are implement the following for the NameData class (i.e., all of this code goes in NameData.cs):



Part 1b (10 points): Reading from data file

The data for this program is in a file named "names-data.txt". We will store the data from this file in an ArrayList where each element is a NameData object containing data from one line of the data file. We will call this the database list.


The code to read in the data goes in the Form1 constructor where indicated in the comments. The code to open the file and attach it to StreamReader object inputFile is provided. As the comments explain, this code assumes that the file is in the same folder as the executable. Since it is possible for the program to be run either with or without debugging, a copy of the data file has been put in the appropriate places for both types of executable. If you move the executable to a different folder, you need to move the data file as well.


For this part, you are to:



When you have finished both parts of Part 1, you can run the program with debugging and use the debugger to look at the database list. To do this, set a breakpoint at the end of the Form1 constructor by clicking in the left margin next to the last closing curly brace of the constructor. This will put a red dot in the left margin. Run the program using Start Debugging, and the program will stop at the red dot. In the bottom left corner should be a window for viewing variable values. Click on the Watch tab, then type in the name of your database list variable. The plus signs to the left of the variable allows you to "open" up the object and see the values of the individual parts of the variable. Check the first few NameData objects in the database list to see they have the correct name and rank data in them from the file. When you are done with debugging, choose Stop Debugging under the Debugging menu and delete the breakpoint by clicking on the red dot.



Part 2 (10 points): Best Decade button handler

Implement the handler for the Best Decade button. It should do the following:



Be sure to hand check the results with the data file to make sure the BestDecade and BestRank methods are working correctly.


Part 3a (10 points): Graph button handler

Graphing the rank data for a name is a two step process involving the handler for the Graph button and the Paint event for the gridPanel. To keep track of the names to be graphed, declare and create another ArrayList in Form1.cs where indicated in the commens. We will call this ArrayList the name list, and it will store the NameData objects of the names to be graphed The handler for the Graph button should do the following:



Part 3b (30 points): gridPanel Paint handler

As discussed in class, when dealing with graphics, there is a world coordinate system and a screen coordinate system. For this project, the x-axis of the two coordinate systems is the same with range of 0 to gridPanel.Width. For the y-axis, the world coordinate system has range 1 to 1000 (the possible rank values). The x-coordinates of the vertical grid lines are evenly spaced across the panel.


The y-axis of the screen coordinate system is a bit tricky, because the graph area is not the entire panel. The horizontal lines drawn by DrawGrid are at 20 pixels and (gridPanel.Height - 20) pixels. A constant PANEL_OFFSET has been defined with value 20 so that if we change where we want the graph area to be, we only need to change the constant's value. Thus the graph area height is (gridPanel.Height - 2*PANEL_OFFSET). The computed y-coordinates of the rank points should result in a placement of the point that is proportional to its rank. For example, a name with a rank of 1 is at the top of the graph, a rank of 475 would be near the middle of the graph, and a rank of 989 is at the bottom of the graph. A rank value of 0 also should be plotted at the bottom of the graph.


The start of the Paint handler (gridPanel_Paint) is given. It obtains a Graphics object from the gridPanel and calls the DrawGrid method to draw the grid lines of the graph. This method show examples of drawing lines (given two point locations) and strings (location given is the upper-left corner).


For this part, you are to complete the implementation of gridPanel_Paint to graph the rank data for the names in the name list by doing the following:



For now, use the black pen (pens[0]) or black brush (brushes[0]) as appropriate. Note that DrawString interprets the location given to it as the upper left corner of the box around the text to be drawn. We would like these strings to be drawn above the point rather than below the point, so you will need to adjust the y-coordinate given to DrawString.


At this point, the program should graph names as they are added using the Graph button. As we add names, they will tend to draw on top of each other, especially at the very top and very bottom. Since the name string is repeated each decade, it is still possible to figure out which line is which. However, it would be nicer if the graph lines were in a few different colors. The arrays pens and brushes contain NUM_PENS pens/brushes of different colors. Instead of always using 0 as the index to the arrays, we can rotate through the colors by keep track of a currentPenIndex that is initialized to 0, then is incremented after each name is graphed. (When currentPenIndex gets to NUM_PENS-1, it rolls back around to 0.)



The more pens/brushes in the arrays, the more names are graphed before a color repeats itself. If you want to add more colors, change the value of NUM_PENS and add initialization code in the Form1 constructor for the new pens/brushes as shown.


In the figure below, we see the names "A" and "Wendy". "A" starts strong in 1900 and trails off to 0 in 1990. Wendy is at 0 until 1940.




In the next figure below, we add "John" who is very near 1 the whole time, and "Samir" who comes on the scene only starting in 1980. Both Wendy and Samir are 0's in 1900, 1910, ... so they draw on top of each other there. That is fine - we draw what we can and if they draw on top of each other, so be it.




Part 4 (10 points): Clear All and Clear One button handlers

Finally, after graphing a few names, the graph gets very messy. Implement the handlers for the last two buttons as follows:



Acknowledgments

This assignment is based on a similar Java assignment developed by Nick Parlante at Stanford University that was presented during a Nifty Assignment session at the 2005 SIGCSE Conference.

Revised: 04/24/08 8 of 8