CS
205 - Programming for the Sciences
Spring 2008 - Final Project
100
points
Out:
April 24, 2008
Due:
May 7, 2008 (Wednesday of finals week), no later than 10am, no late
work accepted
Honor Code
This final project is to be your own work. Any assistance you receive must be from one of the instructors. Assistance may "purchased" for a penalty of 0-5 points depending on the type and complexity of the assistance. Questions regarding interpretation and clarification of the assignment or provided code will not incur penalties and are encouraged. Questions regarding previous assignments and examples also will not incur penalties.
The last class period (Tuesday, April 29) will be an open lab, project work day. Assistance received during the class period will be heavily "discounted" and likely will not incur any penalty. This is to encourage student to start early on the project and answer questions of interpretation and clarification that can then be conveyed to the entire class.
Note: Example usages of almost all of the code you are asked to write for this project are contained the in-class exercises, programming assignments, and (practice) exams given out throughout the entire term.
Logistics
Because this project uses a data file, the solution folder for this project must reside on a local disk. For those with their own computers, this likely will be somewhere on the C: drive. For those using school computers, probably the best thing to do is to put the solution folder on a USB drive so that you can take it to any computer. (Please make sure you do not remove a USB drive before invoking the Safely Remove Hardware utility.)
Use a Web browser go to the course webpage http://csserver.evansville.edu/~hwang/s08-courses/cs205.html. Under today's date, save the compressed folder FinalProjectNameSurfer.zip. Extract the solution folder. Double-click into the folder NameSurfer, then double-click on NameSurfer.sln (the Visual Studio solution file). This will launch Visual Studio with the solution loaded.
When you have finished your project or on Wednesday, May 7, at 10am, whichever comes first, please make sure your name is in the comments as indicated in both Form1.cs and NameData.cs, and submit a compressed folder of your project solution folder as an attachment to an email message to Dr. Hwang (hwang@evansville.edu). Final project scores will be posted to Blackboard and final course grades will be posted to WebAdvisor no later than 5pm on Friday, May 9.
Note: As this project is in place of the final exam, it is worth 25% of the final course grade. Final grades are based on the final weighted score percentage. The grading scale will be no higher than 90/80/70/60 and may be lower depending on overall class performance.
Background
The Social Security Administration provides a neat web site showing the distribution of names chosen for children in the US (http://www.ssa.gov/OACT/babynames/). Among the statistics presented is data giving the 1000 most popular boy and girl names for children born in the US for each decade starting with the 1880s. For this project, we use the data starting with the 1900s. The data can be boiled down to a single text file with a format as shown below. On each line we have the name, followed by the rank of that name in the decades starting in 1900, 1910, 1920, ..., 2000 (11 numbers). A rank of 1 was the most popular name that decade, while a rank of 997 was not very popular. A 0 means the name did not appear in the top 1000 that decade at all. The elements on each line are separated from each other by a single space. The lines are in alphabetical order, although we will not depend on that.
... Sam 58 69 99 131 168 236 278 380 467 408 466 Samantha 0 0 0 0 0 0 272 107 26 5 7 Samara 0 0 0 0 0 0 0 0 0 0 886 Samir 0 0 0 0 0 0 0 0 920 0 798 Sammie 537 545 351 325 333 396 565 772 930 0 0 Sammy 0 887 544 299 202 262 321 395 575 639 755 Samson 0 0 0 0 0 0 0 0 0 0 915 Samuel 31 41 46 60 61 71 83 61 52 35 28 Sandi 0 0 0 0 704 864 621 695 0 0 0 Sandra 0 942 606 50 6 12 11 39 94 168 257 ...
We see that "Sam" was #58 in 1900 and is slowly moving down. "Samantha" popped on the scene in 1960 and is moving up strong to #7. "Samir" barely appears in 1980, but by 2000 is up to #798. The database is for children born in the US, so ethnic trends show up when immigrants have kids.
Ultimately, we want to organize the data to graph it as shown below (with the names Sam and Samantha - the figures are shrunk from the actual interface so they are a little fuzzy). There are around 4500 names in the database. The data just records literally what people put on the forms, so there are things like "A" and "Baby" recorded as names (the data is more cleaned up in the later years). We will not worry about that, and we will not combine names that are similar in some sense - "Cathy" and "Catherine" and "Kathryn" and "Katie" and "Kati" will all count as different names.
While this project is large, it consists of pieces that are similar to programs that we have written before. It is laid out below as a series of parts to be completed. It is suggested that you do the parts in the order given as generally an earlier part must be completed to make a later part work. After each part, you should be able to test your program and see that what you have written works.
Interface Notes
The area where the graph goes is a panel named gridPanel. The textbox where a name is input is named aName. The application window has its minimum and maximum size set to the current size so that its size cannot be changed while it is running.

Part 1a (30 points): NameData class
We will use a NameData class to encapsulate the data for one name - the name and its rank over the decades. This is essentially the data of one line from the file shown above. The start of the NameData class is contained in NameData.cs. It currently contains the following items:
Definition of public int constants DECADES and START, so that if we were to change the number of decades or the year of the first decade, respectively, we can do so easily by changing the constants. These constants are currently defined with values 11 and 1900, respectively, to match the data file format.
The start of an explicit-value constructor that takes a string argument. The argument must be in the format of a line of the data files (a name followed by 11 numbers). The code for splitting this string between the spaces into an array of strings where each element is one of the "words" of the input string has been done for you. (I.e., data[0] will be the name, data[1] will be a string with the digits of the rank of the first decade, etc.)
You are implement the following for the NameData class (i.e., all of this code goes in NameData.cs):
Declare two private attributes where indicated. One is a string for the name; the other is an int array of DECADES elements to store the rank numbers.
Complete the explicit value constructor to store the data into the attributes. You will need to convert the number strings into actual integers.
A public property Name with only a get operation that returns the name attribute
A public method with header int RankInDecade (int decade) that returns the rank of the name in the given decade. We will use the convention that decade=0 is the START year (currently 1900), decade=1 is the next decade (currently 1910), and so on.
A public method with header int BestDecade() that returns the decade where the name was most popular (i.e., lowest rank number that is not 0), using the earliest decade in the event of a tie. For example, from the data above Sam's best decade is 1900, while Samantha's best decade is 1990. This method should return the actual year, for example 1920, so the caller does not need to adjust for START. It is safe to assume that every name has at least one year with a non-zero rank.
A public method with header int BestRank() that returns the rank of the decade where the name was most popular, using the earliest decade in the event of a tie. For example, from the data above Sam's rank when it was most popular (1900) is 58 and Samantha's is 5 (from 1990).
Part 1b (10 points): Reading from data file
The data for this program is in a file named "names-data.txt". We will store the data from this file in an ArrayList where each element is a NameData object containing data from one line of the data file. We will call this the database list.
The code to read in the data goes in the Form1 constructor where indicated in the comments. The code to open the file and attach it to StreamReader object inputFile is provided. As the comments explain, this code assumes that the file is in the same folder as the executable. Since it is possible for the program to be run either with or without debugging, a copy of the data file has been put in the appropriate places for both types of executable. If you move the executable to a different folder, you need to move the data file as well.
For this part, you are to:
Declare and create the database ArrayList variable in Form1.cs where indicated in the comments. Recall that you also will need to add using System.Collections; to the beginning of the file.
Implement the loop that will read each line of the file, create a NameData object with the line, and add the NameData object to the database list. The place where this code goes is indicated in the comments.
When you have finished both parts of Part 1, you can run the program with debugging and use the debugger to look at the database list. To do this, set a breakpoint at the end of the Form1 constructor by clicking in the left margin next to the last closing curly brace of the constructor. This will put a red dot in the left margin. Run the program using Start Debugging, and the program will stop at the red dot. In the bottom left corner should be a window for viewing variable values. Click on the Watch tab, then type in the name of your database list variable. The plus signs to the left of the variable allows you to "open" up the object and see the values of the individual parts of the variable. Check the first few NameData objects in the database list to see they have the correct name and rank data in them from the file. When you are done with debugging, choose Stop Debugging under the Debugging menu and delete the breakpoint by clicking on the red dot.
Part 2 (10 points): Best Decade button handler
Implement the handler for the Best Decade button. It should do the following:
Use a loop to search through the database list for the name in the aName textbox by comparing it to the Name property of the list element
If it finds the name, it should display the following in the results listbox: the name, the decade of the name's highest rank (obtained by calling the BestDecade method of the list element) and the name's rank for its best decade (obtained by calling the BestRank method).
If it does not find the name, it should display an error message in the results listbox saying the that name was not found in the database list.
Be sure to hand check the results with the data file to make sure the BestDecade and BestRank methods are working correctly.
Part 3a (10 points): Graph button handler
Graphing the rank data for a name is a two step process involving the handler for the Graph button and the Paint event for the gridPanel. To keep track of the names to be graphed, declare and create another ArrayList in Form1.cs where indicated in the commens. We will call this ArrayList the name list, and it will store the NameData objects of the names to be graphed The handler for the Graph button should do the following:
Search through the database list for the name in the aName textbox
If it finds the name, it should add the NameData object to the name list, then invalidate the gridPanel (to force it to be redrawn)
If it does not find the name, it should display an error message in the results listbox saying that the name was not found in the database list.
Part 3b (30 points): gridPanel Paint handler
As discussed in class, when dealing with graphics, there is a world coordinate system and a screen coordinate system. For this project, the x-axis of the two coordinate systems is the same with range of 0 to gridPanel.Width. For the y-axis, the world coordinate system has range 1 to 1000 (the possible rank values). The x-coordinates of the vertical grid lines are evenly spaced across the panel.
The y-axis of the screen coordinate system is a bit tricky, because the graph area is not the entire panel. The horizontal lines drawn by DrawGrid are at 20 pixels and (gridPanel.Height - 20) pixels. A constant PANEL_OFFSET has been defined with value 20 so that if we change where we want the graph area to be, we only need to change the constant's value. Thus the graph area height is (gridPanel.Height - 2*PANEL_OFFSET). The computed y-coordinates of the rank points should result in a placement of the point that is proportional to its rank. For example, a name with a rank of 1 is at the top of the graph, a rank of 475 would be near the middle of the graph, and a rank of 989 is at the bottom of the graph. A rank value of 0 also should be plotted at the bottom of the graph.
The start of the Paint handler (gridPanel_Paint) is given. It obtains a Graphics object from the gridPanel and calls the DrawGrid method to draw the grid lines of the graph. This method show examples of drawing lines (given two point locations) and strings (location given is the upper-left corner).
For this part, you are to complete the implementation of gridPanel_Paint to graph the rank data for the names in the name list by doing the following:
Loop through each NameData object in the name list and do the following:
Compute the (x, y) location of the first decade rank (obtained by calling the RankInDecade method)
Draw the name and first decade rank next to the point
Loop through the rest of the decade ranks and do the following:
Make (xOld, yOld) be the previous (x, y) location
Compute the (x, y) location of the current decade rank
Draw a line from (xOld, yOld) to (x, y)
Draw the name and current decade rank next to (x, y).
For now, use the black pen (pens[0]) or black brush (brushes[0]) as appropriate. Note that DrawString interprets the location given to it as the upper left corner of the box around the text to be drawn. We would like these strings to be drawn above the point rather than below the point, so you will need to adjust the y-coordinate given to DrawString.
At this point, the program should graph names as they are added using the Graph button. As we add names, they will tend to draw on top of each other, especially at the very top and very bottom. Since the name string is repeated each decade, it is still possible to figure out which line is which. However, it would be nicer if the graph lines were in a few different colors. The arrays pens and brushes contain NUM_PENS pens/brushes of different colors. Instead of always using 0 as the index to the arrays, we can rotate through the colors by keep track of a currentPenIndex that is initialized to 0, then is incremented after each name is graphed. (When currentPenIndex gets to NUM_PENS-1, it rolls back around to 0.)
Add code to the handler to rotate pen/brush color as it graphs the data.
The more pens/brushes in the arrays, the more names are graphed before a color repeats itself. If you want to add more colors, change the value of NUM_PENS and add initialization code in the Form1 constructor for the new pens/brushes as shown.
In the figure below, we see the names "A" and "Wendy". "A" starts strong in 1900 and trails off to 0 in 1990. Wendy is at 0 until 1940.

In the next figure below, we add "John" who is very near 1 the whole time, and "Samir" who comes on the scene only starting in 1980. Both Wendy and Samir are 0's in 1900, 1910, ... so they draw on top of each other there. That is fine - we draw what we can and if they draw on top of each other, so be it.

Part 4 (10 points): Clear All and Clear One button handlers
Finally, after graphing a few names, the graph gets very messy. Implement the handlers for the last two buttons as follows:
The Clear All handler should erase the name list (using the Clear method) and invalidate the gridPanel (to cause a repaint).
The Clear One handler should remove the earliest added name (i.e., the first one at index 0, using the RemoveAt method) and invalidate the gridPanel (to cause a repaint). For example, if the names Samantha, Wendy, and John are added to the name list, then clicking the Clear One button removes Samantha. It is fine that which color goes with which name changes as names are added and removed.
Acknowledgments
This assignment is based on a similar Java assignment developed by Nick Parlante at Stanford University that was presented during a Nifty Assignment session at the 2005 SIGCSE Conference.
Revised:
04/24/08