A Titanic Travesty
There are a number of data sets that are commonly used for data science instructional exercises. The one we are working on now is a list of the passengers of the Titanic. We’re being asked to do an analysis on whether any of the data supplied could inform who survived (e.g., what class passenger, were they part of a family).
While looking through the data I had a thought, “are the crew accounted for here?” I was really shocked to find out that they weren’t.
And it’s not as if the information isn’t available. I found it in three places:
I then discovered that there were 462 additional passengers that aren’t represented in the data. I suppose for other analyses of the data that would be okay, but when we’re being asked to examine the fatalities of those aboard the ship I just feel that it’s a travesty to include only 891 names, omitting the majority of those aboard: 855 crew members (38% of those aboard) and 462 passengers (21%) when they could so easily be made available.