My Data Scientist Salary Scraping Experience

1 minute read

This week’s project was to utilize web scraping and logistic regression to create a model that would determine whether a job would be high-paying or low-paying. I was excited about this because web scraping seemed like it would be fun to do and I’ve been wanting to get more experience doing regressions.

The first challenge was finding an appropriate website. After reviewing several I found that careerbuilder.com was ideal. Not only did it have salary data for many of its posts, but the links for the site led to internal urls. This was important because if you wanted to really get a lot of detail to work with you would want to see more than just the front page snippet.

I got very excited about finding this and was eager to create code that would compile links for every job posting, then go through each link to scrape the content we were interested in. In hindsight my eyes were bigger than my stomach. Like so many things in life I think that if I had a little more time I think I would have been able to do it, and I feel a little bit of a letdown over not getting to include things in the model like degree and experience requirements, industry, or full job descriptions.

Luckily there were others on Team Beefy (BFY, Bentley/Federman/Yass) who had more feasible goals and so we made it to the finish line.

I’m happy that we got the job done, but I’m not thrilled with the results, to be honest. In retrospect, I think that we would have been better off with less compartmentalization and more communication. Hard lessons are best had when you’re not getting paid for the results.

Updated: