The unexpected gain of ‘soft’ skills: Not the experience I envisioned, but the experience I needed.

Keino Baird
5 min readMar 5, 2021

The team I worked with for this product (CitySpire) consisted of four data scientists, four web-developers, and a team project leader. Additionally, there was a stakeholder for the product. The team was 100% remote and met regularly for standups for the duration of the eight-week project. I worked with one of the data scientists before that were a part of the same cohort and program track, and I was also familiar with the technical abilities of another data scientist on the team.

The problem we were trying to solve was the creation of a web-based application for individuals who were looking to move from one city to another and needed information on the rental prices of possible new cities. The application included various metrics such as a walk score, a livability score, and a crime score. The application also included recommended cities using the k nearest neighbors algorithm.

On the second day of Lambda Labs, my grandmother passed away. I chose not to flex (withdraw). No one in my family was able to physically travel to Tobago for the funeral due to Covid-19 travel restrictions. The alternative was a YouTube streamed funeral, it was awfully difficult. I was concerned about my immediate mental health while processing the scope of this loss. My teammates were empathetic and accommodating. For the most part, they carried me and the bulk weight of the project, However missing those early days and a few standups did affect how much I contributed to the overall scale of the product.

Technically I was able to keep up and follow the development of the product, mainly due to the communication threads in Slack and Trello. The important takeaway here for me is that ‘life’ stuff will happen, not only to oneself but to others with who you will be working. The overall ‘empathic people skills’ that were shown to me throughout the duration of the project is one of the experiences that I valued from working with a small team of data scientists. Clearly, experiences of nature will happen when working on teams, and there may be PTO for dealing with things like grief and loss. Having supportive colleagues who are willing to spend time to explain things and bring you up to speed is a plus.

So much for data scientists being the sexist job of the decade; data is dirty, and you are nothing more than a custodian that codes. Data is not pretty and it can be downright overwhelming. Sourcing, cleaning, and exploratory data analysis are the time-consuming aspects of data science. First, we must source the data, which at times may include scraping. In this case, the US Census site was able to provide national data about population demographics. The Zillow API provided information about rental prices.

A major decision had to be made about the granularity of the data that affected the scope of the project. Initially, we thought that we were only going to focus on five cities. This would have allowed us to go deep, instead of going wide with over 400 cities. Going deep would have allowed us time to gather robust data about fewer (five) cities instead of limited data about 400 cities. This would have allowed us to aggregate rental data on a zip code level, instead of the current aggregated average of the web-based application.

Before I talk about the possible future features for this application, my first independent project at Lambda during the first unit was a project about New York City. I found several datasets which included, demographics, performance ratings of all the public schools in the city, and georeferenced points for mapping. Certainly, I can do more with these datasets including integrating them into larger projects — however, a major technical challenge can arise here.

The problem, data from different sources, whether structured and unstructured can be computationally expensive. How are you going to solve that problem? Data Warehousing and ETL (Extract, Transform, and Load) are tools and technologies that are capable of solving various user needs. A data scientist can build a great model, however, it is the analytics and visualizations that bring that data story to life for individual users based on their needs.

While working on that first Lambda project more than a year ago, I also ran into crime and police complaints datasets about New York City. As I thought about possible future features for this project, visualizing crime hotspots or having dynamic maps of historic criminal activities displayed to end-users. For example, after data exploration and analysis, I can determine that on Wednesdays and Fridays there is a spike in car thefts between 10 AM and 1 PM. Resources can be readily available for possible interventions, even if this means ramping up street patrols. The idea is, it is business intelligence whether, in real-time or spaced intervals, this information can only be acted on after it is extracted, transformed, and loaded.

Future Features

Future features can include school ratings and crime hotspots. This would include a smaller number of cities. Additionally, the data for this should be aggregated on the zip code level. This would allow providing the user with relevant information they may find useful when considering a move.

Where are the nearest schools? How neighboring schools perform may be a consideration for someone moving into New York City but less of a consideration when moving to a rural county with one elementary school. In many towns, property taxes or residential taxes are assessed taking into consideration the overall quality of resources allocated to the school by means of funding.

Another feature would include crime hotspots. Who wants to move into a crime-infested neighborhood? Mapping hotspots based on the prevalence of crime is useful future feature. Technically the challenge I foresee with these features is the consumption of time for sourcing, cleaning and joining these different geospatial datasets. Additionally, from a data engineering perspective, relying on a database instead of pipelining all the data into one data frame for the endpoint calls would have been a different engineering decision.

More than a capstone project, working on this cross-functional team to develop and create this product was a valuable experience. It was not the experience I envisioned, but it was an experience that enriched my scope of knowledge on various technologies.

I gained a deeper appreciation for collaboration, being transparent about what is going on, and also being courageous enough to ask for help. I am more comfortable approaching Data Engineering roles and while the data roles may be hidden away in the back of the house from the visually appealing frontends we see on web applications.

--

--

Keino Baird

Keino is a data nerd, a data science student at Lambda School and an educational consultant.