Before settling on this specific topic, I already knew that I wanted to explore data that dealt with language, in particular data that showcased the linguistic diversity of New York City. As a self-professed language nerd, I often find myself doing my own research on the role that language plays in the US and as of recent how non-English speakers navigate an English-dominant country and workforce. Initially I opted to visualize in a chart the percentage of languages spoken by New Yorkers, but then decided that I wanted to dig a bit deeper. I’m more so interested in how the presence of these myriad languages is showcased in different situations, whether that be an everyday interaction or something a bit more complex such as filing for bankruptcy.
After searching for language related datasets on data.world, I discovered that the Language Assistance Program released monthly reports of US citizens who filed for bankruptcy and requested translators for their meeting with creditors due to limited English proficiency. Upon request, these individuals were provided a phone call with an interpreter that spoke their preferred language. The most recent month available on data.world was for the month of May in 2017 so that is the dataset that I selected. Additionally, this dataset recorded individuals from nearly every major US city but for this assignment I wanted to focus solely on New Yorkers. As the Language Assistance Program was releasing monthly datasets, the languages listed tend to differ with each month. In this specific month, the languages that were requested by individuals filing for bankruptcy were Albanian, Bengali, Cantonese, Korean, Mandarin, Polish, Punjabi, Romanian, Russian and Spanish.
To create my chart, I transferred my chosen data -the requested languages, collective call duration with interpreter for each language, and whether the calls took place in what the dataset listed as either New York (Manhattan) or Brooklyn – into an Excel sheet, and from there transferred it to Tableau. I ultimately selected this chart as I thought that it best showcased the information. Based on the chart, it’s evident that some languages like Spanish are more recurrent and are present in both locations, while other languages like Cantonese only appear one or two times and only appear in one location. However, as I have only visualized a single month for this assignment, I feel that I do not have sufficient information to make any hypotheses or have any solid takeaways based on the dataset. The next steps I’m going to take is gather more data from previous posts by the Language Assistance Program, visualize them into charts and compare the information, specifically looking for patterns in recurring languages and call durations.