Skip to content

Latest commit

 

History

History
77 lines (48 loc) · 2.83 KB

dataset_EDA.md

File metadata and controls

77 lines (48 loc) · 2.83 KB

Datasets

The datasets I will use to answer the question come from SourceStack. I queried the API on June the 9th 2023 and April the 2nd 2024, getting 50_000 records- sample on each day, resulting in a dataset with 100_000 records. The offers are mostly engineering jobs.

image

The dataset has 16 features:


job_name [str] - job title
job_location [str] - information about job location
hours [str] - information about type of employment full-time/part-time/contract/gig
remote [bool] - information if the role is remote or not
company_name [str] - the name of the company
education [str] - what education is required
tags_matched/ tag/categories/categories list[str] - tags describing the job
seniority [str] - information about seniority
comp_est [str] - estimated compensation offered for this vacancy
language [str] - language required
city [str] - city location of the offered job
country [str] - country location of the offered job
job_published_at [str] - the date at which the job was published
last_indexed [str] - the most current date the job was indexed at

The dataset contains a significant amount of null values and only one duplicate.

image

The job offers come from 183 countries with 44 languages and 7_795 different cities.

Countries:

image

image

Top Countries being:


United States - 44_581 job offers
India - 10_275
United Kingdom - 4191
Germany - 3560
Canada - 2362

Cities:

image

Top Cities being:


Bengaluru with 1942 job offers in total.
Bangalore - 1512
San Francisco - 961
London - 956
Singapore - 863


The job offers come from 28_781 different companies.
The top 20 companies posted in total 9502 job offers - 9,5% of the total job offers.

image

Every Job Offers goes alongside with a certain number of tags, ranging from 0 to 150 tags. The most popular tags being:

download

Last but not least are the job names:

download