What drives the rental price of homes and rooms for guest accommodation?

Aravind Brahmadevara
6 min readMay 28, 2021
Holiday

Hello everyone ! We know that different types of homes and rooms for holidays/guest accommodations have different prices. Can we use a data-driven approach to understand what impacts the price of a property?

If you are just starting to enter the business of hosting and renting homes/rooms , then this blog post would gently guide you towards investing on the appropriate property to optimize your rental yields. If you are looking for a property , this should also help in you choosing the one as per your budget/needs.

Note: This is an observational study and not a formal study. Therefore ,the outcomes serve as true guidelines only — since external and other economic factors often weigh a great deal in the decision making.

If you are viewing this blog as part of a case study, this would help you to do a systematic step by step analysis of data, using the right tools etc.,

Questions:

Let us try to answer the following questions

  1. What are the key factors that drive the rental price? Do we find any interesting findings?
  2. For the same set of key factors, is there is a price difference between cities ?
  3. Does review score/number of reviews impact the price or bookings?

Dataset choice ?

Choosing the right dataset is important in our study. Let’s use Airbnb dataset which is publicly available at Seattle Airbnb Open Data | Kaggle

Can we guess the factors?

There are are many factors contributing to the rental price of a property . A common perception is that property size, number of bedrooms ,number of guests, positive reviews etc., drive an upward price for the property. Let us validate them.

Data Analysis on numerical data

Get the numerical factors. Hold on ! Our dataset is not clean.

For technical users- the dollar amounts are of string format ‘$125.00’ , We can use regex to identify price columns. You can follow my github code in the references

Picking the most important numerical features

  1. Let us find the correlation between price and other numerical columns and select only those which have positive correlation >0.2 and negative correlation < -0.2

Note: The factor ‘reviews_per_month’ column is analyzed in the summary

Focus on the price row in the correlation matrix . We see that ‘accomodates’ has the closest relationship(0.63) to the price along with other features as shown below

['price', 'accommodates', 'bedrooms', 'beds', 'square_feet', 'bathrooms','guests_included', 'reviews_per_month']
Price correlation with numerical features

Find relevant non-numeric features

dict(listings.select_dtypes(include=['object']).nunique().sort_values(ascending=True))
{'last_scraped': 1,
'jurisdiction_names': 1,
'requires_license': 1,
'experiences_offered': 1,
'calendar_last_scraped': 1,
'has_availability': 1,
'market': 1,
'country_code': 1,
'country': 1,
'require_guest_phone_verification': 2,
'host_identity_verified': 2,
'host_has_profile_pic': 2,
'state': 2,
'host_is_superhost': 2,
'is_location_exact': 2,
'require_guest_profile_picture': 2,
'instant_bookable': 2,
'room_type': 3,
'cancellation_policy': 3,
'host_response_time': 4,
'bed_type': 5,
'smart_location': 7,
'city': 7,
'property_type': 16,
'neighbourhood_group_cleansed': 17,
'zipcode': 28,
'calendar_updated': 34,
'extra_people': 45,
'security_deposit': 46,
'neighbourhood': 81,
'neighbourhood_cleansed': 87,
'host_neighbourhood': 102,
'host_verifications': 116,
'cleaning_fee': 118,
'host_location': 120,
'last_review': 321,
'first_review': 984,
'host_since': 1380,
'street': 1442,
'host_name': 1466,
'notes': 1999,
'host_about': 2011,
'neighborhood_overview': 2506,
'transit': 2574,
'host_thumbnail_url': 2743,
'host_picture_url': 2743,
'host_url': 2751,
'space': 3119,
'amenities': 3284,
'summary': 3478,
'medium_url': 3498,
'xl_picture_url': 3498,
'thumbnail_url': 3498,
'description': 3742,
'name': 3792,
'picture_url': 3818,
'listing_url': 3818}
  1. Some of them are just descriptive columns which can be removed. We can choose the following after a bit of domain/business analysis.
['price', 'room_type_Entire home/apt', 'room_type_Private room',
'zipcode_98101', 'bed_type_Real Bed', 'cancellation_policy_strict']

2. Skipping Amenities. Amenities could play a vital role too(Ex. ‘Pets are allowed’ amenity). But we have to clean it, standardize spellings, typos(as shown below). If the number of features becomes too high compared to the rows, modelling becomes inaccurate. So we can skip amenities for now and come back to amenities later if we need further insight on its relationship to price.

{"Cable TV","Wireless Internet",Kitchen,"Free Parking on Premises",Breakfast,"Pets live on this property",Dog(s),Cat(s),Heating,Washer,Dryer,"Smoke Detector","First Aid Kit","Safety Card","Fire Extinguisher",Essentials,Hangers,"Hair Dryer",Iron,"Laptop Friendly Workspace"}

Correlation with price — all types of most important features

Price correlation with all factors

Let’s average the price grouping by important features.

The data suggests that the number of accommodates and bed rooms and room type are determining the price

Pivot table Seattle
Pivot table Boston

Compare Boston vs Seattle prices — Use the most important feature

On an average, the Boston properties are a bit expensive(50$ ) compared to the Seattle prices. This should help in investing in the right city. The same results can be extrapolated to weekly and monthly prices. However in some exceptional conditions, some properties of Seattle cost more.

Bar: Average price difference between cities

Merge both datasets for a final analysis

The average price of a property with 12 guests and 4–5 bathroom properties ranges in 500– 700 dollars. We can also consider properties with highest number of reviews as an economical accommodation and guesstimate prices for that.

Pivot table: Combine both cities data

Summary

Let us answer the following questions

  1. What are the key factors that drive the rental price? Also ,do we find any interesting findings?

Usually for leased out /tenanted properties, the square_feet is a good indicator of price .However , for holiday/guest hosting, the size of the property is not the most important when it comes to price . We see listings where 3000 square_feet properties are comparatively priced less.

‘accommodates’, ‘bedrooms’, ‘beds’, ‘square_feet’, ‘bathrooms’, ‘guests_included’, ‘room_type_Entire home/apt’, ‘room_type_Private room’, ‘cancellation_policy_strict’

2. For the same set of key factors, is there is a price difference between cities ?

Yes on average, the Boston city yields 50$ more rental prices compared to Seattle city . There are exceptions for higher values of ‘accommodates’

3. Does review score/number of reviews impact the price or bookings?

Average review score has no/very less relationship with price. Well !It makes sense that reviews don’t drive the price :-), rather they might impact the number of bookings(which can be analyzed with the bookings data set )

If you had observed ‘reviews_per_month’ — the more the number of reviews the less the price of the property . Well it might NOT be a causal relationship. Rather guests who had negative experiences tend to give more reviews than people with positive/neutral experiences. OR it may also suggest the properties with highest number of reviews is an economical accommodation. Domain expertise/Business understanding matters here!!!

In future articles, we will assess the impact of amenities. Thank your for your time :)

References:

Github :

Data-Science/BlogPost.ipynb at main · aravind-deva/Data-Science (github.com)

Data-Science/Numeric_correlation_price.png at main · aravind-deva/Data-Science (github.com)

--

--