Architecture

Colorful Image

Dataset and Preprocessing

We are using Yelp Dataset which provides a comprehensive subset of business, review, and user data in JSON format, intended for personal, educational, and academic use. The dataset includes various JSON files containing relevant information for analysis and research purposes. Yelp provides various json files such as follows

Business.json – 150,346 businesses

Business information such as name, location (state, address, zip code), geographic location and categories. More information about categories can be found here

User.json – 1,987,897 users

User data such as name, review_count, photo, friends, etc.

Reviews.json – 6,990,280 reviews

User-Business review information with various columns such as stars, comments, tags, etc

The Yelp dataset contains a vast amount of data, making the corresponding JSON files very large. To handle this big data, we developed a CSV parser that converts the JSON files into CSV format. The converted CSV files served as a starting point for our recommendation algorithms, allowing us to efficiently access the necessary data. In the implementation, we utilized the Pandas library to read and manipulate the data, allowing us to develop more complex algorithms to extract insights from the dataset. The utilization of CSV format and Pandas library helped us effectively manage and process the data, providing a more streamlined and efficient approach to analyzing the Yelp dataset.
At Travelix, our recommendation system relied on three primary datasets: Business, Users, and Reviews. Due to the large size of these datasets, we employed a preprocessing step to divide the Business data by state. We identified the top ten states based on the highest number of businesses and used them for generating recommendations. Further categorization of the Business data was carried out to focus on the three main categories of interest: Hotels & Travel, Restaurants, and Nightlife. This enabled us to create more focused and relevant recommendations for our users based on their interests and preferences. Some of the subcategories are as follows

Hotels & Travel

Ski Resorts, Hotels, Airports, Hostels, Motorcycle and RV Rental

Restaurants

Cafes, Caribbean, Chinese, Dinner Theater, Hawaiian, Indian, Sushi Bars

Nightlife

Bars, Comedy Clubs, Jazz & Blues, Karaoke, Music Venues, Pool Halls

Recommendation Engine

Travelix is a revolutionary platform designed to provide highly personalized travel recommendations to individuals seeking to optimize their travel experiences. We recognized a gap in the market for a product that not only streamlines the planning process, but also delivers tailored recommendations tailored to the unique preferences of each user. To achieve this goal, we invested heavily in the development of a robust recommendation engine capable of providing three distinct types of recommendations: non-personalized, matrix factorization-based personalized, and auto-encode paired up with collaborative filtering-based personalized recommendations. Our cutting-edge approach leverages advanced algorithms, including user-user collaborative filtering and clustering, to analyze vast amounts of data and generate highly targeted recommendations.
At Travelix, we believe that travel should be accessible and enjoyable for everyone, and we are committed to providing our users with the most comprehensive and customized travel recommendations available. As mentioned, Travelix provides three different kinds of recommendations as follows

Non-personalized recommendations

To recommend trending places to visit right now for selected state

Personalized recommendations using Matrix Factorization

Personalized recommendations using Auto Encode and Collaborative Filtering

Recommendations based on similar users using User-User Collaborative Filtering and Clustering algorithms.

Non Personalized Recommendation

Non-personalized recommendations (NPR) are a class of recommendation systems that leverage the collective wisdom of a user community to identify the most popular items. This approach entails computing the popularity score of each item by aggregating ratings assigned by all users for that item. Subsequently, unexplored items are ranked in descending order based on their popularity score, and the top-ranked items are recommended to each user.

Matrix Factorization

Matrix Factorization is a technique used in machine learning and recommendation systems to decompose a matrix into two lower-rank matrices. In the context of recommendation systems, the matrix represents the user-item interactions, where the rows represent the users, the columns represent the items, and the values represent the ratings or preferences of the users for the items. By factorizing the matrix into lower-rank matrices, Matrix Factorization can identify latent factors that explain the observed user-item interactions, such as user preferences or item characteristics. Matrix Factorization can be used to make personalized recommendations to users based on their preferences and past behavior. The benefits of Matrix Factorization in recommendation systems include improved accuracy, scalability, and interpretability of the recommendations. Additionally, Matrix Factorization can handle sparse data, which is common in recommendation systems where users have only rated a small fraction of the available items.
We implement the Matrix Factorization algorithm for implicit feedback data using the MF_implicit class that takes as input the training rating matrix train_mat, the number of latent factors latent, the learning rate lr, and the regularization weight reg. The class contains the methods negative_sampling() to sample negative user-item pairs to train the model, train() to train the model for a given number of epochs, and predict() to generate the recommendation list for each user. In the train() method, the model is trained using stochastic gradient descent by minimizing the difference between the predicted and actual ratings of the user-item pairs. The predict() method generates the top 50 ranked list of recommendations for each user based on the learned latent factors.

Auto Encoder and Collaborative Filtering

Autoencoders and collaborative filtering are both widely used techniques in the field of machine learning and for designing Recommendation engines. Autoencoders are a type of neural network that can learn to compress and reconstruct high-dimensional data, such as images, while minimizing the reconstruction error. Collaborative filtering, on the other hand, is a technique used for recommendation systems, where the system predicts user preferences based on past behavior and similarity to other users. By analyzing user behavior and preferences, collaborative filtering can suggest products, services, or content that users are likely to enjoy.
We have implemented an autoencoder-based collaborative filtering algorithm for personalized recommendation in a travel recommendation system. The autoencoder model is trained on a sparse matrix of user ratings, with the goal of learning embeddings that capture user preferences. The learned embeddings are then used to compute similarity scores between users and items, which are used to predict the ratings of the items for each user. The model is evaluated using the root mean squared error (RMSE) metric on a held-out test set, and the top recommended items for a given user are returned based on the predicted ratings.
To implement the Autoencoder, we used 3 hidden dense layers with 64, 32 and 16 neurons during encoding and 2 hidden layers with 32 and 64 neurons during decoding. We are using the ReLU activation function in the hidden layers. We are also sandwiching the dropout layers between all the hidden layers and the last hidden layer and output layer. In the output layer we are using the “sigmoid” as the activation function. To train the Autoencoder as defined above, we are using the “Adam” optimizer and “Binary Cross-Entropy” loss function. After this we train the model taking a batch size of 32 and training the model for 200 epochs. We also implemented a learning rate scheduler to find the optimal hyperparameter during training. We shuffle the data before each epoch and use a validation set as well.
Autoencoders can help solve problems in recommendation systems by learning a low-dimensional representation of the data that captures the underlying structure and patterns in the user-item interactions. This can be useful in handling the sparsity and high dimensionality of the data, as well as improving the accuracy and relevance of the recommendations. Autoencoders can also be used in conjunction with other techniques such as collaborative filtering, content-based filtering, and hybrid methods to further improve the performance and robustness of the recommendation system.
Address

Texas A&M University

College Station, TX 77802

Phone

Reception: +1 123 4567

Office: +1 123 4567

Email

Office: bassirishabh@gmail.com

Site: https://travelix2.herokuapp.com/

Social
Made by RADS · All rights reserved.