Using Python and Scrapy, this repository aims to scrape the data of books for use such as a book recommendation system with pre-built assets.

Overview

Scrapy-GoodReads


Description 📝

This repository marks my first attempt at web scraping using Scrapy and what better way to do it than doing it on GoodReads to yield the details of the books which are described in the start_urls of /Learning/Spiders file.

This program is meant to retreive the image URL of the book, Title of the book and the description will be scraped via this crawler


To run the code 👨🏽‍💻

pip install -r requirements.txt

Change directory to Learning/spider


scrapy crawl GoodReads -o BooksData.json
(to store it in BooksData.json file, please note that this will just append the data in the file)

scrapy crawl GoodReads
(to run it normally and diplay the output)


Future prospects

As of now, we need to manually enter the links in the scraper.py file, which I would like to change to a command-line argument.

You might also like...

RedisTimeseriesManager is a redis timeseries management system that enhance redis timeseries with features including multi-line data, built-in timeframes, data classifiers and convenient data accessors.

RedisTimeseriesManager RedisTimeseriesManager is a redis timeseries management system that enhance redis timeseries with features including multi-line

Aug 21, 2022

This repository aims to store my Data Science resumé

This repository aims to store my Data Science resumé

This is my Digital Resumé I created a digital resume in minutes with Python and Streamlit. For this, I followed a tutorial by @Sven-Bo. Feel free to u

Aug 31, 2022

This repository consists of files required to deploy Movie Recommendation System Project, created with Streamlit, on Heroku platform

This repository consists of files required to deploy Movie Recommendation System Project, created with Streamlit, on Heroku platform

Movies-Recommendation-System-Deployment If you want to see the deployed model, click here: https://movies-recommend-deployment.herokuapp.com/ If you w

Aug 14, 2022

A short description of the project. This Repository will demonstrate using Pytorch to build deep convolutional neural networks and use Qt to create the GUI with the pre-trained model.

A short description of the project. This Repository will demonstrate using Pytorch to build deep convolutional neural networks and use Qt to create the GUI with the pre-trained model.

A short description of the project. This Repository will demonstrate using Pytorch to build deep convolutional neural networks and use Qt to create the GUI with the pre-trained model like the figure below.

Apr 4, 2022

A python based data center topology. The project aims to present a comprehensive analysis of the Fat tree topology's performance using Mininet simulator.

Mininet-Fat-Tree-Topology This project aims to simulate a Data Center Network (DCN), using Fat tree topology proposed by M.Al-Fares, which is derived

May 21, 2022

Data Cleaning. Data Integration. Data Reduction for 1-Data Quality. 2-Data Transformation. 3-Data Mining. 4-Pattern Evaluation. 5-Representing Knowledge in Data Mining.

Data Cleaning. Data Integration. Data Reduction for 1-Data Quality. 2-Data Transformation. 3-Data Mining. 4-Pattern Evaluation. 5-Representing Knowledge in Data Mining.

Python_application_for_dataMining Oreview: We aim to achieve a prediction model for improving data analysis and reporting. The programming language th

Sep 21, 2022

This system helps to book and reserve rooms in hotel.

Hotel room booking system This system helps to book and reserve rooms in hotel. Features :- Register rooms Clear all registerations in room View regis

Sep 18, 2022

Repository for practicing examples and exercises from book "Automate the Boring Stuff with Python (2nd Edition)" written by Al Sweigart.

Automate the Boring Stuff with Python Repository for practicing examples and exercises from book Automate the Boring Stuff with Python (2nd Edition) w

Jun 5, 2022
Owner
Immanuel Vivek
Crappy attempt of a student/enthusiast to make programming projects while trying to maintain other passions.
Immanuel Vivek
This repository contains the project for the course Informatical Methods for Statistics and Data Science, where we built a recommendation system using UV decomposition.

recommendation_system_UV English description This repository contains the project for the course Informatical Methods for Statistics and Data Science,

Alessio Piraccini 2 Sep 17, 2022
A Recommendation Engine API that can be used to recommend movies, music, games, manga, anime, comics, tv shows and books. Deployed using an AWS EC2 instance.

Media Recommendation Engine Media Recommendation Engine, an API that recommends content such as movies, tv shows, anime, songs etc. Built with FastAPI

Ifeanyi Nneji 6 Aug 20, 2022
Discord bot that shows your information GM20 Book If you need help regarding GM20 book .

GM20-tools About | Screenshots | Installation | Usage | Disclaimer | Special Thanks | Support About The Discord bot that shows your information The gi

Sodynoizz_TH 2 Aug 14, 2022
This is a simple menu to manage system actions such as shutdown, reboot and logout in linux operating systems, built with Python it was used in Polybar and i3wm.

SimplePowerMenu This is a simple menu to manage system actions such as shutdown, reboot and logout in linux operating systems, built with Python it wa

Elias Telleria 1 Sep 16, 2022
Dashboard Application built with Django displaying the status of deployable assets and their related elements.

Status Dashboard Dashboard application displaying the state of deployable server sets and their core elements. Database schema erDiagram DSS ||--|

David Milne 1 Sep 22, 2022
Google Maps Crawler takes Google Maps List and it scrape elements from all items such as: title, rating, reviews, location url, website,etc.

Google Maps Crawler Google Maps Crawler takes Google Maps List and it scrape elements from all items such as: title, rating, reviews, location url, we

Marko Vasiljevic 1 Sep 15, 2022
A simple system information gathering tool, displays variety of system information such as hardware information, network configuration and more

SystemParser_0.0.3-alpha A simple system information gathering tool written in python, displays a medley of system information reported by the systems

El Codigo Dominicano 1 Aug 10, 2022
This project was designed to fetch the data of imdb website by using scrapy.

imdb-scrapy I build this imdb-scrapy project to crawl some format imdb data basing on the scrapy framework, and the result report will be sent to spec

Mars 2 Jun 26, 2022
Bridge assets info&metadata repository

Rainbow Bridge assets This repository contains the metadata of Rainbow Bridge assets. As icon field is required for NEP-141 tokens but is not presente

Super Ninja 9 Jun 21, 2022