theenglishway-corp/twhatter: A simple scraper for Twitter (requires Python >= 3.6)

A simple scraper for Twitter (requires Python >= 3.6)

theenglishway (time) 542221d23f Bump version: 0.1.0 → 0.2.0		7 năm trước cách đây
.github	4935b1edbf Initial commit	7 năm trước cách đây
docs	4935b1edbf Initial commit	7 năm trước cách đây
tests	63e4e4b600 Some documentation and renaming	7 năm trước cách đây
twhatter	542221d23f Bump version: 0.1.0 → 0.2.0	7 năm trước cách đây
.editorconfig	4935b1edbf Initial commit	7 năm trước cách đây
.gitignore	4935b1edbf Initial commit	7 năm trước cách đây
.travis.yml	4935b1edbf Initial commit	7 năm trước cách đây
AUTHORS.rst	4935b1edbf Initial commit	7 năm trước cách đây
CONTRIBUTING.rst	4935b1edbf Initial commit	7 năm trước cách đây
HISTORY.rst	63e4e4b600 Some documentation and renaming	7 năm trước cách đây
LICENSE	4935b1edbf Initial commit	7 năm trước cách đây
MANIFEST.in	4935b1edbf Initial commit	7 năm trước cách đây
Makefile	4935b1edbf Initial commit	7 năm trước cách đây
Pipfile	9539167374 Use library for user-agent	7 năm trước cách đây
Pipfile.lock	9539167374 Use library for user-agent	7 năm trước cách đây
README.rst	63e4e4b600 Some documentation and renaming	7 năm trước cách đây
setup.cfg	542221d23f Bump version: 0.1.0 → 0.2.0	7 năm trước cách đây
setup.py	4935b1edbf Initial commit	7 năm trước cách đây
tox.ini	4935b1edbf Initial commit	7 năm trước cách đây

		
				README.rst
			
				========
Twhatter
========

A simple Python scraper for Twitter.

Motivation
----------

Twitter's API `terms and conditions `_
have become very demanding in May 2018. Inspired by other attempts, I have
put together yet another twitter scraper that uses a simple HTTP client instead
of the developer API, and allows retrieving any data that can be accessed in an
anonymous browsing session.

This is mostly an attempt for me to produce some clean, functional and
maintainable Python code. I have especially focused on a clean separation
between data retrieval, Twitter pages exploration, and output, which allows
to easily define and combine various crawling strategies and data formats.

And why that terrible name ? Simple, "WHAT's going on TWITTER ?" => TWHATTER !

Features
--------

At the moment, this utility only provides a command-line to interact with it.

Anonymous client
****************

- Get any user's full timeline.
- Get any user's profile data.

Data output
***********

All scraped information can either be :

* displayed on the terminal,
* stored into a JSON / YAML file
* stored into a local database.

Installation
------------

Installation requires Python >= 3.6. ::

    $ pip install --user git+https://code.theenglishway.eu/theenglishway-corp/twhatter

You then have to ensure that `~/.local/bin` is in your `$PATH` or call
`~/.local/bin/twhatter` instead of `twhatter` in the following examples

Usage
-----

Display some user's tweets ::

    $ twhatter timeline realDonaldTrump --limit 40
    
    ...
    
Display their profile information ::

    $ twhatter profile realDonaldTrump
    User(id=25073877, fullname='Donald J. Trump', join_date=datetime.datetime(2009, 3, 18, 0, 0), tweets_nb=40183, following_nb=45, followers_nb=57144827, likes_nb=7)

Put them into a JSON/YAML file ::

    $ twhatter json timeline realDonaldTrump
    $ twhatter yaml profile realDonaldTrump

Put them into a local database (by default in /tmp/db.sqlite) ::

    $ twhatter db timeline realDonaldTrump

Open a session on the local database and make queries with SQLAlchemy ::

    $ twhatter db shell

    In [1]: session.query(Tweet).all()
    Out[1]:
    [                         
In all cases the help is here ::

    $ twhatter --help

Tests
-----

Unit tests are provided. From the root directory in a freshly-cloned repository
and with a clean virtual environment, they can be run with ::

    $ pytest

Useful links
------------

* `Raymond Hettinger's excellent talk about Python 3.7's Dataclasses `_

Other scrapers that might fit your needs
****************************************

In Python :

* `twint `_
* `twitterscraper `_
* `twitter-scraper `_

In Javascript:

* `scrape-twitter `_