A simple scraper for Twitter (requires Python >= 3.6)

theenglishway (time) 00d6fb9ce0 Move HTTP client code into "exploration" 7 лет назад
.github 4935b1edbf Initial commit 7 лет назад
docs 4935b1edbf Initial commit 7 лет назад
tests 00d6fb9ce0 Move HTTP client code into "exploration" 7 лет назад
twhatter 00d6fb9ce0 Move HTTP client code into "exploration" 7 лет назад
.editorconfig 4935b1edbf Initial commit 7 лет назад
.gitignore 4935b1edbf Initial commit 7 лет назад
.travis.yml 4935b1edbf Initial commit 7 лет назад
AUTHORS.rst 4935b1edbf Initial commit 7 лет назад
CONTRIBUTING.rst 4935b1edbf Initial commit 7 лет назад
HISTORY.rst 4935b1edbf Initial commit 7 лет назад
LICENSE 4935b1edbf Initial commit 7 лет назад
MANIFEST.in 4935b1edbf Initial commit 7 лет назад
Makefile 4935b1edbf Initial commit 7 лет назад
Pipfile 9539167374 Use library for user-agent 7 лет назад
Pipfile.lock 9539167374 Use library for user-agent 7 лет назад
README.rst 37ced8cd34 Change "screen_name" into "fullname" & "username" 7 лет назад
setup.cfg f22591146f Fix installation on Python < 3.7 7 лет назад
setup.py 4935b1edbf Initial commit 7 лет назад
tox.ini 4935b1edbf Initial commit 7 лет назад

README.rst

========
Twhatter
========

A simple Python scraper for Twitter.

Motivation
----------

Twitter's API `terms and conditions `_
have become very demanding in May 2018. Inspired by other attempts, I have
put together yet another twitter scraper that uses a simple HTTP client instead
of the developer API, and allows retrieving any data that can be accessed in an
anonymous browsing session.

This is mostly an attempt for me to produce some clean, functional and
maintainable Python code. I have especially focused on a clean separation
between data retrieval and output, which should allow to export data in any
format.

And why that terrible name ? Simple, "WHAT's going on TWITTER ?" => TWHATTER !

Features
--------

At the moment, this utility only provides a command-line to interact with it.

Anonymous client
****************

- Get any user's full timeline.
- Get any user's profile data.

Data output
***********

All scraped information can either be displayed on the terminal or stored into
a local database.

Installation
------------

Installation requires Python >= 3.6. ::

$ pip install --user git+https://code.theenglishway.eu/theenglishway-corp/twhatter

You then have to ensure that `~/.local/bin` in your `$PATH` or call
`~/.local/bin/twhatter` instead of `twhatter` in the following examples

Usage
-----

Display some user's tweets ::

$ twhatter timeline realDonaldTrump --limit 10











Display their profile information ::

$ twhatter profile realDonaldTrump
User(id=25073877, fullname='Donald J. Trump', join_date=datetime.datetime(2009, 3, 18, 0, 0), tweets_nb=40183, following_nb=45, followers_nb=57144827, likes_nb=7)

Put them into a local database (by default in /tmp/db.sqlite) ::

$ twhatter db timeline realDonaldTrump

Open a session on the local database and make queries with SQLAlchemy ::

$ twhatter db shell

In [1]: session.query(Tweet).all()
Out[1]:
[
Tests
-----

Unit tests are provided. From the root directory in a freshly-cloned repository
and with a clean virtual environment, they can be run with ::

$ pytest

Useful links
------------

* `Raymond Hettinger's excellent talk about Python 3.7's Dataclasses `_

Other scrapers that might fit your needs
****************************************

In Python :

* `twint `_
* `twitterscraper `_
* `twitter-scraper `_

In Javascript:

* `scrape-twitter `_