README.rst 4.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127
  1. ========
  2. Twhatter
  3. ========
  4. A simple Python scraper for Twitter.
  5. Motivation
  6. ----------
  7. Twitter's API `terms and conditions <https://developer.twitter.com/en/developer-terms/agreement-and-policy.html>`_
  8. have become very demanding in May 2018. Inspired by other attempts, I have
  9. put together yet another twitter scraper that uses a simple HTTP client instead
  10. of the developer API, and allows retrieving any data that can be accessed in an
  11. anonymous browsing session.
  12. This is mostly an attempt for me to produce some clean, functional and
  13. maintainable Python code. I have especially focused on a clean separation
  14. between data retrieval, Twitter pages exploration, and output, which allows
  15. to easily define and combine various crawling strategies and data formats.
  16. And why that terrible name ? Simple, "WHAT's going on TWITTER ?" => TWHATTER !
  17. Features
  18. --------
  19. At the moment, this utility only provides a command-line to interact with it.
  20. Anonymous client
  21. ****************
  22. - Get any user's full timeline.
  23. - Get any user's profile data.
  24. Data output
  25. ***********
  26. All scraped information can either be :
  27. * displayed on the terminal,
  28. * stored into a JSON / YAML file
  29. * stored into a local database.
  30. Installation
  31. ------------
  32. Installation requires Python >= 3.6. ::
  33. $ pip install --user git+https://code.theenglishway.eu/theenglishway-corp/twhatter
  34. You then have to ensure that `~/.local/bin` is in your `$PATH` or call
  35. `~/.local/bin/twhatter` instead of `twhatter` in the following examples
  36. Usage
  37. -----
  38. Display some user's tweets ::
  39. $ twhatter timeline realDonaldTrump --limit 40
  40. <TweetTextOnly (id=1083404900862545920, date=2019-01-10 16:47:11, likes=32033, likes=11087, likes=6935)>
  41. <TweetTextOnly (id=1083358775925460992, date=2019-01-10 13:43:54, likes=96565, likes=22596, likes=26802)>
  42. <TweetTextOnly (id=1083358611315789826, date=2019-01-10 13:43:15, likes=52849, likes=9344, likes=9571)>
  43. <TweetTextOnly (id=1083358150214979585, date=2019-01-10 13:41:25, likes=48808, likes=11096, likes=11499)>
  44. <TweetTextOnly (id=1083356326833602561, date=2019-01-10 13:34:10, likes=50695, likes=11743, likes=11045)>
  45. ...
  46. <TweetTextOnly (id=1083353895030702080, date=2019-01-10 13:24:30, likes=85184, likes=19686, likes=27751)>
  47. <TweetRetweet (id=1083121283645272064, date=2019-01-09 22:00:12, likes=42640, likes=13189, likes=10242)>
  48. <TweetRetweet (id=1082774275390693376, date=2019-01-08 23:01:18, likes=52776, likes=14459, likes=2403)>
  49. <TweetRetweet (id=1083049664021233664, date=2019-01-09 17:15:36, likes=64770, likes=21099, likes=7818)>
  50. <TweetRetweet (id=1083148367184781312, date=2019-01-09 23:47:49, likes=75514, likes=21966, likes=6145)>
  51. Display their profile information ::
  52. $ twhatter profile realDonaldTrump
  53. User(id=25073877, fullname='Donald J. Trump', join_date=datetime.datetime(2009, 3, 18, 0, 0), tweets_nb=40183, following_nb=45, followers_nb=57144827, likes_nb=7)
  54. Put them into a JSON/YAML file ::
  55. $ twhatter json timeline realDonaldTrump
  56. $ twhatter yaml profile realDonaldTrump
  57. Put them into a local database (by default in /tmp/db.sqlite) ::
  58. $ twhatter db timeline realDonaldTrump
  59. Open a session on the local database and make queries with SQLAlchemy ::
  60. $ twhatter db shell
  61. In [1]: session.query(Tweet).all()
  62. Out[1]:
  63. [<Tweet (id=1020561192849412096),
  64. <Tweet (id=1021305708908818433),
  65. <Tweet (id=1024699386528505856),
  66. <Tweet (id=1026373195790802949),
  67. <Tweet (id=1026482814164844544),
  68. <Tweet (id=1027797734613504001)]
  69. In all cases the help is here ::
  70. $ twhatter --help
  71. Tests
  72. -----
  73. Unit tests are provided. From the root directory in a freshly-cloned repository
  74. and with a clean virtual environment, they can be run with ::
  75. $ pytest
  76. Useful links
  77. ------------
  78. * `Raymond Hettinger's excellent talk about Python 3.7's Dataclasses <https://www.youtube.com/watch?v=T-TwcmT6Rcw>`_
  79. Other scrapers that might fit your needs
  80. ****************************************
  81. In Python :
  82. * `twint <https://github.com/twintproject/twint>`_
  83. * `twitterscraper <https://github.com/taspinar/twitterscraper>`_
  84. * `twitter-scraper <https://github.com/kennethreitz/twitter-scraper>`_
  85. In Javascript:
  86. * `scrape-twitter <https://github.com/sebinsua/scrape-twitter>`_