jherve 1 rok temu
rodzic
commit
274e896947
1 zmienionych plików z 84 dodań i 8 usunięć
  1. 84 8
      README.md

+ 84 - 8
README.md

@@ -1,15 +1,91 @@
-# Exemple of Firefox extension with PureScript
+# JobSearch, a Firefox extension to boost your job search
 
-Basically just a stripped out version of : https://github.com/Kazy/PureTabs
+This extension helps you keep track of the job offers you stumble upon on LinkedIn, automagically saving all of them into a human-editable database file.
 
-Tests can be run with `spago test` [TODO : for some reason setting npm "test" script to this value won't work]
+Here are some of its features :
 
-Javascript source maps won't work out-of-the-box if JS source files are in e.g. `extension/src` instead of `extension`. This can surely be solved using this issue : https://github.com/parcel-bundler/parcel/issues/3750
+* Extract data from job offer pages (e.g. job position, company name, link, company domain, location, ...)
+* Save it in a plain text database format ([recfile](https://www.gnu.org/software/recutils/))
+* Add a colored overlay on the job offers depending on their status (seen/applied to/dismissed/rejected/...)
+* Display the offers you're interested in applying in a sidebar
 
-## Native backend
+## How it runs
 
-The backend requires `pdm` for installation.
+From the settings of the extension, you can choose where the file will be located ; let's say `/home/me/job_search/`. A `jobs.rec` file will be created in this directory.
 
-It can be installed with `native/install.sh`.
+From then, everytime you visit a page that contains a job offer, `/home/me/job_search/jobs.rec` will be updated with data extracted from the page.
 
-Tests can be run with `(cd native && pdm run pytest)`.
+E.g. if you visit https://www.linkedin.com/jobs/view/3765452342/, you will get  a record about the job offer itself :
+
+```
+first_seen_date: Mon, 19 Feb 2024 13:31:00 +0100
+url: https://www.linkedin.com/jobs/view/3765452342/
+title: Data Engineer
+origin: linked_in
+location: Amérique latine
+id: linked_in_3765452342
+flexibility: full_remote
+company: Mentor Talent Acquisition
+application_process: regular
+```
+
+... and another with info about the company :
+
+```
+url: https://www.linkedin.com/company/mentor-talent-acquisition/life
+name: Mentor Talent Acquisition
+domain: Recrutement et placement de personnel
+```
+
+Because the database is just a plain text file, you can then update those records with other information that is harder to extract automatically (e.g. required experience, skills, ...) or with information about a potential application. You can also version it with `git`. Data integrity can be ensured via `recutils` utilities.
+
+## Installation
+
+0. Install external dependencies :
+    * [recutils](https://www.gnu.org/software/recutils/) to read/write the database file
+    * [pdm](https://pdm-project.org/) to install the Python environment
+    * [npm](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm) to install the Javascript environment
+1. Clone this repository
+1. Install the native backend : `native/install.sh`
+1. Build the extension : `npm install && npm run build`
+1. Install the extension as [temporary](https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Your_first_WebExtension#installing) by pointing to the file `extension/manifest.json` (**NOT** the `manifest.json` located at root)
+1. Setup the location of the job offers' file
+
+## Tech stack / general tech info
+
+* [Firefox WebExtensions](https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions)
+* Frontend code in [PureScript](https://www.purescript.org/), a pure functional language very similar to Haskell
+* Native application code is a basic [Python](https://www.python.org/) app
+* [Recutils](https://www.gnu.org/software/recutils/), a genious piece of free software that brings database-like capabilities to a human-readable file format
+
+Overall the extension architecture is not too complex, even though web extension standard mandates lots of message passing between parts that run in isolation of each other and can only communicate via JSON messages. E.g. only "content scripts" can read/write a web page's content ; only a native application that is launched by the browser is allowed to interact with the local file system ; only a background script can interact with the native application.
+
+The major hard point was parsing LinkedIn pages to extract meaningful information. The HTML structure is not very semantic (lots of nested `div` and `span` with little identifiable class names), quite hard to retro-engineer in a reliable way, and evolves with UI updates.
+
+For this task especially, PureScript type system proved incredibly useful.
+
+## Caveats
+
+This extension was mostly written to :
+
+* help me with my current job search (come and say [hello](https://www.linkedin.com/in/julien-herve/))
+* have an excuse to dive into Purescript
+* experiment with methods to properly extract data from unfriendly HTML code
+
+Therefore it has the following caveats :
+
+* Very poor documentation
+* Likely not to run on Windows without pain (recutils doesn't work there)
+* Works only on Firefox
+* Poor packaging
+* Minimal UI
+
+But it works on my machine 🤷 !
+
+I will likely improve on this, e.g. by extracting the LinkedIn parsing code into a PureScript/Javascript standalone library, but don't hold your breath !
+
+## Tests
+
+Frontend tests can be run with `npm run test`.
+
+Native application tests can be run with `(cd native && pdm run pytest)`.