jordan

jordan

0-day streak
https://scrapbook-into-the-redwoods.s3.amazonaws.com/bd85b494-f947-47d5-91ec-edc1a8316daf-image.png
summer-of-making emoji
Published the dataset. I'm on the front page of Kaggle!
https://scrapbook-into-the-redwoods.s3.amazonaws.com/c7141809-1694-411c-a46f-fecbecb49755-image.png
update! its going super super well, the error logging i have in place works perfectly. 2014! 60mb file, 400k ish headlines.... roughly 1/8th of the way there. expected to finish in maybe 40 hrs or so?
https://scrapbook-into-the-redwoods.s3.amazonaws.com/eae44085-ae5e-47ed-b766-da2ee459daac-image.png
started scraping today, it's going pretty well! (3/120 months done...)
https://scrapbook-into-the-redwoods.s3.amazonaws.com/03d93ca4-d6b3-4a6e-91ea-e67d99e3501e-image.png
summer-of-making emoji
the ramblings of a madman
https://scrapbook-into-the-redwoods.s3.amazonaws.com/2e8f4db0-db3e-45bd-9490-c28fd3b30c1e-image.png
fun thing ive noticed: the most viewed news sites have the most sane web design (NYtimes, CNN, etc) and then the more you go down, the worse the web design is to parse. (CNBC, Daily Mail, etc) WITH THE EXCEPTION OF THE GUARDIAN THEY'RE THE ONLY ONES WHO THOUGHT OF USING <li> ITS SO CLEAN AND CONCISE
https://scrapbook-into-the-redwoods.s3.amazonaws.com/7faa7c54-02fc-486f-baac-43c63db21b8c-image.png