This expression got fame with Newton in the 1600’s, but it had been used already as early as the 1100’s¹. Here at Bits4Waves we usually don’t immediately dismiss ideas that linger for 1000 years or so—we try to learn from them, if possible! That’s why today’s activity is so gratifying…
We’ve been collecting shortcodes of the posts with the hashtag
#100daysofpractice. There are 600k in total, but we could get only 50k (12 times less!).
The process to obtain them used the Python library
instaloader, and it was breaking at the 50k mark.
After sharing the issue on
instaloader‘s Github, one of the developers was kind enough to help. Applying some advanced wizardry, he cooked a new script using ideas and codes from related opened issues and SHAZAM: we now have 250k shortcodes! It breaks at this point, and I communicated the fact. Let’s hope it’s solvable!
Meanwhile, we have work to do:
[X]update the code with the new script
About the code, I had created a new branch for the new script, giving the script also a new different name. As it worked better than the previous version, it could simply replace that one. Let’s do this:
SRC=~/sci/100daysofpractice-dataset/src pushd $SRC git -C $SRC rm get-shortcodes.py git -C $SRC mv get-hashtag.py get-shortcodes.py
This takes care of the renaming. Now we have to check to see if everything can work well with the new script. Let’s start from the beginning: the
ifndef IG_USER $(error IG_USER is not set) endif PYTHON=python SRC=../src SHORTCODES_ORIG=shortcodes-orig.txt SHORTCODES_TEST=shortcodes-test.txt SHORTCODES_SORT=shortcodes-sort.txt SHORTCODES_UNIQ=shortcodes-uniq.txt all: shortcodes-orig shortcodes-test shortcodes-sort shortcodes-uniq shortcodes-orig: $(PYTHON) $(SRC)/get-shortcodes.py shortcodes-test: $(SHORTCODES_ORIG) head --lines=10 $(SHORTCODES_ORIG) > $(SHORTCODES_TEST) shortcodes-sort: $(SHORTCODES_ORIG) sort $(SHORTCODES_ORIG) > $(SHORTCODES_SORT) shortcodes-uniq: $(SHORTCODES_SORT) uniq $(SHORTCODES_SORT) > $(SHORTCODES_UNIQ) clean: rm -rf $(SHORTCODES_ORIG) $(SHORTCODES_TEST) $(SHORTCODES_SORT) $(SHORTCODES_UNIQ)
First, let’s fix some issues with
[X]a fundamental problem with the
Makefile: the targets must have the file extension!
[X]fix: typos in targets’ names
[X]create a link for the final file at the end
[X]add variable for link to final file
ifndef IG_USER $(error IG_USER is not set) endif PYTHON=python SRC=../src GET_SHORTCODES_PY=$(SRC)/get-shortcodes.py SHORTCODES_ORIG=shortcodes-orig.txt SHORTCODES_TEST=shortcodes-test.txt SHORTCODES_SORT=shortcodes-sort.txt SHORTCODES_UNIQ=shortcodes-uniq.txt SHORTCODES_LINK=shortcodes.txt OBJECTS = $(SHORTCODES_ORIG) $(SHORTCODES_TEST) $(SHORTCODES_SORT) $(SHORTCODES_UNIQ) all: $(OBJECTS) $(SHORTCODES_ORIG): $(GET_SHORTCODES_PY) $(PYTHON) $(GET_SHORTCODES_PY) $(SHORTCODES_TEST): $(SHORTCODES_ORIG) head --lines=10 $(SHORTCODES_ORIG) > $(SHORTCODES_TEST) $(SHORTCODES_SORT): $(SHORTCODES_ORIG) sort $(SHORTCODES_ORIG) > $(SHORTCODES_SORT) $(SHORTCODES_UNIQ): $(SHORTCODES_SORT) uniq $(SHORTCODES_SORT) > $(SHORTCODES_UNIQ) ln --symbolic $(SHORTCODES_UNIQ) $(SHORTCODES_LINK) clean: rm -rf $(OBJECTS) $(SHORTCODES_LINK)
Now, it would be nice to
[X]unify the old and new shortcodes into a single file
Finally, let’s make use of all the wizardry we got access to, and try and continue downloading from 250k onwards.
We’ll manually change the session file to make
total_index point to 250k. OK, that’s done! Now let’s
make it and wait for the results!