Here at Bits4Waves things got a special detour yesterday. Dealing with all this data turned out to be really brain-teasing! This resulted in a lot—literally dozens—of ideas and also questions. So yesterday there was a “pause-and-assess” moment to develop some tools to deal with all this info. (If you like Emacs and Org, you may love what happened!)
But back to the matter at hand: the
#100daysofpractice dataset! After acquiring the shortcodes, the next natural step would be to explore the data. We would like to know, for instance, how they are spread through time, and some information about the practictioners, along with a bunch of important details about music practice! But before that, we need to effectively use the shortcodes to gather this data. This is the objective for today!
The idea is simple: we will go through the list of shortcodes, and, give the shortocodes one at a time to
instaloader, asking for it to analyze the given shortcode and return the corresponding data (
instaloader is a Python API that interfaces with Instagram).
Let’s start! First, lets open the file to read the shortcodes:
#!/usr/bin/env python import fileinput for line in fileinput.input('../shortcodes/shortcodes-uniq.txt'): print(line, end='') if (fileinput.lineno() == 10): break
Now, let’s fetch the username for the post. To do this for the first shorcode we’ll run:
#!/usr/bin/env python import instaloader shortcode = '008-CMh_h-' I = instaloader.Instaloader() post = instaloader.Post.from_shortcode(I.context, shortcode) print(post.owner_profile.username)
OK, now that we know how to grab the profile info, we can create a simple python script that will receive a shortcode and return the corresponding username. This script will be called from a shell script, which will fetch the posts from the profile. This approach may seem counterintuitive at first, because we could do everything ourselves from inside the python script. It seems better to do this way—call
instaloader from the shell script—because it is what worked best in the past, in terms of reliability. Let’s get to it, then:
#!/usr/bin/env python import argparse import instaloader parser = argparse.ArgumentParser() parser.add_argument('shortcode') args = parser.parse_args() I = instaloader.Instaloader() post = instaloader.Post.from_shortcode(I.context, args.shortcode) print(post.owner_profile.username)
Now, we’ll create a shell script to fetch the username for each one of the shortcodes, and then fetch the data:
#!/bin/bash PROJECT=~/sci/100daysofpractice-dataset PYTHON=$PROJECT/venv/bin/python SRC=$PROJECT/src GET_USERNAME="$PYTHON $SRC/get-username.py" SHORTCODES=$PROJECT/shortcodes/shortcodes-uniq.txt PROFILES=$PROJECT/profiles CSV=$PROFILES/shortcode-username.csv while read SHORTCODE; do USERNAME=$($GET_USERNAME $SHORTCODE) PAIR=$SHORTCODE,$USERNAME echo $PAIR echo $PAIR >> $CSV instaloader $USERNAME done <$SHORTCODES
This file will then download the necessary data!