Fetch the data from the shortcodes

Hello!

Here at Bits4Waves things got a special detour yesterday. Dealing with all this data turned out to be really brain-teasing! This resulted in a lot—literally dozens—of ideas and also questions. So yesterday there was a “pause-and-assess” moment to develop some tools to deal with all this info. (If you like Emacs and Org, you may love what happened!)

But back to the matter at hand: the #100daysofpractice dataset! After acquiring the shortcodes, the next natural step would be to explore the data. We would like to know, for instance, how they are spread through time, and some information about the practictioners, along with a bunch of important details about music practice! But before that, we need to effectively use the shortcodes to gather this data. This is the objective for today!

The idea is simple: we will go through the list of shortcodes, and, give the shortocodes one at a time to instaloader, asking for it to analyze the given shortcode and return the corresponding data (instaloader is a Python API that interfaces with Instagram).

Let’s start! First, lets open the file to read the shortcodes:

#!/usr/bin/env python

import fileinput

for line in fileinput.input('../shortcodes/shortcodes-uniq.txt'):
    print(line, end='')
    if (fileinput.lineno() == 10): break

Now, let’s fetch the username for the post. To do this for the first shorcode we’ll run:

#!/usr/bin/env python

import instaloader

shortcode = '008-CMh_h-'
I = instaloader.Instaloader()
post = instaloader.Post.from_shortcode(I.context, shortcode)
print(post.owner_profile.username)

OK, now that we know how to grab the profile info, we can create a simple python script that will receive a shortcode and return the corresponding username. This script will be called from a shell script, which will fetch the posts from the profile. This approach may seem counterintuitive at first, because we could do everything ourselves from inside the python script. It seems better to do this way—call instaloader from the shell script—because it is what worked best in the past, in terms of reliability. Let’s get to it, then:

#!/usr/bin/env python

import argparse
import instaloader

parser = argparse.ArgumentParser()
parser.add_argument('shortcode')
args = parser.parse_args()

I = instaloader.Instaloader()
post = instaloader.Post.from_shortcode(I.context, args.shortcode)
print(post.owner_profile.username)

Now, we’ll create a shell script to fetch the username for each one of the shortcodes, and then fetch the data:

#!/bin/bash

PROJECT=~/sci/100daysofpractice-dataset
PYTHON=$PROJECT/venv/bin/python
SRC=$PROJECT/src
GET_USERNAME="$PYTHON $SRC/get-username.py"
SHORTCODES=$PROJECT/shortcodes/shortcodes-uniq.txt
PROFILES=$PROJECT/profiles
CSV=$PROFILES/shortcode-username.csv

while read SHORTCODE; do
    USERNAME=$($GET_USERNAME $SHORTCODE)
    PAIR=$SHORTCODE,$USERNAME
    echo $PAIR
    echo $PAIR >> $CSV
    instaloader $USERNAME
done <$SHORTCODES

This file will then download the necessary data!

Published by bits4waves

Software in harmony with your melody

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website with WordPress.com
Get started
%d bloggers like this: