Kaggle is a fantastic place to find practice datasets to learn with – both through putting your skills into practice and seeing the techniques that others use with different types of data. Kaggle host datasets, competitions and analyses on a huge range of topics, with the aim of providing both data science support to groups and analysis education to learners.
This Extra Time tutorial will take you through using the command line/terminal (not a Python script!) to search and download Kaggle dataset files. Of course, you may find it easier to just find them on the website!
Before starting the command line process, you will need to set up a free account at Kaggle, and download your API key from the bottom of your account page – https://www.kaggle.com/ YOUR ACCOUNT NAME/account
Once in the command line/terminal, the first thing that you need to so is import the Kaggle module, with ‘pip install kaggle’:
One thing that install kaggle does, is create a folder called ‘.kaggle’ on your computer. Find this folder (usually in your user folder) and drop the API key that you downloaded earlier into it. This essentially logs you into the API.
With the module installed and authenticated, we can now search through Kaggle competitions and datasets. Let’s look for datasets related to FIFA and see what comes up. We do this with the command ‘kaggle datasets list -s fifa’. Let’s break that command down a bit:
- kaggle – Tells the command line we are running from the kaggle module
- datasets – Sets us up to search through datasets, and not through competitions (although you can give this a try!)
- list – We want to list the available datasets at this point, not download them
- -s – short for search
- fifa – the topic that we’re searching for
This is what we get:
Awesome, so many things to learn from and datasets to make use of. One that is particularly helpful is the European Soccer Database, a dataset with over 25000 entries covering matches, teams and players – alongside some great notebooks analysing the data that you can learn from.
Downloading the dataset for our own analysis is easy. With the command ‘kaggle datasets download -d hugomathien/soccer’, it will set about downloading the file for us. Again, let’s take a look at this command:
- kaggle datasets – We’ve seen this already
- download – simple enough!
- -d – short for dataset in this case, as we are downloading a dataset, not a competition
- hugomathien/soccer – this is the reference to the dataset that we want. You can get this from the first column in the table above
And there you go! The file will download and you can recreate some of the notebooks on Kaggle, develop your skills and eventually submit your own analyses for others to learn from!
Enjoy learning with Kaggle datasets and get in touch with what you come out with!