I’ve been asked a few times lately about whether one should learn R or Python.
Channeling David Robinson’s post, I’m writing a blog post about it.
When you’ve written the same code 3 times, write a function— David Robinson (@drob) November 9, 2017
When you’ve given the same in-person advice 3 times, write a blog post
I personally perfer R for dashboarding (shiny) and publication (RMarkdown + knitr), but that’s mainly because back in 2014, when I was first learning data science skills I was taught in R, and there were not good equlivilants in the Python world (other than what is now Jupyter).
Also, most data scientists in industry are working in SQL anyway ( :
For data science/data cleaning/data analysis programs, the best way is to practice working with data. Having said that, it’s really hard to find projects, or sometimes your own projects are “too big and complicated” so you might be at a loss of where to begin.
My first recommendation is take a look at Kaggle. You’ll see a lot of datasets there and you might find something interesting to play around with. The forums on the competitions are also good place to get ideas on some of the machine learning and model fitting side. One of my first exposures to data science was one of those competitions, and I learned a lot of web scraping and string parsing skills there.
Another great resource where you can practice some of your data skills come from the R community. It’s called Tidy Tuesday. Every Tuesday a new dataset gets released, and people (all over twitter) share their findings. It does not need to be a fully worked out machine learning pipeline. You are just getting a new dataset to explore. David Robinson does a 1-hour livestream every week. Even though it’s in R, you can do it all in Python as well (maybe try replicating someone’s R work?). After you get used to exploring data, and when you get more practice in either R or Python, it’s much easier to see how you can apply it in your daily life. And that motivation will help you practice and learn more.
Rachael Tatman, from Kaggle, also does weekly livestreams where she also hosts a journal club to discuss an academic paper about a machine learning method.
If you need some more of the basic programming knowledge, take a look at the software-carpentry and data-carpentry R and Python lessons. Jake VanderPlas’ book “Python Data Science Handbook” is free online as well as as Garrett Grolemund and Hadley Wickham’s R for Data Science. Hadley Wickham also as a free book on Advanced R, but that’s not necessary when getting started. I also try to keep a list of free (python) resources online as well.
Since I work on the data side of most things, it’s easy for me to suggest learning things from that point of view. All I can say is, that it if you stick with one of the learning paths, you will actually pick up bits of knowledge that will help you in others. I first learned Python as a normal scriping language, and learned how to do data analysis in R. I only started to do data manipulation in Python when I understoon the concept of tidy data.
Tidy data is probably the most important topic when working with data. So many of the skills you need to clean and tidy your data involve other aspects of the langage (e.g., writing vectorized functions), that you’ll learn basic programming concepts by making your data tidy.
For example, while learning pandas and data manipulations, will get you working with Python classes, which is an Object Oriented Programming concept. That will all translate if you need to so more software work in Python or if you want to learn Django. In R, you learn how to work with dataframes, functions, and how to write your own.
In the end, find somewhere to start. Because of my background, I say start with loading data and playing with it. The more you do it, the more questions you will have on your data, and the more skills you’ll acquire to answer those questions. Those skills will carry over to other aspects of the language, and even to other languages (e.g., R, Python, even Julia!).
You’ll always be learning, it never ends, so don’t worry about ever trying to “know it all”. I’m constantly finding new things and ways to do things in R and Pandas; I am always googling and stackoverflow-ing… Even though I have a book about Pandas and worked as an intern for RStudio! :)