Elasticiti’s Rob Tsai attended PyGotham in New York City and here’s his recap.
It’s not easy to sacrifice a summer weekend to anything other than the beach or a beer garden, but I’ve always been impressed by the quality of the sessions at PyGotham, so I figured it would be worth the tradeoff.
The talk schedule is available online while videos of the different talks will most likely be posted on Youtube soon.
Here’s a couple of fun talks I enjoyed :
Playing with Python Bytecode:
This was a really cool talk by Scott Sanderson and Joe Jevnik that explored the internals of CPython’s code representation. If you were ever curious to know what a function looks like in bytes – this was the talk for you. For many of us who use Python in our day to day – to make API calls, build webscrapers, build API endpoints, parse through CSVs, write data loaders to our database, write visualization scripts, etc. – we might really never need to ‘hack’ the CPython bytecode. We benefit so much from the core Python team, that we don’t always know (or need to know) what’s going on under the hood. But I found it truly fascinating. It’s kind of like the guy who made a sandwich for $1,500 by making everything from scratch. Witnessing the amount of dedication it takes in each part of the codebase to make your code run, you develop a newfound appreciation for all the hard work that had to happen for your Python code to run.
Probabilistic Graphical Models in Python:
One day I will complete Daphne Koller’s Probabilistic Graphical Models course on Coursera, but Aileen Nielsen’s talk was an excellent overview of how PGMs work, and how they are implemented in Python. I really liked how she used a concrete example of a very simple PGM to cover Bayesian Networks. In particular – she used a simple example of a person trying to make it to the Olympics, and creating the network at multiple layers as being dependent on how you perform at the Trials, which is dependent on whether you practice or have good genes. In most of the classes I’ve seen on PGMs, they make you calculate the probabilities by hand (brutal but probably worth doing once or twice) – so it was fun to see all that stuff implemented in the Python library so you could build your own networks quickly, and see the probabilities calculated on the fly.
Spark Dataframes for the Pandas Pro:
I enjoyed Alfred Lee’s talk on Spark and Pandas. We use Pandas quite heavily and while I’ve mostly been writing native Hive QL for my big data querying, it’s pretty seamless to move to Spark using Scala’s SQLContext class – if you like thinking and writing in SQL.
If you think like a Python/Pandas developer with Dataframes, it was pretty interesting to see how the functions and methods are invoked side by side to query, slice, join and subset data. Short answer – use Pandas if you’re modeling data that you can read in memory on your dev machine. Use Spark dataframes if you have a distributed cluster with data stored in HDFS and distributed compute nodes that can make use of shared cluster memory.
Simple Serverless ETL in AWS:
Ryan Tuck’s live demo of building a Pokemon API service using AWS Lambda was pretty amazing. The idea that you could set up a microservice by simply deploying to AWS Lambda means you don’t need to deal with spinning up virtual machines, loading up your libraries and dependencies, thinking about scale out, load balancing, deployment scripts. It’s a new way of thinking, and I think it’s going to be a hugely important trend moving forward – as people migrate their mononlithic applications towards microservices. Lambda architecture makes sense, as you offload all the scaling and DevOps challenges to vendors like AWS. For batch processing, this may not make the most sense – as you are limited by I believe a max of 300 seconds per request. You have memory and disk space limitations as well. But it will be interesting to see which parts of an analytics data pipeline could be migrated over to Lambda, and which parts stay outside of Lambda.
PyGotham conference was this year made of a great diversity of speakers, each one of them bringing great quality content. I recommend developers from any backgrounds and skills to make sure they attend next year’s conference because this is a cool opportunity to listen and meet some visionary developers.