Data scientists are often described as hybrids: part statistician, part computer scientist; part analyst, part strategist. But while we focus on the myriad technical skills that a data scientist should possess, we often overlook one of the foundational skills (without which the whole edifice falls apart): communication skills. Being able to build models is all well and good, but if you can’t succinctly express the assumptions, takeaways, and next steps to the rest of the company, then your model isn’t worth the bits it’s made up of.
Today we’ll do some operator overloading, or change the behavior of some infix operators (+ and *) for our own handmade classes. There are legitimate reasons for doing so: if adding a behavior for + makes sense for your class, you can improve the readability of your code and make your classes easier to use. Of course, it’s also a power that can be abused (the creator of Java did not include operator overloading in that language because he said he saw it abused in C++, which is probably a reflection of the character of C++ programmers). For example, let’s look at the behavior of + for built-in tuples and lists. Using + on lists concatenates them! This is a problem if we were hoping to add the elements of the lists together. But, what if I want + to do vector addition and * vector multiplication?! But using numpy isn’t so much fun, so let’s build our own vector
Writing technically proficient code on a line-by-line basis is all well and good, but paying attention only to the micro aspect of our code is a recipe for disaster. But oftentimes that’s what we do, especially as beginners. When we’re just starting out, we struggle with learning the tools, jargon and syntax so that we can start coding, dammit! But once we’ve mastered variable declarations, conditionals, and loops, it’s time to start thinking about our code more holistically. WHY do we need to think about the architecture of our code? It’s an oft-statued truism that our code will be read by other humans more often than it will be read by machines. This observation is meant to nudge us towards writing code that’s friendlier for human comprehension, even at the cost of leaving some efficiency on the table. You can probably find edge cases where this generalization doesn’t hold, but for our purposes, we’ll assume that you’re convinced that it’s important for your code to be comprehensible to your human colleagues
Before I even tell you about today’s Python tip, I’ll warn you that there are a lot of opinions on the Internet about whether it’s even a good idea to use, so be sure to pick a side and dig in! Anyway, YMMV, and if you’re writing code that needs to be maintained by a lot of other people who may be confused by the syntax, take that into account. Onwards! I’m assuming you’re familiar with Python’s for-loops and while-loops. Sometimes you’ll write a loop that has a conditional inside that causes the loop to break. Let’s review the break statement: it’s used to terminate a loop early and picks up execution at the next line after the end of the loop. For example, if you have a for-loop that’s going from counter = [0,10), you can have a break statement that will cause you to leave the loop at, say, counter = 5. Or if you have a while-loop, you can
Students, even those with minimal computer science training, don’t often have issues with the concept of a data type when learning Python. A string is put inside a string, and an integer holds an integer; so far, so good. But the primitive data types pose few issues for understanding because students are relying on intuitions from the physical world, intuitions that may not be as reliable when we increase the complexity of our data structures or the work we’re asking those data structures to do. Alright, I’ll be honest: an understanding of abstract data types (ADT) is not essential for a beginning data scientist. But it’s unquestionably a fun topic (do not question me on this!). Additionally, the point of the code we write is to store and manipulate data, so it’s important to know how to handle this data efficiently. As we progress in our data science journey, knowing how data storage is handled will help us write more efficient code.
Let’s get the obvious out of the way: You don’t actually want to overflow a stack. But if, like me, you get most of your knowledge these days from stackoverflow.com, you may have wondered about the provenance of that website’s name. While I can’t tell you why they called it Stack Overflow per se, I can tell you what I know about stacks and how to royally screw up your day through a few lines of terrible code. Stack versus heap Before we get to the issue of overflow, we need to nail down the concept of a stack. I currently teach Python for my day job, so we’re blessed by having memory management hidden from us. And that means no stack (and only a private heap, at least in the CPython implementation). So beware: The concept of stack and heap may not exist or may be implemented differently in your favorite programming language! Warnings aside, we can get a