
Data Science at OSCON (Interview)
The Changelog: Software Development, Open Source
37min 21sec Nov 10, 2017
We went back into the archives to conversations we had around data science at OSCON 2017. We talked with Vida Williams (Data Scientist) and Michelle Casbon (Director of Data Science at Qordoba) about the social impact of open data, personal data and transparency, privacy, the big data problem of public surveillance, electronic fingerprinting, the rift between data scientists and computer scientists, natural language processing, machine learning, and more.
Changelog++ members support our work, get closer to the metal, and make the ads disappear. Join today!
Sponsors:
- Bugsnag – Mission control for software quality! Monitor website or mobile app errors that impact your customers. Our listeners can try all the features free for 60 days ($118 value).
- Linode – Our cloud server of choice. Get one of the fastest, most efficient SSD cloud servers for only $5/mo. Use the code
changelog2017to get 4 months free! - GoCD – GoCD is an on-premise open source continuous delivery server created by ThoughtWorks that lets you automate and streamline your build-test-release cycle for reliable, continuous delivery of your product.
- Toptal – Hire the top 3% of freelance software developers, designers, and finance experts. Email
adam@changelog.comfor a personal introduction.
Featuring:
- Vida Williams – Website, GitHub, X
- Michelle Casbon – GitHub, X
- Adam Stacoviak – Website, GitHub, LinkedIn, Mastodon, X
- Jerod Santo – Website, GitHub, LinkedIn, Mastodon, X
Show Notes:
- Citizen Data by Vida Williams at TEDxRVA
- Qordoba - Localization Software for Global Brands
Something missing or broken? PRs welcome!
Vida Williams
Unless you're a data practitioner in the world of open source developers, it's not really on the core of everything.
Jerod Santo
True.
Vida Williams
I have to make a compelling case to be interesting.
Jerod Santo
I see data science and I get excited. And I'm an open source developer. So maybe I'm the outlier.
Vida Williams
Well, it was interesting, because one of the things I talk about is open data; that's specifically what I'm interested in, the social impact of open data, like how do we come together --
Jerod Santo
That's what we wanna talk to you about.
Vida Williams
\[04:01\] So that's my thing, and there's just now burgeoning conversation around it. I think we tried to have it (interestingly enough) twenty years ago, but there's wasn't an infrastructure for open data at the time.
Jerod Santo
Who's "we"?
Vida Williams
Data practitioners. I mean, my first big project was a DPA data project, so that was big data before big data was big. We were doing something stupid that 15 years later we knew not to do, and that's move from mainframe into relational. You probably don't wanna do that to that volume of that.
That being said, at the time there were discussions around transparency and open data and who should have access to it, but there were no standardizations, there were no protocols, there were no accesses, there were no platforms. Now we're finally in a place where we can have this discussion, because especially in the open source sphere, all that stuff exists. So now it's regathering the vendors, if you will, all the data superheroes and going "Hey, we can now hold everybody accountable for privacy, for standardization, for protocols on access, in order to actually make a difference, so why don't we do that?" So anyway, that's what the talk was about.
Jerod Santo
Cool.
Adam Stacoviak
Interesting. We've actually had some shows -- we've been around for a while; in 2009 we started this show, and we've talked about open data, mostly in the government space a couple times... I'm looking for some older shows... It's been a while. This is like the first one - Civic Hacking, with Luigi Montanez and Jeremy Carbaugh; that was when they were both working with...
Jerod Santo
Sunlight Labs?
Adam Stacoviak
Yeah, Sunlight Labs...
Jerod Santo
Sunlight Foundation?
Adam Stacoviak
Yeah.
Vida Williams
Well, now you have the President's Information Fellows (the PIFs) who are in that whole White House-sponsored open data platform... But an interesting question came up in my session about if this conversation was before, and what do we do about the question of privacy? It was really like, "Okay, so if everybody is supposed to have this personal data, then how do we accomplish this around privacy?"
My response was we as data practitioners need to challenge the hypocrisy of privacy. We want to put a camera everywhere and be able to develop in reality TV, and there's no privacy communication there, but all of a sudden you're a data point, and there's all of a sudden a need for privacy. So we as practitioners need to actually challenge the definition of data as though image is somehow not data, and thus exempted from privacy, but if you're a number or some type of codified information, then all of a sudden there's privacy rules.
Adam Stacoviak
That's interesting, I've never really considered the idea of cameras being somewhere, and considering that, I hate that, too. I may be somewhat of a devil's advocate, but I'm not sure of your perspective... It kind of bugs me that you can take six data points and figure out exactly who I am - male, color, where I originated from, how much money I probably make, if I had kids... You could take six data points and pretty much figure out roughly everything about me besides my name. That's the world we live in, but should we accept that? Is it okay to have all that -- and I'm born in '79, so I'm 38 years old. People born in today's age, they're like -- it's second nature.
Vida Williams
They have no expectation of privacy.
Adam Stacoviak
Right.
Vida Williams
Okay, so where I sit on it - I'm an introvert data gig, so I don't want anybody to know anything. \[laughter\]
Adam Stacoviak
Okay, so maybe I'm not devil's advocate.
Vida Williams
No, no, no. I don't want anybody -- I'm one of the first ones to say "I'm falling off the grid for a set period of time and you can't get me." But I also, having been in technology for so long, strike a cool balance between the fact that in order for us to have this technological infrastructure and the innovation revolution that we're currently in, we have already as a country, at minimum - world, a little bit less, but equally made a decision to forego privacy.
So now when we discuss privacy, we're only talking about it really in the realm of making you feel comfortable at having you as a citizen, for having given it up.
Adam Stacoviak
\[08:11\] So it's already out there. It's reversing it.
Vida Williams
Right, it's already gone. Now, the problem that I have from a data sciences perspective is the definition of data. We will refuse to call image information data, and it is equally data.
Adam Stacoviak
Who's "we"?
Vida Williams
We is when we start talking about privacy laws, we do not consider image, video etc. with the same standard as we do your credit card number, your social security number... Except for now we have technology where if I put your picture up, I can equally find everything about you on the internet that's associated with that image, right?
Adam Stacoviak
You're scaring me, Vida... Come on now.
Vida Williams
I'm just saying...
Jerod Santo
It's true.
Adam Stacoviak
It's like catfish - you just throw that image in Google or whatever, this magic machine, and...
Vida Williams
Look, if you're trying to prevent catfish from happening, you might wanna put the image out. I'm just saying.
Adam Stacoviak
Okay. Yeah, that's true. \[laughter\]
Vida Williams
But we don't have the same protocols and expectation around privacy, and I'm saying there's a bit of hypocrisy there. In my space, when we're talking about making an actual difference in the world, we will not at all disclose the information of a youth who is in trouble at all. But as soon as he's in a fight or as soon as he's in some police exchange or as soon as he's in whatever, all privacy goes out the window, because there's an image, there's a video, and now we know everything, right?
Jerod Santo
Yeah.
Vida Williams
But if we could have just -- and this is my... So one of my course bases is child welfare; I work a lot in education, I'm planning a lot of impact investing and a lot of those things where I feel like we make community safer. How about if we just identified at the point in time that he became a foster youth, and all of a sudden his environment is instable? Why couldn't we de-privacy, de-new some of that data events, so that we could provide services that could have helped him? But now that is a privacy issue.
I don't know where the lines are, I just know that we don't -- I don't know where the lines are, but I know that we do not have a rational way of discussing privacy via data in a way that is actually gonna be beneficial for a community. That's what I know. So my thing is issuing a call to action to those who deal with data to begin the process of discussing "How do we templatize it? How do we standardize it? What protocols do we put into place in order to make data more available and more consumable for impact?" That's my goal, and I don't know if you're recording any of this...
Adam Stacoviak
We've recorded all of it.
Vida Williams
Did you really?
Adam Stacoviak
Yeah, we've already started, basically... It was like a soft opening here... \[laughter\] Unless you wanna resume differently. I was about to say that "By the way, we've been recording this whole thing and this is a good riff, so let's keep going..." \[laughter\]
Jerod Santo
Well, speaking to your privacy here... You know, we've been recording everything you just said... \[laughter\] It's funny, because we normally will do like an intro thing and then we'll start, but like --
Adam Stacoviak
She was glad we already had it going, and I was like, "We'll just keep talking."
Jerod Santo
I was over here thinking "This is better than the show is gonna be..."
Adam Stacoviak
This is the show, y'all...
Jerod Santo
This is the show, yeah. So, Vida Williams...
Adam Stacoviak
Vida Williams...
Jerod Santo
Lots to say... From my perspective, I didn't realize this. I've always considered it -- but because I'm just like a nerdy developer person, images are data, the video is data, my phone number is data... I always saw it the same; I didn't realize that the classification from the data practitioners or from the governmental bodies or people making the decisions - they see imagery and video as completely distinct things.
Vida Williams
Well, think about it this way - when you had the huge push for police to wear cams, right? That was the answer to the interactions between police and youth, right? The answer was "Everybody wear a cam."
Adam Stacoviak
Body cam, yeah.
Vida Williams
\[11:56\] So my response was "Who is managing all that data? How are you exactly organizing the fact that, well, we need to pick up this cam, from this person, at this time...? And who has the space? Who's managing the space constraints for calling all of that data at once?"
Adam Stacoviak
Is it archived? Is it archived well? Could it be used in the court?
Vida Williams
Absolutely.
Adam Stacoviak
All these things, I've never even thought about that. Nobody does.
Jerod Santo
Nobody did.
Adam Stacoviak
We do! We should!
Vida Williams
Right, and that is where the data people come in, and we were nowhere in that conversation, so yes, it's a social justice question, because the legislators wanna say "Yes, wear a body cam", and the data people are like "Wait a minute, that's like a yes-no", because that's a "Yes, we should do it", but a "No, we can't."
Adam Stacoviak
Right.
Vida Williams
And then how do you play that out later in the courts? And then where is the question of privacy then? The people in the video are under 18; how much can you show? You can't even tell a child's name if there's been any type of sexual violence in the newspaper, and yet you can show an entire video of a young person in some type of exchange with the police? Talk to me about privacy again. But because the data people are missing from those types of conversations, those points are only discussions in our rooms, behind our little screens, because we don't really like talking to people.
Adam Stacoviak
So what are they doing then with these cameras? How are they dealing with the data, do you know?
Vida Williams
I have no idea. I honestly have no idea. I have talked to a couple...
Adam Stacoviak
What's your best guess?
Vida Williams
My best guess is they're not.
Jerod Santo
They just lose it.
Adam Stacoviak
So maybe it's around for a week, until the SD card is formatted?
Vida Williams
What will happen is we'll have some case that will challenge it, where the data will need to be there - the data they filmed, the metadata and the images will all need to be there, and the (we'll just call them the) legislators of the day will come up and say "You know what, our policy at that point in time was to archive it seven days because of the volume of the data, and unfortunately that was cut before we could get there."
It will be some answer like that, because then that enables the legislators to vote yes, and then the execution of it to fall defunct, and it'll be nobody's fault.
Jerod Santo
Yeah... I'm starting to think of chain of custody and issues like that as well...
Vida Williams
Exactly.
Jerod Santo
Because who's the one who's maintaining the data? Is it the same people who are called in or questioned by \[unintelligible 00:14:06.22\]
Vida Williams
That's why I said the metadata becomes very important, like "Who picked it up? Who cataloged it? Where did they move it? When did they move it?"
We have electronic fingerprints - that's all a data issue, that's a development issue, that's an infrastructure issue, but we don't have the practices in place and nor do we have the protocols in place to deal with issues such as privacy. So now, if you had a routine traffic stop, I was stopped, he's got a camera on, he's taking a picture of me. But later I go running for office, what if I cursed him out during that traffic stop? Well, that video can resurface; where's the privacy of that? That was a state-sanctioned video.
So there's all kinds of questions of privacy that never come up when you're dealing with data from an image perspective.
Adam Stacoviak
They always say you never have something to hide, until you have something to hide, right? \[laughter\] That's the truth, though.
Vida Williams
It is! But in the era of data, you have everything to hide, or nothing to hide. That's where we are now. You don't even know what's out there to hide.
Jerod Santo
I'm going off grid, I'm out.
Adam Stacoviak
I'm out. We're done here. \[laughter\]
Jerod Santo
We want privacy back.
Adam Stacoviak
Oh, boy...
Jerod Santo
Do you kind of feel like that? Do you throw your hands up and you're like "What are we gonna do?"
Vida Williams
I did that years ago when I knew that we gave up privacy. It was just one of those things where I literally would fall off the grid... For a moment, because I know I'm never really off the grid, right? I just don't wanna talk to anybody.
Jerod Santo
Right.
Vida Williams
I think we're in the era of transparency. I think the best opportunity we have is citizenry, and on our side of the house as developers, as infrastructure planners, as data, is to begin to influence the legislation around it, is to begin to have some expectation that we be at the table as they're defining what are the rights and the wrongs of people, as it has to do with information that we're calling.
I think that's where we need to be, and I don't think that we're in the conversation at all. I don't think that people are thinking about "Let's bring the geeks to the table to discuss how this can happen."
Jerod Santo
Right, I agree with that.
Adam Stacoviak
They want us there last.
Jerod Santo
When it's too late...
Adam Stacoviak
\[16:00\] "We've made the solution, go make it. We've designed how it should be..." Yeah, exactly. "All the decisions are made, here's the spec. Can you do this in two weeks?" or "We're gonna need this tomorrow."
Vida Williams
Exactly. "Really, we needed this last week, so we're gonna pay you a hell of a lot of money to maybe get it wrong, but we're gonna roll it out anyway, and then \[unintelligible 00:16:19.07\]"
Jerod Santo
Oh, man... That's how it's gonna go down. That's how it goes down.
Vida Williams
That's how it goes down... But we can change that. That's why you're doing this podcast; we're calling awareness to it, a call-to-action... Bring the geek avengers out, we can change this.
Adam Stacoviak
What's your biggest call-to-action for developers, data scientists, geeks out there? What's your biggest call to action?
Jerod Santo
Yeah, actionable steps, what can we do?
Vida Williams
My biggest call-to-action is really get engaged with social justice issues. There are not enough of us that apply our talents into spaces where our impact can be readily felt. Three years ago I went from working high corp, enterprise architecture and data, to deciding that if I was so good at what I do, that I can drive corporate missions forward, Department of Defense missions forward, that if I use that same talent and applied it to child welfare and applied it into these other places, that I can drive those missions forward just as fast. And I would think that that would be true for all of us, that if we reapply all of our skillsets in these areas and look at that as a donation as much as we look at dollar donations, then maybe we can start affecting change in our communities.
Adam Stacoviak
Any low-hanging fruit in particular that you could mention?
Vida Williams
Absolutely. Probably education is the biggest one right now - like, how do we standardize education data so that we can actually show where our students are successful, where they're struggling, which communities can benefit from what types of actions.
We just need data, we need platforms to be able to nationalize some of the results that we're getting from the education systems. If there's already a mandate to produce education data, why isn't it standardized across the nation, and who's holding them accountable for doing that, and then who's doing that type of reporting that is accessible to educational practitioners, whether that's pre-school programs or extra-curricular education programs, or social workers or counselors?
That's low-hanging fruit that's really easy, but has the biggest impact for our next decade.
Adam Stacoviak
We always have to take care of our future generation, right?
Vida Williams
It would seem to be.
Adam Stacoviak
It's the best place to invest.
Vida Williams
They don't even know that they're not supposed to tell you this information, so... \[laughter\]
Jerod Santo
Yeah, really...
Vida Williams
So that's probably my biggest call-to-action in the first industry that I would say we could be the most impactful.
Adam Stacoviak
So if people are listening to this and they're like "I love Vida, she's awesome" and they wanna learn more about you - where do they go to find out more about you and what you're doing?
Vida Williams
Well, the first thing I would have to do is tell you my name is not Vida \[veeda\], but Vida \[vyda\]...
Jerod Santo
Oh, my goodness...
Vida Williams
...which is fine!
Adam Stacoviak
Come on now, you let me say it 15 times and I messed it up?
Jerod Santo
You waited this long... \[laughter\]
Adam Stacoviak
I even said, "Are you Vida Williams?" and she said "Yes, I am!"
Jerod Santo
I'm not even embarrassed, I'm just mad now...
Vida Williams
Well, she CAN be... \[laughter\]
Adam Stacoviak
Oh, man... The audience knows that I mess a lot of names up.
Vida Williams
And I'll just say it's not a big deal, because in Europe they told me I say my name wrong anyway.
Adam Stacoviak
Okay, what is it then?
Vida Williams
It is Vida Williams...
Jerod Santo
Okay. I was thinking Vida like life in Spanish...
Adam Stacoviak
Yeah, me too...
Jerod Santo
Livin' la vida loca was what I said to Adam, and he rolled his eyes at me...
Vida Williams
That's it, that's the thing... \[laughter\]
Jerod Santo
Livin' la Vyda loca...
Vida Williams
Yes, that's it, and I am @vidachristy everywhere - on Twitter, on Google, via email on Gmail... You can always get me at @vidachristy.
Adam Stacoviak
We'll put the links in the show notes to you, and make sure everybody knows about you.
Vida Williams
Awesome.
Adam Stacoviak
Any closing thoughts?
Vida Williams
I'd just thank you for the opportunity to ramble for about 15 minutes... I mean, I don't get that too often, so it's pretty awesome.
Jerod Santo
Awesome. We're happy to...
Adam Stacoviak
Happy to talk to you, very much.
Vida Williams
Thank you!
Jerod Santo
We're here with Michelle Casbon, Director of Data Science at Qordoba. Michelle, you as well as Vida Williams and other data scientists that we spoke to at this show, and I guess maybe other -- we're sensing a thing which I didn't know existed... We were talking about it before we started recording, but I wanted to get your explanation, because this is a social construct that I've never experienced, which is there seems to be a bit of a divide between data scientists, maybe with quotes around that, and computer scientists with quotes around that (or programmer)... What's up with that?
Michelle Casbon
Yeah, that's a great question. I think it stems from a lot of -- so data science didn't really exist until 5-10 years ago; it's a new thing, and I think when companies started to bring data scientists on, they sort of created these organizational structures that put a wall in between them, and they have different skill sets for the most part. So there's definitely some overlap. Engineering - you need a really strong programming background... But data science - you need strong engineering, and strong math... All of these other things in addition. So I feel like engineering kind of thought "Well, their programming skills aren't as strong, because they're really good at math", and then the data scientists are like "Well, they don't know anything about modeling, so they're no good."
I think it really boils down to organizational structures and having that wall in between, because a lot of times data science will do some really amazing things with math, and then they'll sort of like "Hey, go implement that, go put it into production", and an engineer is like "This library - it doesn't exist in Java. I don't know what kind of magic you expect me to do...", but that sort of throwing things over the fence, that kind of tension I think has caused a lot of problems.
Jerod Santo
I see. And that seems to have moved beyond the walls of the corporations to even events like this, where I think yourself as well as Vida, both responded to us in different terms... Like, "Are you sure you wanna talk to me? I'm a data scientist" or "I'm not a developer." \[laughter\] And our response to that was like "Yes, we do wanna talk to you!"
Adam Stacoviak
Yes, of course!
Jerod Santo
I have never been aware...
Adam Stacoviak
What was my response to that question...? "That's okay."
Jerod Santo
That's okay. \[laughter\] A little pat on the head, "That's okay..."
Adam Stacoviak
It wasn't that kind of "That's okay."
Michelle Casbon
I didn't say I'm not a developer, because data scientists are definitely developers.
Jerod Santo
Right, you didn't say you're not a -- well, Vida said she wasn't a developer... You just said "What's your audience? Because I'm a ..."
Adam Stacoviak
Do you think it's just like the community hasn't gotten to know you well enough? Like, maybe not hanging out...? Since it's newish, so to speak, maybe you all haven't gotten that time to congeal or hang out in the same rooms and realize that you're all human beings and you all have smarts and can bring something to a changing landscape of things?
Michelle Casbon
Yeah, I mean logically that makes sense...
Jerod Santo
It's making a lot of logical sense... Humans aren't logical.
Michelle Casbon
Right.
Adam Stacoviak
That's true.
Jerod Santo
We're emotional.
Adam Stacoviak
Very judgmental, very picky...
Michelle Casbon
I don't know, I guess it seems like there are these two focuses. One is just on sort of production code, writing things that don't break, and then there's the "No, but machine learning..." and "The math is the most important part..." I just think that like with any two organizations, just like between engineering and DevOps, there's a lot of tension because the goals are a bit different.
Jerod Santo
Right, and in a certain sense because there's overlapping skillsets, but not identical skillsets, both sides feel threatened by the other one...
Michelle Casbon
That's a strong word, but...
Jerod Santo
Was that too strong?
Michelle Casbon
I mean, "threatened" is like... That's just a strong word.
Jerod Santo
Okay. I'm gonna back it off...
Michelle Casbon
I'm not saying it's wrong...
Adam Stacoviak
How do you mean threatened? Just curious...
Jerod Santo
Well, I said it...
Adam Stacoviak
No, but she thinks it's strong - why is it strong?
Jerod Santo
Yeah, because I thought it was an apropos...
Adam Stacoviak
I feel like it's right on, too...
Jerod Santo
Yeah, but different reaction here, so please, tell us.
Michelle Casbon
\[27:01\] I think because we understand enough of what the other side does... It's easy to be critical of how other people are doing things. I think the best way to -- what I've seen to make the problem go away the best is really just to take down those walls... Organizationally, you're not too different people...
Jerod Santo
As you're saying, just sit together, work together... There's even job descriptions--
Michelle Casbon
Yes, and sharing titles. I consider myself a data science engineer, because I feel like that better describes what I do. Because I do have a background in engineering, and now I do a lot of machine learning, and my official title is Director of Data Science, but I don't feel like that's distinct from engineering anymore.
NLP is what I focus on, and in order to do that, I have to be able to understand distributed computing. That didn't necessarily exist in traditional NLP, and so now to be able to do machine learning, I really have to understand so much of it... And vice-versa, if anyone wants to implement any of these models, any of this NLP stuff, they really kind of have to understand what the libraries are doing...
I guess what I'm saying is just that the more you can merge the roles and the everyday tasks, whether that starts with calling people data science engineers, or merging titles somehow, or giving people the same sort of social status in the engineering hierarchy - either way, I think the more those can merge and the more you can align those goals...
Jerod Santo
The better off they will all be.
Michelle Casbon
Yeah, then the better will people work together.
Adam Stacoviak
It's a form of segregation, right? Titles... Wouldn't you say?
Jerod Santo
Well, you're literally segregating. You're actually drawing lines.
Adam Stacoviak
It's not a racial segregation; maybe that term is normally associated with... But it's a segregation; you're separating by roles and distinctions, when you should be melding more and considering yourselves more of a cohesive unit. That's what you learn in the military, that's what you learn working with teams, and the more you operate as a team, a fluid team, the better you are in the end result.
Jerod Santo
Well, in the military you have titles; you have the medic, you have the engineer, you have the...
Adam Stacoviak
Well, I didn't say that the authority and structure isn't required, because you have to respect those above you who've had the experience a bit down the road... So that's still there, I think... I mean, military is maybe a little different to compare perfectly, it's not a one to one, but you still have structure, you still have hierarchy, but that doesn't mean that you can't be on the same team.
Michelle Casbon
I agree. And that also helps with the whole common goal thing. We're all working towards the same thing.
Jerod Santo
Right.
Michelle Casbon
You don't have to be nailed down to a certain thing.
Jerod Santo
We've just gotta quit putting each other in boxes, man...
Adam Stacoviak
That's right, man. No boxes, okay?
Jerod Santo
Don't put me in a box, alright?
Adam Stacoviak
Box, not boxes.
Michelle Casbon
I'm really encouraged by the fact that you guys didn't even know that there was this tension... That is definitely a good sign for the future.
Adam Stacoviak
I'm starting to get a hint of it, though. I've been working with...
Jerod Santo
Daniel Whitenack?
Adam Stacoviak
No, Pete Soderling from DataEngConf.
Michelle Casbon
Oh, he's great!
Adam Stacoviak
Yeah, Pete's great. So I've kind of caught some edge that there's this divide, because like, okay, why is it DataEngConf and not DataScienceConf...? Why are there those nuances? So I didn't know the animosity or the divide, but I could sense that something was not perfect, not a cohesive world. There was a distinct between the different roles.
Michelle Casbon
Yeah. And his conference is I think part of the solution, because he addresses it, and it's all about working together as data science engineers and not as engineering and data science.
Adam Stacoviak
Those individuals, yeah. That's cool.
Jerod Santo
Let's talk about your talk, what you're here to talk about. You said your focus is on natural language processing, speech recognition, stuff like that. Is that what your talk was about?
Michelle Casbon
\[30:55\] So it was about how we use NLP at Qordoba. We have a platform that helps people localize their products... It doesn't really matter what the product is, but most everyone has a website or a mobile app, anything like that... We have a platform that helps people release that product in different markets. So not just English-speaking ones, but really across the globe. My role within the engineering team is to work on the machine learning.
My talk really set the stage for "Okay, why is localization important? Why should you even care about it? Because these are the disasters that happen when you don't care about it." I went down into a few of the details about which tools we're using...
We've built a lot of this on open source software. I really couldn't imagine building it on anything else. Open source really did enable us to even create this platform.
Jerod Santo
Because of the costs, or why?
Michelle Casbon
No, capabilities.
Jerod Santo
It's just better software...?
Michelle Casbon
Well, there's so many different components... I don't think any one vendor provides that entire stack, and even if I wanted to cobble all that together, it would be extremely difficult. It's much, much easier using open source tools, and they have gotten better so much faster.
Jerod Santo
What are some of the tools that you're using?
Michelle Casbon
The heart of our machine learning - we're using Spark's MLlib; we use their LogisticRegression, random forests libraries, stuff like that. And PredictionIO is what does a lot of the NLP stuff.
We're running that in Docker containers, on Kubernetes... It's all in Scala. Our storage layer is, we're using MariaDB and Cassandra.
Jerod Santo
Lots of things.
Michelle Casbon
There's a lot of stuff, yeah. So I talked a little bit about that.
Adam Stacoviak
That's interesting.
Jerod Santo
A laundry list of...
Adam Stacoviak
Yeah, and it's all open source.
Michelle Casbon
It's basically a dream.
Adam Stacoviak
That's good.
Michelle Casbon
Almost all open source.
Jerod Santo
It's basically a dream?
Michelle Casbon
Yeah, like as an engineer, to be able to work with such amazing tools, it's really, really fun.
Jerod Santo
That's cool.
Michelle Casbon
They didn't have to work too hard to recruit me, because... The mission - changing the world, being able to give people products that feel native to them, even if they don't speak English, can really do so much good in the world by building that kind of platform. And then using the best tools out there to do it, the tools that engineers really want to use... That's a big plus.
Adam Stacoviak
Yeah. I love the branding.
Michelle Casbon
Yeah?
Adam Stacoviak
The branding is phenomenal, Qordoba. Have you seen the site?
Jerod Santo
No, I haven't.
Adam Stacoviak
It's beautiful.
Michelle Casbon
We have a great designer.
Adam Stacoviak
Yeah. I mean, I love the direction it's -- it looks extremely trustworthy.
Michelle Casbon
That's actually our brand new, newly unveiled site, because we've just announced our funding; we just closed our series A funding round, and part of that was unveiling the new website... So I'm glad you like it.
Jerod Santo
Congratulations on all that.
Adam Stacoviak
Why is it the first time we're hearing of Qordoba? Why do you think?
Michelle Casbon
\[34:00\] I've asked myself that question a lot. When I first met the co-founders and I first heard about what they were building, it was one of those times where I was just like "Lightbulb! How have I not thought of using machine learning for that purpose?" It's so well-suited, it just makes sense. But I think a lot of good ideas in the past are like that. They seem obvious once you've thought of them.
Adam Stacoviak
Right. "A wheel!" \[laughter\]
Michelle Casbon
Exactly!
Jerod Santo
"This circle is better than the square I was using..." \[laughter\]
Michelle Casbon
The thing about the localization field is that it just really hasn't changed much in 30-35 years, and we're really here to take a lot of the tools that work so well in other areas and apply it to this older, more traditional one. Why hasn't anyone done it before? I have no idea, because it makes so much sense, and it's really exciting to be a part of that so early in the game, at such an early stage of the startup. It's a fantastic experience.
Adam Stacoviak
Cool.
Jerod Santo
Well, Michelle, thanks so much for sitting down with us!
Michelle Casbon
Of course!
Adam Stacoviak
Any closing thoughts to share? Any words of wisdom to part on? For the data scientists out there, the data engineers out there, and the mathematicians not knowing you well enough, what's going on.
Michelle Casbon
Feel the love...! \[laughs\] I guess I feel very personally invested in that whole data science versus engineering thing because I have one foot in both sides.
Adam Stacoviak
Both worlds, yeah.
Jerod Santo
You're the hybrid.
Michelle Casbon
I am definitely a hybrid, and that's been a fantastic experience. I haven't encountered any animosity in my personal teams, and so I guess I just wanna see more of that... Just everyone be nice.
Jerod Santo
Everybody be nice.
Adam Stacoviak
Be nice! Please!
