00:11 Tobias Macey
Hello, and welcome to the data engineering podcast the show about modern data management. When you're ready to build your next pipeline or want to test out the project to hear about on the show, you'll need some more to deploy it. So check out our friends over at linode. With 200 gigabit private networking, scalable shared block storage, a 40 gigabit public network fast object storage and a brand new managed Kubernetes platform you get everything you need to run a fast, reliable and bulletproof data platform. And for your machine learning workloads. They've got dedicated CPU and GPU instances, go to data engineering podcast.com slash linode today to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show. He listened to this show to learn and stay up to date with what's happening in databases, streaming platforms, big data and everything else you need to know about modern data management for even more opportunities to listen and learn from your peers you don't want to miss out on some great conferences. We have partnered with organizations such as od sc into data Council, with upcoming events including the observe 2020 Virtual Conference on April 6, and od SC East which has also gone virtual starting April 16. Go to data engineering podcast comm slash conferences to learn more about these and other events and take advantage of our partner discounts to save money when you register today. Just because you're stuck at home doesn't mean you can't still learn something. Your host is Tobias Macey and today I'm interviewing Tyler Colby about his experiences working as a data professional in the nonprofit arena, most recently at the Natural Resources Defense Council. So Tyler, can you start by introducing yourself?
01:42 Tyler Colby
Absolutely. I'm Tyler Colby. I work at the Natural Resources Defense Council as a data infrastructure director currently overseeing all of our data infrastructure means most prevalent among which is our Salesforce instance, which leads our fundraising and development efforts and our Amazon redshift, which sits at the middle of all of our disparate data platforms.
02:04 Tobias Macey
And do you remember how you first got involved in the area of data management?
02:07 Tyler Colby
Absolutely. I've had a twisting and turning halfway through my career. So it started way back in 2005. With Time Warner Cable, I was working my way through college as a part time telemarketer. And my manager at the time was fed up with the ridiculous order entry training, it was about a six week process in order to get everybody onto the telephones and you know, and he was looking at the team and saying, you know, this is these are people who signed up to be telemarketers not order entry specialists. And so he went out and bought Salesforce. And as is the case was, so many Salesforce professionals fell into what's called kind of an accidental admin role and was just put out the, you know, core of standing up the Salesforce instance because I knew computers, which was pretty much the only qualification at that time, we went through some really rapid transformations there and we're able to To take the team from eight part time telemarketers, up to 250 covering the entire Midwest in the matter of just a couple of years. And what that really showed me was, you know, when we start to look at these at the time, new technologies, right cloud was brand new at the time, just how powerful it was to not have to, you know, wait on an IT queue of things or stand up a whole bunch of servers in order to manage these different processes, that there wasn't a new and better way to do these things. As its progressed through the years, I've tackled everything from you know, doing pro bono consulting for nonprofits. I've worked on the for profit side, then out Salesforce itself. And then I did a lot of consulting before this as an integration architect as well with a company called Cloud for good which dealt specifically with nonprofits working on Salesforce, but connecting them to usually legacy systems and making sure that their data could flow between legacy systems and Salesforce so that they could, you know, still have that rapid transformation. And the future with this, you know, better platform, but still keep some of those platforms that they had on site and on premises.
04:07 Tobias Macey
You mentioned that in your current role as the director of data infrastructure for the NRDC, you're working pretty heavily with Salesforce and redshift. I'm curious if you can just give a bit more of a description of the overall scope of your responsibilities there and some of the challenges that you're facing in terms of being able to manage those elements of the data platform and any other components that you're using to support it.
04:30 Tyler Colby
Absolutely. So the NRDC started its rapid modded modernization of its data systems approximately four years ago. And if we look at the landscape at the NRDC four years ago, it was a lot of on premise and very siloed systems and a lot of them were starting to become or just were being deprecated or sunset by the vendors that supplied them. So looking at, you know, end of life scenarios on a lot of those platforms forced us into making some real Rapid decision. So specifically with Salesforce, we had a very tight turnaround on the platform and a very abbreviated implementation. So a lot of times we would have to, you know, in the last few years, we would have to go back to our old data systems to pull through either data or reconfigure a process in order to move that through. But today we are moving to a more traditional data warehouse model, specifically with Amazon redshift, sitting at the number of all of our platforms. So Salesforce, as you mentioned, is the key for our fundraising and development efforts. We do also have action kit for advocacy work mobile Commons for SMS based messaging, we do templo for dashboarding and our direct mail efforts are run off of Heroku Postgres with Salesforce Connect, we have a lot of other systems that also feed into that centralized model, but those are our main platforms right now feeding in and I I've been in the nonprofit space, and specifically the Salesforce nonprofit space for quite a while and what I found unique about my time at the NRDC is Really the scale of data for a nonprofit? We're not, you know, a lot of times when people think nonprofit, they think, you know, small amounts. And I know, you know, we're not looking into the billions and, you know, to the petabytes, but we are still working with the millions and 10s of millions of records across our data sets. So as we look at, you know, how do we move this data around? How do we keep this data in sync? And how do we make sure that we know which constituent is which across each of these disparate platforms. So as that conversation evolves across each of these different platforms, that we're able to know what this is develop the metrics around them, make sure that things like their communication preferences are set correctly, all of that becomes a little bit more difficult, because we're not dealing with a small quantity of data, but starting to get into at least what Salesforce would consider a large data volume.
06:48 Tobias Macey
And also for a lot of the organizations that are dealing with petabytes or exabytes scales of data. Their primary challenge is just in terms of the actual volume of the data and being able to scale that to process it within a timely fashion, whereas for smaller organizations, and particularly imagine in the nonprofit space, the challenge isn't so much in the volume of data and being able to scale to meet that it's in the variety and the overall cleanliness of the data that you're dealing with. And being able to scale to those challenges, which is a much different type of scaling and a much different type of complexity that you're dealing with, where if you have these petabyte and exabyte scale data sets, then there's a high probability that you've already put in the effort to make sure that everything that's landing in that is already clean and fairly homogeneous. So you don't have to deal too much with having to do a bunch of cleanup after the fact. And so again, it's just a matter of being able to process at scale, rather than being able to integrate all of the data from the disparate sources and the different ways of representing it, which I'm sure is probably more of what you're dealing with. And so I'm curious a bit about the types of data sources and some of the ways that it is some of the way that it manifests that you're having to deal with for your work.
07:59 Tyler Colby
That's a really insightful way to put it because, you know, we're looking not at, you know, petabytes of log data, we're gonna be looking at gigabytes of very complex data and data that has a number of rules that is attached to each of those records. And there's a, you know, a lot of different scenarios that it can fall into. There's an old XKCD comic, it's number 1667. It's one of my favorites. And it describes the complexity of algorithms on a spectrum. It has left pad and quicksort. On the far left, and on the far right is a sprawling excel sheet from a nonprofit. And, you know, in my experience that is so the case with nonprofits, you know, if we look at, you know, either the traditional ETL or ELT models, that t is so heavy, and there's so much complexity built into what does the State of exactly need to do and developing, you know, even getting the business analysis around? What do we need this data to do? When does you know, if we look internally to the NRDC? You know, we have things called giving levels. So as giving levels increase, that may mean a different set of choices and a whole different rubric for where that data could go, what we're thinking of the data, what team that's being routed to. And that's just one of many decisions. And when you start to compound these issues on top of each other, it gets really difficult just to make sure that that, you know, again, that tea within either the ETL or ELT framework is correctly attributed or attributing those records and moving them to where they need to go. And moving beyond that, really, there's a large segment of nonprofits that are really just limited by funding. So, you know, during my time consulting for nonprofits, especially as an integration architect solutions really had to meet some very strict funding limitations. So typically, you know, services would be covered by a grant or a specific donation with very left, very little leftover for choosing The perfect solution or the most robust solution. So a lot of times, you know, we'd be left with tooling that is just as good as it could be. And then designing processes that were as good as they could be given the limitations. So when we look at the NRDC in specific in today's world, we are fortunate that we are able to spend a little extra time and we have a great internal team in order to build out some more robust data pipelines, especially because we have redshift and a great analytics team that's assisting us. So we work with civis analytics, specifically to do a lot of our more complex data pipelines. But you know, because we have those, we are able to get to more automated data flows, make sure that our data is being adjusted. And, you know, do some more of these more complex data work internally as well.
10:48 Tobias Macey
Another element of nonprofits that is less pronounced in for profit institutions is the need to very closely align the work that you're doing with the value But it's going to produce to ensure that you're not wasting cycles on something that might be technically elegant and useful. But it's not necessarily going to give the immediate impact that's necessary to ensure that you're meeting the mission and the specific financial needs of the organization that you're working with. And I'm curious how that manifests in terms of the ways that you approach the technical and design decisions of your work. And any of the aspects of the build versus buy dichotomy in terms of how you're building things out,
11:30 Tyler Colby
build versus buy is a constant question. And it comes across daily. You know, I was on a call this even this morning, and we were talking about specifically doing roll ups within our instance and starting to look at more time series and snapshot data. And, you know, there's a lot of different ways that we could approach a situation and we do have Tableau on site currently, but talking through just where do we store this data? Where does it move out? too? And do we need something specific to solve that one specific issue? So, you know, as giving level changes, or as the primary team assigned changes, getting a grasp on what that population is, that falls into that giving level, how much funding how many activities is a constant struggle, again, at the NRDC, we have a lot of different options. And you know, we have a lot of different tools at our disposal in order to decide, you know, is this a builder or buy, typically, we do end up on the buy, you know, more by the best solution, and, you know, we can stand up and build if we need to, but with smaller nonprofits, it does become more of a challenge. So, you know, in my work, as you know, consulting, a lot of times it would end up being, you know, a design limit, you'd end up designing around it, and it's less around the buy and more what can we do as far as almost like a minimal viable product just to make things work?
12:56 Tobias Macey
And so you mentioned that at your current state You have Salesforce and redshift as the primary data sources that you're working with. And you have multiple other sources of information that feed into those. And you also mentioned that as of about four years ago is when this transformation started at NRDC, I'm curious what you saw to be the state of the ecosystem at the time and any challenges that you see in terms of the available tooling and available systems as it pertains to the needs of nonprofit organizations where a lot of these platforms were built out either from industry with the needs of suiting the for profit spectrum, or in academia with a larger research bench that was then repurposed for use in industry. And so given that there are a lot of different influences going into the tools that are available, how well do you find that they meet the needs of the specifics of nonprofits and how did you approach the overall effort of navigating the landscape but what What was available to determine what would best fit your needs at the NRDC?
14:03 Tyler Colby
Absolutely, um, there are a lot of vendors and in the nonprofit space that really tailor their solutions and their platforms for nonprofits. So it was a long and lengthy decision before we decided on Salesforce, and when we looked at the entire, you know, landscape of possibilities and where, where we could go, there's a few main players within the nonprofit space, specifically when we look at something like Salesforce to lead our fundraising and development. So, you know, when we looked at competitors, like every action, and Blackbaud has a number of solutions, really, it came down to the ease of which we could implement. So you know, looking at the partner network and looking at the robust support that we could get from Salesforce, and then also looking at, you know, what could we do off of this platform. So as we speak, we already talking in the middle of the covert 19 crisis and self isolating and one of the projects that came down across all of our platforms is making sure that these platforms are mobile ready so that as people are making the shift from working in an office to working from home that we don't have to go through and make sure that we're standing up these systems to make them ready wherever somebody is. And what's great about at least Salesforce as a platform is all of that was ready for us from day one. So we didn't have to go through and do a bunch of system configuration or, you know, all you know, even adjusting our page layouts for it. It was already for our users from day one, with all the security and everything else already built in. So having that box checked on day one makes crisis planning that much easier. And as we look through, you know, the rest of the other decisions, the time to get somebody from walking into Salesforce as just an admin to being proficient with the system is just a matter of years sometimes versus, you know, on some systems a matter of decades, including long periods of college training. So Salesforce really allowed us to just move a lot faster as far as getting our system ready for users. As we start to look towards our data side. It's a very similar decision to why we chose redshift. So redshift has allowed us to very quickly, you know, pull a lot of data in and we're ingesting data from all of our systems into redshift, and really using that as our central hub to do a lot of the more ELT work so that we are combining these things. We just had a significant release solving one of our main issues, which was identity resolution. So I think I mentioned this a little bit before but you know, who is who across each system. It's this problem that I believe has a lot of solutions on the for profit side, but not a lot that has been done on the nonprofit side. So until just very recently, Simply so we ingest Salesforce action kit, mobile Commons and a few other data sources. And we resolve them all down to a single ID which we call the NRDC ID. And we just had, again, a very significant release that allows us to do this. And we'll be building off of this. So our next big challenge is making sure that communication preferences are unique across all systems. So really making sure that we honor the data privacy legislation that's coming out. So GDPR and ccpa, across all these systems, and making sure that also that our constituents are being contacted the way they want to be contacted. So you know, sometimes it is because we are a large environmental nonprofit, you know, some people don't like to receive direct mail from us and they only want to receive email communications. So making sure that that message gets across all systems so that we're not accidentally sending mail and, you know, upsetting donors as well is something that we have to keep in mind. So you know, some of that tooling and some of that work we are doing and building on but he is Isn't readily available and isn't there are no out of the box solutions for it today,
18:04 Tobias Macey
going further on the subject of regulations and compliance. And as you mentioned, some of the less formerly stated but still important aspects of user privacy, such as you know, not accidentally exposing somebody's affiliation or support for some particular organization. What are some of the data challenges that exist in terms of the nonprofit space for being able to comply with any of this increased scrutiny of regulations or different compliance regimes and things like that? And what are some of the ways that you have found to be useful strategies for approaching that?
18:43 Tyler Colby
You know, if you asked me that question six months ago, I wouldn't have had a good answer. The Salesforce platform has pushed out a number of updates, and we are able to take each specific piece of data within Salesforce as far as what regulation it may fall under. So whether it's just taking it as PII and making sure that it's masked in any system that's not Salesforce or on any export all the way up to taking it for GDPR, or any of the right to be forgotten legislation, we're able to do that on the metadata level inside of Salesforce now, which has been a great help, because as we started to work through the architecting of that solution, and how do we actually pattern this out, it would have been built on top of our identity resolution platform with custom rules. So we had started to build this on our own. But Salesforce did come up with a significant release, which allows us to take it there and then start to handle a lot of that more on the Salesforce side, since most of our PII and constituent data is housed inside of Salesforce today, making sure that that is our main hub for most of those decisions. And tracking which of that data falls into that is, you know, again, has been a great save of time Moving Beyond Today, though, and you know where we go, so We have gone through our system and tagged on all of the field level data with that those metadata options, we are moving to, or moving that data now to our Amazon redshift and starting to pattern that out using that NRDC ID. So making sure that each of these systems have the same, the same column metadata, as we do in Salesforce is our current effort. And just make sure that it's marked across the system. And then while that data is being ingested into redshift, we'll make sure that it follows those same masking and identification rules, as well as delete policies as well. And that ripples even into our backups as well. And making sure that if we do receive a delete request that that's being placed through to all of our systems and our backups as well. So currently, that effort is manual. We don't we have not received many. I think in the last six months, I've only received two delete requests. But as these right to be forgotten legislations become more prevalent, I'm assuming that that will be an uptick, especially as we see, you know, states adopt more and most likely, at some point a US federal legislation will come, we'll likely see that an uptick
21:16 Tobias Macey
digging further into Salesforce. I'm tangentially familiar with it as a CRM and sales platform, particularly from when it was first introduced several years ago. And for anybody who isn't familiar with it, can you talk through a bit of the elements that are useful for it and some of the overall workflow that's involved in being able to take advantage of Salesforce, particularly as a data professional.
21:40 Tyler Colby
So the reason that we like to use Salesforce and what I've seen with Salesforce over the years is a lot of work and focus being placed into the ability to export data and the ability to work with the data within their system. So a lot of work has been put into their bulk API and the ability to Move records in and out quickly from Salesforce in the 10s of thousands or sometimes even hundreds of thousands of records within a batch. They also have their Streaming API, which has enabled a lot more of those rapid and on time, real time changes as well, as well as platform events and Change Data Capture, which allows more of the more modern integration design patterns. So a lot of work has been put into the platform from a data side that allows all of the new design patterns. And we're using a number of those platforms today and a lot of those different API's today. So when we need to move records around in bulk, we're able to get records in and out very, very quickly, especially for processes that don't require a lot of those complex changes. So you know, whether it's an update across all you know, millions of records, or taking data out to do roll ups in an outside system. All of that is done extremely quickly when we do get to some of those pieces. More complex data patterns. So you know, things that do require multiple levels of transformation or a lot of rules within the transformation, we do need to look outside to link so we could put this inside of Salesforce. And their quoting platform is very similar to Java. It's called apex. And we can put those transfers or transformations in to, you know, a trigger and allow things to work there. But a lot of times, we're able to handle this usually off platform and then push the data. And again, every nonprofit is going to be a little bit different when they're working with Salesforce. So some will do all of those transformations right on platform if they're dealing with the Lord data volume, but because of our data volume, and because of all the complexity and in our system, a lot of times we'll do those transformations off and then use those different API's to push everything in. But one of the key things with Salesforce is again, that ease of adoption, it can be extremely difficult for a systems implementation to move from any legacy system over time. Salesforce, but what I found in my career is that, you know, when you talk to people about, you know, how they used to work with these older systems, so, you know, Oracle or Microsoft, that those systems implementations, you know, were years long, where many times to Salesforce, it's a month long process. So it just takes that time down, because so many of the decisions have already been made for you.
24:22 Tobias Macey
One of the roles that I've seen is a Salesforce architect, which points to the level of flexibility and customization that you mentioned, and wondering what you have found to be some of the common stumbling blocks or the innate complexities of the platform that users should be aware of as they're starting to either modify their existing implementation or start to onboard an organization onto Salesforce.
24:49 Tyler Colby
I think Salesforce like any other platform can really suffer from just technical debt. And because Salesforce has implemented so many tools that are very friendly for admins and very declarative. So instead of, you know, requiring a CS degree to make a process or make a trigger, make something scalable, they have put a lot of tools into the hands of admins. So point and click Tools where you can stand up workflow rules or triggers, and a lot of these data processes that are, you know, very seemingly complex and would require code and other systems. The problem with that is when you get to, you know, pages upon pages of automation that have been put into place by an admin without oversight or without long term thinking, so hey, this business rule came up and I made a workflow rule and then it's in the system or they have process builders and flow and all of these declarative tools that they put into people's tool belts, and that's great and that's, you know, really fantastic for again, that ease of adoption, but when you start to look at your five or your 10 on Salesforce, and when I would work with the organizations that are in year five or year 10, I would see, you know, extremely slow system performance, I would even see, you know, sometimes inability to use specific portions of the system or get access to records because so much automation had been put into place. So when we do look at system architects or technical architects or you know, engaging with the consultant a lot of times are first work with an existing implementation and an existing org is starting to take a look through what is already in place, and where are the pain points within the system? Where are things going slowly, as we look to, you know, organizations that are functioning and you know, working well with an architect today, a lot of that work starts to become around, well, what can we do to make sure that page load times are working correctly? So on the front end, making sure that things look correctly that these systems are architected for large data volumes. So again, we could if we build a Salesforce system with not a lot of, or automation in place and make it more more simple system. A lot of those design system or designs decisions don't have to be taken at the architect level, they could be taken at the admin. But if we're starting to look across millions of records, we need to make sure that you know, anything that goes into the system is well architected has a design plan that makes sense and little design decisions can make a big difference into things like record locking, user performance, ability to open records, ability to search for the correct records, and all of that bubbles up and really falls into that role of an architect or a trusted consulting partner.
27:37 Tobias Macey
In addition to your current role at the NRDC. You have also taken it upon yourself to spin up a new nonprofit organization in light of the current global crisis that we're going through with the cobit 19 virus. I'm wondering if you can describe a bit of the nature of that mission and the organization that you're building up around it and some of the goals that you have That organization and how you're hoping to make an impact on the
28:05 Tyler Colby
current state of affairs. Absolutely.
Yeah. So I was sitting around as we record about two weeks ago now and just looking at kind of the state of affairs in the state of the crisis and where experts were saying it was going to trend to and having been in the nonprofit vertical for so long, I knew that, you know, not only was the virus going to impact it, and you know, the public health crisis. But as we look to all of the other impacts of this crisis, as it unfolds on the market crashes, a very big impact as well, because usually the first thing to dry up during this time is funding to nonprofits. And it's these community facing organizations that are really going to fill in those critical infrastructure gaps that are left as we go. And so making sure that our food banks, our animal shelters, our homeless populations, and so many other key areas of our society are already served by these organizations that they still have a path to receive these services in this new and changing way that we interact with each other was really a focus. So when I took a look at, you know, all the things that are really unique and special about the Salesforce nonprofit community, one of the main things that we do is called a community sprint and the Salesforce nonprofit team called salesforce.org have been holding these community sprints for a number of years. And typically these are in person to day events where a whole bunch of professionals I think the last one that I was out with somewhere around 250 professionals all fly in from around the country. We sit in a conference room, we break it apart into small groups, so similar to a hackathon and we just work on issues and then we donate all that code back to the community and back to Salesforce. Salesforce is a nonprofit platform, which is called the nonprofit success pack. So all of that is open and available to any nonprofit that's on the Salesforce platform. So really, what I was looking to do was say, Hey, we need to have a sprint and we need to have multiple Sprint's that can be done virtually, so that we can start to bring these technology professionals together and start to give our time back to these community organizations during this time. So we need to instead of, you know, taking these off the calendar, we need to ramp up our efforts at this time. But more than that, it's you know, not just about the Salesforce community, and it's not just about, you know, what we can do on this side of the fence. It's really a call to all technology professionals to stop and say, you know, during this time of self isolation, instead of just clearing out the Netflix queue, or, you know, getting to those video games that you're looking to, you know, spend a little bit extra time on or whatever it is that you're doing to, you know, kill all this extra time, turn your focus to the community who do you know, in the community that could use a little bit of help, so whether that's helping somebody set up a zoom in an afternoon, you know, I had multiple conversations, we're just setting up zoom so that people can meet face to face have a conference call. These are things that we take for granted on a daily basis. A lot of these technologies and just allowing that business continuity can mean the difference between the life and death of some of these organizations. So, you know, turn the focus around. So even if it's not specifically within the confines of you know what we're doing within the Sprint's, just take a look at the community and see what you can do and don't take the technology background that you have for granted. And make sure that you're using it in this time is really our message.
31:33 Tobias Macey
One of the challenges with that type of approach, I imagine is being able to scale Hubley identify the specific needs of different organizations and then do some matchmaking with people who have an appropriate skill set to help fulfill that need. And I'm wondering what your approaches in terms of being able to handle that matchmaking. And particularly since this is the data engineering podcast, what you see as being some of The unique data challenges that are posed by this situation and some of the needs for data professionals that might exist in the organizations and communities that you are currently working with.
32:12 Tyler Colby
Absolutely. That's a great question. So, you know, when we take a look at things like case intake and volunteer skill matching, that's an issue that the nonprofit community has been focused on for a very long time. So you know, if you look at organizations like the United Way, or other organizations that do massive amounts of volunteerism, and really rely on the community to come in and lend a helping hand, those are data challenges that we've been working on for years. So we have a lot of different case intake platforms and a lot of different case intake processes. And then matching those people to the correct volunteers, especially in a pro bono environment is something that is really making sure that they're vetted to the you know, correct people with the right expertise, having a QA person I'm in a QA team that sits on top of that, that can kind of veto that decision as well. So it's not just people picking up a project, but really making sure that we have a QA team from the very getgo, looking at who is being assigned to this and tasked to this initiative. And we're what we're creating is a hub app that sits at the center of a lot of different groups. So this hub would be responsible for mostly the complex issues that can come out of this unfolding crisis. So if it's a food bank that we can take a look at, you know, designing a solution that links to popular on site platforms that are currently developed. So Microsoft has most of the food bank ecosystem currently with the platform called Microsoft series. And if it's, you know, developing something that we can have an integration between the two, that's out of the box and allows these custom designs to come over to Salesforce. So maybe it's inventory tracking and having that inventory automatically. You With very little implementation, click through, we can start to move that inventory over to Salesforce. And we can start to move this into Salesforce world. So the reason that I bring up that example is after Hurricane Harvey, I was in Houston and worked with the Houston Food Bank. And that was one of the initiatives that we did was starting to migrate their data from Microsoft series over to Salesforce, and they were keeping their Microsoft series instance. So this was a close to real time integration, we ended up doing a batch time of about 15 minutes between the two so that we could keep the two in sync. And what that allowed them to do was, then use Salesforce this power to scale out. So they're able to go from a centralized distribution model to a more remote distribution model as well. So instead of just distributing from a single massive warehouse, they're able to now set up different distribution sites and know exactly what food was needed at each different site, who they're distributing to at each different site. Just allowed them to scale a lot. So they went up five acts on their distribution within the first few months after Hurricane Harvey hit. So without a technology that allows you to, you know, go off platform and do that that transformation never would have happened. Again, this is on the complex side of all the issues. And when we take a look at solutions like that, that's a solution that a lot of data professionals can start to bring to the table is saying, you know, hey, if we need to make a very large scale transformation, is this something that I can engage with so that people can either migrate their data in order or to a new platform in order to allow them to scale? Or is this something that we can make an integration between these two platforms and allow this, you know, rapid transformation of scale, it doesn't just have to be food banks, it can be, you know, a lot of different community organizations
35:48 Tobias Macey
and in terms of the challenges of the situation above and beyond what you have seen in your career with nonprofits. What are some of the things that stood stands out to you that you are currently working on trying to find solutions for and some of the support that would be most valuable to you as you continue on this path. And as the crisis sort of continues somewhat unabated.
36:15 Tyler Colby
That's a really good question. And you know, honestly, I've been trying to also find the balance for my personal life as well. Because, you know, as this crisis unfolds, I think it's all impacting everybody in a different emotional way outside of just technology systems as well. So I've been trying to build this really as a grassroots effort, and allowing different leaders to speak and different leaders to step up and take action with my own initiative and making sure that you know, when there's a better solution that I get out of the way as well, because I think as this unfolds, it will impact us all in just a number of ways. I know personally, in my life, I've already had, you know, multiple phone conversations that honestly typical week, you know, or in a typical times I've just never had, whether it's, you know, people being laid off or regarding quarantine plans or making sure that, you know, coworkers and family are just safe and making sure that they're taking the appropriate precautions. So there's a lot as this unfolds, and it will impact a lot of us on a personal level. So one of the things I'll say is, you know, this is, again, it's a pro bono effort, I urge you know, all technology professionals to say, what is my capacity to give, don't give too much. So don't over promise and say that you're going to, you know, give everything, see what you can give. So if it's, you know, couple hours on a weekend that you can set up a zoom session and start to take a look through and offer your services that can sometimes mean the difference between again, business continuity, your community organization, being able to stay alive and fill that critical infrastructure need, and possibly not so, but take a look at your own capacity first, before you decide To give
38:00 Tobias Macey
and going back out to the point of tools and platforms that exist that are available off the shelf, either in terms of open source where you can build to fit or on hosted platforms, what have you found to be some of the most useful or beneficial in the current landscape of data management systems and best practices? And what are the areas that you feel need to be addressed or improved, particularly for workers in the nonprofit sector?
38:27 Tyler Colby
It's a great question. So we've had a lot of success with a number of different ETL platforms and a number of different solutions. So specific to Salesforce is a platform called validity. They offer a number of different tools specifically around data management. So whether it's looking at merge records, identifying duplicates within your data set and just keeping it clean on an automated basis. It's a very robust platform. It's very similar to Google Refine or the old I believe they switched to open refine a few years ago. So very similar to functionality but very robust and very, very robust specifically for Salesforce. So we've been working with their platform for a long time, specifically around keeping our data clean on the Salesforce platform. And that's really, within our entire data architecture. Salesforce is where most of our data comes in. When we look out the rest of our data ecosystem currently, you know, things like, you know, Amazon redshift, Amazon redshift has allowed us to just move at lightning speed compared to a lot of the other platforms that we had been considering. And especially compared to if we had tried to build this ourselves and host this, you know, on a SQL server or something like that internally. So the ability to just ingest a ton of data, the compression around that the ability to manipulate that data inside and then push it out to these respective systems, again, was just it was something in my career that I didn't know that I was missing it until I had it. So you know, I wish that I had work with that a lot sooner. Besides that, you know, there are some tools that are off the shelf that I've worked with a number of nonprofits that are very helpful. So jitter bit, you know, has mixed reviews within some of the community. But I've had great success with a number of nonprofits using that as an ETL platform. And even as we go into kind of the micro side, there's a great platform called sky via sky v IAA, which offers some limited functionality, but when we just need some bare bones, you know, batch functionality typically to move a CSV typically, the design pattern is CSV from an FTP over to Salesforce on it allows even an admin to come in and stand up a lot of that without a lot of cost overhead. So the technology has come down so that it is accessible to a lot of these nonprofits. And that has been a great help in order to just make sure that this data is moving from system to system or from place to place no matter how complex it is.
40:59 Tobias Macey
Are there any Other aspects of your work at the NRDC or the tech workers Task Force, or your just overall experience of working in nonprofits as a data professional that we didn't discuss that you'd like to cover? Before we close out the show, I think
41:12 Tyler Colby
this is a good overview of those, I would just mention again, you know, if you are looking at the tech worker Task Force, and that it is meant for mostly Salesforce professionals, so we could use more data professionals in there, but really making sure that the focus is again, just turning the community turning back to your community, and reaching out to either local user groups are just local organizations and just saying, what help do you need at this time, because you'll be shocked at how often it's a very simple issue that may not be 100% what you're doing today, but something that you know, it could be setting up a slack community in an afternoon so that discussions can continue. So, you know, very easy things can happen and very easy. So Aleutians can have a very large impact at this time.
42:02 Tobias Macey
Well, for anybody who wants to follow along with you, or get in touch or offer their help, I'll have you add your preferred contact information to the show notes. And as a final question, I would just like to get your perspective on what you see as being the biggest gap and the tooling or technology that's available for data management today.
42:17 Tyler Colby
So answer this specifically for nonprofits. And really, the biggest gap I see is just a standard unified tool for nonprofits to use that is easy to use across all these different platforms. So right now, there are some platforms. So we're kado. Again, I mentioned sky via before that to allow a more developer or excuse me, a more admin experience. So more declarative point and click integration experience. But on one side, you're hit by limitations with skyview. Because of its low cost, it's not a very robust feature set. And on the other end with ricado, a lot of nonprofits get hit just by the price. So it's kind of a hard sell between the two. So Finding a nice middle ground where we can, you know, start to automate some of these data platforms and automate these data flows for smaller nonprofits. And smaller data sets is just been difficult especially given that they don't have the funding in order to hire an on site developer and how the staff so I would like to see some tooling that falls into that kind of middle ground, so easy to use, easy to stand up for an admin but not hitting the budget in a very harsh way.
43:27 Tobias Macey
Well, thank you very much for taking the time today to share your experiences working with Salesforce and in the nonprofit community. It's definitely an interesting and valuable area of effort. So thank you for all your time and effort on that front and I hope you enjoy the rest of your day.
43:42 Tyler Colby
Thank you too. Bye. So it's a pleasure being here.
43:49 Tobias Macey
Listening Don't forget to check out our other show podcast.in it at Python podcast calm to learn about the Python language, its community in the innovative ways it is being used. visit the site at data engineering podcast comm to subscribe to the show, sign up for the mailing list and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast comm with your story and to help other people find the show, please leave a review on iTunes and tell your friends and coworkers