← Back to Schedule

What would the django of data pipelines look like?

to View on time.is

Room B

About this session

This talk will introduce the phaser open source library, but also teach principles that can help organizing data transformations and data pipelines. It's hard to carve out time to build your own utilities for a data pipeline and it's hard to even know what are effective patterns for modularizing once code gets complicated. Let's talk about why this investment is important and how the investment could be less costly.

Topics will include

  • Organizing data pipelines into more phases than just ETL, and why
  • Using checkpoints and logging to be able to debug pipeline breakdowns after they occur
  • Making data transformation code testable and maintainable
  • Supporting a team or rotating contributors to data pipeline code

Presented by

  • Lisa Dusseault

    Lisa Dusseault

    Lisa Dusseault is the CTO of the non-profit Data Transfer Initiative, supporting consumer data portability across tech platforms. With a dual career in standards and startups, she brings both idealism and pragmatism. On the startup side, Lisa was CTO of Compaas and ShareTheVisit and VPEng of Klutch. On the standards side, she co-authored CalDAV, updated WebDAV, was chair of the XMPP and IMAPExt working groups, and spent four years as Area Director shepherding new Applications area work at the IETF.