August 16-18, 2015

San Francisco, CA

A new conference for data engineers, data scientists, developers, and data managers who use the Scala language to build big data pipelines. Brought to you by the organizers of Text By the Bay 2015 and Scala By The Bay 2015.

Conference News

The Conference starts Thursday 8/13!

Scala By the Bay first day is FinagleCon held at Twitter on Thursday, in San Francisco.

Scala By the Bay, Friday-Saturday, and Big Data Scala, Monday-Tuesday, are both held at Kaiser Center in Oakland.

And the Complete Pipeline Training (long sold out) is held at Galvanize in San Francisco on Sunday.

Ask Martin Anything!

We are crowd-sourcing closing panel questions for both conferences.

Scala By the Bay:  Ask Martin Odersky and SBTB Panel Anything!

Big Data Scala:  Ask Big Data Scala CEOs and CTOs Anything!

The authors of the best questions for each panel will receive a special prize.

You can still register!     Buy Tickets

Scala: Connecting the Dots in Big Data

There are two key themes for Big Data Scala:

  • Connecting the dots in Big Data: Scala is the pipeline which runs APIs, pumps data through Kafka into Spark, and mines actionable insights. End-to-end Backends with Analytics can be built entirely using open-source Scala stacks. We’ll show you how to do it, via best practices and teams who built them.
  • Big Data Science. If you want to have a cluster backing your prompt, and not just a single node, and respond in real time, and keep all the data in the overall memory of the cluster, you should use Scala today. Leverage open-source libraries in Scala and Java for superior performance, the safety, and natural Hadoop integration.

Scala is seeing a meteoric rise in the developer adoption, taking over API services in Twitter and many other companies with Play, Akka, and Spray. Scala is the predominant functional programming language used in industry and running on the Java platform, 100% interoperable with Java. Scala provides concise and readable abstractions, proven to express algorithms and data science workflows much easier and in more flexible ways than specialized languages like Pig, while outperforming scripting languages such as Python or Ruby by orders of magnitude.

Scala use for Big Data processing skyrocketed in the last few years due do the ascendance of Apache Spark, the emerging in-memory computing platform. From data cleaning to JSON schema inference to Spark SQL to Spark Streaming to Databricks Cloud UI, it promises a complete Scala end-to-end backend and analytics. Kafka queue, developed at LinkedIn, is rapidly gaining ground as an enterprise message bus and stream processing engine. Tranquility and Druid provide real-time analytics at web scale for the ad space. These are just a few topics recently covered at Scala By the Bay and SF Scala.

Keynote Speakers

Martin Odersky

Martin (@odersky) is a German computer scientist and professor of programming methods at EPFL in Switzerland. He specializes in code analysis and programming languages. He designed the Scala programming language and Generic Java, and built the current generation of javac, the Java compiler. In 2007 he was inducted as a Fellow of the Association for Computing Machinery. In 1989, he received his Ph.D. from ETH Zurich under the supervision of Niklaus Wirth, who is best known as the designer of several programming languages (including Pascal). He did postdoctoral work at IBM and Yale. In 2011, Martin founded Typesafe Inc., a company to support and promote Scala, and he currently serves as the chairman and chief architect. Martin teaches two courses on the massive open online course provider, Coursera, namely Functional Programming Principles in Scala and Principles of Reactive Programming.

Mike Olson

Mike (@mikeolson) co-founded Cloudera in 2008 and served as its CEO until 2013 when he took on his current role of chief strategy officer (CSO). As CSO, Mike is responsible for Cloudera’s product strategy, open source leadership, engineering alignment and direct engagement with customers. Prior to Cloudera Mike was CEO of Sleepycat Software, makers of Berkeley DB, the open source embedded database engine. Mike spent two years at Oracle Corporation as vice president for Embedded Technologies after Oracle’s acquisition of Sleepycat in 2006. Prior to joining Sleepycat, Mike held technical and business positions at database vendors Britton Lee, Illustra Information Technologies and Informix Software. Mike has a Bachelor’s and a Master’s Degree in Computer Science from the University of California, Berkeley.

Debora Donato

Debora (@donabeb) Donato is Sr. Director of Personalization and Principal Data Scientist at StumbleUpon. Before moving to StumbleUpon, Debora was Senior Scientist at Yahoo! Labs. Her research interests include User Behavior Analysis, Recommendation Systems, Web Information Retrieval, Link Analysis, Algorithms for the Characterization of the Web, Complex Networks and Social Networks.

Jay Kreps

Jay (@jaykreps) is co-founder and CEO at Confluent. Prior to Confluent, Jay Kreps was the initial developer on several open source projects, including Apache Kafka, Apache Samza, Voldemort. He was the lead architect for data infrastructure at LinkedIn.

Matei Zaharia

Matei (@matei_zaharia) is an assistant professor at MIT and CTO of Databricks, the company commercializing Apache Spark. He started Spark as a research project at UC Berkeley and has been involved in the big data community since 2007, through projects including Hadoop, Mesos and Shark.

Big Data Scala follows Scala By the Bay

The key differences between Scala By the Bay (SBTB) and Big Data Scala (BDS) are as follows:

  • SBTB is the traditional classic Scala conference, held for the 3rd year in a row
  • Talks cover all spectrum of Scala engineering, including key FP principles
  • BDS is a new conference where Big Data community at large learns about the Scala advantage
  • BDS will have a significant Scala newbie attendance, especially data scientists
  • SBTB will cover real-time API scalability and reactive concerns to the degree that BDS will not
  • BDS will cover Hadoop integrations and analytics to the degree that SBTB will not
  • Both have separate CFPs, registrations, and training sessions, but share Complete Pipeline Training

Both Scala By the Bay and Big Data Scala will share the following common properties:

  • Both are held at the same venue, Kaiser Center
  • BDS follows SBTB after the common training day
  • Complete Pipeline Training between the conferences is open to both (but fits half from each).

    It covers Mesos, Akka, Kafka, and Spark in one day

  • We provide discounted packages on both conferences attended together

The Agenda

Come to Big Data Scala By the Bay well-rested and ready to meet your fellow Scala developers. We'll have two full days of talks (keynotes, full-length, and lightning), and multiple training tracks, including a Complete Pipeline Training day before.

Contact Us

Stay informed with the Big Data Scala conference news and event updates.


Conference Tickets

Contact Us about student discounts for the conference.

Our Sponsors

Partner Sponsors

Gold Sponsors

Silver Sponsors


Developer Events

Be a supporting member of the world's first end-to-end Big Data pipeline and science conference. We want to hear from you! Contact us for a prospectus and sponsorship agreement, or to talk about how we can help you be a contributing sponsor for the Scala By The Bay family of conferences.