Last fall, we released the Google Patents Public Datasets on BigQuery. These datasets include a collection of publicly accessible, connected database tables that enable empirical analysis of the international patent system. This post is a tutorial on how to use that data, along with Apache Beam, Cloud Dataflow, TensorFlow, and Cloud ML Engine to create a machine learning model to estimate the ‘breadth’ of patent claims. You can find all of the associated code for this post on GitHub. Before diving into how to train that model, let’s first discuss what patent claim breadth is and how we might measure it.