Google BigQuery has been updated with a greater range of query and data types, more flexibility with table structure, and better tools for collaborative analysis.
Google BigQuery is Google’s tool that lets you run SQL-like queries against very large datasets. It is designed to work most effectively when used for interactive analysis of very large datasets, typically using a small number of very large, append-only tables.
The new improvements start with the addition of Big Join and Big Group Aggregations. The Big Join feature lets you merge data from two large tables with a common key to produce a data set. The Big Join operator cuts out the intermediate data transformation step.
The Big Group Aggregations increases the number of distinct values that can be grouped in a result set so you can set up queries on larger subsets of data. As Michael Manoochehri, Developer Programs Engineer, Cloud Platform, points out in the Google Developers Blog:
“Popular web applications produce user activity logs that can grow by billions of rows each week. Dividing users into smaller groups is a key step for analysis. However, each group of users can number in the millions. To handle this for such large volumes, we've enabled Big Group Aggregations.”
Both Big Join and Big Group are used by adding the ‘Each’ modifier to the clause. For example:
Another change that’s minor in scope but will save developers a great deal of work is the addition of native support for the TIMESTAMP data type. This means you’ll be able to import date and time values from databases such as MySQL without losing the timezone offset information. There are new functions to convert TIMESTAMP fields into other formats, calculate time intervals, and to extract components such as the hour, day of week, and quarter.
The third area of improvement is the ability to add columns to existing BigQuery tables. Finally, the BiqQuery Web UI has been improved. You can now see direct links to individual datasets in the BigQuery Web UI. This has been added to make it easier to bookmark and share a dataset, and to quickly access a dataset. In addition, if you share a dataset with another user using the sharing control panel, BigQuery will send a notification email to the person you’ve shared it with containing a direct link to the dataset.
Once you have signed up to BigQuery, you can test the new features using BigQuery’s set of public datasets free of charge.
Lennart Poettering has caused a big stir in the Linux world with his systemd approach to configuration. Now he has suggested a new way of building distros and getting your code into the users hands an [ ... ]