Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blink_features.usage has null rank column #122

Open
tunetheweb opened this issue Jul 19, 2021 · 3 comments
Open

blink_features.usage has null rank column #122

tunetheweb opened this issue Jul 19, 2021 · 3 comments

Comments

@tunetheweb
Copy link
Member

Since we have this column can we populate it with the new CrUX ranking? It's confusing not to have it in here, makes joins more difficult, and means you need an extra join to summary_pages table to get ranking.

@rviscomi / @pmeenan not sure what populates this table and so where this change would need to be made?

@rviscomi
Copy link
Member

There's a pair of "materialize blink features" scheduled queries at the project level in BigQuery that generate the blink_features tables on the 1st of the month. We'd need to edit these queries to output the ranking info.

@tunetheweb
Copy link
Member Author

Can't find where these are. Can you give me a pointer?

Also why the first of the month as thought these were based off the crawl? So why not generated with the rest of them?

@rviscomi
Copy link
Member

rviscomi commented Jul 30, 2021

Can't find where these are. Can you give me a pointer?

From BigQuery, on the left panel it says "Scheduled queries".
https://console.cloud.google.com/bigquery/scheduled-queries?project=httparchive

Also why the first of the month as thought these were based off the crawl? So why not generated with the rest of them?

We could look into doing that, similar to how we generate the technologies table. But we would need to be more careful since this table is appended to each month and repeated Dataflow jobs might result in duplicate data.

I've updated the features table to include rank info starting in the July dataset being processed in a couple of days. We would still need to aggregate pages by rank for the usage table and update consumers of the table accordingly. The most notable consumer is chromestatus.com but there may be other queries floating around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants