Add lens ids by jmelot · Pull Request #28 · georgetown-cset/article-linking

jmelot · 2023-06-09T20:59:07Z

Closes #27

jmelot

@jamesdunham a few things I was thinking about

sql/all_match_pairs_with_um.sql

sql/lens_ids.sql

jmelot · 2023-06-09T21:05:16Z

sql/lens_metadata.sql

+  MAX(LOWER(id.value)) AS clean_doi,
+  MAX(year_published) as year,
+  ARRAY_AGG(author.last_name) AS last_names,
+  ARRAY_AGG(reference.lens_id) AS references


Again please sanity check me - they're unusual in that they provide both references and citations

Did this execute successfully for you? I get Cannot query rows larger than 100MB limit. which led me here suggesting the issue is with the size of the values over which we're applying array_agg?

Ugh I must have somehow forgotten to run this one. Sorry about that - can you take a look at the updated version I've added in the latest commit?

(I've written the tables to tmp.lens_ids and tmp.lens_metadata, fwiw)

jmelot · 2023-06-09T21:08:31Z

sql/all_match_pairs_with_um.sql

+      AND lens_id in (select id from {{ staging_dataset }}.lens_ids)
+    )
+    (
+    SELECT


I'm guessing based on the name that the alias_lens_ids column maps lens ids to duplicates, so using those to match here

alias_lens_ids is undocumented so I'm not sure either. But from brief inspection, seems likely. We always have 1+ alias, and from spot checking, the array of aliases includes the lens_id.

select array_length(alias_lens_ids) as n_aliases, count(*) n_pubs from `gcp-cset-projects.lens.scholarly` group by 1 order by 1

sql/all_match_pairs_with_um.sql

jamesdunham · 2023-06-15T14:09:16Z

Done here with the exception of comments and questions above.

jamesdunham · 2023-07-06T18:16:53Z

sql/lens_metadata.sql

+FROM
+  lens.scholarly
+LEFT JOIN
+  dois


Is it intentional to join 1:M here? We string_agg() the references and array_agg() the author last names, but not the unnested DOIs, so just checking.

Yeah, it's intentional (but a reasonable question!). If we have multiple version of the metadata for an article, we record that on different rows for the same orig_id. We aggregate the refs and authors because they're the same for each version of the article's metadata

jmelot

ty!

jmelot · 2023-07-13T17:42:03Z

sql/lens_metadata.sql

+FROM
+  lens.scholarly
+LEFT JOIN
+  dois


Yeah, it's intentional (but a reasonable question!). If we have multiple version of the metadata for an article, we record that on different rows for the same orig_id. We aggregate the refs and authors because they're the same for each version of the article's metadata

Add lens ids

6f53213

Closes #27

jmelot force-pushed the 27-integrate-lens branch from 7cef1e8 to 6f53213 Compare June 9, 2023 21:00

jmelot added 2 commits June 9, 2023 17:07

Deduplicate lens ids

c941711

Reference tables consistently

3f7beb8

jmelot commented Jun 9, 2023

View reviewed changes

jmelot requested a review from jamesdunham June 9, 2023 21:09

jamesdunham reviewed Jun 14, 2023

View reviewed changes

sql/all_match_pairs_with_um.sql Show resolved Hide resolved

jmelot mentioned this pull request Jun 20, 2023

Rename all_match_pairs_with_um #30

Open

jmelot added 3 commits June 20, 2023 11:53

Fix lens metadata query

d2184cf

String agg, not array agg

8eaba2b

Add missing union

72f25eb

jamesdunham reviewed Jul 6, 2023

View reviewed changes

jmelot commented Jul 13, 2023

View reviewed changes

jmelot merged commit 63f0e23 into master Jul 13, 2023

jmelot deleted the 27-integrate-lens branch July 13, 2023 17:42

Conversation

jmelot commented Jun 9, 2023

Uh oh!

jmelot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jamesdunham commented Jun 15, 2023

Uh oh!

jamesdunham Jul 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmelot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

jamesdunham Jul 6, 2023 •

edited

Loading