Custom outputting for pupa data by doubleswirve · Pull Request #299 · opencivicdata/pupa

doubleswirve · 2017-12-06T21:43:17Z

Apologies if this is a little rough around the edges (not sure what the etiquette is on some of this stuff), but I wanted to submit this as a work-in-progress PR to get feedback.

This PR allows pupa data to be sent to other targets besides being written to a file. This initially includes an output option for Google Cloud Pub/Sub (thanks @showerst for initial implementation), but could also be extended to additional targets/services (e.g., Kafka).

The basic idea is we hook into the __init__ method of the Scraper class, and set up an instance variable output_target (which defaults to self for the default file writing). Based on the specified OUTPUT_TARGET environment variable, we call the save_object method on either the default Scraper instance or the alternative output target instance (e.g., Pub/Sub instance). So the only requirement is to have the alternative output target class implement a save_object method.

So far this works pretty well; however, a couple redundant spots include:

obj.pre_save call
info/debug logging prior to writing to file/sending to service/etc
object validation (i.e., obj.validate())
obj._related iterating/saving

Seems like some of these could be moved to methods in the Scraper class so alternative output target classes wouldn't need to include them.

We probably need some unit testing in there as well. Anyway, open to ideas and look forward to getting your feedback. Thanks!

… pass json key file into build)

Custom export update

…date meta data from google pubsub as it is already part of the message object (during subscription)

… client just once in __init__

Google Pub/Sub env var adjustments and helper methods

Custom export amazon sqs

showerst and others added 30 commits December 4, 2017 15:47

Temp version change

4b7ae82

Add GCPS settings

8d5a2d5

pluggable backends initial commit

fd267ae

move settings from cmd line to env

7b38107

add google-cloud-pubsub install requirement

17d8a3a

try setting up conditional import of module based on env var

8090d65

tiny var name change since not really dealing w/ class

5253ffc

try to set up pub/sub for env var auth (since it will be difficult to…

6685f30

… pass json key file into build)

small style changes to abide by flake8

e2d2889

tiny styling change

c95963f

add in obj pre_save method to match file writing save

48e215b

change env var for output switch and remove from settings

cbf13f0

bunch o name changes

6d5e4a8

remove unused import_lib

8d5622b

remove extra whitespace (causing diff change)

774a53b

ditto

1a1a1da

remove all just because hehe

9b8eca9

single quote consistency

2a6e749

add debuggering line

9e72ffb

more debug printing

fd6eb6a

remove debugging

9ca4c3d

(try) add scopes and tiny flake8 patch

4a39f4a

try another way for scopes

5c5c69a

oops, forgot to specify keyword for arg

a5bc525

try a tuple instead?

0d26c22

meh, trying this, maybe i missed something

6e42604

try another way

ef688c3

patch validation to reference caller

dad6321

one last shot, for this nite my friend

5e02928

try again

d9362d1

doubleswirve and others added 28 commits January 18, 2018 23:33

remove local file output init method

957a0bd

Merge pull request #1 from doubleswirve/custom-export--dry-it-up

6f8f8a4

Custom export update

patch scraper reference prop in google cloud output

ce3456c

add amazon sqs output

cd967b1

add amazon sqs output target to condition

757e907

add boto3 dep

99afa3e

do not encode for aws sqs

337beab

update pubsub filename

4165197

update initialization of publisher with new env var

f550a37

break some more helper methods out to the parent class and remove pub…

c985014

…date meta data from google pubsub as it is already part of the message object (during subscription)

try renewing the publisher client every publish...

4cf74ac

patch order so var error does not occur

71dc7b7

smore patches

a01dc61

try to debug http version

2e55902

remove http version testing and revert to instantiating the publisher…

1c832dd

… client just once in __init__

add debug helper method to parent class and update improts

289f9a0

try upgrading the grpc lib version

e2c950b

remove grpc dep, no dice

c6e4d71

Merge pull request #2 from doubleswirve/custom-export--google-pubsub-env

0adc9a5

Google Pub/Sub env var adjustments and helper methods

Merge branch 'master' into custom-export

585dc90

Merge branch 'custom-export' into custom-export--amazon-sqs

0b5d707

update info logging

90bc055

port over some more helper to sqs

5ade2fe

add s3 handling for messages that are too large

838b36e

update s3 key format

9d5e9d2

update number formatting for py version

820d395

Merge pull request #3 from doubleswirve/custom-export--amazon-sqs

9db57ba

Custom export amazon sqs

resolve conflicts

81bb820

doubleswirve closed this Aug 15, 2018

doubleswirve mentioned this pull request Aug 15, 2018

Feature: custom outputs #315

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom outputting for pupa data#299

Custom outputting for pupa data#299
doubleswirve wants to merge 81 commits into
opencivicdata:masterfrom
doubleswirve:custom-export

doubleswirve commented Dec 6, 2017

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

doubleswirve commented Dec 6, 2017

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants