NYU-CDS-Capstone-Project / dahlia Public

Notifications You must be signed in to change notification settings
Fork 2
Star 2

Capstone Project: Instant visualization of Twitter data using an online dashboard

2 stars 2 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
Data		Data
Document		Document
Proc		Proc
Vis		Vis
.gitignore		.gitignore
README.md		README.md

Repository files navigation

Team Dahlia

Instant visualization of Twitter data using an online dashboard

Team Member:

Meihao Chen
Yitong Wang

Advisor:

Pablo Barbera

Potential datasets

Tweets about Hillary Clinton's presidential announcement

All tweets mentioning "hillary", "hillary clinton" or "clinton" between April 12, 2015 at 17:00 UTC and April 14, 2015 at 17:00 UTC. Tweets are stored in JSON format by hour (each file is a different hour of data) and gzipped, inside a tar file. LINK

Analysis Tasks

General description of the dataset: number of tweets in total; number of tweets in a time series; important word count; number of retweets

Tweets about the 2014 Oscars

All tweets mentioning "oscars", "oscar", "red carpet", "oscars2014", "academy", "award", "awards" between March 2nd, 23:00 UTC and March 3rd 06:00 UTC. Tweets are stored in JSON format by hour (each file is a different hour of data) and gzipped, inside a tar file. LINK

Analysis Tasks

General decription of the dataset: count hashtags, number of tweets in a time series
Name entity recognition research. LINK
Public opinion analysis and prediction of award. LINK

Visualization Tools

We intent to use D3, javascript, and other tools to build interactive visualization on website.

Oct 5 2015

Pushed code for reading json file and running preliminary analysis on the hillary dataset

Nov 8 2015: Re-structuring this repository

Data

Hillary

Preliminary: basic counts of fields (used for the exploratory data presentation)
dataForVis: processed data for d3 visualization

Oscar

OscarNameCount: data derived from name entity tagger on the tweet texts, which gives the number of occurrences of names
filteredData: Fields extracted from Oscar-related tweets
Rest: Counts of each field data file

Document

All the references file and project descriptions

Proc

bashFilter

Scripts for running lmr (local map reduce), jq, counting data, and generating data for d3 (hier_bund.sh)

countMapReduce

MapReduce scripts for processing the raw field data extracted using jq

nameRecognition

Scripts for generating the name entity from the tweets

proc_d3

Scripts for processing data into the format that can be used for d3 visualization
Normally takes the data processed by jq

Vis

Contains all the files needed for constructing the webpage

About

Capstone Project: Instant visualization of Twitter data using an online dashboard

Custom properties

Report repository

Releases

No releases published

Packages

No packages published

Contributors 2

Languages