Skip to content

Capstone Project: Instant visualization of Twitter data using an online dashboard

Notifications You must be signed in to change notification settings

NYU-CDS-Capstone-Project/dahlia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

144 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Team Dahlia

Instant visualization of Twitter data using an online dashboard

Team Member:

  • Meihao Chen
  • Yitong Wang

Advisor:

  • Pablo Barbera

Potential datasets

Tweets about Hillary Clinton's presidential announcement

  • All tweets mentioning "hillary", "hillary clinton" or "clinton" between April 12, 2015 at 17:00 UTC and April 14, 2015 at 17:00 UTC. Tweets are stored in JSON format by hour (each file is a different hour of data) and gzipped, inside a tar file. LINK

Analysis Tasks

  • General description of the dataset: number of tweets in total; number of tweets in a time series; important word count; number of retweets

Tweets about the 2014 Oscars

  • All tweets mentioning "oscars", "oscar", "red carpet", "oscars2014", "academy", "award", "awards" between March 2nd, 23:00 UTC and March 3rd 06:00 UTC. Tweets are stored in JSON format by hour (each file is a different hour of data) and gzipped, inside a tar file. LINK

Analysis Tasks

  • General decription of the dataset: count hashtags, number of tweets in a time series
  • Name entity recognition research. LINK
  • Public opinion analysis and prediction of award. LINK

Visualization Tools

  • We intent to use D3, javascript, and other tools to build interactive visualization on website.

Oct 5 2015

  • Pushed code for reading json file and running preliminary analysis on the hillary dataset

Nov 8 2015: Re-structuring this repository

Data

Hillary

  • Preliminary: basic counts of fields (used for the exploratory data presentation)
  • dataForVis: processed data for d3 visualization

Oscar

  • OscarNameCount: data derived from name entity tagger on the tweet texts, which gives the number of occurrences of names
  • filteredData: Fields extracted from Oscar-related tweets
  • Rest: Counts of each field data file

Document

  • All the references file and project descriptions

Proc

bashFilter

  • Scripts for running lmr (local map reduce), jq, counting data, and generating data for d3 (hier_bund.sh)

countMapReduce

  • MapReduce scripts for processing the raw field data extracted using jq

nameRecognition

  • Scripts for generating the name entity from the tweets

proc_d3

  • Scripts for processing data into the format that can be used for d3 visualization
  • Normally takes the data processed by jq

Vis

  • Contains all the files needed for constructing the webpage

About

Capstone Project: Instant visualization of Twitter data using an online dashboard

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •