This project was done for the undergraduate course Principles of Information and Data Management which covered SQL and relational database design.
Our project is aimed to make drinking in NJ bars safer. We keep track of sex offenders and use that information to determine which bars are the safest to visit and which sex offenders to watch out for. We calculate safety of a bar based on the number of sex offenders/offendees visit, how dangerous they are, and the number of illegal beers they sell. Users of our application can search based on their location in New Jersey to find the safest bars near them.
The data used in this project was generated by us (we gave credit to any tools we used in our code). We generated realistic data an implanted verifiable patterns into the data.
To first get started with data generation you can import one of the following into your SQL database:
- sql/db_structure.sql this has only the tables of our database and not the actual data. Use this if you want to start from scratch.
- sql/db_structure_and_data.sql this has both the tables and data
After this, you can modify/use the scripts we used to generate data in the data-gen/ directory.
You can also see the front-end in public_html/.
The data is for one month of October 2013 with the following assumptions:
- ~5k drinkers
- ~3k bars
- ~250 sex offenders
- Each drinker goes to bar on average 3-4 times
- Each drinker frequents 2-3 different bars
We need to add a couple of patterns to the data. It might be easier to add all of the data initially and then implement the patterns by deleting records which go against the pattern. This way we can just worry about making a shit ton of data for now.
This pattern states that if an older person left with an underage drinker the older person is a sex offender.
After running this is the SQL verification of the pattern:
Logic behind this: Drinkers who left with underage drinkers are sex offenders Find a drinker who did leave with an underage drinker but is not a sex offender I have to check both drinker1 and drinker2 in LeftWith
SELECT CASE WHEN (SELECT COUNT(*) FROM (
SELECT d1.name
FROM LeftWith lw, Drinker d1, Drinker d2
WHERE lw.drinker1 = d1.name AND d1.age >= 21 AND lw.drinker2 = d2.name and d2.age < 21 AND NOT EXISTS(SELECT * from SexOffender s WHERE s.name = d1.name)
UNION
SELECT d1.name
FROM LeftWith lw, Drinker d1, Drinker d2
WHERE lw.drinker2 = d1.name AND d1.age >= 21 AND lw.drinker1 = d2.name and d2.age < 21 AND NOT EXISTS(SELECT * from SexOffender s WHERE s.name = d1.name)
) A) = 0 THEN 'Yes'
ELSE 'No'
END AS isTrueHere is the verification of the pattern:
SELECT CASE WHEN
(SELECT COUNT(*)
FROM Sells s
WHERE s.beer IN (SELECT name FROM Beer where manf in (SELECT m.name FROM Manufacturer m,Country c Where m.country = c.name AND prohibition=1))
AND NOT EXISTS (SELECT * FROM SexOffender WHERE bar = s.bar)) = 0 THEN 'Yes'
ELSE 'No'
END AS isTrueSELECT IF
((SELECT COUNT(f.drinker) FROM Frequents f, Bar b
WHERE f.bar = b.name AND b.international='1' AND
(SELECT COUNT(*) FROM Likes l WHERE l.drinker = f.drinker AND l.beer IN(
SELECT b.name FROM Beer b, Manufacturer m
WHERE b.manf = m.name AND m.country = 'Poland')) = 0) = 0, 'Yes', 'No') AS verificationOk, so this query gets the number of sex offenders all of the bars have:
SELECT b.name, COUNT(A.name) AS cnt
FROM Bar b LEFT JOIN (SELECT f.bar AS bar, s.name AS name FROM SexOffender s, Frequents f WHERE s.name = f.drinker) A ON b.name = A.bar GROUP BY b.name ORDER By cnt;Newer version (get illegal beers and number of sex offenders)
SELECT b.name, COUNT(A.name) AS offCnt, (SELECT COUNT(*) FROM Sells s WHERE s.bar = b.name AND s.beer IN (SELECT name FROM Beer where manf in (SELECT m.name FROM Manufacturer m,Country c Where m.country = c.name AND prohibition=1))) AS numIllegal
FROM Bar b
LEFT JOIN (SELECT c.bar AS bar, s.name AS name FROM SexOffender s, Consumed c WHERE s.name = c.drinker) A ON b.name = A.bar GROUP BY b.name ORDER BY numIllegal;Newer version (using victims)
SELECT b.name, round((10 - (COUNT(A.name) + (SELECT COUNT(*) FROM Sells s WHERE s.bar = b.name AND s.beer IN (SELECT name FROM Beer where manf in (SELECT m.name FROM Manufacturer m,Country c Where m.country = c.name AND prohibition=1)))) * (10/13)),1) AS rating
FROM Bar b LEFT JOIN (SELECT c.bar AS bar, s.name AS name FROM SexOffender s, Frequents c WHERE s.name = c.drinker OR s.victim = c.drinker) A ON b.name = A.bar
GROUP BY b.name;Yet another (actually this is slower)
SELECT off.name, round(10 - (off.numOff + ill.numIllegal) * (10/13), 1) as rating
FROM (SELECT b.name AS name, COUNT(A.name) AS numOff FROM Bar b
LEFT JOIN (SELECT c.bar AS bar, s.name AS name FROM SexOffender s, Frequents c WHERE s.name = c.drinker OR s.victim = c.drinker) A ON b.name = A.bar GROUP BY b.name) off
LEFT JOIN
(SELECT s.bar as name, COUNT(illBeers.name) as numIllegal
FROM Sells s
LEFT JOIN
(SELECT name FROM Beer WHERE manf in (SELECT m.name FROM Manufacturer m,Country c Where m.country = c.name AND prohibition=1)) illBeers
ON illBeers.name = s.beer GROUP BY s.bar) ill ON ill.name = off.nameI think this get's the maximum radius for a sex offender:
SELECT so.name, MAX(SQRT(POW(d.latitude - b.latitude,2) + POW(d.longitude - b.longitude, 2)))
FROM SexOffender so, Drinker d, Frequents f, Bar b
WHERE so.name = f.drinker AND f.bar = b.name AND d.name=so.name GROUP BY so.name;Some possibilities:
- bars with sex offenders have underage drinkers [done 11/17]
- bars with illegal beers have sex offenders [done 11/17]
- the higher the alchohol content in the beers which a bar serves correlates to how many sex offenders frequent that bar
- everybody who frequents an international bar likes Zywiec (we could add a trigger to make sure anyone added into Frequents with an international bar has a corresponding row in the Likes table)
- bars which serve beers with higher alchohol content are less safe
- generate a safety rating for a bar (possibly using the time too)
- Our ranking can be on a 1-10 scale:
- 8-10 the bar is safe. There is low risk of visiting this bar
- 5-7 There is some risk.
- 3-4 There is a moderate risk.
- 1-2 There is high risk.
- Our ranking can be on a 1-10 scale:
- add a price to sells in beers and maybe think of a pattern with the price
- get bar picture
- get bar review comments [bad and good] maybe write them ourselves
- remove people who were offended more than 2 times
- add indexes to tables