Mention yt-dlp in README and start recording view counts #10

Merged
gg merged 2 commits from pta/WeatherNews.jl:main into main 2024-04-23 02:51:26 +00:00
Contributor
No description provided.
pta added 2 commits 2024-04-19 23:22:34 +00:00
Author
Contributor

next I'll work on a run-once script to retroactively add view counts to table rows with a video ID

next I'll work on a run-once script to retroactively add view counts to table rows with a video ID
gg merged commit 4fe3e10656 into main 2024-04-23 02:51:26 +00:00
Owner

I added a view column to the schedule display, and I think there might be a little bug. Notice uozumi on the 2024-04-23 having 0 views. The link is also going to her future show on the 24th.
image
I haven't dug that deeply into it, but I wanted to drop the screenshot here first.

I added a view column to the schedule display, and I think there might be a little bug. Notice uozumi on the 2024-04-23 having 0 views. The link is also going to her future show on the 24th. ![image](/attachments/814b54ff-6778-4c29-988a-bb67838cbeb6) I haven't dug that deeply into it, but I wanted to drop the screenshot here first.
Author
Contributor

I noticed that bug while testing. I think it happens on the first run because staff privatizes the video right after broadcast to remove the music segments. On future videos it should keep the "is_live" view count while they're privatized. And it's pointing to the future stream because staff fucked up the stream title again. Both should get fixed after several hours once the video is re-uploaded and staff revises the title. Maybe staff will be forced to fix the title on vod re-upload so the title's don't collide, though it's probably not a problem for youtube. I should change the code to derive dates from premier time rather than the title anyways.

I noticed that bug while testing. I think it happens on the first run because staff privatizes the video right after broadcast to remove the music segments. On future videos it should keep the "is_live" view count while they're privatized. And it's pointing to the future stream because staff fucked up the stream title again. Both should get fixed after several hours once the video is re-uploaded and staff revises the title. Maybe staff will be forced to fix the title on vod re-upload so the title's don't collide, though it's probably not a problem for youtube. I should change the code to derive dates from premier time rather than the title anyways.
Author
Contributor

I have the retroactive views script ready too. I'll test it maybe today. The only problem is that the yt-dlp request per video takes a good 2-4 seconds, and I tried different command options to get it to fetch minimal data and run faster. I'll see how long it takes to update the whole table.

I have the retroactive views script ready too. I'll test it maybe today. The only problem is that the yt-dlp request per video takes a good 2-4 seconds, and I tried different command options to get it to fetch minimal data and run faster. I'll see how long it takes to update the whole table.
Owner

And it's pointing to the future stream because staff fucked up the stream title again.

Staffさん!!!!!

I have the retroactive views script ready too.

Nice.

The only problem is that the yt-dlp request per video takes a good 2-4 seconds

That's a little on the slow side, but I can let it run for as long as it needs to go. With the number of YouTube videos that are currently stored, it would take a little over 2 hours. That's reasonable to me.

sqlite> SELECT COUNT(*) FROM schedule WHERE segment_id != 8 AND video_id IS NOT NULL;
┌──────────┐
│ COUNT(*) │
├──────────┤
│ 1997     │
└──────────┘
sqlite> SELECT 1997 * 4;
┌──────────┐
│ 1997 * 4 │
├──────────┤
│ 7988     │
└──────────┘

Second(7988) |> canonicalize
# 2 hours, 13 minutes, 8 seconds
>And it's pointing to the future stream because staff fucked up the stream title again. Staffさん!!!!! >I have the retroactive views script ready too. Nice. >The only problem is that the yt-dlp request per video takes a good 2-4 seconds That's a little on the slow side, but I can let it run for as long as it needs to go. With the number of YouTube videos that are currently stored, it would take a little over 2 hours. That's reasonable to me. ``` sqlite> SELECT COUNT(*) FROM schedule WHERE segment_id != 8 AND video_id IS NOT NULL; ┌──────────┐ │ COUNT(*) │ ├──────────┤ │ 1997 │ └──────────┘ sqlite> SELECT 1997 * 4; ┌──────────┐ │ 1997 * 4 │ ├──────────┤ │ 7988 │ └──────────┘ ``` ```julia Second(7988) |> canonicalize # 2 hours, 13 minutes, 8 seconds ```
Author
Contributor

Mayu's stream got re-uploaded and tomorrow's title got fixed, so all the data is accurate now.

Mayu's stream got re-uploaded and tomorrow's title got fixed, so all the data is accurate now.
Author
Contributor

Here's the script. Just started running it now. I downloaded the database to /tmp for testing and am running it in the project root with time julia --project add_view_counts.jl

using SQLite, JSON3

db = SQLite.DB("/tmp/wn.db")
rows = DBInterface.execute(db,
                           """
                           SELECT id, video_id FROM schedule
                           WHERE segment_id != 8 AND video_id IS NOT NULL AND view_count IS NULL;
                           """)

progress = 0
for row in rows
    cmd = `yt-dlp --flat-playlist
                  --extractor-args "youtube:player-client=web"
                  --dump-json
                  "https://youtube.com/watch?v=$(row[:video_id])"`
    vid = readchomp(cmd) |> JSON3.read
    if vid[:live_status] == "was_live"
        try
            sql_update = "UPDATE schedule SET view_count = ? WHERE id = ?;"
            DBInterface.execute(db, sql_update, [vid[:view_count], row[:id]])
        catch e
            @debug e
        finally
            global progress += 1
            println("progress: $(progress)")
        end
    end
end
Here's the script. Just started running it now. I downloaded the database to /tmp for testing and am running it in the project root with `time julia --project add_view_counts.jl` ``` julia using SQLite, JSON3 db = SQLite.DB("/tmp/wn.db") rows = DBInterface.execute(db, """ SELECT id, video_id FROM schedule WHERE segment_id != 8 AND video_id IS NOT NULL AND view_count IS NULL; """) progress = 0 for row in rows cmd = `yt-dlp --flat-playlist --extractor-args "youtube:player-client=web" --dump-json "https://youtube.com/watch?v=$(row[:video_id])"` vid = readchomp(cmd) |> JSON3.read if vid[:live_status] == "was_live" try sql_update = "UPDATE schedule SET view_count = ? WHERE id = ?;" DBInterface.execute(db, sql_update, [vid[:view_count], row[:id]]) catch e @debug e finally global progress += 1 println("progress: $(progress)") end end end ```
Author
Contributor

it's actually taking max 2 seconds per update so far

it's actually taking max 2 seconds per update so far
Author
Contributor

so far so good
sqlite3 /tmp/wn.db 'select * from schedule WHERE segment_id != 8 AND video_id IS NOT NULL AND view_count IS NOT NULL;'

so far so good `sqlite3 /tmp/wn.db 'select * from schedule WHERE segment_id != 8 AND video_id IS NOT NULL AND view_count IS NOT NULL;'`
Author
Contributor

finally finished

progress: 1980

real    94m36.556s
user    60m30.666s
sys     5m34.231s
finally finished ``` progress: 1980 real 94m36.556s user 60m30.666s sys 5m34.231s ```
Author
Contributor
1088  11         4           2023-11-26T14:00:00+09:00  iUHVlwj7Ivo  42068
1089  3          5           2023-11-26T17:00:00+09:00  hWp6FPfTT9A  43370
1090  8          6           2023-11-26T20:00:00+09:00  SwVz3Oni58M
1091             7           2023-11-26T23:00:00+09:00  mlSsPoDy6pA  12296
1092             7           2023-11-27T00:00:00+09:00  mlSsPoDy6pA  12296

This is weird. Row 1090 didn't get it's view count, and the json does have it. Maybe add a @debug vid[:view_count] row[:id] under the UPDATE statement before running it. I'm not sure whether a comma or something is needed in that debug statement.

``` 1088 11 4 2023-11-26T14:00:00+09:00 iUHVlwj7Ivo 42068 1089 3 5 2023-11-26T17:00:00+09:00 hWp6FPfTT9A 43370 1090 8 6 2023-11-26T20:00:00+09:00 SwVz3Oni58M 1091 7 2023-11-26T23:00:00+09:00 mlSsPoDy6pA 12296 1092 7 2023-11-27T00:00:00+09:00 mlSsPoDy6pA 12296 ``` This is weird. Row 1090 didn't get it's view count, and the json does have it. Maybe add a `@debug vid[:view_count] row[:id]` under the UPDATE statement before running it. I'm not sure whether a comma or something is needed in that debug statement.
Author
Contributor

the @debug under catch didn't output anything, so I guess the UPDATE "succeeded"

the `@debug` under catch didn't output anything, so I guess the UPDATE "succeeded"
Author
Contributor

https://www.youtube.com/watch?v=SwVz3Oni58M

"live_status": "not_live",

This is the problem. That URL points to the second calendar-making video!, not Yui's Moon that day, so it was never "live". My script is fine and correctly didn't even try adding a view count. Glad I added that condition for redundancy. I didn't expect a non-WNL youtube video got sneaked into the schedule. Fucking staff in charge of list.json. I'm glad we moved on to getting video data from youtube now.

https://www.youtube.com/watch?v=SwVz3Oni58M ``` "live_status": "not_live", ``` This is the problem. That URL points to the second calendar-making video!, not Yui's Moon that day, so it was never "live". My script is fine and correctly didn't even try adding a view count. Glad I added that condition for redundancy. I didn't expect a non-WNL youtube video got sneaked into the schedule. Fucking staff in charge of list.json. I'm glad we moved on to getting video data from youtube now.
Author
Contributor

I should make the script more elaborate by making it check that the video title passes my iswnl function and it's date and segment in the title matches what's stored in the database, and just print an warning if it isn't so you can check. There are some valid repeat video IDs, like January 1st when the earthquake stream lasted 10 hours. It's also the most viewed video.

1382  13         4           2024-01-01T14:00:00+09:00  zfDfEC6Vyws  5970556
1383  3          5           2024-01-01T17:00:00+09:00  zfDfEC6Vyws  5970556
1384  8          6           2024-01-01T20:00:00+09:00  zfDfEC6Vyws  5970556
1385             7           2024-01-01T23:00:00+09:00  zfDfEC6Vyws  5970556
I should make the script more elaborate by making it check that the video title passes my `iswnl` function and it's date and segment in the title matches what's stored in the database, and just print an warning if it isn't so you can check. There are some valid repeat video IDs, like January 1st when the earthquake stream lasted 10 hours. It's also the most viewed video. ``` 1382 13 4 2024-01-01T14:00:00+09:00 zfDfEC6Vyws 5970556 1383 3 5 2024-01-01T17:00:00+09:00 zfDfEC6Vyws 5970556 1384 8 6 2024-01-01T20:00:00+09:00 zfDfEC6Vyws 5970556 1385 7 2024-01-01T23:00:00+09:00 zfDfEC6Vyws 5970556 ```
Author
Contributor

I should have cached the yt-dlp json responses for testing my updated script without spamming youtube's servers again. Would run way faster too, of course.

I should have cached the yt-dlp json responses for testing my updated script without spamming youtube's servers again. Would run way faster too, of course.
Owner

I've started running the script on the live database.

I've started running the script on the live database.
Author
Contributor
using SQLite, JSON3
using WeatherNews

db = SQLite.DB("/tmp/wn.db")
rows = DBInterface.execute(db,
                           """
                           SELECT id, video_id FROM schedule
                           WHERE segment_id != 8 AND video_id IS NOT NULL AND view_count IS NULL;
                           """)

progress = 0
for row in rows
    cmd = `yt-dlp --flat-playlist
                  --extractor-args "youtube:player-client=web"
                  --dump-json
                  "https://youtube.com/watch?v=$(row[:video_id])"`
    vid = readchomp(cmd) |> JSON3.read
    if vid[:live_status] == "was_live" && iswnl(vid[:title])
        try
            sql_update = "UPDATE schedule SET view_count = ? WHERE id = ?;"
            DBInterface.execute(db, sql_update, [vid[:view_count], row[:id]])
        catch e
            @debug e
        end
        local titlejst = string(wnl_title_to_jst(vid[:title]))
        if titlejst != row[:jst]
            @warn """
                  Expected jst $(titlejst) does not match database's $(row[:jst])
                  Row: $(row[:id]) Segment: $(row[:segment_id])
                  https://www.youtube.com/watch?v=$(row[:video_id])
                  """
        end
        global progress += 1
        println("progress: $(progress)")
    else
        @warn """
              Probable non-WNL video at row $(row[:id]). Skipping...
              https://www.youtube.com/watch?v=$(row[:video_id])
              """
    end
end

this is the improved for debugging version. I suspect it would catch some invalid repeat video ID's from staff being lazy.

``` julia using SQLite, JSON3 using WeatherNews db = SQLite.DB("/tmp/wn.db") rows = DBInterface.execute(db, """ SELECT id, video_id FROM schedule WHERE segment_id != 8 AND video_id IS NOT NULL AND view_count IS NULL; """) progress = 0 for row in rows cmd = `yt-dlp --flat-playlist --extractor-args "youtube:player-client=web" --dump-json "https://youtube.com/watch?v=$(row[:video_id])"` vid = readchomp(cmd) |> JSON3.read if vid[:live_status] == "was_live" && iswnl(vid[:title]) try sql_update = "UPDATE schedule SET view_count = ? WHERE id = ?;" DBInterface.execute(db, sql_update, [vid[:view_count], row[:id]]) catch e @debug e end local titlejst = string(wnl_title_to_jst(vid[:title])) if titlejst != row[:jst] @warn """ Expected jst $(titlejst) does not match database's $(row[:jst]) Row: $(row[:id]) Segment: $(row[:segment_id]) https://www.youtube.com/watch?v=$(row[:video_id]) """ end global progress += 1 println("progress: $(progress)") else @warn """ Probable non-WNL video at row $(row[:id]). Skipping... https://www.youtube.com/watch?v=$(row[:video_id]) """ end end ``` this is the improved for debugging version. I suspect it would catch some invalid repeat video ID's from staff being lazy.
Author
Contributor

I didn't test this new version. For some reason it's not finding the functions in WeatherNews.jl. Even if I move the script to bin/ with the others

I didn't test this new version. For some reason it's not finding the functions in WeatherNews.jl. Even if I move the script to bin/ with the others
Owner

Here are a few queries you can play with. Maybe you've already written something similar.

-- Average views for casters
SELECT c.n AS name,
       AVG(s.view_count) AS avg_views,
       COUNT(s.view_count) AS appearances
  FROM schedule s
       JOIN caster c ON c.id = s.caster_id
 WHERE view_count IS NOT NULL
   AND view_count != 0
 GROUP BY c.n
 ORDER BY avg_views DESC;

-- Average views for segments
SELECT seg.n AS name,
       AVG(s.view_count) AS avg_views,
       COUNT(s.view_count) AS appearances
  FROM schedule s
       JOIN segment seg ON seg.id = s.segment_id
 WHERE view_count IS NOT NULL
   AND view_count != 0
 GROUP BY seg.n
 ORDER BY avg_views DESC;

I feel like using AVG is vulnerable to getting skewed by extreme outliers, but it's an interesting first pass.

┌─────────────┬──────────────────┬─────────────┐
│    name     │    avg_views     │ appearances │
├─────────────┼──────────────────┼─────────────┤
│ ailin       │ 93026.3333333333 │ 48          │
│ komaki2018  │ 82938.3728813559 │ 59          │
│ ohshima     │ 79741.6140350877 │ 57          │
│ ogawa       │ 71577.5084745763 │ 59          │
│ kobayashi   │ 65792.8524590164 │ 61          │
│ tokita      │ 60463.4444444444 │ 54          │
│ takayama    │ 59841.3333333333 │ 24          │
│ uozumi      │ 58126.8333333333 │ 60          │
│ hiyama2018  │ 56675.16         │ 50          │
│ matsu       │ 55527.6734693878 │ 49          │
│ okamoto2023 │ 50747.8          │ 10          │
│ aohara2023  │ 40025.05         │ 20          │
│ kawabata    │ 36004.5263157895 │ 19          │
└─────────────┴──────────────────┴─────────────┘
sqlite> 
┌─────────────┬──────────────────┬─────────────┐
│    name     │    avg_views     │ appearances │
├─────────────┼──────────────────┼─────────────┤
│ Moon        │ 95436.5          │ 98          │
│ Morning     │ 77436.1770833333 │ 96          │
│ Coffee Time │ 65027.40625      │ 96          │
│ Sunshine    │ 57757.4791666667 │ 96          │
│ Afternoon   │ 54250.3195876289 │ 97          │
│ Evening     │ 48831.4536082474 │ 97          │
│ _           │ 35675.7329842932 │ 191         │
└─────────────┴──────────────────┴─────────────┘
sqlite> 

The backfill of the data is still in-progress too.

Here are a few queries you can play with. Maybe you've already written something similar. ```sql -- Average views for casters SELECT c.n AS name, AVG(s.view_count) AS avg_views, COUNT(s.view_count) AS appearances FROM schedule s JOIN caster c ON c.id = s.caster_id WHERE view_count IS NOT NULL AND view_count != 0 GROUP BY c.n ORDER BY avg_views DESC; -- Average views for segments SELECT seg.n AS name, AVG(s.view_count) AS avg_views, COUNT(s.view_count) AS appearances FROM schedule s JOIN segment seg ON seg.id = s.segment_id WHERE view_count IS NOT NULL AND view_count != 0 GROUP BY seg.n ORDER BY avg_views DESC; ``` I feel like using `AVG` is vulnerable to getting skewed by extreme outliers, but it's an interesting first pass. ``` ┌─────────────┬──────────────────┬─────────────┐ │ name │ avg_views │ appearances │ ├─────────────┼──────────────────┼─────────────┤ │ ailin │ 93026.3333333333 │ 48 │ │ komaki2018 │ 82938.3728813559 │ 59 │ │ ohshima │ 79741.6140350877 │ 57 │ │ ogawa │ 71577.5084745763 │ 59 │ │ kobayashi │ 65792.8524590164 │ 61 │ │ tokita │ 60463.4444444444 │ 54 │ │ takayama │ 59841.3333333333 │ 24 │ │ uozumi │ 58126.8333333333 │ 60 │ │ hiyama2018 │ 56675.16 │ 50 │ │ matsu │ 55527.6734693878 │ 49 │ │ okamoto2023 │ 50747.8 │ 10 │ │ aohara2023 │ 40025.05 │ 20 │ │ kawabata │ 36004.5263157895 │ 19 │ └─────────────┴──────────────────┴─────────────┘ sqlite> ┌─────────────┬──────────────────┬─────────────┐ │ name │ avg_views │ appearances │ ├─────────────┼──────────────────┼─────────────┤ │ Moon │ 95436.5 │ 98 │ │ Morning │ 77436.1770833333 │ 96 │ │ Coffee Time │ 65027.40625 │ 96 │ │ Sunshine │ 57757.4791666667 │ 96 │ │ Afternoon │ 54250.3195876289 │ 97 │ │ Evening │ 48831.4536082474 │ 97 │ │ _ │ 35675.7329842932 │ 191 │ └─────────────┴──────────────────┴─────────────┘ sqlite> ``` The backfill of the data is still in-progress too.
Owner

For some reason it's not finding the functions in WeatherNews.jl.

There aren't any exported functions in the WeatherNews module. When you're not inside the defining module:

  • you either have to fully qualify the function when you call it, or
  • you can import the function with using.
# fully qualified
WeatherNews.wnl_title_to_jst(vid[:title])

# explicitly imported
using WeatherNews
using WeatherNews: wnl_title_to_jst, iswnl
> For some reason it's not finding the functions in WeatherNews.jl. There aren't any exported functions in the WeatherNews module. When you're not inside the defining module: - you either have to fully qualify the function when you call it, or - you can import the function with `using`. ``` # fully qualified WeatherNews.wnl_title_to_jst(vid[:title]) # explicitly imported using WeatherNews using WeatherNews: wnl_title_to_jst, iswnl ```
Owner

@pta - Heads up! I just added DataFramesMeta as a new dependency. There isn't any deployed code that uses it yet, but I wanted to have it so that I could explore the data with Julia instead of SQLite for when SQL isn't enough.

Here's a little thing I came up with to find the median instead of the average, because it'll be able to ignore the extreme outliers a bit better.

using WeatherNews
using WeatherNews: API, DB
using DataFrames
using DataFramesMeta
using SQLite
using Statistics

db = SQLite.DB("sql/wn.db")
s = DB.load_schedule_joined(db)

# Try finding the median instead of the average.
rs = @chain s begin
    @subset(:title .!= "au PAY")
    dropmissing(:view_count)
    dropmissing(:caster)
    @subset(:view_count .!= 0)
    @select(:caster, :view_count)
    @groupby(:caster)
    @combine begin
        :median_views = median(:view_count)
        :average_views = mean(:view_count)
    end
    sort(:median_views, rev=true)
end

You can try it in the REPL. Here are my current results.

13×3 DataFrame
 Row │ caster       median_views  average_views 
     │ String       Float64       Float64       
─────┼──────────────────────────────────────────
   1 │ ohshima           65981.5     77756.8
   2 │ komaki2018        65315.0    134937.0
   3 │ kobayashi         57125.0     66928.2
   4 │ hiyama2018        55823.0     64109.5
   5 │ tokita            55437.0         1.09e5
   6 │ takayama          54266.5     59841.3
   7 │ ogawa             52664.5     69393.0
   8 │ uozumi            49764.0     63084.5
   9 │ ailin             49541.0     78551.9
  10 │ aohara2023        41737.0     98628.1
  11 │ okamoto2023       39561.0     44772.0
  12 │ matsu             34218.0     47349.7
  13 │ kawabata          27279.0     42895.5
@pta - Heads up! I just added `DataFramesMeta` as a new dependency. There isn't any deployed code that uses it yet, but I wanted to have it so that I could explore the data with Julia instead of SQLite for when SQL isn't enough. Here's a little thing I came up with to find the median instead of the average, because it'll be able to ignore the extreme outliers a bit better. ```julia using WeatherNews using WeatherNews: API, DB using DataFrames using DataFramesMeta using SQLite using Statistics db = SQLite.DB("sql/wn.db") s = DB.load_schedule_joined(db) # Try finding the median instead of the average. rs = @chain s begin @subset(:title .!= "au PAY") dropmissing(:view_count) dropmissing(:caster) @subset(:view_count .!= 0) @select(:caster, :view_count) @groupby(:caster) @combine begin :median_views = median(:view_count) :average_views = mean(:view_count) end sort(:median_views, rev=true) end ``` You can try it in the REPL. Here are my current results. ``` 13×3 DataFrame Row │ caster median_views average_views │ String Float64 Float64 ─────┼────────────────────────────────────────── 1 │ ohshima 65981.5 77756.8 2 │ komaki2018 65315.0 134937.0 3 │ kobayashi 57125.0 66928.2 4 │ hiyama2018 55823.0 64109.5 5 │ tokita 55437.0 1.09e5 6 │ takayama 54266.5 59841.3 7 │ ogawa 52664.5 69393.0 8 │ uozumi 49764.0 63084.5 9 │ ailin 49541.0 78551.9 10 │ aohara2023 41737.0 98628.1 11 │ okamoto2023 39561.0 44772.0 12 │ matsu 34218.0 47349.7 13 │ kawabata 27279.0 42895.5 ```
Author
Contributor

Great. I've been wanting to collect enough data to start doing some fun statistics. What I really want to do is get graphs on the site to make it easy for the guys to see this analysis. I'm not sure how we can integrate something like pyplot into the current site. Julia has a plotting library, but how to integrate it into the perl site. Maybe perl can call a plot generator and send graph pngs in the webpage. Or maybe feed the data to some Javascript plotting library and do the rendering client-side. Lots of new stuff for me to learn.
A graph for each caster that plots views for every stream chronologically would be a really interesting, stock-like visualization.

Great. I've been wanting to collect enough data to start doing some fun statistics. What I really want to do is get graphs on the site to make it easy for the guys to see this analysis. I'm not sure how we can integrate something like pyplot into the current site. Julia has a plotting library, but how to integrate it into the perl site. Maybe perl can call a plot generator and send graph pngs in the webpage. Or maybe feed the data to some Javascript plotting library and do the rendering client-side. Lots of new stuff for me to learn. A graph for each caster that plots views for every stream chronologically would be a really interesting, stock-like visualization.
Author
Contributor

*pyplot
I meant gnuplot, which lots of languages have bindings to, I think.

*pyplot I meant gnuplot, which lots of languages have bindings to, I think.
Author
Contributor

I think the simplest thing to do would be to have a cron job generate plot pngs with every database update:

  1. a single plot on the main page with all rows' views.
  2. a plot on the caster page tracking median views
  3. per-caster view and shift frequency plots on each caster's respective page
  4. one on each segment's respective page
I think the simplest thing to do would be to have a cron job generate plot pngs with every database update: 1. a single plot on the main page with all rows' views. 2. a plot on the caster page tracking median views 3. per-caster view and shift frequency plots on each caster's respective page 4. one on each segment's respective page
Owner

Please bear with me as I am a beginner when it comes to data visualization.

I tried doing 2) using Plots.jl. I had to read documentation for a while and experiment a lot to end up with this result.

using Plots # I added this as a new dependency.

default(size=(800,600))
colors = cgrad(:thermal, 13; categorical=true, rev=true)

# Note that `rs` comes from my previous code.
b = bar(rs.caster, rs.median_views; formatter=:plain, xrotation=60, legend=false, fillcolor=colors[1:13])
png(b, "b") # save as b.png

median views by caster

I'm also a bit unclear on how to interpret 1), 3), and 4). This is how I interpreted it, and you can tell me where I'm wrong.

  1. I'm assuming the x-axis would be DateTimes, and the y-axis would be view_count, and the plotted line wouldn't care about who the caster is. It's just a single line that shows WN views over time.
  2. See the code and bar chart above.
  3. Would the per-caster view be another single line over time, but this time caster-specific? Would shift frequency be another bar chart like the one I attached here, except the x-axis would be segments, and the y-axis would be the number of appearances?
  4. On the segment pages, the bar chart the x-axis would be casters and the y-axis would be appearances.

PS: Using julia-snail in Emacs, the plots were rendered inside an Emacs window which was convenient while experimenting. See the Multimedia and plotting section in their README for more details.

PPS: I discovered a bug in the julia-snail REPL when it tries (and fails) to print the unicode-rich JSON that API.video_ids() returns. If you're using a REPL, use one outside of Emacs for that data or Emacs may become unresponsive.

Please bear with me as I am a beginner when it comes to data visualization. I tried doing 2) using [Plots.jl](https://docs.juliaplots.org/stable/api/#Plots.bar-Tuple). I had to read documentation for a while and experiment a lot to end up with this result. ```julia using Plots # I added this as a new dependency. default(size=(800,600)) colors = cgrad(:thermal, 13; categorical=true, rev=true) # Note that `rs` comes from my previous code. b = bar(rs.caster, rs.median_views; formatter=:plain, xrotation=60, legend=false, fillcolor=colors[1:13]) png(b, "b") # save as b.png ``` ![median views by caster](/attachments/01c0b3b0-db87-4864-9879-3baf295d3f27) I'm also a bit unclear on how to interpret 1), 3), and 4). This is how I interpreted it, and you can tell me where I'm wrong. 1. I'm assuming the x-axis would be DateTimes, and the y-axis would be view_count, and the plotted line wouldn't care about who the caster is. It's just a single line that shows WN views over time. 2. See the code and bar chart above. 3. Would the per-caster view be another single line over time, but this time caster-specific? Would shift frequency be another bar chart like the one I attached here, except the x-axis would be segments, and the y-axis would be the number of appearances? 4. On the segment pages, the bar chart the x-axis would be casters and the y-axis would be appearances. PS: Using [julia-snail](https://github.com/gcv/julia-snail) in Emacs, the plots were rendered inside an Emacs window which was convenient while experimenting. See the [Multimedia and plotting](https://github.com/gcv/julia-snail#multimedia-and-plotting) section in their README for more details. PPS: I discovered a bug in the julia-snail REPL when it tries (and fails) to print the unicode-rich JSON that API.video_ids() returns. If you're using a REPL, use one outside of Emacs for that data or Emacs may become unresponsive.
27 KiB
Owner

Less Noisy Views

In an attempt to make the plot of views look less noisy, I tried summarizing the view data by day. I took all the views that happened in a day and summed them into one number, and I did that for each day.

tf = Day(1)
rs = @chain s begin
    dropmissing(:view_count)
    @subset(:view_count .!= 0)
    @subset(hour.(:jst) .!= 23)
    @transform(:jst2 = floor.(:jst, tf))
    @groupby(:jst2)
    @combine begin
        :views = sum(:view_count)
    end
    @subset(:views .> 100000)
    @select(:jst = :jst2, :views)
end

p = plot(rs.jst[1:end], rs.views[1:end]; formatter=:plain, xrotation=60, 
    yscale=:identity, linewidth=2, label="views/day", ylims=(0,2000000))

daily-views.png

PS: I don't know if you were using it, but DB.load_schedule_joined(db) changed in a potentially breaking way.

  • title was renamed to segment (to be consistent with the caster field)
  • It's sorted by jst ascending instead of descending.
  • The jst field is transformed into a ZonedDateTime instead of just being a string.
## Less Noisy Views In an attempt to make the plot of views look less noisy, I tried summarizing the view data by day. I took all the views that happened in a day and summed them into one number, and I did that for each day. ```julia tf = Day(1) rs = @chain s begin dropmissing(:view_count) @subset(:view_count .!= 0) @subset(hour.(:jst) .!= 23) @transform(:jst2 = floor.(:jst, tf)) @groupby(:jst2) @combine begin :views = sum(:view_count) end @subset(:views .> 100000) @select(:jst = :jst2, :views) end p = plot(rs.jst[1:end], rs.views[1:end]; formatter=:plain, xrotation=60, yscale=:identity, linewidth=2, label="views/day", ylims=(0,2000000)) ``` ![daily-views.png](/attachments/ffb1c6a9-cb09-41c7-9687-b036ec1b9104) PS: I don't know if you were using it, but `DB.load_schedule_joined(db)` changed in a potentially breaking way. - `title` was renamed to `segment` (to be consistent with the `caster` field) - It's sorted by jst ascending instead of descending. - The `jst` field is transformed into a `ZonedDateTime` instead of just being a string.
Sign in to join this conversation.
No reviewers
No Label
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: gg/WeatherNews.jl#10
No description provided.