Mention yt-dlp in README and start recording view counts #10
Loading…
Reference in New Issue
No description provided.
Delete Branch "pta/WeatherNews.jl:main"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
next I'll work on a run-once script to retroactively add view counts to table rows with a video ID
I added a view column to the schedule display, and I think there might be a little bug. Notice uozumi on the 2024-04-23 having 0 views. The link is also going to her future show on the 24th.
I haven't dug that deeply into it, but I wanted to drop the screenshot here first.
I noticed that bug while testing. I think it happens on the first run because staff privatizes the video right after broadcast to remove the music segments. On future videos it should keep the "is_live" view count while they're privatized. And it's pointing to the future stream because staff fucked up the stream title again. Both should get fixed after several hours once the video is re-uploaded and staff revises the title. Maybe staff will be forced to fix the title on vod re-upload so the title's don't collide, though it's probably not a problem for youtube. I should change the code to derive dates from premier time rather than the title anyways.
I have the retroactive views script ready too. I'll test it maybe today. The only problem is that the yt-dlp request per video takes a good 2-4 seconds, and I tried different command options to get it to fetch minimal data and run faster. I'll see how long it takes to update the whole table.
Staffさん!!!!!
Nice.
That's a little on the slow side, but I can let it run for as long as it needs to go. With the number of YouTube videos that are currently stored, it would take a little over 2 hours. That's reasonable to me.
Mayu's stream got re-uploaded and tomorrow's title got fixed, so all the data is accurate now.
Here's the script. Just started running it now. I downloaded the database to /tmp for testing and am running it in the project root with
time julia --project add_view_counts.jl
it's actually taking max 2 seconds per update so far
so far so good
sqlite3 /tmp/wn.db 'select * from schedule WHERE segment_id != 8 AND video_id IS NOT NULL AND view_count IS NOT NULL;'
finally finished
This is weird. Row 1090 didn't get it's view count, and the json does have it. Maybe add a
@debug vid[:view_count] row[:id]
under the UPDATE statement before running it. I'm not sure whether a comma or something is needed in that debug statement.the
@debug
under catch didn't output anything, so I guess the UPDATE "succeeded"https://www.youtube.com/watch?v=SwVz3Oni58M
This is the problem. That URL points to the second calendar-making video!, not Yui's Moon that day, so it was never "live". My script is fine and correctly didn't even try adding a view count. Glad I added that condition for redundancy. I didn't expect a non-WNL youtube video got sneaked into the schedule. Fucking staff in charge of list.json. I'm glad we moved on to getting video data from youtube now.
I should make the script more elaborate by making it check that the video title passes my
iswnl
function and it's date and segment in the title matches what's stored in the database, and just print an warning if it isn't so you can check. There are some valid repeat video IDs, like January 1st when the earthquake stream lasted 10 hours. It's also the most viewed video.I should have cached the yt-dlp json responses for testing my updated script without spamming youtube's servers again. Would run way faster too, of course.
I've started running the script on the live database.
this is the improved for debugging version. I suspect it would catch some invalid repeat video ID's from staff being lazy.
I didn't test this new version. For some reason it's not finding the functions in WeatherNews.jl. Even if I move the script to bin/ with the others
Here are a few queries you can play with. Maybe you've already written something similar.
I feel like using
AVG
is vulnerable to getting skewed by extreme outliers, but it's an interesting first pass.The backfill of the data is still in-progress too.
There aren't any exported functions in the WeatherNews module. When you're not inside the defining module:
using
.@pta - Heads up! I just added
DataFramesMeta
as a new dependency. There isn't any deployed code that uses it yet, but I wanted to have it so that I could explore the data with Julia instead of SQLite for when SQL isn't enough.Here's a little thing I came up with to find the median instead of the average, because it'll be able to ignore the extreme outliers a bit better.
You can try it in the REPL. Here are my current results.
Great. I've been wanting to collect enough data to start doing some fun statistics. What I really want to do is get graphs on the site to make it easy for the guys to see this analysis. I'm not sure how we can integrate something like pyplot into the current site. Julia has a plotting library, but how to integrate it into the perl site. Maybe perl can call a plot generator and send graph pngs in the webpage. Or maybe feed the data to some Javascript plotting library and do the rendering client-side. Lots of new stuff for me to learn.
A graph for each caster that plots views for every stream chronologically would be a really interesting, stock-like visualization.
*pyplot
I meant gnuplot, which lots of languages have bindings to, I think.
I think the simplest thing to do would be to have a cron job generate plot pngs with every database update:
Please bear with me as I am a beginner when it comes to data visualization.
I tried doing 2) using Plots.jl. I had to read documentation for a while and experiment a lot to end up with this result.
I'm also a bit unclear on how to interpret 1), 3), and 4). This is how I interpreted it, and you can tell me where I'm wrong.
PS: Using julia-snail in Emacs, the plots were rendered inside an Emacs window which was convenient while experimenting. See the Multimedia and plotting section in their README for more details.
PPS: I discovered a bug in the julia-snail REPL when it tries (and fails) to print the unicode-rich JSON that API.video_ids() returns. If you're using a REPL, use one outside of Emacs for that data or Emacs may become unresponsive.
Less Noisy Views
In an attempt to make the plot of views look less noisy, I tried summarizing the view data by day. I took all the views that happened in a day and summed them into one number, and I did that for each day.
PS: I don't know if you were using it, but
DB.load_schedule_joined(db)
changed in a potentially breaking way.title
was renamed tosegment
(to be consistent with thecaster
field)jst
field is transformed into aZonedDateTime
instead of just being a string.