The Wiert Corner – irregular stream of stuff

✨ Check out this awesome post from Hacker News 📖

📂 **Category**:

💡 **What You’ll Learn**:

Interesting article: [Wayback/Archive] Stack Overflow’s forum is dead thanks to AI, but the company’s still kicking… thanks to AI – Sherwood News. with this important quote:

The complex questions still get asked on Stack because there’s no other place. If the LLMs are only as good as the data, which is typically human curated, we’re one of the best places for that, if not the best for technology.

I wonder about how far it has declined now, and also think these are reasons for the decline as well:

Anyway, the graph in that post is just a sexy version of a query you can create yourself on the SEDE (Stack Exchange Data Explorer). That’s why I included both below.

Graph: Stack Overflow's Forum's Decline Started Years Ago, But Al Killed The Platform For Good; Horizontal data: year; Vertical data: Number Of Questions Asked [Monthly, including deleted questions]; Rise from 2008-24, peak during Covid-19 in 2021/2022, decline from 2024 when ChatGPT got launched

Graph: Stack Overflow’s Forum’s Decline Started Years Ago, But Al Killed The Platform For Good; Horizontal data: year; Vertical data: Number Of Questions Asked [Monthly, including deleted questions]; Rise from 2008-24, peak during Covid-19 in 2021/2022, decline from 2024 when ChatGPT got launched

If you are like me, you also want other measures, like seeing answers and comments: are these also dropping, if so do they drop at the same rate, and what’s the ratio over time of answers over questions, and comments over questions and answers?

Generating graphs like these yourself

If you want to regenerate a very similar graph, check out the query [Archive] Questions per month including deleted – Stack Exchange Data Explorer

SELECT
  DATEFROMPARTS(YEAR(CreationDate), MONTH(CreationDate), 1) AS [Month],
  COUNT(*) AS [Questions]
FROM PostsWithDeleted
WHERE 
  PostTypeId = 1
GROUP BY 
  DATEFROMPARTS(YEAR(CreationDate), MONTH(CreationDate), 1)
ORDER BY 
  [Month] ASC

That was my starting point after a quick search, and the base of all my queries below.

The above query resulted in this graph: [Archive] Questions per month including deleted – Stack Exchange Data Explorer (graph)

[Wayback/Archive] 536362203-4db75341-a0c7-4b64-a6ec-e67f80575005.png (1003×535)

Answers per month including deleted

The is a similar graph as the above one, but now showing answers instead of questions derived from the above Questions query: Answers per month including deleted – Stack Exchange Data Explorer:

[Wayback/Archive] 537280204-e8929110-74f7-4fd5-a327-aaf32d270914.png (983×566)

The query is different in just two places: the count(*) alias and the PostTypeId value.

SELECT
  DATEFROMPARTS(YEAR(CreationDate), MONTH(CreationDate), 1) AS [Month],
  COUNT(*) AS [Answers]
FROM PostsWithDeleted
WHERE 
  PostTypeId = 2
GROUP BY 
  DATEFROMPARTS(YEAR(CreationDate), MONTH(CreationDate), 1)
ORDER BY 
  [Month] ASC

That opened a quest to get this graph:

Combined graph adding all post types

What I needed for this is aggregate the various post types (question, answer) and group by month. This is a tad more complex than Count(*) where PostTypeId=# queries above.

So I did some digging, where the results… were NOT from the Stack Exchange network (:

I started looking for transaction types and months because I know that is a common scenario in databases. Google Search returned relevant results, but Duck Duck Go  didn’t.

  • [WaybackFailed/ArchiveSave] (Cannot be archived) sql count transaction types group by month – Google Search
  • [Wayback/Archive] sql count transaction types group by month at DuckDuckGo had no good results
  • [Wayback/Archive] sql query to generate counts by month (no more recent Wayback Machine archivals because of #ThisIsWhyWeCantHaveNiceThings with JavaScript and URL rewrites)

    You can do some things with “GROUPING” (not included below) to make the subtotals and totals more apparent but, considering you posted no schema nor data, this should get you started once you make the appropriate substitutions for table and column names as required…

     SELECT d.Company,
            d.[Year],
            SUM(CASE WHEN [Month] = 01 THEN Transactions ELSE 0 END) AS Jan,
            SUM(CASE WHEN [Month] = 02 THEN Transactions ELSE 0 END) AS Feb,
            SUM(CASE WHEN [Month] = 03 THEN Transactions ELSE 0 END) AS Mar,
            SUM(CASE WHEN [Month] = 04 THEN Transactions ELSE 0 END) AS Apr,
            SUM(CASE WHEN [Month] = 05 THEN Transactions ELSE 0 END) AS May,
            SUM(CASE WHEN [Month] = 06 THEN Transactions ELSE 0 END) AS Jun,
            SUM(CASE WHEN [Month] = 07 THEN Transactions ELSE 0 END) AS Jul,
            SUM(CASE WHEN [Month] = 08 THEN Transactions ELSE 0 END) AS Aug,
            SUM(CASE WHEN [Month] = 09 THEN Transactions ELSE 0 END) AS Sep,
            SUM(CASE WHEN [Month] = 10 THEN Transactions ELSE 0 END) AS Oct,
            SUM(CASE WHEN [Month] = 11 THEN Transactions ELSE 0 END) AS Nov,
            SUM(CASE WHEN [Month] = 12 THEN Transactions ELSE 0 END) AS [Dec],
            SUM(Transactions) AS Total,
            SUM(Transactions)/24 AS Average
       FROM (--Derived table "d" finds count for year and month
             SELECT Company,
                    [Year] = DATEPART(yy,TransactionDate),
                    [Month] = DATEPART(mm,TransactionDate)
                    Transactions = COUNT(*)
               FROM yourtable
              GROUP BY Company,
                       DATEPART(yy,TransactionDate),
                       DATEPART(mm,TransactionDate)
            ) d
      GROUP BY d.Company,
               d.[Year]
       WITH ROLLUP
      ORDER BY d.Company,
               d.[Year]

    …and, in a properly indexed environment, will process millions of rows in scant seconds (ie. 4 million rows in about 5-7 seconds).

    –Jeff Moden

  • [Wayback/Archive] How to Summarize Monthly Transactions by Country in SQL | Step-by-Step Guide | by Data Science Wallah | Medium

    In this tutorial, we will solve the SQL problem “Monthly Transactions I” from LeetCode. This problem will test your skills in grouping data by date and country, as well as filtering and aggregating transaction data based on specific conditions. This is an excellent problem for practicing SQL aggregation functions like COUNT()SUM(), and GROUP BY.

    Here’s the complete SQL query to solve the problem:

    SELECT DATE_FORMAT(trans_date, '%Y-%m') AS month,
           country,
           COUNT(*) AS trans_count,
           SUM(CASE WHEN state="approved" THEN 1 ELSE 0 END) AS approved_count,
           SUM(amount) AS trans_total_amount,
           SUM(CASE WHEN state="approved" THEN amount ELSE 0 END) AS approved_total_amount
    FROM Transactions
    GROUP BY DATE_FORMAT(trans_date, '%Y-%m'), country;

The tricks used/knowledge needed:

  • Combine SUM and CASE WHEN to count relevant values as there is nou COUNT(WHEN …)
  • GROUP BY cannot use aliases from the SELECT portion

So in order to get the counts for the various PostTypes, we need to add a column that counts for each PostTypes value.

The PostTypes are these (thanks [Wayback/Archive] Show all types – Stack Exchange Data Explorer, see more about it further below):

Id   PostType           
---- ------------------- 
1    Question            
2    Answer              
3    Wiki                
4    TagWikiExcerpt      
5    TagWiki             
6    ModeratorNomination 
7    WikiPlaceholder     
8    PrivilegeWiki       

That resulted in this query: Questions, answers and other post types per month including deleted – Stack Exchange Data Explorer (3 seconds)

SELECT
    DATEFROMPARTS(YEAR(CreationDate), MONTH(CreationDate), 1) AS [Month],
    SUM(CASE WHEN PostTypeId = 1 THEN 1 ELSE 0 END) AS [Questions],
    SUM(CASE WHEN PostTypeId = 2 THEN 1 ELSE 0 END) AS [Answers],
    SUM(CASE WHEN PostTypeId = 3 THEN 1 ELSE 0 END) AS [Wikis],
    SUM(CASE WHEN PostTypeId = 4 THEN 1 ELSE 0 END) AS [TagWikiExcerpts],
    SUM(CASE WHEN PostTypeId = 5 THEN 1 ELSE 0 END) AS [TagWikis],
    SUM(CASE WHEN PostTypeId = 6 THEN 1 ELSE 0 END) AS [ModeratorNominations],
    SUM(CASE WHEN PostTypeId = 7 THEN 1 ELSE 0 END) AS [WikiPlaceholders],
    SUM(CASE WHEN PostTypeId = 8 THEN 1 ELSE 0 END) AS [PrivilegeWikis],
    COUNT(*) AS [AllPostTypes]
FROM PostsWithDeleted
GROUP BY 
    DATEFROMPARTS(YEAR(CreationDate), MONTH(CreationDate), 1)
ORDER BY 
    [Month] ASC

with this graph:

Questions, Answers, Wikis TagWikiExcerpts, TagWikis, ModerationNominations, WikiPlaceholders, PrivilegeWikis, AllPostTypes including deleted ones per month

Questions, Answers, Wikis TagWikiExcerpts, TagWikis, ModerationNominations, WikiPlaceholders, PrivilegeWikis, AllPostTypes including deleted ones per month

[Wayback/Archive] 539268768-c1955f6e-70b2-4025-8a2b-02534b08f720.png (979×525)

This also gives a rough view of answers per question: until roughly 2018, that was above 1, and from 2020 on it got below 1.

Comments by month graph

In created this to try creating a second “combined” graph below.

The query is this: Comments count by Month – Stack Exchange Data Explorer

SELECT     DATEFROMPARTS(YEAR(Comments.CreationDate), MONTH(Comments.CreationDate), 1) 
                             AS [Month],
           COUNT(Id)         AS CommentIdCount
           
FROM       Comments 
GROUP BY   DATEFROMPARTS(YEAR(Comments.CreationDate), MONTH(Comments.CreationDate), 1)
ORDER BY   [Month] ASC

and the graph:

CommentIdCount by month

CommentIdCount by month

[Wayback/Archive] 537654387-b7af32a9-68ef-4b06-be89-a36dc620f105.png (979×523)

Combined graph adding comments to all post types

The above question/answer ratio estimate got me thinking: what about comments and the ratio to posts – how did these evolve over time?

The last query above took about 3 seconds. The final one about 7 seconds as it needs two more queries, as you can see in Comments for questions, answers and other posts per month including deleted – Stack Exchange Data Explorer:

WITH       PostCommentCounts AS
(
SELECT     COUNT(Id)         AS CommentIdCount, 
           PostId
FROM       Comments 
GROUP BY   PostId
)

SELECT     DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1) 
                                                           AS [Month],
           SUM(CASE WHEN PostTypeId = 1 THEN 1 ELSE 0 END) AS [Questions],
           SUM(CASE WHEN PostTypeId = 2 THEN 1 ELSE 0 END) AS [Answers],
           SUM(CASE WHEN PostTypeId = 3 THEN 1 ELSE 0 END) AS [Wikis],
           SUM(CASE WHEN PostTypeId = 4 THEN 1 ELSE 0 END) AS [TagWikiExcerpts],
           SUM(CASE WHEN PostTypeId = 5 THEN 1 ELSE 0 END) AS [TagWikis],
           SUM(CASE WHEN PostTypeId = 6 THEN 1 ELSE 0 END) AS [ModeratorNominations],
           SUM(CASE WHEN PostTypeId = 7 THEN 1 ELSE 0 END) AS [WikiPlaceholders],
           SUM(CASE WHEN PostTypeId = 8 THEN 1 ELSE 0 END) AS [PrivilegeWikis],
           SUM(PostCommentCounts.CommentIdCount)           AS [Comments],
           COUNT(PostsWithDeleted.Id)                      AS [AllPostTypes]
           
FROM       PostsWithDeleted
LEFT JOIN  PostCommentCounts
       ON  PostCommentCounts.PostId = PostsWithDeleted.Id

GROUP BY   DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1)
ORDER BY   [Month] ASC

with this graph:

Comments for the Questions, Answers, Wikis TagWikiExcerpts, TagWikis, ModerationNominations, WikiPlaceholders, PrivilegeWikis, AllPostTypes including deleted ones per month (correlated by post)

Comments for the Questions, Answers, Wikis TagWikiExcerpts, TagWikis, ModerationNominations, WikiPlaceholders, PrivilegeWikis, AllPostTypes including deleted ones per month (correlated by post)

[Wayback/Archive] 537322714-a43a36be-4a30-421a-8235-74d7d865a0cb.png (978×523)

I wasn’t really sure what the best way was to add the count of comments (which are in a different table) into the results. The above query correlates them on PostId. This could also be done by [Month]. Since they are correlated, part of the aggregation needs to be done in a separate query. Initially I thought this could be done with an inner select (officially called subquery), but didn’t get that to work quickly enough. So I opted for using a WITH (officially called common table expressions or CTE).

Since I hadn’t used SQL regularly for quite a while, I used these to get me going:

  1. [WaybackFail] sql sum inner select group by – Google Search
  2. [Wayback/Archive] sql sum inner select join group by at DuckDuckGo which gave me this part of the above solution:
  3. [Wayback/Archive] select – Trying to use INNER JOIN and GROUP BY SQL with SUM Function, Not Working – Stack Overflow (thanks [Wayback/Archive] Rob4236 and [Wayback/Archive] rory.ap)

    Q

    RES_DATA Contains my Customer as below

    CUSTOMER ID | NAME
    
    1, Robert
    2, John
    3, Peter
    

    INV_DATA Contains their INVOICES as Below

    INVOICE ID | CUSTOMER ID | AMOUNT
    
    100, 1, £49.95
    200, 1, £105.95
    300, 2, £400.00
    400, 3, £150.00
    500, 1, £25.00
    

    I am Trying to write a SELECT STATEMENT Which will give me the results as Below.

    CUSTOMER ID | NAME | TOTAL AMOUNT
    
    1, Robert, £180.90
    2, John, £400.00
    3, Peter, £150.00

    A

    SELECT a.[CUSTOMER ID], a.[NAME], SUM(b.[AMOUNT]) AS [TOTAL AMOUNT]
    FROM RES_DATA a INNER JOIN INV_DATA b
    ON a.[CUSTOMER ID]=b.[CUSTOMER ID]
    GROUP BY a.[CUSTOMER ID], a.[NAME]
    
  4. [Wayback/Archive] sql sum CTE join group by at DuckDuckGo
  5. [Wayback/Archive] sql – Using GROUP BY Inside a CTE for Aggregation – Stack Overflow (thanks [Wayback/Archive] leilanihagen and [Wayback/Archive] sacse) which confirmed a CTE solution works:
    WITH SumPerOrder (SalesOrderID, CalculatedSubTotalFromDetail)
    AS
    (
    SELECT
        SalesOrderID
        ,SUM(LineTotal)
    FROM AdventureWorks2014.Sales.SalesOrderDetail
    GROUP BY SalesOrderID
    )
    SELECT
        soh.SalesOrderID
        ,soh.SalesOrderNumber
        ,soh.SubTotal AS OriginalSubTotal
        ,spo.CalculatedSubTotalFromDetail
        ,(soh.SubTotal - spo.CalculatedSubTotalFromDetail) AS Difference
    FROM AdventureWorks2014.Sales.SalesOrderHeader soh
        INNER JOIN SumPerOrder spo
            ON soh.SalesOrderID = spo.SalesOrderID
  6. [Wayback/Archive] Sum colums in one table and join with another table : SQL was referenced from the first query and shows the same solution as the prior link:
    WITH sums AS
         ( SELECT consignmentid 
                , SUM(quantity) AS sum_qty
             FROM consignmentitemstbl
           GROUP
               BY consignmentid )
    SELECT c.consignmentid
         , c.[total volume]
         , c.[total weight]
         , COALESCE(sums.sum_qty,0) AS [total quantity]
      FROM consignmentstbl AS c
    LEFT OUTER
      JOIN sums
        ON sums.consignmentid = c.consignmentid
    

Combined graph adding comments to all post types

This took me a few tries as the first try failed with “Line 25: Invalid column name 'Month'.“:

WITH       PostCommentCounts AS
(
SELECT     DATEFROMPARTS(YEAR(Comments.CreationDate), MONTH(Comments.CreationDate), 1) 
                             AS [Month],
           COUNT(Id)         AS CommentIdCount
           
FROM       Comments 
GROUP BY   DATEFROMPARTS(YEAR(Comments.CreationDate), MONTH(Comments.CreationDate), 1)
)

SELECT     DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1) 
                                                           AS [Month],
           SUM(CASE WHEN PostTypeId = 1 THEN 1 ELSE 0 END) AS [Questions],
           SUM(CASE WHEN PostTypeId = 2 THEN 1 ELSE 0 END) AS [Answers],
           SUM(CASE WHEN PostTypeId = 3 THEN 1 ELSE 0 END) AS [Wikis],
           SUM(CASE WHEN PostTypeId = 4 THEN 1 ELSE 0 END) AS [TagWikiExcerpts],
           SUM(CASE WHEN PostTypeId = 5 THEN 1 ELSE 0 END) AS [TagWikis],
           SUM(CASE WHEN PostTypeId = 6 THEN 1 ELSE 0 END) AS [ModeratorNominations],
           SUM(CASE WHEN PostTypeId = 7 THEN 1 ELSE 0 END) AS [WikiPlaceholders],
           SUM(CASE WHEN PostTypeId = 8 THEN 1 ELSE 0 END) AS [PrivilegeWikis],
           SUM(PostCommentCounts.CommentIdCount)           AS [Comments],
           COUNT(PostsWithDeleted.Id)                      AS [AllPostTypes]
           
FROM       PostsWithDeleted
LEFT JOIN  PostCommentCounts
       ON  PostCommentCounts.[Month] = PostsWithDeleted.[Month]

GROUP BY   DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1)
ORDER BY   [Month] ASC

That happened to be the PostsWithDeleted.[Month] bit.

A new try failed as well, but with a “Line 1: Arithmetic overflow error converting expression to data type int.“:

WITH       PostCommentCounts AS
(
SELECT     DATEFROMPARTS(YEAR(Comments.CreationDate), MONTH(Comments.CreationDate), 1) 
                             AS [Month],
           COUNT(Id)         AS CommentIdCount
           
FROM       Comments 
GROUP BY   DATEFROMPARTS(YEAR(Comments.CreationDate), MONTH(Comments.CreationDate), 1)
)

SELECT     DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1) 
                                                           AS [Month],
           SUM(CASE WHEN PostTypeId = 1 THEN 1 ELSE 0 END) AS [Questions],
           SUM(CASE WHEN PostTypeId = 2 THEN 1 ELSE 0 END) AS [Answers],
           SUM(CASE WHEN PostTypeId = 3 THEN 1 ELSE 0 END) AS [Wikis],
           SUM(CASE WHEN PostTypeId = 4 THEN 1 ELSE 0 END) AS [TagWikiExcerpts],
           SUM(CASE WHEN PostTypeId = 5 THEN 1 ELSE 0 END) AS [TagWikis],
           SUM(CASE WHEN PostTypeId = 6 THEN 1 ELSE 0 END) AS [ModeratorNominations],
           SUM(CASE WHEN PostTypeId = 7 THEN 1 ELSE 0 END) AS [WikiPlaceholders],
           SUM(CASE WHEN PostTypeId = 8 THEN 1 ELSE 0 END) AS [PrivilegeWikis],
           SUM(PostCommentCounts.CommentIdCount)           AS [Comments],
           COUNT(PostsWithDeleted.Id)                      AS [AllPostTypes]
           
FROM       PostsWithDeleted
LEFT JOIN  PostCommentCounts
       ON  PostCommentCounts.[Month] = DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1) 

GROUP BY   DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1)
ORDER BY   [Month] ASC

That I could not pinpoint when reading the error message (as it indicated “Line 1”), but my gut feeling was either of these:

  1. the query inside PostCommentCounts CTE at the start with the most likely place the DATEFROMPARTS
  2. the SUM at “Line 12” which was confirmed by a quick search where I learned COUNT_BIG exists, but SUM_BIG does not. Thinking about that twice that made sense: COUNT_BIG is about counting so there cannot be a cast inside the COUNT expression, but SUM is about summing the inside of the expression and that’s where a CAST needs to take place.

The learning:

  1. [Wayback/Archive] Line 1: Arithmetic overflow error converting expression to data type int. at DuckDuckGo
  2. [Wayback/Archive] SQL Server : Arithmetic overflow error converting expression to data type int – Stack Overflow (thanks [Wayback/Archive] user2270544 – Stack Overflow , [Wayback/Archive] Jeff Johnston and [Wayback/Archive] John G)

    A

    Is the problem with SUM(billableDuration)? To find out, try commenting out that line and see if it works.

    It could be that the sum is exceeding the maximum int. If so, try replacing it with SUM(CAST(billableDuration AS BIGINT)).

    A

    Very simple:

    Use

    COUNT_BIG(*) AS NumStreams
  3. [Wayback/Archive] COUNT_BIG at DuckDuckGo
  4. [Wayback/Archive] COUNT_BIG (Transact-SQL) – SQL Server | Microsoft Learn
  5. [Wayback/Archive] COUNT() vs COUNT_BIG() in SQL Server: What’s the Difference?

    The difference is that COUNT() returns its result as an int, whereas COUNT_BIG() returns its result as a bigint.

    In other words, you’ll need to use COUNT_BIG() if you expect its results to be larger than 2,147,483,647 (i.e. if the query returns more than 2,147,483,647 rows).

    We can use the sp_describe_first_result_set stored procedure to check the return data type each of these functions.

    Check the Data Type for COUNT()

    EXEC sp_describe_first_result_set N'SELECT COUNT(*) FROM Fact.[Order]', null, 0;
    

    Result (using vertical output):

    is_hidden                    | 0
    column_ordinal               | 1
    name                         | NULL
    is_nullable                  | 1
    system_type_id               | 56
    system_type_name             | int
    max_length                   | 4
    precision                    | 10
    scale                        | 0
    …

    Check the Data Type for COUNT_BIG()

    For this example, all we need to do is replace COUNT(*) with COUNT_BIG(*):

    EXEC sp_describe_first_result_set N'SELECT COUNT_BIG(*) FROM Fact.[Order]', null, 0;
    

    Result (using vertical output):

    is_hidden                    | 0
    column_ordinal               | 1
    name                         | NULL
    is_nullable                  | 1
    system_type_id               | 127
    system_type_name             | bigint
    max_length                   | 8
    precision                    | 19
    scale                        | 0
    …

    By the way, a quicker way of doing the above is to combine both functions into query when calling the stored procedure.

    Like this:

    EXEC sp_describe_first_result_set N'SELECT COUNT(*), COUNT_BIG(*) FROM Fact.[Order]', null, 0;
    

    That taught me about a I think I knew would exist, but not its name:

  6. [Wayback/Archive] sp_describe_first_result_set at DuckDuckGo
  7. [Wayback/Archive] sp_describe_first_result_set (Transact-SQL) – SQL Server | Microsoft Learn

    Returns the metadata for the first possible result set of the Transact-SQL batch. Returns an empty result set if the batch returns no results. Raises an error if the Database Engine can’t determine the metadata for the first query that will be executed by performing a static analysis.

So I tried casting which failed with a much more specific location – “Line 3: Explicit conversion from data type date to bigint is not allowed.” indicating modifying the CTE wasn’t the solution:

WITH       PostCommentCounts AS
(
SELECT     CAST(DATEFROMPARTS(YEAR(Comments.CreationDate), MONTH(Comments.CreationDate), 1) AS BIGINT)
                             AS [MonthBIGINT],
           COUNT(Id)         AS CommentIdCount
           
FROM       Comments 
GROUP BY   DATEFROMPARTS(YEAR(Comments.CreationDate), MONTH(Comments.CreationDate), 1)
)

SELECT     DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1) 
                                                           AS [Month],
           SUM(CASE WHEN PostTypeId = 1 THEN 1 ELSE 0 END) AS [Questions],
           SUM(CASE WHEN PostTypeId = 2 THEN 1 ELSE 0 END) AS [Answers],
           SUM(CASE WHEN PostTypeId = 3 THEN 1 ELSE 0 END) AS [Wikis],
           SUM(CASE WHEN PostTypeId = 4 THEN 1 ELSE 0 END) AS [TagWikiExcerpts],
           SUM(CASE WHEN PostTypeId = 5 THEN 1 ELSE 0 END) AS [TagWikis],
           SUM(CASE WHEN PostTypeId = 6 THEN 1 ELSE 0 END) AS [ModeratorNominations],
           SUM(CASE WHEN PostTypeId = 7 THEN 1 ELSE 0 END) AS [WikiPlaceholders],
           SUM(CASE WHEN PostTypeId = 8 THEN 1 ELSE 0 END) AS [PrivilegeWikis],
           SUM(PostCommentCounts.CommentIdCount)           AS [Comments],
           COUNT(PostsWithDeleted.Id)                      AS [AllPostTypes]
           
FROM       PostsWithDeleted
LEFT JOIN  PostCommentCounts
       ON  PostCommentCounts.[MonthBIGINT] = CAST(DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1) AS BIGINT)

GROUP BY   DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1)
ORDER BY   [Month] ASC

to split the grouping into [Year] and [Month] portions, then and cast the SUM parameter to use BIGINT:

WITH       PostCommentCounts AS
(
SELECT     YEAR(Comments.CreationDate)  AS [Year],
           MONTH(Comments.CreationDate) AS [Month],
           COUNT(Id)                    AS CommentIdCount
           
FROM       Comments 
GROUP BY   YEAR(Comments.CreationDate),
           MONTH(Comments.CreationDate)
)

SELECT     DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1) 
                                                                 AS [Month],
           SUM(CASE WHEN PostTypeId = 1 THEN 1 ELSE 0 END)       AS [Questions],
           SUM(CASE WHEN PostTypeId = 2 THEN 1 ELSE 0 END)       AS [Answers],
           SUM(CASE WHEN PostTypeId = 3 THEN 1 ELSE 0 END)       AS [Wikis],
           SUM(CASE WHEN PostTypeId = 4 THEN 1 ELSE 0 END)       AS [TagWikiExcerpts],
           SUM(CASE WHEN PostTypeId = 5 THEN 1 ELSE 0 END)       AS [TagWikis],
           SUM(CASE WHEN PostTypeId = 6 THEN 1 ELSE 0 END)       AS [ModeratorNominations],
           SUM(CASE WHEN PostTypeId = 7 THEN 1 ELSE 0 END)       AS [WikiPlaceholders],
           SUM(CASE WHEN PostTypeId = 8 THEN 1 ELSE 0 END)       AS [PrivilegeWikis],
           SUM(CAST(PostCommentCounts.CommentIdCount AS BIGINT)) AS [Comments],
           COUNT(PostsWithDeleted.Id)                            AS [AllPostTypes]
           
FROM       PostsWithDeleted
LEFT JOIN  PostCommentCounts
       ON  PostCommentCounts.[Year] = YEAR(PostsWithDeleted.CreationDate)
       AND PostCommentCounts.[Month] = MONTH(PostsWithDeleted.CreationDate)

GROUP BY   DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1)
ORDER BY   [Month] ASC

The resulting graph (and to a lesser extend the 7 second query duration) then made me slap my head: I should have used INNER JOIN instead of LEFT JOIN,which is short for LEFT OUTER JOIN, explaining the explosion of the count of Comments:

Explosion of Comments values (not rows) because of the LEFT JOIN

Explosion of Comments values (not rows) because of the LEFT JOIN

[Wayback/Archive] 537711393-9eeb1895-2536-434c-9274-ff0ea6245ce9.png (979×523)

This meant I could try to go back to the first query, the one resulting in “Line 1: Arithmetic overflow error converting expression to data type int.“, and replace the LEFT JOIN with INNER JOIN:

WITH       PostCommentCounts AS
(
SELECT     DATEFROMPARTS(YEAR(Comments.CreationDate), MONTH(Comments.CreationDate), 1) 
                             AS [Month],
           COUNT(Id)         AS CommentIdCount
           
FROM       Comments 
GROUP BY   DATEFROMPARTS(YEAR(Comments.CreationDate), MONTH(Comments.CreationDate), 1)
)

SELECT     DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1) 
                                                                 AS [Month],
           SUM(CASE WHEN PostTypeId = 1 THEN 1 ELSE 0 END)       AS [Questions],
           SUM(CASE WHEN PostTypeId = 2 THEN 1 ELSE 0 END)       AS [Answers],
           SUM(CASE WHEN PostTypeId = 3 THEN 1 ELSE 0 END)       AS [Wikis],
           SUM(CASE WHEN PostTypeId = 4 THEN 1 ELSE 0 END)       AS [TagWikiExcerpts],
           SUM(CASE WHEN PostTypeId = 5 THEN 1 ELSE 0 END)       AS [TagWikis],
           SUM(CASE WHEN PostTypeId = 6 THEN 1 ELSE 0 END)       AS [ModeratorNominations],
           SUM(CASE WHEN PostTypeId = 7 THEN 1 ELSE 0 END)       AS [WikiPlaceholders],
           SUM(CASE WHEN PostTypeId = 8 THEN 1 ELSE 0 END)       AS [PrivilegeWikis],
           SUM(CAST(PostCommentCounts.CommentIdCount AS BIGINT)) AS [Comments],
           COUNT(PostCommentCounts.CommentIdCount)               AS [PostCommentCountsCount],
           COUNT(PostsWithDeleted.Id)                            AS [AllPostTypes]
           
FROM       PostsWithDeleted
INNER JOIN PostCommentCounts
        ON PostCommentCounts.[Month] = DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1) 

GROUP BY   DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1)
ORDER BY   [Month] ASC

But the resulting graph also returned way too high values for Comments:

INNER JOIN also has way too high values for Comments.

INNER JOIN also has way too high values for Comments.

[Wayback/Archive] 537733862-c2f47291-107b-42fe-94ca-a3f53a2a88e3.png (980×495)

Back to the drawing board

My next try was to use a SUM on a subquery, but that failed with a “Line 22: Incorrect syntax near the keyword 'SELECT'. Incorrect syntax near ')'.“:

WITH       PostCommentCounts AS
(
SELECT     DATEFROMPARTS(YEAR(Comments.CreationDate), MONTH(Comments.CreationDate), 1) 
                             AS [Month],
           COUNT(Id)         AS CommentIdCount
           
FROM       Comments 
GROUP BY   DATEFROMPARTS(YEAR(Comments.CreationDate), MONTH(Comments.CreationDate), 1)
)

SELECT     DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1) 
                                                                 AS [Month],
           SUM(CASE WHEN PostTypeId = 1 THEN 1 ELSE 0 END)       AS [Questions],
           SUM(CASE WHEN PostTypeId = 2 THEN 1 ELSE 0 END)       AS [Answers],
           SUM(CASE WHEN PostTypeId = 3 THEN 1 ELSE 0 END)       AS [Wikis],
           SUM(CASE WHEN PostTypeId = 4 THEN 1 ELSE 0 END)       AS [TagWikiExcerpts],
           SUM(CASE WHEN PostTypeId = 5 THEN 1 ELSE 0 END)       AS [TagWikis],
           SUM(CASE WHEN PostTypeId = 6 THEN 1 ELSE 0 END)       AS [ModeratorNominations],
           SUM(CASE WHEN PostTypeId = 7 THEN 1 ELSE 0 END)       AS [WikiPlaceholders],
           SUM(CASE WHEN PostTypeId = 8 THEN 1 ELSE 0 END)       AS [PrivilegeWikis],
           SUM
           (
               SELECT PostCommentCounts.CommentIdCount
               FROM   PostCommentCounts
               WHERE  PostCommentCounts.[Month] = PostsWithDeleted.[Month]
           )
                                                                 AS [Comments],
           COUNT(PostsWithDeleted.Id)                            AS [AllPostTypes]
           
FROM       PostsWithDeleted
GROUP BY   DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1)
ORDER BY   [Month] ASC

Back then, I confused the order of SUM and SELECT, and I didn’t understand the ')' part of that error, but did some digging:

  • [Wayback/Archive] sql server subquery inside sum at DuckDuckGo (because SEDE is based on SQL Azure, which is based on SQL Server)
  • [Wayback/Archive] “sql” “subquery” inside “sum” at DuckDuckGo (going more generic as the above query didn’t get useful results)
  • [Wayback/Archive] Why SQL subquery doesn’t work inside SUM function? – Stack Overflow (thanks [Wayback/Archive] Trinh Cuong, [Wayback/Archive] Thorsten Kettner and [Wayback/Archive] Gordon Linoff)

    Q

    I’m learning SQL, and the lesson is subquery. My query is:

    select sum (select tientra from thang7_8714 where tientra > 0) as tmp;
    

    But Postgres notice me the message:

    ERROR:  syntax error at or near "select" LINE 1: select sum (select
      tientra from thang7_8714 where tientra > ...
      SQL state: 42601 Character: 13

    A

    SUM with a subquery

    SUM wants one parameter. This can be a fixed value (e.g. 123) or a column (e.g. mycolumn) or an expression (e.g. 123 * mycolumn) or a subquery. But this subquery would have to be scalar, which means it returns only one value.

    A subquery is surrounded by parentheses, so such SUM with a subquery would look like this:

    select sum( (select t2.value from t2 where t2.id = t1.id_t1) )
    from t1;
    

    But subqueries inside aggregate functions are extremely rare, because we can achieve the same with a join (e.g. select sum(t2.value) from t1 join t2 on t2.id = t1.id_t2;).

    Typical places for subqueries:

    that answer finished with a list (including examples) of places where subqueries usually are used, and was followed by this answer:

    A

    In general, SQL does not allow aggregation functions to have arguments that are subqueries. The generic solution is to move the aggregation inside the subquery:

    select (select sum(tientra) from thang7_8714 where tientra > 0) as tmp;
    

    Presumably, you know that the more canonical method for solving this is:

    select sum(tientra) as tmp
    from thang7_8714
    where tientra > 0;
    

    This shows:

    1. that PostgreSQL has a better error message than the SEDE (which is based on SQL Azure, which simplified is the a very recent SQL Server version running on Microsoft Azure)
    2. the conditions an aggregate expression needs to meet
    3. an alternative

In retrospect, it is obviously why a SUM needs a single value in its expression: it will aggregate (in this case summate) the single value for each occurance in the current GROUP BY.

Even adding an extra set of parenthesis as suggested in the above answer does not resolve the “single value” problem: it just returns either a less specific answer “Something unexpected went wrong while running your query.” or “Line 25: Cannot perform an aggregate function on an expression containing an aggregate or a subquery.“:

WITH       PostCommentCounts AS
(
SELECT     DATEFROMPARTS(YEAR(Comments.CreationDate), MONTH(Comments.CreationDate), 1) 
                             AS [Month],
           COUNT(Id)         AS CommentIdCount
           
FROM       Comments 
GROUP BY   DATEFROMPARTS(YEAR(Comments.CreationDate), MONTH(Comments.CreationDate), 1)
)

SELECT     DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1) 
                                                                 AS [Month],
           SUM(CASE WHEN PostTypeId = 1 THEN 1 ELSE 0 END)       AS [Questions],
           SUM(CASE WHEN PostTypeId = 2 THEN 1 ELSE 0 END)       AS [Answers],
           SUM(CASE WHEN PostTypeId = 3 THEN 1 ELSE 0 END)       AS [Wikis],
           SUM(CASE WHEN PostTypeId = 4 THEN 1 ELSE 0 END)       AS [TagWikiExcerpts],
           SUM(CASE WHEN PostTypeId = 5 THEN 1 ELSE 0 END)       AS [TagWikis],
           SUM(CASE WHEN PostTypeId = 6 THEN 1 ELSE 0 END)       AS [ModeratorNominations],
           SUM(CASE WHEN PostTypeId = 7 THEN 1 ELSE 0 END)       AS [WikiPlaceholders],
           SUM(CASE WHEN PostTypeId = 8 THEN 1 ELSE 0 END)       AS [PrivilegeWikis],
           SUM
           ((
               SELECT PostCommentCounts.CommentIdCount
               FROM   PostCommentCounts
               WHERE  PostCommentCounts.[Month] = PostsWithDeleted.[Month]
           ))
                                                                 AS [Comments],
           COUNT(PostsWithDeleted.Id)                            AS [AllPostTypes]
           
FROM       PostsWithDeleted
GROUP BY   DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1)
ORDER BY   [Month] ASC

That means the SUM needs to be inside the SELECT, so I basically swapped the SUM and SELECT parts of it and this worked (In addition, as aliases apparently of the superquery are not allowed in the subquery causing the ambiguous error message “Line 23: Invalid column name 'Month'.“, I needed to replace PostsWithDeleted.[Month] with DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1):

WITH       PostCommentCounts AS
(
SELECT     DATEFROMPARTS(YEAR(Comments.CreationDate), MONTH(Comments.CreationDate), 1) 
                             AS [Month],
           COUNT(Id)         AS CommentIdCount
           
FROM       Comments 
GROUP BY   DATEFROMPARTS(YEAR(Comments.CreationDate), MONTH(Comments.CreationDate), 1)
)

SELECT     DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1)
                                                                 AS [Month],
           SUM(CASE WHEN PostTypeId = 1 THEN 1 ELSE 0 END)       AS [Questions],
           SUM(CASE WHEN PostTypeId = 2 THEN 1 ELSE 0 END)       AS [Answers],
           SUM(CASE WHEN PostTypeId = 3 THEN 1 ELSE 0 END)       AS [Wikis],
           SUM(CASE WHEN PostTypeId = 4 THEN 1 ELSE 0 END)       AS [TagWikiExcerpts],
           SUM(CASE WHEN PostTypeId = 5 THEN 1 ELSE 0 END)       AS [TagWikis],
           SUM(CASE WHEN PostTypeId = 6 THEN 1 ELSE 0 END)       AS [ModeratorNominations],
           SUM(CASE WHEN PostTypeId = 7 THEN 1 ELSE 0 END)       AS [WikiPlaceholders],
           SUM(CASE WHEN PostTypeId = 8 THEN 1 ELSE 0 END)       AS [PrivilegeWikis],
           (
               SELECT SUM(PostCommentCounts.CommentIdCount)
               FROM   PostCommentCounts
               WHERE  PostCommentCounts.[Month] = DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1)
           )
                                                                 AS [Comments],
           COUNT(PostsWithDeleted.Id)                            AS [AllPostTypes]
           
FROM       PostsWithDeleted
GROUP BY   DATEFROMPARTS(YEAR(PostsWithDeleted.CreationDate), MONTH(PostsWithDeleted.CreationDate), 1)
ORDER BY   [Month] ASC

More importantly, the graph was also correct:

Comments for the Questions, Answers, Wikis TagWikiExcerpts, TagWikis, ModerationNominations, WikiPlaceholders, PrivilegeWikis, AllPostTypes including deleted ones per month (correlated by month)

Comments for the Questions, Answers, Wikis TagWikiExcerpts, TagWikis, ModerationNominations, WikiPlaceholders, PrivilegeWikis, AllPostTypes including deleted ones per month (correlated by month)

[Wayback/Archive] 537739041-0bbe4aef-977d-48f6-87ce-95e646b08b1b.png (981×520)

And the graph worries me of the future: besides questions and answers dropping, also the ratio of answers/question and comments/post are dropping. Bo the ratios are important measures on interaction, and I think less interaction means less quality.

Comparing the queries

Note that I intentionally left the order of the columns the same in all queries so – sorry if you are not colorblind – you can compare the data across the graphs except for the ones in the graph “CommentIdCount by month”.

I wish the SEDE would offer other options than colours to distinguish column data, but they don’t.

If you want build upon these queries multiple times in various directions, then the SEDE by default (because it prefers linear history) it allows just one fork, but in practice you can by realising that the Query Stack Overflow – Stack Exchange Data Explorer: fork “Questions per month including deleted” has this URL

data.stackexchange.com/stackoverflow/query/fork/1932522

In a similar way, editing a query has a URL like this:

data.stackexchange.com/stackoverflow/query/edit/1933447

In fact there are at least 6 URL forms for query:

All of these can have various fragments to switch the results view into various modes or direct focus to various inputs:

Type of SEDE query fragments (including their #)
Fragment Meaning
#result Results view
#resultSets Results view
#messages Messages view
#graph Graph view
#executionPlan Execution plan view (only visible when “Include execution plan” is checked before running the query)
#edit-query-description Query description text area
#sql SQL text area
#query-params Query parameters edits (only visible when there are query parameters)
#site-selector Site selector pane
#switch-sites Site selector query input

There are more fragments, but these do not set focus.

I got to the above tables because these were in my query history:

The above URL table also means that you can edit anyone elses queries (in essence creating a fork of it) like data.stackexchange.com/stackoverflow/query/edit/1932522

I did and ended up with the above four forks all stemming from query 1932522: Questions per month including deleted – Stack Exchange Data Explorer:

  1. 1933385: Answers per month including deleted – Stack Exchange Data Explorer
  2. 1933447: Questions, answers and other post types per month including deleted – Stack Exchange Data Explorer
    1. 1933489: Comments for questions, answers and other posts per month including deleted – Stack Exchange Data Explorer
    2. 1933447: Comments for questions, answers and other post types per month incl. deleted (correlated by month) – Stack Exchange Data Explorer

Of course, I could have forked 2.2. from 2.1. but then I could not have shown you can fork your own queries.

What I did was forking Questions, answers and other post types per month including deleted – Stack Exchange Data Explorer to create a second history tree by editing

data.stackexchange.com/stackoverflow/revision/1933447/2375006/questions-answers-and-other-post-types-per-month-including-deleted

into

data.stackexchange.com/stackoverflow/query/fork/1933447

So if I want to fork Comments for questions, answers and other posts per month including deleted – Stack Exchange Data Explorer as well, now i know how!

Stack Exchange Data Explorer (SEDE)

The above graphs were made using the Stack Exchange Data Explorer (SEDE).

Some links on it’s basics:

SEDE used to be a great playground to learn SQL, but the current rate limiting makes it less useful for that. That means that by now, the best way to use it is with good enough SQL knowledge and a grasp of the Stack Exchange data model. For the last part, these links will help:

A few more useful links on SEDE:

Note that Stack Exchange is giving both the Wayback Machine and Archive.is a hard time archiving SEDE links, see for instance the [Wayback] of the last linked query above. That’s a pity, as with the decline of Stack Exchange, it might go off-line one day taking all kinds of historically relevant data with it.

Via

  1. [Wayback/Archive] Post by @wiert.bsky.social — Bluesky: Source
  2. [Wayback/Archive] Sherwood News Stack Overflow’s Forum’s Decline Started Years Ago, But Al Killed The Platform For Good at DuckDuckGo

--jeroen

PS: My favorite queries are listed at User Jeroen.Wiert.Pluimers – Stack Exchange Data Explorer. Regrettably, this long cannot be archived, so I quoted the list at the time of finishing this blog post:

🔥 **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#Wiert #Corner #irregular #stream #stuff**

🕒 **Posted on**: 1779820774

🌟 **Want more?** Click here for more info! 🌟

By

Leave a Reply

Your email address will not be published. Required fields are marked *