this post was submitted on 04 May 2025
18 points (100.0% liked)

Linguistics

882 readers
1 users here now

Welcome to the community about the science of human Language!

Everyone is welcome here: from laypeople to professionals, Historical linguists to discourse analysts, structuralists to generativists.

Rules:

  1. Instance rules apply.
  2. Be reasonable, constructive, and conductive to discussion.
  3. Stay on-topic, specially for more divisive subjects. And avoid unnecessary mentioning topics and individuals prone to derail the discussion.
  4. Post sources when reasonable to do so. And when sharing links to paywalled content, provide either a short summary of the content or a freely accessible archive link.
  5. Avoid crack theories and pseudoscientific claims.
  6. Have fun!

Related communities:

founded 1 year ago
MODERATORS
 

(Cross posting from !stupid_questions@lemmings.world https://lemmings.world/post/25535229)

A brand new sentence is a sentence which we consider to have never been spoken or written, or thought of (at least the ones which are not recorded). And just swapping a noun with another noun (for example a name of a place or person) while may techinically be considered new sentence, it is not really matching the spirit of brand new sentence.

for the linguists, can you try to come up with a better estimate (better than just coming up (all the words)^(to the power average sentence length)). Maybe by using the description of using different forms of verbs (like we consider in NLP) (verbs which take DP, CP), then adding standard adjectives and finish with remaining grammar (sorry if I am getting it all wrong, it has been a while since I took my intro to linguistics class). Also, consider a morpheme less form. This exercise is for a more realistic guess.

you are viewing a single comment's thread
view the rest of the comments
[–] lvxferre@mander.xyz 1 points 2 weeks ago (1 children)

Poisson does make more sense, and it would be easier to work with. In that case the odds of a single sentence having a specific length n would be p = (λ^n)*[e^(-λ)] / n!; for English λ should be around 18 words/sentence.

English even has a loophole of having ‘;’ which is kinda like full stop, but does not really count as one. (I do not really know how this is classified properly in linguistics, my guess is that it would a conjunction, but then some over powered kind, which allows to break regular grammar rules).

The semicolon is simply punctuation; a conjunction would be a word, like "and". Since the semicolon is mostly used to connect related albeit independent sentences, I think it's fair to treat it like a full stop.

Please correct me on stuff i got wrong, i am very new to this stuff.

So am I - my main area of interest is Historical Linguistics, so I'm completely clueless about this stuff. I never thought the statistics classes I got 20y ago in a Chemistry grad would help me with this, but here we are.

[–] sga 2 points 2 weeks ago (1 children)

English λ should be around 18 words/sentence

is not that really huge. Does an average sentence really have 18 words? Would love the source.

I never thought the statistics classes I got 20y ago in a Chemistry grad would help me with this, but here we are.

my statistics is coming from QM 1 2 and optics classes

[–] lvxferre@mander.xyz 2 points 2 weeks ago (1 children)

I remember reading this number from style manuals, but the sources I've found online are actually consistent with this number - this one for example claiming 15~20 words. It seems to vary an awful lot depending on the topic and the author, though; plus the source above is mostly prescriptive, so take it with a grain of salt.

[–] sga 2 points 2 weeks ago

my guess would have been something like 5-10 words (maybe 7). Maybe in literature it would be much higher, as writing capabilities for people writing literature (technical or not) is much better than average stuff an average person says. Averages have to include less than 10 year olds, and even 5 year olds, which might have hard time having 10 words stringed together in a logical manner. Still seems crazy fact to me.