Native speakers don't always talk the same way. The register they use varies dramatically depending on the context, and understanding all registers is important for full fluency.
In sociolinguistics, a register is a variety of language used for a particular purpose or particular communicative situation. For example, when speaking officially or in a public setting, an English speaker may be more likely to follow prescriptive norms for formal usage than in a casual setting, for example, by pronouncing words ending in -ing rather than -in, which is common in casual contexts. - Wikipedia
A few common examples:
This is the clearest, most deliberate way of speaking. Examples: news broadcasts, formal presentations, reading aloud, speaking to a non-native speaker.
Words are fully pronounced, grammar is standard, and the pace is moderate. This is the easiest register to understand, but also most like to contain embellished lexemes.
Written dialogue that actors perform. Examples: TV shows, movies, audiobooks, radio dramas.
Scripted speech sounds more natural than careful speech, but it's still cleaner and more predictable than real conversation. Words are mostly clear, pacing is designed for the audience, and the vocabulary is chosen for entertainment value.
How native speakers actually talk to each other in real life. Examples: conversations with friends, vlogs, unscripted podcasts, overheard street conversation.
Casual speech is fast, words blend together, sounds get dropped, grammar bends, slang appears, and people interrupt each other, trail off, and change direction mid-sentence.
Casual speech is the hardest register to understand, but it's also the register you'll encounter most in real life.
Many learners who can understand TV shows are shocked when they can't understand a native speaker talking normally. This is because they've mostly been exposed to scripted speech. Phase 3 deliberately introduces progressively more casual content to bridge this gap, and Phase 3D (Understand Native Conversation) specifically targets casual, conversational speech through crosstalk and unscripted content.