Retired

Proofreading task: find all duplicate words in a multilingual text (retired)

Description:

One of the basic tasks of proofreaders is to try and capture inadvertently repeated words in the text they’re editing. Almost all word-processing software provides such a tool – for instance, Word for Windows has the ‘Find’ command, which can capture a wide range of letter characters in various languages.

Your task is to create a simpler tool that will capture all duplicate words in a text that contains words with characters from the English, French, German, Danish, Swedish and Spanish alphabets and will return all identified duplicates in pairs (e.g. ‘Søren Søren’) in a single array. You are provided with a short text with a number of duplicate words, so, essentially, you’re given a bunch of test cases to try out.

RESTRICTIONS AND THINGS TO NOTE:

  1. Your tool must be capable of capturing all duplicate words, so long as the repeated words are separated by one or more spaces. The spaces will be preserved in the results.
  2. Your proofing tool must capture only entire words, so you must make sure that it doesn’t capture repeated blocks of numbers (such as ‘2225 2225’).
  3. Your proofing tool must be capable of capturing duplicate words regardless of case and of the number of spaces that separate them (e.g. it should catch both ‘Two two’ and ‘two two’).
  4. As in real life, whether the duplicate words should be deleted or not is up to the proofreader to decide. So your proofing tool should not delete or replace with something else the duplicate words it captures (for instance, a proofreader would not delete the second ‘really’ in ‘He’s really really nice’).

TIPS:

  1. Javascript still has a strong Anglosaxon bias. Which means that capturing characters from non-English alphabets isn’t always as straightforward as it ought to be. So you may have to think of a workaround for some pesky foreign characters.
  2. The proofing tool has real-life practical uses: it represents a (basic) solution to a real-life problem proofreaders who work online often face. Luckily, they’re unlikely to come across texts with as many repetitions as the text you’re invited to check here.
  3. The proofing tool is expandable and thus versatile: once you've found the solution, you may want to experiment with different samples of text. Note, however, that if you do change the sample text, you may (or may not) have to modify your solution for it to work.

The text you're invited to check is included in the solution, but I've pasted below in a more readable format.

Good luck, Viel Glück!

TEXT:

'Prof. Süsskind Süsskind and fast panting Jin-Jin, his tail tense, stepped forward with trepidation. The sudden rumble and roar of of the heaving ground made Salomé Weiß Weiß duck instinctively - and just in time to avoid the boulder that crashed with a thud inches above her head. "That was close", whispered Søren Søren Årndt to Jan Lærke Lærke, but the latter stood transfixed: the collapsed rocks had revealed a mysterious inscription: " La côte côte des 3579 3579 esprits oubliés oubliés sont en Åland Åland" stuttered Señor François, as was his wont. Aríaðna Aríaðna cast an annoyed glance at him. "Indeed, Señor Señor François François", she mocked. "I have never ever ever seen anything like it", murmured Baron Pitoëff Pitoëff (that was actually his name).'

Regular Expressions
Fundamentals

Stats:

CreatedApr 6, 2018
Warriors Trained65
Total Skips1
Total Code Submissions204
Total Times Completed13
JavaScript Completions13
Total Stars3
% of votes with a positive feedback rating29% of 7
Total "Very Satisfied" Votes0
Total "Somewhat Satisfied" Votes4
Total "Not Satisfied" Votes3
Total Rank Assessments7
Average Assessed Rank
6 kyu
Highest Assessed Rank
5 kyu
Lowest Assessed Rank
8 kyu
Ad
Contributors
  • Art67 Avatar
Ad