Python/์ฝ”๋”ฉํ…Œ์ŠคํŠธ

[์ฝ”๋”ฉํ…Œ์ŠคํŠธ] ๋ฌธ๋‹จ์—์„œ ๊ฐ€์žฅ ํ”ํ•œ ๋‹จ์–ด ์ฐพ๊ธฐ - re.sub, counter ๊ฐ์ฒด

์ฃผ์˜ ๐Ÿฑ 2024. 1. 13. 16:34
728x90

[๋ฌธ์ œ]

paragraph์—์„œ ๋Œ€์†Œ๋ฌธ์ž, ์‰ผํ‘œ ๊ตฌ๋‘์ ๋“ฑ์„ ๋ฌด์‹œํ•˜๊ณ , banned ๋‹จ์–ด์— ํฌํ•จ๋˜์ง€ ์•Š์€ ๋‹จ์–ด ์ค‘ ๊ฐ€์žฅ ๋งŽ์ด ๋“ฑ์žฅํ•œ ๋‹จ์–ด ๋ฐ˜ํ™˜

Example 1:

Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"]
Output: "ball"
Explanation: 
"hit" occurs 3 times, but it is a banned word.
"ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. 
Note that words in the paragraph are not case sensitive,
that punctuation is ignored (even if adjacent to words, such as "ball,"), 
and that "hit" isn't the answer even though it occurs more because it is banned.

Example 2:

Input: paragraph = "a.", banned = []
Output: "a"

 

[ํ’€์ด]

1. ์ž…๋ ฅ๊ฐ’์— ๋Œ€ํ•œ ์ „์ฒ˜๋ฆฌ (re)

2. ๋“ฑ์žฅํ•œ ๊ฐœ์ˆ˜ ์„ธ๊ธฐ (counter)

3. return

 

[์„ค๋ช…]

re.sub๋ฅผ ํ™œ์šฉํ•œ ๋ฌธ์ž์—ด ์น˜ํ™˜ํ•˜๊ธฐ

re.sub('ํŒจํ„ด', '๋ฐ”๊ฟ€๋ฌธ์ž์—ด', '๋ฌธ์ž์—ด', ๋ฐ”๊ฟ€ํšŸ์ˆ˜)

์ •๊ทœํ‘œํ˜„์‹ ๋ฌธ๋ฒ•์— ๋”ฐ๋ผ ๋‹ค์–‘ํ•œ ํŒจํ„ด์„ ๊ตฌ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค.

- ๋Š” ๋ฒ”์œ„๋Š” ๋‚˜ํƒ€๋ƒ„ [a-z] - ์˜์–ด ์†Œ๋ฌธ์ž

^ ๋Š” not์— ํ•ด๋‹น , [^1] ๋Š” '1'๋ฅผ ์ œ์™ธํ•œ ๋ชจ๋“  ๋ฌธ์ž์™€ ์ผ์น˜, [1^]๋Š” '1'ํ˜น์€ '^'์™€ ์ผ์น˜ํ•œ๋‹ค.

\w๋Š” ๋ชจ๋“  ์˜์ˆซ์ž,

\d: ์ˆซ์ž([0-9]),

\D : ๋น„ ์ˆซ์ž ๋ฌธ์ž([^0-9]), 

 . ์€ “๋ชจ๋“  ๋ฌธ์ž”์™€ ์ผ์น˜

\s: ๊ณต๋ฐฑ ๋ฌธ์ž, 

์ด๋ฅผ ํ˜ผํ•ฉํ•˜์—ฌ ์“ฐ๋Š” ๊ฒƒ๋„ ๊ฐ€๋Šฅํ•œ๋ฐ, [\s,.] ๋Š” ๋ชจ๋“  ๊ณต๋ฐฑ, ',' ๋˜๋Š” '.'์™€ ์ผ์น˜ํ•˜๋Š” ๋ฌธ์ž

\Aword: ๋ฌธ์ž์—ด์˜ ์‹œ์ž‘๊ณผ 'word'๊ฐ€ ์ผ์น˜ํ•ฉ๋‹ˆ๋‹ค.
end\Z: ๋ฌธ์ž์—ด์˜ ๋๊ณผ 'end'๊ฐ€ ์ผ์น˜ํ•ฉ๋‹ˆ๋‹ค.

๋ฐ˜๋ณต

* : ca*t๋Š” 'ct' (0๊ฐœ์˜ 'a' ๋ฌธ์ž), 'cat' (1๊ฐœ์˜ 'a'), 'caaat' (3๊ฐœ์˜ 'a' ๋ฌธ์ž) ๋“ฑ๊ณผ ์ผ์น˜

 

๋ฆฌ์ŠคํŠธ ์ปดํ”„๋ฆฌํ—จ์…˜

- ํŒŒ์ด์ฌ์—์„œ ๊ฐ„๊ฒฐํ•˜๊ฒŒ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜

new_list = [expression for item in iterable if condition]

  • expression: ๊ฐ ํ•ญ๋ชฉ์— ๋Œ€ํ•œ ๊ณ„์‚ฐ์‹ ๋˜๋Š” ํ‘œํ˜„์‹์ž…๋‹ˆ๋‹ค.
  • item: ๋ฐ˜๋ณต๋˜๋Š” ๊ฐ ์š”์†Œ์— ๋Œ€ํ•œ ๋ณ€์ˆ˜๋ช…์ž…๋‹ˆ๋‹ค.
  • iterable: ๋ฐ˜๋ณต ๊ฐ€๋Šฅํ•œ ๊ฐ์ฒด(๋ฆฌ์ŠคํŠธ, ํŠœํ”Œ, ๋ฌธ์ž์—ด ๋“ฑ)์ž…๋‹ˆ๋‹ค.
  • condition (์„ ํƒ ์‚ฌํ•ญ): ์กฐ๊ฑด์‹์œผ๋กœ, ํ•„ํ„ฐ๋ง ์กฐ๊ฑด์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

์—ฌ๋Ÿฌ ์ค„ ํ…์ŠคํŠธ์—์„œ ๊ฐ ์ค„์˜ ๊ธธ์ด๋ฅผ ๊ตฌํ•˜๊ณ ์ž ํ•  ๋•Œ

text = """This is a
multiline
text."""

line_lengths = [len(line) for line in text.split('\n')]
print(line_lengths)

๋”ฐ๋ผ์„œ ์ด๋ฅผ ์ด์šฉํ•ด ๊ฐ ๋‹จ์–ด ๋ฆฌ์ŠคํŠธ ์ƒ์„ฑ

words = [word for word in re.sub('[^\w]', ' ', paragraph)
.lower().split() if word not in banned]

['bob', 'a', 'ball', 'the', 'ball', 'flew', 'far', 'after', 'it', 'was']

collections ๋ชจ๋“ˆ์— ํฌํ•จ๋œ Counter ํด๋ž˜์Šค๋Š” ํŒŒ์ด์ฌ์—์„œ ๊ฐ„๋‹จํ•˜๊ฒŒ ์š”์†Œ์˜ ๊ฐœ์ˆ˜๋ฅผ ์„ธ๊ธฐ ์œ„ํ•œ ์œ ์šฉํ•œ ๋„๊ตฌ์ž…๋‹ˆ๋‹ค. Counter ๊ฐ์ฒด๋Š” ๋ฐ˜๋ณต ๊ฐ€๋Šฅํ•œ(iterable) ๊ฐ์ฒด(๋ฆฌ์ŠคํŠธ, ํŠœํ”Œ ๋“ฑ)์—์„œ ๊ฐ ์š”์†Œ์˜ ๊ฐœ์ˆ˜๋ฅผ ์…‰๋‹ˆ๋‹ค.

from collections import Counter

my_list = [1, 2, 3, 1, 2, 3, 1, 2, 1, 1]
my_counter = Counter(my_list)

print(my_counter)  # Counter({1: 5, 2: 3, 3: 2})

most_common([n]): ๊ฐ€์žฅ ๋นˆ๋ฒˆํ•˜๊ฒŒ ๋‚˜ํƒ€๋‚˜๋Š” ์š”์†Œ์™€ ๊ทธ ๊ฐœ์ˆ˜๋ฅผ ์ˆœ์„œ๋Œ€๋กœ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. n์„ ์ œ๊ณตํ•˜๋ฉด ์ƒ์œ„ n๊ฐœ๋งŒ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

print(my_counter.most_common(2))  # [(1, 5), (2, 3)]

 

 

[๋‹ต]

class Solution:
def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
words = [word for word in re.sub('[^\w]', ' ', paragraph)
.lower().split() if word not in banned]
counts = collections.Counter(words)
return counts.most_common(1)[0][0]
๋ฐ˜์‘ํ˜•