前に、LuceneのDirectoryの実装としてのInfinispanの機能を使ってみましたが、そもそも自分はLuceneにあまり詳しくないので、これを機にちょっと勉強してみることにしました。
仕事でも、直接的でないにしろ、Solrを使っていますので。
Apache Lucene
http://lucene.apache.org/core/index.html
というわけで、Analyzerから触っていこうと思います。プログラムは、Scalaで書きました。
では、build.sbtから。
name := "lucene-analyzers" version := "0.0.1-SNAPSHOT" scalaVersion := "2.10.1" organization := "littlewings" scalacOptions += "-deprecation" libraryDependencies ++= Seq( "org.apache.lucene" % "lucene-core" % "4.3.0", "org.apache.lucene" % "lucene-analyzers-common" % "4.3.0", "org.apache.lucene" % "lucene-analyzers-kuromoji" % "4.3.0" )
使うLuceneのバージョンは4.3で、AnalyzerにKuromojiを含めています。
では、サンプルコード。
src/main/scala/LuceneAnalyzers.scala
import java.io.StringReader import org.apache.lucene.analysis.{Analyzer, TokenStream} import org.apache.lucene.analysis.cjk.CJKAnalyzer import org.apache.lucene.analysis.core.WhitespaceAnalyzer import org.apache.lucene.analysis.core.KeywordAnalyzer import org.apache.lucene.analysis.ja.{JapaneseAnalyzer, JapaneseTokenizer} import org.apache.lucene.analysis.ja.tokenattributes.{BaseFormAttribute, PartOfSpeechAttribute, ReadingAttribute, InflectionAttribute} import org.apache.lucene.analysis.ja.dict.UserDictionary import org.apache.lucene.analysis.standard.StandardAnalyzer import org.apache.lucene.analysis.tokenattributes.{CharTermAttribute, OffsetAttribute, PositionIncrementAttribute, TypeAttribute} import org.apache.lucene.util.Version object LuceneAnalyzers { def main(args: Array[String]): Unit = { val luceneVersion = Version.LUCENE_43 val texts = List( "すもももももももものうち。", "メガネは顔の一部です。", "日本経済新聞でモバゲーの記事を読んだ。", "Java, Scala, Groovy, Clojure", "LUCENE、SOLR、Lucene, Solr", "アイウエオカキクケコさしすせそABCXYZ123456", "Lucene is a full-featured text search engine library written in Java." ) usingTokenStream(/* ここでAnalyzerをnewして渡す */, texts: _*)(displayTokens) } def usingTokenStream(analyzer: Analyzer, texts: String*)(body: (String, TokenStream) => Unit): Unit = { println(s"Analyzer => ${analyzer.getClass.getName} Start") for (text <- texts) { val reader = new StringReader(text) val tokenStream = analyzer.tokenStream("", reader) try { body(text, tokenStream) } finally { tokenStream.close() } } println(s"Analyzer => ${analyzer.getClass.getName} End") println() } def displayTokens(text: String, tokenStream: TokenStream): Unit = { val charTermAttr = tokenStream.addAttribute(classOf[CharTermAttribute]) val offsetAttr = tokenStream.addAttribute(classOf[OffsetAttribute]) val positionIncrementAttr = tokenStream.addAttribute(classOf[PositionIncrementAttribute]) val typeAttr = tokenStream.addAttribute(classOf[TypeAttribute]) // JapaneseAnalyzerは、これを入れないと取得できない // Kuromoji Additional Attributes val baseFormAttr = tokenStream.addAttribute(classOf[BaseFormAttribute]) val partOfSpeechAttr = tokenStream.addAttribute(classOf[PartOfSpeechAttribute]) val readingAttr = tokenStream.addAttribute(classOf[ReadingAttribute]) val inflectionAttr = tokenStream.addAttribute(classOf[InflectionAttribute]) println("<<==========================================") println(s"input text => $text") println("============================================") tokenStream.reset() while (tokenStream.incrementToken()) { val startOffset = offsetAttr.startOffset val endOffset = offsetAttr.endOffset val token = charTermAttr.toString val posInc = positionIncrementAttr.getPositionIncrement val tpe = typeAttr.`type` // Kuromoji Additional Attributes val baseForm = baseFormAttr.getBaseForm val partOfSpeech = partOfSpeechAttr.getPartOfSpeech val reading = readingAttr.getReading val pronunciation = readingAttr.getPronunciation val inflectionForm = inflectionAttr.getInflectionForm val inflectionType = inflectionAttr.getInflectionType println(s"token: $token, startOffset: $startOffset, endOffset: $endOffset, posInc: $posInc, type: $tpe") if (partOfSpeech != null) { println(s"baseForm: $baseForm, partOfSpeech: $partOfSpeech, reading: $reading, pronunciation: $pronunciation, inflectionForm: $inflectionForm, inflectionType: $inflectionType") } } tokenStream.end() println("==========================================>>") } }
Analyzerは、Readerを渡してtokenStreamメソッドを呼び出すことで、TokenStreamを取得できます、と。
val tokenStream = analyzer.tokenStream("", reader) try { body(text, tokenStream) } finally { tokenStream.close() }
使い終わったTokenStreamは、closeするのがお約束?
TokenStreamは、resetメソッドを呼んだ後に、incrementTokenメソッドでTokenを読み進めていく感じみたいですね。
tokenStream.reset() while (tokenStream.incrementToken()) { // Tokenごとの処理 } tokenStream.end()
終了したら、TokenStream#end。今回のサンプルでは、各種Attributeから取得できる情報をコンソールに出力するようにしてあります。
では、以下の文字列を対象にして
val texts = List( "すもももももももものうち。", "メガネは顔の一部です。", "日本経済新聞でモバゲーの記事を読んだ。", "Java, Scala, Groovy, Clojure", "LUCENE、SOLR、Lucene, Solr", "アイウエオカキクケコさしすせそABCXYZ123456", "Lucene is a full-featured text search engine library written in Java." )
コメントにも書いてあるように、試したいAnalyzerをnewして試してみましょう。
usingTokenStream(/* ここでAnalyzerをnewして渡す */, texts: _*)(displayTokens)
StandardAnalyzer
文字通り、標準的なAnalyzerです。StandardFilterとLowerCaseFilter、StopFilter付き。
StandardAnalyzer
http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html
usingTokenStream(new StandardAnalyzer(luceneVersion), texts: _*)(displayTokens)
このサンプルで動かすと、こういう結果になります。
Analyzer => org.apache.lucene.analysis.standard.StandardAnalyzer Start <<========================================== input text => すもももももももものうち。 ============================================ token: す, startOffset: 0, endOffset: 1, posInc: 1, type: <HIRAGANA> token: も, startOffset: 1, endOffset: 2, posInc: 1, type: <HIRAGANA> token: も, startOffset: 2, endOffset: 3, posInc: 1, type: <HIRAGANA> token: も, startOffset: 3, endOffset: 4, posInc: 1, type: <HIRAGANA> token: も, startOffset: 4, endOffset: 5, posInc: 1, type: <HIRAGANA> token: も, startOffset: 5, endOffset: 6, posInc: 1, type: <HIRAGANA> token: も, startOffset: 6, endOffset: 7, posInc: 1, type: <HIRAGANA> token: も, startOffset: 7, endOffset: 8, posInc: 1, type: <HIRAGANA> token: も, startOffset: 8, endOffset: 9, posInc: 1, type: <HIRAGANA> token: の, startOffset: 9, endOffset: 10, posInc: 1, type: <HIRAGANA> token: う, startOffset: 10, endOffset: 11, posInc: 1, type: <HIRAGANA> token: ち, startOffset: 11, endOffset: 12, posInc: 1, type: <HIRAGANA> ==========================================>> <<========================================== input text => メガネは顔の一部です。 ============================================ token: メガネ, startOffset: 0, endOffset: 3, posInc: 1, type: <KATAKANA> token: は, startOffset: 3, endOffset: 4, posInc: 1, type: <HIRAGANA> token: 顔, startOffset: 4, endOffset: 5, posInc: 1, type: <IDEOGRAPHIC> token: の, startOffset: 5, endOffset: 6, posInc: 1, type: <HIRAGANA> token: 一, startOffset: 6, endOffset: 7, posInc: 1, type: <IDEOGRAPHIC> token: 部, startOffset: 7, endOffset: 8, posInc: 1, type: <IDEOGRAPHIC> token: で, startOffset: 8, endOffset: 9, posInc: 1, type: <HIRAGANA> token: す, startOffset: 9, endOffset: 10, posInc: 1, type: <HIRAGANA> ==========================================>> <<========================================== input text => 日本経済新聞でモバゲーの記事を読んだ。 ============================================ token: 日, startOffset: 0, endOffset: 1, posInc: 1, type: <IDEOGRAPHIC> token: 本, startOffset: 1, endOffset: 2, posInc: 1, type: <IDEOGRAPHIC> token: 経, startOffset: 2, endOffset: 3, posInc: 1, type: <IDEOGRAPHIC> token: 済, startOffset: 3, endOffset: 4, posInc: 1, type: <IDEOGRAPHIC> token: 新, startOffset: 4, endOffset: 5, posInc: 1, type: <IDEOGRAPHIC> token: 聞, startOffset: 5, endOffset: 6, posInc: 1, type: <IDEOGRAPHIC> token: で, startOffset: 6, endOffset: 7, posInc: 1, type: <HIRAGANA> token: モバゲー, startOffset: 7, endOffset: 11, posInc: 1, type: <KATAKANA> token: の, startOffset: 11, endOffset: 12, posInc: 1, type: <HIRAGANA> token: 記, startOffset: 12, endOffset: 13, posInc: 1, type: <IDEOGRAPHIC> token: 事, startOffset: 13, endOffset: 14, posInc: 1, type: <IDEOGRAPHIC> token: を, startOffset: 14, endOffset: 15, posInc: 1, type: <HIRAGANA> token: 読, startOffset: 15, endOffset: 16, posInc: 1, type: <IDEOGRAPHIC> token: ん, startOffset: 16, endOffset: 17, posInc: 1, type: <HIRAGANA> token: だ, startOffset: 17, endOffset: 18, posInc: 1, type: <HIRAGANA> ==========================================>> <<========================================== input text => Java, Scala, Groovy, Clojure ============================================ token: java, startOffset: 0, endOffset: 4, posInc: 1, type: <ALPHANUM> token: scala, startOffset: 6, endOffset: 11, posInc: 1, type: <ALPHANUM> token: groovy, startOffset: 13, endOffset: 19, posInc: 1, type: <ALPHANUM> token: clojure, startOffset: 21, endOffset: 28, posInc: 1, type: <ALPHANUM> ==========================================>> <<========================================== input text => LUCENE、SOLR、Lucene, Solr ============================================ token: lucene, startOffset: 0, endOffset: 6, posInc: 1, type: <ALPHANUM> token: solr, startOffset: 7, endOffset: 11, posInc: 1, type: <ALPHANUM> token: lucene, startOffset: 12, endOffset: 18, posInc: 1, type: <ALPHANUM> token: solr, startOffset: 20, endOffset: 24, posInc: 1, type: <ALPHANUM> ==========================================>> <<========================================== input text => アイウエオカキクケコさしすせそABCXYZ123456 ============================================ token: アイウエオカキクケコ, startOffset: 0, endOffset: 10, posInc: 1, type: <KATAKANA> token: さ, startOffset: 10, endOffset: 11, posInc: 1, type: <HIRAGANA> token: し, startOffset: 11, endOffset: 12, posInc: 1, type: <HIRAGANA> token: す, startOffset: 12, endOffset: 13, posInc: 1, type: <HIRAGANA> token: せ, startOffset: 13, endOffset: 14, posInc: 1, type: <HIRAGANA> token: そ, startOffset: 14, endOffset: 15, posInc: 1, type: <HIRAGANA> token: abcxyz123456, startOffset: 15, endOffset: 27, posInc: 1, type: <ALPHANUM> ==========================================>> <<========================================== input text => Lucene is a full-featured text search engine library written in Java. ============================================ token: lucene, startOffset: 0, endOffset: 6, posInc: 1, type: <ALPHANUM> token: full, startOffset: 12, endOffset: 16, posInc: 3, type: <ALPHANUM> token: featured, startOffset: 17, endOffset: 25, posInc: 1, type: <ALPHANUM> token: text, startOffset: 26, endOffset: 30, posInc: 1, type: <ALPHANUM> token: search, startOffset: 31, endOffset: 37, posInc: 1, type: <ALPHANUM> token: engine, startOffset: 38, endOffset: 44, posInc: 1, type: <ALPHANUM> token: library, startOffset: 45, endOffset: 52, posInc: 1, type: <ALPHANUM> token: written, startOffset: 53, endOffset: 60, posInc: 1, type: <ALPHANUM> token: java, startOffset: 64, endOffset: 68, posInc: 2, type: <ALPHANUM> ==========================================>> Analyzer => org.apache.lucene.analysis.standard.StandardAnalyzer End
CJK文字に対しては、uni-gramとして動作しています。あと、英単語は全て小文字に変換。なんか、平仮名がわかっていますね。Lucene 3.4からっぽいです。
WhitespaceAnalyzer
スペースやタブなどで、単語分割を行うAnalyzer。
WhitespaceAnalyzer
http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/core/WhitespaceAnalyzer.html
usingTokenStream(new WhitespaceAnalyzer(luceneVersion), texts: _*)(displayTokens)
実行結果。
Analyzer => org.apache.lucene.analysis.core.WhitespaceAnalyzer Start <<========================================== input text => すもももももももものうち。 ============================================ token: すもももももももものうち。, startOffset: 0, endOffset: 13, posInc: 1, type: word ==========================================>> <<========================================== input text => メガネは顔の一部です。 ============================================ token: メガネは顔の一部です。, startOffset: 0, endOffset: 11, posInc: 1, type: word ==========================================>> <<========================================== input text => 日本経済新聞でモバゲーの記事を読んだ。 ============================================ token: 日本経済新聞でモバゲーの記事を読んだ。, startOffset: 0, endOffset: 19, posInc: 1, type: word ==========================================>> <<========================================== input text => Java, Scala, Groovy, Clojure ============================================ token: Java,, startOffset: 0, endOffset: 5, posInc: 1, type: word token: Scala,, startOffset: 6, endOffset: 12, posInc: 1, type: word token: Groovy,, startOffset: 13, endOffset: 20, posInc: 1, type: word token: Clojure, startOffset: 21, endOffset: 28, posInc: 1, type: word ==========================================>> <<========================================== input text => LUCENE、SOLR、Lucene, Solr ============================================ token: LUCENE、SOLR、Lucene,, startOffset: 0, endOffset: 19, posInc: 1, type: word token: Solr, startOffset: 20, endOffset: 24, posInc: 1, type: word ==========================================>> <<========================================== input text => アイウエオカキクケコさしすせそABCXYZ123456 ============================================ token: アイウエオカキクケコさしすせそABCXYZ123456, startOffset: 0, endOffset: 27, posInc: 1, type: word ==========================================>> <<========================================== input text => Lucene is a full-featured text search engine library written in Java. ============================================ token: Lucene, startOffset: 0, endOffset: 6, posInc: 1, type: word token: is, startOffset: 7, endOffset: 9, posInc: 1, type: word token: a, startOffset: 10, endOffset: 11, posInc: 1, type: word token: full-featured, startOffset: 12, endOffset: 25, posInc: 1, type: word token: text, startOffset: 26, endOffset: 30, posInc: 1, type: word token: search, startOffset: 31, endOffset: 37, posInc: 1, type: word token: engine, startOffset: 38, endOffset: 44, posInc: 1, type: word token: library, startOffset: 45, endOffset: 52, posInc: 1, type: word token: written, startOffset: 53, endOffset: 60, posInc: 1, type: word token: in, startOffset: 61, endOffset: 63, posInc: 1, type: word token: Java., startOffset: 64, endOffset: 69, posInc: 1, type: word ==========================================>> Analyzer => org.apache.lucene.analysis.core.WhitespaceAnalyzer End
英語に対してはそれっぽく動きますが、あまり明示的に使うことはない??
KeywordAnalyzer
入力単語のすべてを単一のトークンとして扱うAnalyzer。IDなど、むしろ単語分割して欲しくないもののは、こちらを使用するのでしょうね。
KeywordAnalyzer
http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/core/KeywordAnalyzer.html
usingTokenStream(new KeywordAnalyzer, texts: _*)(displayTokens)
実行結果。
Analyzer => org.apache.lucene.analysis.core.KeywordAnalyzer Start <<========================================== input text => すもももももももものうち。 ============================================ token: すもももももももものうち。, startOffset: 0, endOffset: 13, posInc: 1, type: word ==========================================>> <<========================================== input text => メガネは顔の一部です。 ============================================ token: メガネは顔の一部です。, startOffset: 0, endOffset: 11, posInc: 1, type: word ==========================================>> <<========================================== input text => 日本経済新聞でモバゲーの記事を読んだ。 ============================================ token: 日本経済新聞でモバゲーの記事を読んだ。, startOffset: 0, endOffset: 19, posInc: 1, type: word ==========================================>> <<========================================== input text => Java, Scala, Groovy, Clojure ============================================ token: Java, Scala, Groovy, Clojure, startOffset: 0, endOffset: 28, posInc: 1, type: word ==========================================>> <<========================================== input text => LUCENE、SOLR、Lucene, Solr ============================================ token: LUCENE、SOLR、Lucene, Solr, startOffset: 0, endOffset: 24, posInc: 1, type: word ==========================================>> <<========================================== input text => アイウエオカキクケコさしすせそABCXYZ123456 ============================================ token: アイウエオカキクケコさしすせそABCXYZ123456, startOffset: 0, endOffset: 27, posInc: 1, type: word ==========================================>> <<========================================== input text => Lucene is a full-featured text search engine library written in Java. ============================================ token: Lucene is a full-featured text search engine library written in Java., startOffset: 0, endOffset: 69, posInc: 1, type: word ==========================================>> Analyzer => org.apache.lucene.analysis.core.KeywordAnalyzer End
CJKAnalyzer
bi-gramのAnalyzer。CJK文字を読み込んだ場合は2文字ごとに分割し、英単語を読ませた場合はそれを認識してトークン化するようです。
CJKAnalyzer
http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/cjk/CJKAnalyzer.html
今は、StandardAnalyzerにCJKWithFilter、LowerCaseFilter、CJKBigramFilter、StopFilterを組み合わせたものでできているっぽい?
usingTokenStream(new CJKAnalyzer(luceneVersion), texts: _*)(displayTokens)
実行結果。
Analyzer => org.apache.lucene.analysis.cjk.CJKAnalyzer Start <<========================================== input text => すもももももももものうち。 ============================================ token: すも, startOffset: 0, endOffset: 2, posInc: 1, type: <DOUBLE> token: もも, startOffset: 1, endOffset: 3, posInc: 1, type: <DOUBLE> token: もも, startOffset: 2, endOffset: 4, posInc: 1, type: <DOUBLE> token: もも, startOffset: 3, endOffset: 5, posInc: 1, type: <DOUBLE> token: もも, startOffset: 4, endOffset: 6, posInc: 1, type: <DOUBLE> token: もも, startOffset: 5, endOffset: 7, posInc: 1, type: <DOUBLE> token: もも, startOffset: 6, endOffset: 8, posInc: 1, type: <DOUBLE> token: もも, startOffset: 7, endOffset: 9, posInc: 1, type: <DOUBLE> token: もの, startOffset: 8, endOffset: 10, posInc: 1, type: <DOUBLE> token: のう, startOffset: 9, endOffset: 11, posInc: 1, type: <DOUBLE> token: うち, startOffset: 10, endOffset: 12, posInc: 1, type: <DOUBLE> ==========================================>> <<========================================== input text => メガネは顔の一部です。 ============================================ token: メガ, startOffset: 0, endOffset: 2, posInc: 1, type: <DOUBLE> token: ガネ, startOffset: 1, endOffset: 3, posInc: 1, type: <DOUBLE> token: ネは, startOffset: 2, endOffset: 4, posInc: 1, type: <DOUBLE> token: は顔, startOffset: 3, endOffset: 5, posInc: 1, type: <DOUBLE> token: 顔の, startOffset: 4, endOffset: 6, posInc: 1, type: <DOUBLE> token: の一, startOffset: 5, endOffset: 7, posInc: 1, type: <DOUBLE> token: 一部, startOffset: 6, endOffset: 8, posInc: 1, type: <DOUBLE> token: 部で, startOffset: 7, endOffset: 9, posInc: 1, type: <DOUBLE> token: です, startOffset: 8, endOffset: 10, posInc: 1, type: <DOUBLE> ==========================================>> <<========================================== input text => 日本経済新聞でモバゲーの記事を読んだ。 ============================================ token: 日本, startOffset: 0, endOffset: 2, posInc: 1, type: <DOUBLE> token: 本経, startOffset: 1, endOffset: 3, posInc: 1, type: <DOUBLE> token: 経済, startOffset: 2, endOffset: 4, posInc: 1, type: <DOUBLE> token: 済新, startOffset: 3, endOffset: 5, posInc: 1, type: <DOUBLE> token: 新聞, startOffset: 4, endOffset: 6, posInc: 1, type: <DOUBLE> token: 聞で, startOffset: 5, endOffset: 7, posInc: 1, type: <DOUBLE> token: でモ, startOffset: 6, endOffset: 8, posInc: 1, type: <DOUBLE> token: モバ, startOffset: 7, endOffset: 9, posInc: 1, type: <DOUBLE> token: バゲ, startOffset: 8, endOffset: 10, posInc: 1, type: <DOUBLE> token: ゲー, startOffset: 9, endOffset: 11, posInc: 1, type: <DOUBLE> token: ーの, startOffset: 10, endOffset: 12, posInc: 1, type: <DOUBLE> token: の記, startOffset: 11, endOffset: 13, posInc: 1, type: <DOUBLE> token: 記事, startOffset: 12, endOffset: 14, posInc: 1, type: <DOUBLE> token: 事を, startOffset: 13, endOffset: 15, posInc: 1, type: <DOUBLE> token: を読, startOffset: 14, endOffset: 16, posInc: 1, type: <DOUBLE> token: 読ん, startOffset: 15, endOffset: 17, posInc: 1, type: <DOUBLE> token: んだ, startOffset: 16, endOffset: 18, posInc: 1, type: <DOUBLE> ==========================================>> <<========================================== input text => Java, Scala, Groovy, Clojure ============================================ token: java, startOffset: 0, endOffset: 4, posInc: 1, type: <ALPHANUM> token: scala, startOffset: 6, endOffset: 11, posInc: 1, type: <ALPHANUM> token: groovy, startOffset: 13, endOffset: 19, posInc: 1, type: <ALPHANUM> token: clojure, startOffset: 21, endOffset: 28, posInc: 1, type: <ALPHANUM> ==========================================>> <<========================================== input text => LUCENE、SOLR、Lucene, Solr ============================================ token: lucene, startOffset: 0, endOffset: 6, posInc: 1, type: <ALPHANUM> token: solr, startOffset: 7, endOffset: 11, posInc: 1, type: <ALPHANUM> token: lucene, startOffset: 12, endOffset: 18, posInc: 1, type: <ALPHANUM> token: solr, startOffset: 20, endOffset: 24, posInc: 1, type: <ALPHANUM> ==========================================>> <<========================================== input text => アイウエオカキクケコさしすせそABCXYZ123456 ============================================ token: アイ, startOffset: 0, endOffset: 2, posInc: 1, type: <DOUBLE> token: イウ, startOffset: 1, endOffset: 3, posInc: 1, type: <DOUBLE> token: ウエ, startOffset: 2, endOffset: 4, posInc: 1, type: <DOUBLE> token: エオ, startOffset: 3, endOffset: 5, posInc: 1, type: <DOUBLE> token: オカ, startOffset: 4, endOffset: 6, posInc: 1, type: <DOUBLE> token: カキ, startOffset: 5, endOffset: 7, posInc: 1, type: <DOUBLE> token: キク, startOffset: 6, endOffset: 8, posInc: 1, type: <DOUBLE> token: クケ, startOffset: 7, endOffset: 9, posInc: 1, type: <DOUBLE> token: ケコ, startOffset: 8, endOffset: 10, posInc: 1, type: <DOUBLE> token: コさ, startOffset: 9, endOffset: 11, posInc: 1, type: <DOUBLE> token: さし, startOffset: 10, endOffset: 12, posInc: 1, type: <DOUBLE> token: しす, startOffset: 11, endOffset: 13, posInc: 1, type: <DOUBLE> token: すせ, startOffset: 12, endOffset: 14, posInc: 1, type: <DOUBLE> token: せそ, startOffset: 13, endOffset: 15, posInc: 1, type: <DOUBLE> token: abcxyz123456, startOffset: 15, endOffset: 27, posInc: 1, type: <ALPHANUM> ==========================================>> <<========================================== input text => Lucene is a full-featured text search engine library written in Java. ============================================ token: lucene, startOffset: 0, endOffset: 6, posInc: 1, type: <ALPHANUM> token: full, startOffset: 12, endOffset: 16, posInc: 3, type: <ALPHANUM> token: featured, startOffset: 17, endOffset: 25, posInc: 1, type: <ALPHANUM> token: text, startOffset: 26, endOffset: 30, posInc: 1, type: <ALPHANUM> token: search, startOffset: 31, endOffset: 37, posInc: 1, type: <ALPHANUM> token: engine, startOffset: 38, endOffset: 44, posInc: 1, type: <ALPHANUM> token: library, startOffset: 45, endOffset: 52, posInc: 1, type: <ALPHANUM> token: written, startOffset: 53, endOffset: 60, posInc: 1, type: <ALPHANUM> token: java, startOffset: 64, endOffset: 68, posInc: 2, type: <ALPHANUM> ==========================================>> Analyzer => org.apache.lucene.analysis.cjk.CJKAnalyzer End
JapaneseAnalyzer
Kuromojiという、オープンソースの形態素解析をLuceneに取り込んだものらしいです。Luceneの3.6および4.0からだとか。
Kuromoji
http://www.atilika.org/
JapaneseAnalyzer
http://lucene.apache.org/core/4_3_0/analyzers-kuromoji/org/apache/lucene/analysis/ja/JapaneseAnalyzer.html
参考:
http://www.mwsoft.jp/programming/lucene/kuromoji.html
http://www.rondhuit.com/solr%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E5%AF%BE%E5%BF%9C.html
usingTokenStream(new JapaneseAnalyzer(luceneVersion), texts: _*)(displayTokens)
実行結果。
Analyzer => org.apache.lucene.analysis.ja.JapaneseAnalyzer Start <<========================================== input text => すもももももももものうち。 ============================================ token: すもも, startOffset: 0, endOffset: 3, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: スモモ, pronunciation: スモモ, inflectionForm: null, inflectionType: null token: もも, startOffset: 4, endOffset: 6, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: モモ, pronunciation: モモ, inflectionForm: null, inflectionType: null token: もも, startOffset: 7, endOffset: 9, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: モモ, pronunciation: モモ, inflectionForm: null, inflectionType: null ==========================================>> <<========================================== input text => メガネは顔の一部です。 ============================================ token: メガネ, startOffset: 0, endOffset: 3, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: メガネ, pronunciation: メガネ, inflectionForm: null, inflectionType: null token: 顔, startOffset: 4, endOffset: 5, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: カオ, pronunciation: カオ, inflectionForm: null, inflectionType: null token: 一部, startOffset: 6, endOffset: 8, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-副詞可能, reading: イチブ, pronunciation: イチブ, inflectionForm: null, inflectionType: null ==========================================>> <<========================================== input text => 日本経済新聞でモバゲーの記事を読んだ。 ============================================ token: 日本, startOffset: 0, endOffset: 2, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-地域-国, reading: ニッポン, pronunciation: ニッポン, inflectionForm: null, inflectionType: null token: 日本経済新聞, startOffset: 0, endOffset: 6, posInc: 0, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: ニホンケイザイシンブン, pronunciation: ニホンケイザイシンブン, inflectionForm: null, inflectionType: null token: 経済, startOffset: 2, endOffset: 4, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: ケイザイ, pronunciation: ケイザイ, inflectionForm: null, inflectionType: null token: 新聞, startOffset: 4, endOffset: 6, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: シンブン, pronunciation: シンブン, inflectionForm: null, inflectionType: null token: モバゲ, startOffset: 7, endOffset: 11, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: 記事, startOffset: 12, endOffset: 14, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: キジ, pronunciation: キジ, inflectionForm: null, inflectionType: null token: 読む, startOffset: 15, endOffset: 17, posInc: 2, type: word baseForm: 読む, partOfSpeech: 動詞-自立, reading: ヨン, pronunciation: ヨン, inflectionForm: 連用タ接続, inflectionType: 五段・マ行 ==========================================>> <<========================================== input text => Java, Scala, Groovy, Clojure ============================================ token: java, startOffset: 0, endOffset: 4, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: scala, startOffset: 6, endOffset: 11, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: groovy, startOffset: 13, endOffset: 19, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: clojure, startOffset: 21, endOffset: 28, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null ==========================================>> <<========================================== input text => LUCENE、SOLR、Lucene, Solr ============================================ token: lucene, startOffset: 0, endOffset: 6, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: solr, startOffset: 7, endOffset: 11, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: lucene, startOffset: 12, endOffset: 18, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: solr, startOffset: 20, endOffset: 24, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null ==========================================>> <<========================================== input text => アイウエオカキクケコさしすせそABCXYZ123456 ============================================ token: アイウエオカキクケコ, startOffset: 0, endOffset: 10, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-一般, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: しす, startOffset: 11, endOffset: 13, posInc: 2, type: word baseForm: null, partOfSpeech: 動詞-自立, reading: シス, pronunciation: シス, inflectionForm: 基本形, inflectionType: 五段・サ行 token: そ, startOffset: 14, endOffset: 15, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-接尾-助動詞語幹, reading: ソ, pronunciation: ソ, inflectionForm: null, inflectionType: null token: abcxyz, startOffset: 15, endOffset: 21, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: 123456, startOffset: 21, endOffset: 27, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-数, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null ==========================================>> <<========================================== input text => Lucene is a full-featured text search engine library written in Java. ============================================ token: lucene, startOffset: 0, endOffset: 6, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: is, startOffset: 7, endOffset: 9, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: a, startOffset: 10, endOffset: 11, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: full, startOffset: 12, endOffset: 16, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: featured, startOffset: 17, endOffset: 25, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: text, startOffset: 26, endOffset: 30, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: search, startOffset: 31, endOffset: 37, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: engine, startOffset: 38, endOffset: 44, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: library, startOffset: 45, endOffset: 52, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: written, startOffset: 53, endOffset: 60, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: in, startOffset: 61, endOffset: 63, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: java, startOffset: 64, endOffset: 68, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null ==========================================>> Analyzer => org.apache.lucene.analysis.ja.JapaneseAnalyzer End
JapaneseAnalyzerを使用した場合は、もう少しいろいろとAttributeが付けられるようなので、試してみました。
また、KuromojiにはModeというものがあり、
- SEARCH(デフォルト)
- NORMAL
- EXTENDED
の3つがあるようです。
JapaneseTokenizer.Mode
http://lucene.apache.org/core/4_3_0/analyzers-kuromoji/org/apache/lucene/analysis/ja/JapaneseTokenizer.Mode.html
下記のコードで、LuceneのVersionのみを指定するコンストラクタと、同じ引数になります。あとは、Modeを変えるだけですね。
val userDictionary: UserDictionary = null val mode = JapaneseTokenizer.Mode.SEARCH //val mode = JapaneseTokenizer.Mode.NORMAL //val mode = JapaneseTokenizer.Mode.EXTENDED val stopwords = JapaneseAnalyzer.getDefaultStopSet val stoptags = JapaneseAnalyzer.getDefaultStopTags usingTokenStream(new JapaneseAnalyzer(luceneVersion, userDictionary, mode, stopwords, stoptags), texts: _*)(displayTokens)
では、NORMALに変えてみます。
Analyzer => org.apache.lucene.analysis.ja.JapaneseAnalyzer Start <<========================================== input text => すもももももももものうち。 ============================================ token: すもも, startOffset: 0, endOffset: 3, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: スモモ, pronunciation: スモモ, inflectionForm: null, inflectionType: null token: もも, startOffset: 4, endOffset: 6, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: モモ, pronunciation: モモ, inflectionForm: null, inflectionType: null token: もも, startOffset: 7, endOffset: 9, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: モモ, pronunciation: モモ, inflectionForm: null, inflectionType: null ==========================================>> <<========================================== input text => メガネは顔の一部です。 ============================================ token: メガネ, startOffset: 0, endOffset: 3, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: メガネ, pronunciation: メガネ, inflectionForm: null, inflectionType: null token: 顔, startOffset: 4, endOffset: 5, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: カオ, pronunciation: カオ, inflectionForm: null, inflectionType: null token: 一部, startOffset: 6, endOffset: 8, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-副詞可能, reading: イチブ, pronunciation: イチブ, inflectionForm: null, inflectionType: null ==========================================>> <<========================================== input text => 日本経済新聞でモバゲーの記事を読んだ。 ============================================ token: 日本経済新聞, startOffset: 0, endOffset: 6, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: ニホンケイザイシンブン, pronunciation: ニホンケイザイシンブン, inflectionForm: null, inflectionType: null token: モバゲ, startOffset: 7, endOffset: 11, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: 記事, startOffset: 12, endOffset: 14, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: キジ, pronunciation: キジ, inflectionForm: null, inflectionType: null token: 読む, startOffset: 15, endOffset: 17, posInc: 2, type: word baseForm: 読む, partOfSpeech: 動詞-自立, reading: ヨン, pronunciation: ヨン, inflectionForm: 連用タ接続, inflectionType: 五段・マ行 ==========================================>> <<========================================== input text => Java, Scala, Groovy, Clojure ============================================ token: java, startOffset: 0, endOffset: 4, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: scala, startOffset: 6, endOffset: 11, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: groovy, startOffset: 13, endOffset: 19, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: clojure, startOffset: 21, endOffset: 28, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null ==========================================>> <<========================================== input text => LUCENE、SOLR、Lucene, Solr ============================================ token: lucene, startOffset: 0, endOffset: 6, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: solr, startOffset: 7, endOffset: 11, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: lucene, startOffset: 12, endOffset: 18, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: solr, startOffset: 20, endOffset: 24, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null ==========================================>> <<========================================== input text => アイウエオカキクケコさしすせそABCXYZ123456 ============================================ token: アイウエオカキクケコ, startOffset: 0, endOffset: 10, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-一般, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: しす, startOffset: 11, endOffset: 13, posInc: 2, type: word baseForm: null, partOfSpeech: 動詞-自立, reading: シス, pronunciation: シス, inflectionForm: 基本形, inflectionType: 五段・サ行 token: そ, startOffset: 14, endOffset: 15, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-接尾-助動詞語幹, reading: ソ, pronunciation: ソ, inflectionForm: null, inflectionType: null token: abcxyz, startOffset: 15, endOffset: 21, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: 123456, startOffset: 21, endOffset: 27, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-数, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null ==========================================>> <<========================================== input text => Lucene is a full-featured text search engine library written in Java. ============================================ token: lucene, startOffset: 0, endOffset: 6, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: is, startOffset: 7, endOffset: 9, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: a, startOffset: 10, endOffset: 11, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: full, startOffset: 12, endOffset: 16, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: featured, startOffset: 17, endOffset: 25, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: text, startOffset: 26, endOffset: 30, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: search, startOffset: 31, endOffset: 37, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: engine, startOffset: 38, endOffset: 44, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: library, startOffset: 45, endOffset: 52, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: written, startOffset: 53, endOffset: 60, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: in, startOffset: 61, endOffset: 63, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null token: java, startOffset: 64, endOffset: 68, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-組織, reading: null, pronunciation: null, inflectionForm: null, inflectionType: null ==========================================>> Analyzer => org.apache.lucene.analysis.ja.JapaneseAnalyzer End
変わったのは、ここくらいかな…?
input text => 日本経済新聞でモバゲーの記事を読んだ。
最後、EXTENDED。
Analyzer => org.apache.lucene.analysis.ja.JapaneseAnalyzer Start <<========================================== input text => すもももももももものうち。 ============================================ token: すもも, startOffset: 0, endOffset: 3, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: スモモ, pronunciation: スモモ, inflectionForm: null, inflectionType: null token: もも, startOffset: 4, endOffset: 6, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: モモ, pronunciation: モモ, inflectionForm: null, inflectionType: null token: もも, startOffset: 7, endOffset: 9, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: モモ, pronunciation: モモ, inflectionForm: null, inflectionType: null ==========================================>> <<========================================== input text => メガネは顔の一部です。 ============================================ token: メガネ, startOffset: 0, endOffset: 3, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: メガネ, pronunciation: メガネ, inflectionForm: null, inflectionType: null token: 顔, startOffset: 4, endOffset: 5, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: カオ, pronunciation: カオ, inflectionForm: null, inflectionType: null token: 一部, startOffset: 6, endOffset: 8, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-副詞可能, reading: イチブ, pronunciation: イチブ, inflectionForm: null, inflectionType: null ==========================================>> <<========================================== input text => 日本経済新聞でモバゲーの記事を読んだ。 ============================================ token: 日本, startOffset: 0, endOffset: 2, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-固有名詞-地域-国, reading: ニッポン, pronunciation: ニッポン, inflectionForm: null, inflectionType: null token: 経済, startOffset: 2, endOffset: 4, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: ケイザイ, pronunciation: ケイザイ, inflectionForm: null, inflectionType: null token: 新聞, startOffset: 4, endOffset: 6, posInc: 1, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: シンブン, pronunciation: シンブン, inflectionForm: null, inflectionType: null token: 記事, startOffset: 12, endOffset: 14, posInc: 7, type: word baseForm: null, partOfSpeech: 名詞-一般, reading: キジ, pronunciation: キジ, inflectionForm: null, inflectionType: null token: 読む, startOffset: 15, endOffset: 17, posInc: 2, type: word baseForm: 読む, partOfSpeech: 動詞-自立, reading: ヨン, pronunciation: ヨン, inflectionForm: 連用タ接続, inflectionType: 五段・マ行 ==========================================>> <<========================================== input text => Java, Scala, Groovy, Clojure ============================================ ==========================================>> <<========================================== input text => LUCENE、SOLR、Lucene, Solr ============================================ ==========================================>> <<========================================== input text => アイウエオカキクケコさしすせそABCXYZ123456 ============================================ token: しす, startOffset: 11, endOffset: 13, posInc: 12, type: word baseForm: null, partOfSpeech: 動詞-自立, reading: シス, pronunciation: シス, inflectionForm: 基本形, inflectionType: 五段・サ行 token: そ, startOffset: 14, endOffset: 15, posInc: 2, type: word baseForm: null, partOfSpeech: 名詞-接尾-助動詞語幹, reading: ソ, pronunciation: ソ, inflectionForm: null, inflectionType: null ==========================================>> <<========================================== input text => Lucene is a full-featured text search engine library written in Java. ============================================ ==========================================>> Analyzer => org.apache.lucene.analysis.ja.JapaneseAnalyzer End
あれ…?なんか、思ってたのと違う結果が…。というか、なんか妙に得られる単語の数が減ってません??
なんか、使い方を間違ってるかなぁ…。