Upload all models and assets for ady (latest)
Browse files- README.md +93 -92
- models/embeddings/aligned/ady_128d.bin +1 -1
- models/embeddings/aligned/ady_128d.projection.npy +1 -1
- models/embeddings/aligned/ady_32d.bin +1 -1
- models/embeddings/aligned/ady_32d.projection.npy +1 -1
- models/embeddings/aligned/ady_64d.bin +1 -1
- models/embeddings/aligned/ady_64d.projection.npy +1 -1
- models/embeddings/monolingual/ady_128d.bin +1 -1
- models/embeddings/monolingual/ady_32d.bin +1 -1
- models/embeddings/monolingual/ady_64d.bin +1 -1
- models/tokenizer/ady_tokenizer_16k.model +1 -1
- models/tokenizer/ady_tokenizer_32k.model +1 -1
- models/tokenizer/ady_tokenizer_8k.model +1 -1
- visualizations/embedding_alignment_quality.png +0 -0
- visualizations/embedding_isotropy.png +0 -0
- visualizations/embedding_norms.png +0 -0
- visualizations/embedding_similarity.png +2 -2
- visualizations/embedding_tsne_multilingual.png +2 -2
- visualizations/performance_dashboard.png +2 -2
- visualizations/position_encoding_comparison.png +2 -2
- visualizations/tsne_sentences.png +2 -2
- visualizations/tsne_words.png +2 -2
README.md
CHANGED
|
@@ -36,7 +36,7 @@ metrics:
|
|
| 36 |
value: 4.197
|
| 37 |
- name: best_isotropy
|
| 38 |
type: isotropy
|
| 39 |
-
value: 0.
|
| 40 |
- name: vocabulary_size
|
| 41 |
type: vocab
|
| 42 |
value: 0
|
|
@@ -98,29 +98,29 @@ We analyze tokenizers, n-gram models, Markov chains, vocabulary statistics, and
|
|
| 98 |
|
| 99 |
Below are sample sentences tokenized with each vocabulary size:
|
| 100 |
|
| 101 |
-
**Sample 1:**
|
| 102 |
|
| 103 |
| Vocab | Tokens | Count |
|
| 104 |
|-------|--------|-------|
|
| 105 |
-
| 8k |
|
| 106 |
-
| 16k |
|
| 107 |
-
| 32k |
|
| 108 |
|
| 109 |
-
**Sample 2:**
|
| 110 |
|
| 111 |
| Vocab | Tokens | Count |
|
| 112 |
|-------|--------|-------|
|
| 113 |
-
| 8k |
|
| 114 |
-
| 16k |
|
| 115 |
-
| 32k |
|
| 116 |
|
| 117 |
-
**Sample 3:**
|
| 118 |
|
| 119 |
| Vocab | Tokens | Count |
|
| 120 |
|-------|--------|-------|
|
| 121 |
-
| 8k |
|
| 122 |
-
| 16k |
|
| 123 |
-
| 32k |
|
| 124 |
|
| 125 |
|
| 126 |
### Key Findings
|
|
@@ -270,27 +270,27 @@ Below are text samples generated from each word-based Markov chain model:
|
|
| 270 |
|
| 271 |
**Context Size 1:**
|
| 272 |
|
| 273 |
-
1. `и
|
| 274 |
-
2. `адыгэ
|
| 275 |
-
3. `м
|
| 276 |
|
| 277 |
**Context Size 2:**
|
| 278 |
|
| 279 |
-
1. `нэбгырэ млн
|
| 280 |
-
2. `къехъу щэпсэу
|
| 281 |
-
3. `м къехъу щэпсэу хэгэгум
|
| 282 |
|
| 283 |
**Context Size 3:**
|
| 284 |
|
| 285 |
-
1. `м къехъу щэпсэу хэгэгум
|
| 286 |
-
2. `къехъу щэпсэу хэгэгум
|
| 287 |
-
3. `адыгэ республикэм и
|
| 288 |
|
| 289 |
**Context Size 4:**
|
| 290 |
|
| 291 |
-
1. `м къехъу щэпсэу хэгэгум чӏырэу иӏэр
|
| 292 |
-
2. `дло м хахьэ хэгъэгу
|
| 293 |
-
3. `еуропэм хэт къэралыгъу къэлэ
|
| 294 |
|
| 295 |
|
| 296 |
### Generated Text Samples (Subword-based)
|
|
@@ -299,27 +299,27 @@ Below are text samples generated from each subword-based Markov chain model:
|
|
| 299 |
|
| 300 |
**Context Size 1:**
|
| 301 |
|
| 302 |
-
1. `_
|
| 303 |
-
2.
|
| 304 |
-
3.
|
| 305 |
|
| 306 |
**Context Size 2:**
|
| 307 |
|
| 308 |
-
1.
|
| 309 |
-
2.
|
| 310 |
-
3. `э_
|
| 311 |
|
| 312 |
**Context Size 3:**
|
| 313 |
|
| 314 |
-
1.
|
| 315 |
-
2. `_
|
| 316 |
-
3. `эм_
|
| 317 |
|
| 318 |
**Context Size 4:**
|
| 319 |
|
| 320 |
-
1. `ыгъэ_
|
| 321 |
-
2. `хэр_
|
| 322 |
-
3.
|
| 323 |
|
| 324 |
|
| 325 |
### Key Findings
|
|
@@ -424,18 +424,18 @@ Below are text samples generated from each subword-based Markov chain model:
|
|
| 424 |
|
| 425 |
| Model | Dimension | Isotropy | Semantic Density | Alignment R@1 | Alignment R@10 |
|
| 426 |
|-------|-----------|----------|------------------|---------------|----------------|
|
| 427 |
-
| **mono_32d** | 32 | 0.
|
| 428 |
-
| **mono_64d** | 64 | 0.
|
| 429 |
-
| **mono_128d** | 128 | 0.
|
| 430 |
-
| **aligned_32d** | 32 | 0.
|
| 431 |
-
| **aligned_64d** | 64 | 0.
|
| 432 |
-
| **aligned_128d** | 128 | 0.
|
| 433 |
|
| 434 |
### Key Findings
|
| 435 |
|
| 436 |
-
- **Best Isotropy:**
|
| 437 |
-
- **Semantic Density:** Average pairwise similarity of 0.
|
| 438 |
-
- **Alignment Quality:** Aligned models achieve up to 27.
|
| 439 |
- **Recommendation:** 128d aligned for best cross-lingual performance
|
| 440 |
|
| 441 |
---
|
|
@@ -457,20 +457,21 @@ These are the most productive prefixes and suffixes identified by sampling the v
|
|
| 457 |
#### Productive Prefixes
|
| 458 |
| Prefix | Examples |
|
| 459 |
|--------|----------|
|
| 460 |
-
| `-къ` |
|
| 461 |
-
| `-зэ` |
|
|
|
|
| 462 |
|
| 463 |
#### Productive Suffixes
|
| 464 |
| Suffix | Examples |
|
| 465 |
|--------|----------|
|
| 466 |
-
| `-э` |
|
| 467 |
-
|
|
| 468 |
-
|
|
| 469 |
-
| `-эр` |
|
| 470 |
-
| `-эм` |
|
| 471 |
-
| `-эу` |
|
| 472 |
-
| `-хэр` |
|
| 473 |
-
| `-рэ` |
|
| 474 |
|
| 475 |
### 6.3 Bound Stems (Lexical Roots)
|
| 476 |
|
|
@@ -478,18 +479,18 @@ Bound stems are high-frequency subword units that are semantically cohesive but
|
|
| 478 |
|
| 479 |
| Stem | Cohesion | Substitutability | Examples |
|
| 480 |
|------|----------|------------------|----------|
|
| 481 |
-
| `тыгъ` | 1.
|
| 482 |
-
|
|
| 483 |
-
|
|
| 484 |
-
| `агъэ` | 1.
|
| 485 |
-
|
|
| 486 |
-
|
|
| 487 |
-
|
|
| 488 |
-
|
|
| 489 |
-
|
|
| 490 |
-
|
|
| 491 |
-
|
|
| 492 |
-
| `гъэх` | 1.
|
| 493 |
|
| 494 |
### 6.4 Affix Compatibility (Co-occurrence)
|
| 495 |
|
|
@@ -497,16 +498,16 @@ This table shows which prefixes and suffixes most frequently co-occur on the sam
|
|
| 497 |
|
| 498 |
| Prefix | Suffix | Frequency | Examples |
|
| 499 |
|--------|--------|-----------|----------|
|
| 500 |
-
| `-къ` | `-э` | 94 words |
|
| 501 |
-
| `-къ` | `-р` | 64 words |
|
| 502 |
-
| `-къ` | `-м` | 56 words |
|
| 503 |
-
| `-къ` | `-эр` | 52 words |
|
| 504 |
-
| `-зэ` | `-р` | 43 words |
|
| 505 |
-
| `-зэ` | `-м` | 41 words |
|
| 506 |
-
| `-къ` | `-эм` | 36 words |
|
| 507 |
-
| `-зэ` | `-эр` | 34 words |
|
| 508 |
-
| `-къ` | `-эу` | 33 words |
|
| 509 |
-
| `-зэ` | `-э` | 31 words |
|
| 510 |
|
| 511 |
### 6.5 Recursive Morpheme Segmentation
|
| 512 |
|
|
@@ -514,21 +515,21 @@ Using **Recursive Hierarchical Substitutability**, we decompose complex words in
|
|
| 514 |
|
| 515 |
| Word | Suggested Split | Confidence | Stem |
|
| 516 |
|------|-----------------|------------|------|
|
| 517 |
-
|
|
|
|
|
| 518 |
| литературэмрэ | **`литератур-эм-рэ`** | 6.0 | `литератур` |
|
| 519 |
-
|
|
| 520 |
-
|
|
| 521 |
-
| тхьаматэр | **`тхьамат-эр`** | 4.5 | `тхьамат` |
|
| 522 |
-
| фэхъугъэм | **`фэхъугъ-эм`** | 4.5 | `фэхъугъ` |
|
| 523 |
-
| игъунэгъухэр | **`игъунэгъу-хэр`** | 4.5 | `игъунэгъу` |
|
| 524 |
-
| нэмыкӏхэр | **`нэмыкӏ-хэр`** | 4.5 | `нэмыкӏ` |
|
| 525 |
-
| зэкъотыныгъэм | **`зэ-къ-отыныгъ-эм`** | 4.5 | `отыныгъ` |
|
| 526 |
-
| ипрезидентэу | **`ипрезидент-эу`** | 4.5 | `ипрезидент` |
|
| 527 |
| литературэр | **`литератур-эр`** | 4.5 | `литератур` |
|
| 528 |
-
|
|
| 529 |
-
|
|
| 530 |
-
|
|
| 531 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 532 |
|
| 533 |
### 6.6 Linguistic Interpretation
|
| 534 |
|
|
@@ -762,4 +763,4 @@ MIT License - Free for academic and commercial use.
|
|
| 762 |
---
|
| 763 |
*Generated by Wikilangs Models Pipeline*
|
| 764 |
|
| 765 |
-
*Report Date: 2026-01-03
|
|
|
|
| 36 |
value: 4.197
|
| 37 |
- name: best_isotropy
|
| 38 |
type: isotropy
|
| 39 |
+
value: 0.4880
|
| 40 |
- name: vocabulary_size
|
| 41 |
type: vocab
|
| 42 |
value: 0
|
|
|
|
| 98 |
|
| 99 |
Below are sample sentences tokenized with each vocabulary size:
|
| 100 |
|
| 101 |
+
**Sample 1:** `Ермэлхэр — Кавказым ыкӏи дунаем тет лъэпкъ жъыдэдэмэ ащыщых. Армение`
|
| 102 |
|
| 103 |
| Vocab | Tokens | Count |
|
| 104 |
|-------|--------|-------|
|
| 105 |
+
| 8k | `▁ермэлхэр ▁— ▁кавказым ▁ыкӏи ▁дунаем ▁тет ▁лъэпкъ ▁жъыдэдэмэ ▁ащыщых . ... (+1 more)` | 11 |
|
| 106 |
+
| 16k | `▁ермэлхэр ▁— ▁кавказым ▁ыкӏи ▁дунаем ▁тет ▁лъэпкъ ▁жъыдэдэмэ ▁ащыщых . ... (+1 more)` | 11 |
|
| 107 |
+
| 32k | `▁ермэлхэр ▁— ▁кавказым ▁ыкӏи ▁дунаем ▁тет ▁лъэпкъ ▁жъыдэдэмэ ▁ащыщых . ... (+1 more)` | 11 |
|
| 108 |
|
| 109 |
+
**Sample 2:** `ТӀэшъу Светлан (УрысыбзэкӀэ: Светлана Тешева) Адыгэ журналист Адыгеим щыщ.`
|
| 110 |
|
| 111 |
| Vocab | Tokens | Count |
|
| 112 |
|-------|--------|-------|
|
| 113 |
+
| 8k | `▁тӏэ шъу ▁светлан ▁( урысыбзэкӏэ : ▁светлан а ▁те ше ... (+7 more)` | 17 |
|
| 114 |
+
| 16k | `▁тӏэ шъу ▁светлан ▁( урысыбзэкӏэ : ▁светлана ▁тешева ) ▁адыгэ ... (+4 more)` | 14 |
|
| 115 |
+
| 32k | `▁тӏэшъу ▁светлан ▁( урысыбзэкӏэ : ▁светлана ▁тешева ) ▁адыгэ ▁журналист ... (+3 more)` | 13 |
|
| 116 |
|
| 117 |
+
**Sample 3:** `Ашрай - быслъымэнмэ къурмэным ыуж мэфэ гъэнэфагъэм щагъэжъорэ стырыпс. category`
|
| 118 |
|
| 119 |
| Vocab | Tokens | Count |
|
| 120 |
|-------|--------|-------|
|
| 121 |
+
| 8k | `▁аш рай ▁- ▁быслъымэн мэ ▁къур мэным ▁ыуж ▁мэфэ ▁гъэнэф ... (+9 more)` | 19 |
|
| 122 |
+
| 16k | `▁аш рай ▁- ▁быслъымэн мэ ▁къурмэным ▁ыуж ▁мэфэ ▁гъэнэфагъэм ▁щагъэ ... (+4 more)` | 14 |
|
| 123 |
+
| 32k | `▁ашрай ▁- ▁быслъымэнмэ ▁къурмэным ▁ыуж ▁мэфэ ▁гъэнэфагъэм ▁щагъэжъорэ ▁стырыпс . ... (+1 more)` | 11 |
|
| 124 |
|
| 125 |
|
| 126 |
### Key Findings
|
|
|
|
| 270 |
|
| 271 |
**Context Size 1:**
|
| 272 |
|
| 273 |
+
1. `и дгъэпсыфынущ адыгэ лъэпкъым и 29 м н ф ф ф ф х х х хъ`
|
| 274 |
+
2. `адыгэ хэхэсхэм ащыухъумэн ылъэкӏыгъ мыхъугъэ мышӏагъэхэр ыгу ит тарихъ лъапсэ иӏэу кӏэхьапӏэр ӏатау ...`
|
| 275 |
+
3. `м ахахьэ хэгъэгу тхьаматэр халед бахах географие еуропэм ыгу рихь римыхьмэ тетэу къуаджэм ис цӏыфхэр...`
|
| 276 |
|
| 277 |
**Context Size 2:**
|
| 278 |
|
| 279 |
+
1. `нэбгырэ млн 1 3 фэдиз ц1ыфэу дэс ау хьанэгъунэр ибгъэгъусэжьмэ млн 18 фэдиз мэхъу щыпсэухэрэм ромэ к...`
|
| 280 |
+
2. `къехъу щэпсэу я 67 норвегыбз дло м ахахьэ хэгъэгу эдгар ринкевичс къэрал тхьаматэр ульф кристерссон ...`
|
| 281 |
+
3. `м къехъу щэпсэу хэгэгум 1 240 192 км францыбзэ къэрал яйи бони хэгъэгу тхьаматэр халифа бен салман`
|
| 282 |
|
| 283 |
**Context Size 3:**
|
| 284 |
|
| 285 |
+
1. `м къехъу щэпсэу хэгэгум 147 570 км бенгалыбзэ дло м хахьэ хэгъэгу абдель азиз бутефлика къэрал тхьэм...`
|
| 286 |
+
2. `къехъу щэпсэу хэгэгум 140 800 км непали дло м хахьэ ез м хэхьанэу унашъо щыт ез м и`
|
| 287 |
+
3. `адыгэ республикэм и псыхъу а псыхъом пэблагъэу щыт къуажэ`
|
| 288 |
|
| 289 |
**Context Size 4:**
|
| 290 |
|
| 291 |
+
1. `м къехъу щэпсэу хэгэгум чӏырэу иӏэр 322 460 км бзэшъхьаӏэхэр францыбзэ къэрал лӏышъхьэр алассан уатт...`
|
| 292 |
+
2. `дло м хахьэ хэгъэгу султанэу кабоос бин саид аль саид хэгъэгу тхьаматэр фахд бин махьмуд географие а...`
|
| 293 |
+
3. `еуропэм хэт къэралыгъу къэлэ тирана нэбгырэ млн 3 м къехъу щэпсэу хэгэгум 9 984 670 км я 2 англыбзэ`
|
| 294 |
|
| 295 |
|
| 296 |
### Generated Text Samples (Subword-based)
|
|
|
|
| 299 |
|
| 300 |
**Context Size 1:**
|
| 301 |
|
| 302 |
+
1. `_шхажъырэм_ащтем`
|
| 303 |
+
2. `эгекъэсхэ_ари_пч`
|
| 304 |
+
3. `ыгу,_цинащырыхэ_`
|
| 305 |
|
| 306 |
**Context Size 2:**
|
| 307 |
|
| 308 |
+
1. `гъэпсыр_зэрэ_ӏуад`
|
| 309 |
+
2. `ъэп_ву_адыгъэхьын`
|
| 310 |
+
3. `э_зыгэ_ж_дангьэ_т`
|
| 311 |
|
| 312 |
**Context Size 3:**
|
| 313 |
|
| 314 |
+
1. `гъэкъхэр,_кӏэ,_гум`
|
| 315 |
+
2. `_къэралыгъэдунэжъы`
|
| 316 |
+
3. `эм_и_–_зэрал_нэхэр`
|
| 317 |
|
| 318 |
**Context Size 4:**
|
| 319 |
|
| 320 |
+
1. `ыгъэ_гъэмрэ_приручи`
|
| 321 |
+
2. `хэр_бжъэдыгъуапэ_зэ`
|
| 322 |
+
3. `агъэхьан_хуейщ,_ахэ`
|
| 323 |
|
| 324 |
|
| 325 |
### Key Findings
|
|
|
|
| 424 |
|
| 425 |
| Model | Dimension | Isotropy | Semantic Density | Alignment R@1 | Alignment R@10 |
|
| 426 |
|-------|-----------|----------|------------------|---------------|----------------|
|
| 427 |
+
| **mono_32d** | 32 | 0.4880 | 0.4410 | N/A | N/A |
|
| 428 |
+
| **mono_64d** | 64 | 0.2186 | 0.3951 | N/A | N/A |
|
| 429 |
+
| **mono_128d** | 128 | 0.0372 | 0.3901 | N/A | N/A |
|
| 430 |
+
| **aligned_32d** | 32 | 0.4880 🏆 | 0.4477 | 0.0460 | 0.3851 |
|
| 431 |
+
| **aligned_64d** | 64 | 0.2186 | 0.3901 | 0.2011 | 0.7701 |
|
| 432 |
+
| **aligned_128d** | 128 | 0.0372 | 0.3927 | 0.2759 | 0.8103 |
|
| 433 |
|
| 434 |
### Key Findings
|
| 435 |
|
| 436 |
+
- **Best Isotropy:** aligned_32d with 0.4880 (more uniform distribution)
|
| 437 |
+
- **Semantic Density:** Average pairwise similarity of 0.4094. Lower values indicate better semantic separation.
|
| 438 |
+
- **Alignment Quality:** Aligned models achieve up to 27.6% R@1 in cross-lingual retrieval.
|
| 439 |
- **Recommendation:** 128d aligned for best cross-lingual performance
|
| 440 |
|
| 441 |
---
|
|
|
|
| 457 |
#### Productive Prefixes
|
| 458 |
| Prefix | Examples |
|
| 459 |
|--------|----------|
|
| 460 |
+
| `-къ` | къчр, къэлэшъо, къо |
|
| 461 |
+
| `-зэ` | зэрэхъугъэхэм, зэфэшъхьаф, зэхигъэуцогъэгъэ |
|
| 462 |
+
| `-къы` | къыӏуагъ, къыщыфэфедэщтхэу, къыгъэуцугъэ |
|
| 463 |
|
| 464 |
#### Productive Suffixes
|
| 465 |
| Suffix | Examples |
|
| 466 |
|--------|----------|
|
| 467 |
+
| `-э` | литературоведческэ, уиджыбэ, лъымрэ |
|
| 468 |
+
| `-м` | заповедникым, хъуагъэм, ипэм |
|
| 469 |
+
| `-р` | тхэныр, хунгариер, къчр |
|
| 470 |
+
| `-эр` | алъытэщтыгъэр, тхыбзэр, ылъэгъурэр |
|
| 471 |
+
| `-эм` | хъуагъэм, ипэм, псалъэжьхэм |
|
| 472 |
+
| `-эу` | цӏэу, дэлъэу, щысэу |
|
| 473 |
+
| `-хэр` | тыркухэр, ежьхэр, ахэр |
|
| 474 |
+
| `-рэ` | лъымрэ, цӏэмрэ, зыфиӏорэ |
|
| 475 |
|
| 476 |
### 6.3 Bound Stems (Lexical Roots)
|
| 477 |
|
|
|
|
| 479 |
|
| 480 |
| Stem | Cohesion | Substitutability | Examples |
|
| 481 |
|------|----------|------------------|----------|
|
| 482 |
+
| `тыгъ` | 1.84x | 28 contexts | тыгъэ, тыгъу, итыгъ |
|
| 483 |
+
| `эпкъ` | 1.90x | 25 contexts | нэпкъ, тхэпкъ, лъэпкъ |
|
| 484 |
+
| `ъагъ` | 2.25x | 14 contexts | лъагъо, пчъагъ, пчъагъэ |
|
| 485 |
+
| `агъэ` | 1.63x | 39 contexts | благъэ, тхагъэ, пчагъэ |
|
| 486 |
+
| `дыгэ` | 2.03x | 14 contexts | адыгэ, адыгэу, адыгэм |
|
| 487 |
+
| `къуа` | 2.23x | 10 contexts | къуае, къуажэ, къуадж |
|
| 488 |
+
| `эхэр` | 1.72x | 20 contexts | бэхэр, дзэхэр, усэхэр |
|
| 489 |
+
| `ъхьэ` | 1.84x | 16 contexts | шъхьэ, пшъхьэ, шъхьэм |
|
| 490 |
+
| `псэу` | 1.70x | 20 contexts | упсэу, щэпсэу, щыпсэу |
|
| 491 |
+
| `шъхь` | 1.61x | 23 contexts | шъхьэ, пшъхьэ, шъхьэм |
|
| 492 |
+
| `ыгъо` | 1.66x | 19 contexts | цыгъо, мыгъо, цыгъор |
|
| 493 |
+
| `гъэх` | 1.79x | 14 contexts | багъэх, яӏагъэх, тхыгъэх |
|
| 494 |
|
| 495 |
### 6.4 Affix Compatibility (Co-occurrence)
|
| 496 |
|
|
|
|
| 498 |
|
| 499 |
| Prefix | Suffix | Frequency | Examples |
|
| 500 |
|--------|--------|-----------|----------|
|
| 501 |
+
| `-къ` | `-э` | 94 words | къохьапӏэ, къыхаутыгъэ |
|
| 502 |
+
| `-къ` | `-р` | 64 words | къабзэр, къызэдыхэфэныр |
|
| 503 |
+
| `-къ` | `-м` | 56 words | къэралыгъуэм, къунетрэм |
|
| 504 |
+
| `-къ` | `-эр` | 52 words | къабзэр, къуаджэхэр |
|
| 505 |
+
| `-зэ` | `-р` | 43 words | зэреджэхэр, зэрар |
|
| 506 |
+
| `-зэ` | `-м` | 41 words | зэблэтхъуным, зэрагъэтэрэзыжьыгъэм |
|
| 507 |
+
| `-къ` | `-эм` | 36 words | къэралыгъуэм, къунетрэм |
|
| 508 |
+
| `-зэ` | `-эр` | 34 words | зэреджэхэр, зэпырыбгъэзэжьынхэр |
|
| 509 |
+
| `-къ` | `-эу` | 33 words | къыщегъэжьагъэу, къинэу |
|
| 510 |
+
| `-зэ` | `-э` | 31 words | зэкъотыныгъэ, зэралэжьырэ |
|
| 511 |
|
| 512 |
### 6.5 Recursive Morpheme Segmentation
|
| 513 |
|
|
|
|
| 515 |
|
| 516 |
| Word | Suggested Split | Confidence | Stem |
|
| 517 |
|------|-----------------|------------|------|
|
| 518 |
+
| республикэмрэ | **`республик-эм-рэ`** | 6.0 | `республик` |
|
| 519 |
+
| макъэхэмрэ | **`макъэ-хэм-рэ`** | 6.0 | `макъэ` |
|
| 520 |
| литературэмрэ | **`литератур-эм-рэ`** | 6.0 | `литератур` |
|
| 521 |
+
| благъохэмрэ | **`благъо-хэм-рэ`** | 6.0 | `благъо` |
|
| 522 |
+
| бзылъфыгъэмрэ | **`бзылъфыгъ-эм-рэ`** | 6.0 | `бзылъфыгъ` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 523 |
| литературэр | **`литератур-эр`** | 4.5 | `литератур` |
|
| 524 |
+
| диалектэу | **`диалект-эу`** | 4.5 | `диалект` |
|
| 525 |
+
| агъэфедэрэ | **`агъэфедэ-рэ`** | 4.5 | `агъэфедэ` |
|
| 526 |
+
| шъхьафитэу | **`шъхьафит-эу`** | 4.5 | `шъхьафит` |
|
| 527 |
+
| зыкъэзыӏэтыгъэм | **`зыкъэзыӏэтыгъ-эм`** | 4.5 | `зыкъэзыӏэтыгъ` |
|
| 528 |
+
| ишъхъэрэмрэ | **`ишъхъ-эр-эм-рэ`** | 4.5 | `ишъхъ` |
|
| 529 |
+
| адэмыехэр | **`адэмые-хэр`** | 4.5 | `адэмые` |
|
| 530 |
+
| зэкъоуцохэу | **`зэ-къ-оуцох-эу`** | 4.5 | `оуцох` |
|
| 531 |
+
| ыгузэгухэм | **`ыгузэгу-хэм`** | 4.5 | `ыгузэгу` |
|
| 532 |
+
| беслъэнейхэр | **`беслъэней-хэр`** | 4.5 | `беслъэней` |
|
| 533 |
|
| 534 |
### 6.6 Linguistic Interpretation
|
| 535 |
|
|
|
|
| 763 |
---
|
| 764 |
*Generated by Wikilangs Models Pipeline*
|
| 765 |
|
| 766 |
+
*Report Date: 2026-01-03 18:25:02*
|
models/embeddings/aligned/ady_128d.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1025644289
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:54fa8beb790c99fb8dbceffed895e361fc95177fa65b6c2a2f24770283d988e7
|
| 3 |
size 1025644289
|
models/embeddings/aligned/ady_128d.projection.npy
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 65664
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7bfa78b32e0f6f62daba66fcb84c2a3e87517a04646dc0a0af6ccf324dc5a075
|
| 3 |
size 65664
|
models/embeddings/aligned/ady_32d.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 256436225
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3ad9dd33c4ed4c8edb30c3d64131a678ef7ebbdd54ba3dd558250822024bf1e2
|
| 3 |
size 256436225
|
models/embeddings/aligned/ady_32d.projection.npy
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 4224
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:031e3bbfd2a301b3bc3e8ce43f500c590b33ec69cb561392f3d213720f3a1d27
|
| 3 |
size 4224
|
models/embeddings/aligned/ady_64d.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 512838913
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4aa856abed6e473095ebd9f0f03d2e187d2185df3ec2d0b3a05a40e5f88a358e
|
| 3 |
size 512838913
|
models/embeddings/aligned/ady_64d.projection.npy
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 16512
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9bc27a15f027d3d8ab1e3bc89651f96b2b4360385a87f809c730b70723c72f91
|
| 3 |
size 16512
|
models/embeddings/monolingual/ady_128d.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1025644289
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:54fa8beb790c99fb8dbceffed895e361fc95177fa65b6c2a2f24770283d988e7
|
| 3 |
size 1025644289
|
models/embeddings/monolingual/ady_32d.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 256436225
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3ad9dd33c4ed4c8edb30c3d64131a678ef7ebbdd54ba3dd558250822024bf1e2
|
| 3 |
size 256436225
|
models/embeddings/monolingual/ady_64d.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 512838913
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4aa856abed6e473095ebd9f0f03d2e187d2185df3ec2d0b3a05a40e5f88a358e
|
| 3 |
size 512838913
|
models/tokenizer/ady_tokenizer_16k.model
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 579551
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c1630ce7b27e0908e5b8246fc0edf09e4cbf03dec6806f8d56bd743aedabbe72
|
| 3 |
size 579551
|
models/tokenizer/ady_tokenizer_32k.model
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 926359
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:94c1932af3b48ee1e7a378d1c96e4111433905d74e509e9df034526399e2d9b6
|
| 3 |
size 926359
|
models/tokenizer/ady_tokenizer_8k.model
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 395183
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6895f68eb474a2e19a0debf47aac519afd50b5c6357bf706aa9d25d32f54cd66
|
| 3 |
size 395183
|
visualizations/embedding_alignment_quality.png
CHANGED
|
|
visualizations/embedding_isotropy.png
CHANGED
|
|
visualizations/embedding_norms.png
CHANGED
|
|
visualizations/embedding_similarity.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
visualizations/embedding_tsne_multilingual.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
visualizations/performance_dashboard.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
visualizations/position_encoding_comparison.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
visualizations/tsne_sentences.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
visualizations/tsne_words.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|