-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.json
1 lines (1 loc) · 19.3 KB
/
index.json
1
[{"categories":null,"contents":"Linux Install Varnam\nAfter installing varnam, you will need to enable Varnam IBus Engine for easily start typing on any app system-wide.\nSetting Up IBus Most GNOME systems are well-integrated with IBus by default.\nIf you\u0026rsquo;re not using GNOME, you will need to setup IBus. ArchLinux Wiki has good information on setting up IBus: https://wiki.archlinux.org/title/IBus\nIBus setup might be a bit tricky in KDE. It can be easily setup in ArchLinux by installing ibus-input-support.\nEnabling Varnam IBus Engine GNOME Go to System Settings -\u0026gt; Region \u0026amp; Language See the section Input Sources Click on + button Search for \u0026ldquo;Varnam\u0026rdquo; Choose your language to add Others Go to IBus Settings -\u0026gt; Input Method tab. Click \u0026ldquo;Add\u0026rdquo; Button Choose your language and click \u0026ldquo;Add\u0026rdquo; button That\u0026rsquo;s it! You can now switch input method using mouse by clicking on the icon in system tray.\nYou can also choose a keyboard shortcut to easily switch input methods. The default key-combo is Meta Key (Windows Key) + Space\nYou can find tips and more information about Varnam IBus Engine here.\nMac Install for Mac\nWindows Coming Soon\u0026hellip;\n","permalink":"/docs/getting-started/","tags":null,"title":"Getting Started"},{"categories":null,"contents":"Input Method Engine An input method (IME) is an operating system component or program that enables users to generate characters not natively available on their input devices by using sequences of characters (or mouse operations) that are natively available on their input devices. More in Wikipedia.\nIMEs are mostly used to type non-English languages in computer using an English keyboard.\nIBus Engine IBus (Intelligent Input Bus) is an input method (IM) framework for multilingual input in GNU/Linux operating-systems.\nVarnam IBus Engine Varnam has a IBus Engine to type Indian languages on GNU/Linux systems.\nFeatures Customizable Fast Integrates with any GNU/Linux OS Tips Use number keys for selecting a suggestion quickly Use UP/DOWN arrow keys to move over suggestions Use ALT + ARROW UP/DOWN for moving between suggestion pages Highlight a suggestion, press CTRL + DEL to unlearn that suggestion if it is in the dictionary Learning Words Varnam IBus Engine has in-built learning while you type. Here are the situations where Varnam IBus Engine will learn a word:\n When you type a word and press spacebar When you select a word from suggestions box When you press a number key to select a suggestion ","permalink":"/docs/varnam-ibus-engine/","tags":null,"title":"Varnam IBus Engine"},{"categories":null,"contents":"Learning Words Varnam Dictionary Varnam uses a single dictionary for a language. This means there can be multiple schemes for the same language but the words for all of them will be common.\nThe storage location of the dictionary is:\n GNU/Linux: HOME_DIRECTORY/.local/share/varnam/learnings Mac: HOME_DIRECTORY/.local/share/varnam/learnings Learn A Word Do this in a terminal:\nvarnamcli -s ml -learn വാക്കിവിടെ The -s ml mentions the scheme ID. This word will then be added to the Malayalam dictionary.\nThis also will add the word to Malayalam dictionary :\nvarnamcli -s ml-inscript -learn വാക്കിവിടെ Because both scheme ml and ml-inscript is for the Malayalam language.\nLearn From Files Varnam has the ability to learn words from any text file.\nOrdinary File Varnam can import Indic language words from a file :\nvarnamcli -s ml -learn-from-file filename.txt filename.txt can be any kind of file, an HTML web page or any digital file where there are words to learn. Varnam find words, calculate the frequency of that word in the file and learns from it. The frequency determines the confidence of that word.\nThat is, if the word \u0026ldquo;ഒരിക്കൽ\u0026rdquo; (orikkal) is found in the file 100 times and \u0026ldquo;ഒരിടത്ത്\u0026rdquo; (oriTathth) is found 50 times, then after learning, varnamcli -s ml -t ori gives :\nഒരിക്കൽ ഒരിടത്ത് ഒരി Frequency File A frequency file looks like this (\u0026lt;word\u0026gt; \u0026lt;frequency\u0026gt;) :\nമലയാളം 120 ഒരിക്കൽ 100 ഒരിടത്ത് 50 This file can be imported to Varnam with the same command :\nvarnamcli -s ml -learn-from-file word-frequency.txt What\u0026rsquo;s different here is that the frequency value is mentioned. Varnam will learn words and use each frequency as the confidence value.\nSome places to find such frequency files or to make frequenct files on your own:\n https://github.com/AI4Bharat/indicnlp_corpus#text-corpora https://github.com/AI4Bharat/indicnlp_catalog#monolingual-corpus IME Learning If you use Varnam IBus Engine, there is in-built learning while you type. See this page.\nTraining Words You can make Varnam use a specific pattern for words.\nTrain A Word To train a single word:\nvarnamcli -s ml -train firefox=ഫയർഫോക്സ് Now if we give the input \u0026ldquo;firefox\u0026rdquo; for transliteratrion:\nvarnamcli -s ml firefox // ഫയർഫോക്സ് Train From File Varnam can train patterns from a file like this:\nacademy അക്കാഡമി access ആക്സസ് accident ആക്സിഡന്റ് accord അക്കോർഡ് account അക്കൗണ്ട് Do this to train each pattern from the file:\nvarnamcli -s ml -train-from-file my-file.txt Export Learnings You can export your local Varnam Learnings data to use on another system or share with others.\nSimply run this to do an export:\nvarnamcli -s ml -export my-export The export file will have the extension .VLF (Varnam Learnings File).\nDepending on the number of words in your dictionary, the number of files generated will vary. Example:\n my-export-1.vlf my-export-2.vlf my-export-3.vlf By default, the maximum number of words in a file will be 30,000. The words are exported according to descending order of words.\nThe export format is JSON. Here\u0026rsquo;s a tidied-up sample:\n{ \u0026#34;words\u0026#34;: [ { \u0026#34;w\u0026#34;: \u0026#34;എന്നു\u0026#34;, // Word \u0026#34;c\u0026#34;: 210, // Weight/Confidence \u0026#34;l\u0026#34;: 1631129802 // Learned On }, { \u0026#34;w\u0026#34;: \u0026#34;സമാനപദങ്ങൾ\u0026#34;, // Word \u0026#34;c\u0026#34;: 25, // Weight/Confidence \u0026#34;l\u0026#34;: 1631129802 // Learned On }, {\u0026#34;w\u0026#34;: \u0026#34;അക്കാഡമി\u0026#34;, \u0026#34;c\u0026#34;: 215, \u0026#34;l\u0026#34;: 1631129802}, {\u0026#34;w\u0026#34;: \u0026#34;ആക്സസ്\u0026#34;, \u0026#34;c\u0026#34;: 219, \u0026#34;l\u0026#34;: 1631129802}, ... ], \u0026#34;patterns\u0026#34;: [ {\u0026#34;p\u0026#34;:\u0026#34;academy\u0026#34;, \u0026#34;w\u0026#34;:\u0026#34;അക്കാഡമി\u0026#34;}, {\u0026#34;p\u0026#34;:\u0026#34;access\u0026#34;, \u0026#34;w\u0026#34;:\u0026#34;ആക്സസ്\u0026#34;}, ... ] } Import Learnings You can import any .vlf file with:\nvarnamcli -s ml -import my-export-1.vlf The above will import the one file my-export-1.vlf.\nTo import all the .vlf files in the folder, do this:\nvarnamcli -s ml -import \u0026#34;*.vlf\u0026#34; The quotes \u0026ldquo; are important above. Without it the wildcard file matching won\u0026rsquo;t work.\n","permalink":"/docs/learning/","tags":null,"title":"Learning Words"},{"categories":null,"contents":"A language pack is a set of Varnam Learnings File (.VLF) that can be imported into any Varnam instance. After importing a VLF into a Varnam instance, Varnam can then give better suggestions.\nA VLF file is basically a dictionary of words. Any export from Varnam is a VLF. Varnam learns each word you type locally. But we don\u0026rsquo;t need user-typed custom words to be in a VLF that will be shared publicly. A public language pack should only have a general word data.\nLet\u0026rsquo;s see how to make a language pack from different sources of words data.\nFor tutorial purposes we will be making a language pack for Malayalam (ml).\nTerms Varnam Learnings File (.VLF): Words/Learnings can be exported from Varnam to be used in another Varnam instance. This export file format is the VLF. Corpus: A corpus is a language resource consisting of a large and structured set of texts. ml: Malayalam scheme identifier. Varnam scheme identifier doesn\u0026rsquo;t necessarily need to be a language code. Example: ml-inscript scheme exists too (for Malayalam with inscript layout). Frequency: The number of how much a certain word occurs Confidence/Weight: When Varnam predicts, it picks words sorted by the highest confidence/weight. This integer value of a word increases as a user chooses it more Prerequisites Install GoVarnam with language support Ideally a GNU/Linux system Steps In Making A Pack Find words data Make Varnam Learn words Maybe train some patterns Export If you want to upload this pack to Varnam, then\n Make pack.json file Zip it, send it Add Words See Learning docs to see all the ways to learn words in Varnam.\nGather Data You will need to find ways to scrape data from websites (Python is mostly used for this) or use public datasets. Some scrapers :\n luca.co.in scraper Newspaper website scraper Instagram scraper Public datasets:\n AI4Bharat-IndicNLP Dataset IndicNLP SMC Corpus Making A Frequency File A Frequency file shows the frequency of words in a word corpus. A sample frequency file :\nമലയാളം 120 ഒരിക്കൽ 100 ഒരിടത്ത് 50 This means the word മലയാളം was found 120 times, ഒരിക്കൽ was found 100 times etc.\nVarnam can learn words from such files setting the frequency as confidence value.\nLet\u0026rsquo;s make one:\n Gather data from websites, social media etc. into .txt files. Sample : ml-wikipedia-article-india.txt ml-wikipedia-article-kerala.txt How to gather data ? See section above\n Use this instruction to make a frequency report file from the previous .txt files. Cleanup of .txt files is required to make the frequency output better. This can include fixing Unicode problems, common errors like : Using 0 instead of anusvaram ( ം) Using Malayalam numeral ൪ (4) instead of ർ (both are visually similar BUT very different meanings, പയർ != പയ൪). The frequency calculation is a time consuming process depending on number of files and content, so make sure input data is alright before doing it. What that script does : Remove all characters except Malayalam Unicode block characters \u0026amp; ZWJ \u0026amp; NZWJ Remove unwanted newlines, whitespaces Make each line have a single word Sort Count unique occurences (frequency) of words Sort by the occurence number (frequency) Use a text editor to display \u0026amp; edit large files. The report will be big and might have errors. You will have to fix this. Some suggestions : Remove words that have less frequency (Perhaps lesser than 40 or 50 ?) Some words that have spelling mistakes might reach a higher frequency, find them and remove Remember that errored words will later turn out to be suggestions in varnam, so be cautious ! :) Remove words that are already collected. See below The output report.txt will have the format we need to import to Varnam. This file can then be shared to anyone. But, since the learning process is slow, it\u0026rsquo;s better to share trained files.\nDepending on where the data is collected from, we can divide these frequency report files. Examples:\n wikipedia-frequency-report.txt (Words sourced from wikipedia) luca-frequency-report.txt (Science related words sourced from luca.co.in) Useful Tips To remove same words from first frequency report :\nawk 'NR==FNR{a[$1]=1;next}!a[$1]' wikipedia-frequency-report.txt luca-frequency-report.txt \u0026gt; luca.txt Now luca.txt will only have words from luca-frequency-report.txt that is not in wikipedia-frequency-report.txt. This file can then be renamed ml-luca-pack and shared. Doing so, we can assume that ml-luca-pack will only have Science related words.\n To combine words from two frequency reports (report1.txt \u0026amp; report2.txt) :\nawk '{ count[$1] += $2 } END { for(elem in count) print elem, count[elem] }' report1.txt report2.txt | sort -gr -t \u0026quot; \u0026quot; -k 2 \u0026gt; report-combined.txt Now, report-combined.txt will have combined result. The frequency count will be summed up too.\n Making Pack Doing a Varnam export will export all the files in the database. Instead, we\u0026rsquo;re gonna make separate trained files. These files are then made as packs.\nExample packs :\n ml-basic pack: https://github.com/varnamproject/schemes/tree/master/schemes/ml/ml-basic ml-english pack: https://github.com/varnamproject/schemes/tree/master/schemes/ml/ml-english Let\u0026rsquo;s try making a pack called ml-basic for basic Malayalam words.\n Make a folder for our pack mkdir ml-basic Copy the frequency report file : cp \u0026lt;path-to-frequency-report.txt\u0026gt; ./report.txt Set Varnam environment variables : export VARNAM_LEARNINGS_DIR=$(realpath .) This makes Varnam store the learnings in the current folder. Any further varnam commands won\u0026rsquo;t reflect on your system\u0026rsquo;s varnam installation.\nIf you do varnamcli -s ml -t hello, you will get only 1 suggestion because the current folder\u0026rsquo;s varnam learnings file doesn\u0026rsquo;t have anything in it. Let\u0026rsquo;s teach some words in the next step\n Make Varnam learn from the report : varnamcli -s ml -learn-from-file report.txt This will make a SQLite database file in the current folder called ml.vst.learnings\n You can also make Varnam use a specific pattern for a word : varnamcli -s ml -train firefox=ഫയർഫോക്സ് This also gets stored in the ml.vst.learnings file. See more on this here\n Export the learnings from the file : varnamcli -s ml -export \u0026quot;ml-basic\u0026quot; There will be sometimes be more than one output file (Read Export docs). These outputs will be different versions of our pack. The outputs are sorted according to confidence. Less important words end up on last files. So, ml-basic-1.vlf will have the words of most importance.\n Make a file pack.json containing information about the pack : { \u0026quot;identifier\u0026quot;: \u0026quot;ml-basic\u0026quot;, \u0026quot;name\u0026quot;: \u0026quot;Malayalam Basic\u0026quot;, \u0026quot;description\u0026quot;: \u0026quot;Collection of Malayalam words sourced from Wikipedia\u0026quot;, \u0026quot;lang\u0026quot;: \u0026quot;ml\u0026quot;, \u0026quot;versions\u0026quot;: [ { \u0026quot;identifier\u0026quot;: \u0026quot;ml-basic-1\u0026quot;, \u0026quot;version\u0026quot;: 1, \u0026quot;description\u0026quot;: \u0026quot;Words with highest confidence\u0026quot;, \u0026quot;size\u0026quot;: 12481353 }, { \u0026quot;identifier\u0026quot;: \u0026quot;ml-basic-2\u0026quot;, \u0026quot;version\u0026quot;: 2, \u0026quot;description\u0026quot;: \u0026quot;Words with confidence lesser than 95\u0026quot;, \u0026quot;size\u0026quot;: 11814988 }, { \u0026quot;identifier\u0026quot;: \u0026quot;ml-basic-3\u0026quot;, \u0026quot;version\u0026quot;: 3, \u0026quot;description\u0026quot;: \u0026quot;Words with confidence lesser than 3\u0026quot;, \u0026quot;size\u0026quot;: 10461558 }, { \u0026quot;identifier\u0026quot;: \u0026quot;ml-basic-4\u0026quot;, \u0026quot;version\u0026quot;: 4, \u0026quot;description\u0026quot;: \u0026quot;Words with confidence 1\u0026quot;, \u0026quot;size\u0026quot;: 7556327 } ] } The value of size is in bytes (Use ls -la to get size in bytes). The description for each version can be made according to the file\u0026rsquo;s contents. Open the file and find the highest confidence value (will be in the first line itself).\n That\u0026rsquo;s it! You\u0026rsquo;ve successfully made a pack, now zip the files and send it to Varnam maintainer : zip pack.zip pack.json report ml-* ","permalink":"/docs/making-language-pack/","tags":null,"title":"Making A Language Pack"},{"categories":null,"contents":"How does Varnam work ? See from the second-half of this video: https://www.youtube.com/watch?v=pJpOWlD_7OI\nPeerTube: https://peertube.debian.social/w/vWwMGcmTZG9n1UWv8ZdimB?s=1\nDoes Varnam know what I\u0026rsquo;m typing ? Varnam runs entirely on your device. There is NO DATA-SHARING with anyone or any servers. There is also NO TRACKING and NO ANALYTICS. In fact there is no internet communication at all.\nVarnam learn words you type and store it in your device only.\nWhere can I file issues or get help ? We have public Telegram \u0026amp; Matrix groups. See about page.\nIf you know GitHub, you can file issues directly at https://github.com/varnamproject/govarnam/issues\nOr you can submit them at https://community.smc.org.in/c/dev/varnam\n","permalink":"/docs/faq/","tags":null,"title":"FAQ"},{"categories":null,"contents":"The Alappuzha District Collector’s Online Public Grievance System (സ്നേഹപൂർവ്വം കളക്ടർ ) is an initiative by the district administration, developed with the support of KSITM Alappuzha, to streamline the grievance redressal process. This digital platform enables residents to submit grievances and track their resolution status online, ensuring transparency and efficiency. By leveraging technology, the system enhances public service delivery, allowing authorities to address issues more effectively and improve governance in the district.\nSpecial Thanks KSITM Database Admin Team KSITM Malappuram Team ","permalink":"/about/","tags":null,"title":"About "},{"categories":null,"contents":null,"permalink":"/gallery/","tags":null,"title":"Gallery "},{"categories":null,"contents":"The implementation of the online PG module ensures a structured and efficient workflow for grievance redressal while integrating traditional public visits into the digital process. The key steps are as follows:\n Public Visit to Collector’s Office: Citizens who visit the District Collector’s office can directly submit their grievances to the designated grievance cell or an assigned officer. Collector’s Review and Assignment:The District Collector reviews the grievance and either assigns it to a specific department or adds a note with instructions for further action. In cases where immediate guidance is required, the Collector provides directions to the grievance cell for registration. Grievance Registration by PG Cell:The grievance cell registers the complaint in the online system(https://edistrict.kerala.gov.in/) ensuring it is documented with all necessary details such as applicant information, grievance description, and supporting documents.Once registered, the grievance receives a unique application number for tracking. Submission and Forwarding:The registered grievance is submitted to the corresponding department or official responsible for resolution.The system notifies the assigned department, and the status is updated for tracking by the complainant and administration. Resolution and Feedback:The concerned department processes the grievance and updates the system with the resolution details.Citizens can track the progress online and provide feedback upon resolution. ","permalink":"/workflow/","tags":null,"title":"Work Flow "},{"categories":null,"contents":null,"permalink":"/","tags":null,"title":"Snehapoorvam Collector"},{"categories":null,"contents":null,"permalink":"/categories/","tags":null,"title":"Categories"},{"categories":null,"contents":null,"permalink":"/docs/","tags":null,"title":"Docs"},{"categories":null,"contents":null,"permalink":"/tags/","tags":null,"title":"Tags"}]