{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# K ta yaqin qo'shni usuli\n", "\n", "Ushbu qismda o'rganmoqchi bo'lgan algoritmimiz bu *k ta yaqin qo'shni* deb nomlanadi. Ushbu usul biz rasmlardagi raqamlarni aniqlashda ishlatilgan usulning umumiyroq ko'rinishi hisoblanadi. Hamda bu usulni ham biz umumiy holda \"Bayes\" usuli deyishimiz mumkin. Chunki ushbu usul ham oldingi mavjud to'plamdan foydalanib obyektga baho beradi. *Yaqin qo'shni* usuli asosida rasmidagi raqamni topayotganimizda bitta yaqin qo'shinisi qaragan edik. Endi shuni kengaytirgan holda bitta emas *k* ta qo'shni e'tiborga olamiz. Undan tashqari ushbu bobda biz ushbu usulni umumlashtirib regressiya masalalari uchun ham qo'llaymiz.\n", "\n", "## " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data = np.loadtxt('/home/tqqt1/AI/teachings/online-courses/ai_intro/datasets/housing.csv',\n", " skiprows=1,\n", " delimiter=',')\n", "X = data[:, 1:]\n", "y = data[:, 0]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "num_train = 300\n", "num_val = 145\n", "num_test = 100" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "indices = np.arange(data.shape[0])\n", "np.random.seed(42)\n", "np.random.shuffle(indices)\n", "\n", "X_tr = X[indices[:num_train]]\n", "y_tr = y[indices[:num_train]]\n", "\n", "X_va = X[indices[num_train:num_train+num_val]]\n", "y_va = y[indices[num_train:num_train+num_val]]\n", "\n", "X_te = X[indices[num_train+num_val:]]\n", "y_te = y[indices[num_train+num_val:]]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "x_tr_max = X_tr.max(axis=0)\n", "\n", "X_tr = X_tr / x_tr_max\n", "X_va = X_va / x_tr_max\n", "X_te = X_te / x_tr_max" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "D = np.zeros((num_val, num_train))\n", "# val to'plam bo'yicha loop\n", "i = 0\n", "while i < num_val:\n", " # train to'plam bo'yicha loop\n", " j = 0\n", " while j < num_train:\n", " # Manhattan\n", " # D[i, j] = np.sum(np.abs(X_va[i] - X_tr[j]))\n", " # Euclidean\n", " D[i, j] = np.sqrt(np.sum((X_va[i] - X_tr[j])**2))\n", " j += 1 # j = j + 1\n", " i += 1 # i = i + 1\n", "D_argsort = np.argsort(D, axis=1)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "k=8, E=835205.0000\n" ] } ], "source": [ "k = 1\n", "k_best = 2\n", "E_best = 1e10\n", "while k <= 50:\n", " y_pred = y_tr[D_argsort[:, :k]]\n", " y_pred = np.mean(y_pred, axis=1)\n", " E = np.mean(np.abs(y_pred - y_va))\n", " \n", " if E < E_best:\n", " E_best = E\n", " k_best = k\n", "\n", " k += 1 # k = k + 1\n", "\n", "print(f\"k={k_best}, E={E:.4f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "k=11, E=784745.0897\n", "k=8, E=835205.0000" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "D = np.zeros((num_test, num_train))\n", "# val to'plam bo'yicha loop\n", "i = 0\n", "while i < num_test:\n", " # train to'plam bo'yicha loop\n", " j = 0\n", " while j < num_train:\n", " # D[i, j] = np.sum(np.abs(X_te[i] - X_tr[j]))\n", " D[i, j] = np.sqrt(np.sum((X_te[i] - X_tr[j])**2))\n", " j += 1 # j = j + 1\n", " i += 1 # i = i + 1\n", "D_argsort = np.argsort(D, axis=1)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "k=8, Test xatoligi: 920443.5625\n" ] } ], "source": [ "y_pred = y_tr[D_argsort[:, :k_best]]\n", "y_pred = np.mean(y_pred, axis=1)\n", "E = np.mean(np.abs(y_pred - y_te))\n", "print(f\"k={k_best}, Test xatoligi: {E:.4f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Vazinlashgan KNN" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "D = np.zeros((num_val, num_train))\n", "# val to'plam bo'yicha loop\n", "i = 0\n", "while i < num_val:\n", " # train to'plam bo'yicha loop\n", " j = 0\n", " while j < num_train:\n", " # Manhattan\n", " # D[i, j] = np.sum(np.abs(X_va[i] - X_tr[j]))\n", " # Euclidean\n", " D[i, j] = np.sqrt(np.sum((X_va[i] - X_tr[j])**2))\n", " j += 1 # j = j + 1\n", " i += 1 # i = i + 1\n", "D_argsort = np.argsort(D, axis=1)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "k=8, E=834687.4107\n" ] } ], "source": [ "k_best = 2\n", "E_best = 1e10\n", "k = 2\n", "while k <= 50:\n", " D_k = np.take_along_axis(D, D_argsort, axis=1)[:, :k]\n", " D_k_sum = np.sum(D_k, axis=1, keepdims=True)\n", " D_k_1 = D_k / D_k_sum\n", " D_k_2 = 1 - D_k_1\n", " if k > 1:\n", " w = D_k_2 / (k - 1)\n", " else:\n", " w = D_k_2\n", "\n", " y_pred = y_tr[D_argsort[:, :k]]\n", " y_pred = y_pred * w\n", " y_pred = np.sum(y_pred, axis=1)\n", " E = np.mean(np.abs(y_pred - y_va))\n", " \n", " if E < E_best:\n", " E_best = E\n", " k_best = k\n", "\n", " k += 1 # k = k + 1\n", "\n", "print(f\"k={k_best}, E={E:.4f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "k=8, E=784162.1482\n", "k=8, E=834687.4107\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "D = np.zeros((num_test, num_train))\n", "# val to'plam bo'yicha loop\n", "i = 0\n", "while i < num_test:\n", " # train to'plam bo'yicha loop\n", " j = 0\n", " while j < num_train:\n", " # Manhattan\n", " # D[i, j] = np.sum(np.abs(X_te[i] - X_tr[j]))\n", " # Euclidean\n", " D[i, j] = np.sqrt(np.sum((X_te[i] - X_tr[j])**2))\n", " j += 1 # j = j + 1\n", " i += 1 # i = i + 1\n", "D_argsort = np.argsort(D, axis=1)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "k=8, Test xatoligi: 3916707.8565\n" ] } ], "source": [ "D_k = np.take_along_axis(D, D_argsort, axis=1)[:, :k_best]\n", "D_k_sum = np.sum(D_k, axis=1, keepdims=True)\n", "D_k_1 = D_k / D_k_sum\n", "D_k_2 = 1 - D_k_1\n", "w = D_k_2 / (k - 1)\n", "\n", "y_pred = y_tr[D_argsort[:, :k_best]]\n", "y_pred = y_pred * w\n", "y_pred = np.sum(y_pred, axis=1)\n", "E = np.mean(np.abs(y_pred - y_te))\n", "\n", "print(f\"k={k_best}, Test xatoligi: {E:.4f}\")" ] } ], "metadata": { "kernelspec": { "display_name": "ai-courses", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 2 }