Ada Hsu 的胡思亂想: 2019

2019年4月28日

啟用 TensorFlow 的 Intel CPU 擴充指令集

昨天老師在課堂示範了使用 Keras 搭配 TensorFlow 做底層的範例程式，以 minst 資料集的 60000 個數字圖檔進行圖像辦識訓練，這個範例程式透過 Keras 建立了 2 個隱藏層，每次訓練 100 筆資料並持續測試 20 輪，Keras 的 Dense 資訊如下：

Layer (type)	Output Shape	Param #
dense_1 (Dense)	(None, 689)	540865
dense_2 (Dense)	(None, 689)	475410
dense_3 (Dense)	(None, 689)	475410
dense_4 (Dense)	(None, 10)	6900

一開始是在 Jupyter 的 console 中看到下面這段訊息，它顯示了 TensorFlow 沒有真正使用CPU 的擴充指令集，而透過 pip search 發現有個可疑的套件 intel-tensorflow，而我才剛剛在 MacOSX 上透過 PlaidML 啟用了 Keras 對 AMD Radeon 560X 的支援，不妨來看看這幾種組合對 Keras 訓練的影響。

I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA

先說結論

不一定需要有多強大的 GPU 硬體支出，大部份電腦只要額外安裝 intel-tensorflow 套件就能有效減少訓練時間。

補充說明：

請同學在 Windows 上測試，認為似乎裝不裝 intel-tensorflow 都一樣慢
個人測試 CPU Extension 全開的 tensorflow 比 AMD Radeon 560X GPU 版本還快
但 Evaluating PlaidML and GPU Support for Deep Learning on a Windows 10 Notebook 這篇文章看起來則是 GPU 有絕對優勢，但它的範例程式我跑不了... 會出以下訊息
```
ValueError: `steps_per_epoch=None` is only valid for a generator based on the `keras.utils.Sequence` class. Please specify `steps_per_epoch` or use the `keras.utils.Sequence` class.
```

補充說明 20190806：

在 tensorflow 1.14.0 之後似乎 Intel 優化版已整併到官方版本內，pip 內的版本停在 1.13.1 中，1.14 啟動時的訊息也只剩 AVX2 FMA 未啟動
另外發現在 Mac 上使用 PlaidML 當 Keras 後端時，使用 Apple 自家的 Metal API 效率會比 OpenCL 好約 1/3（94 秒 --> 68 秒）
CPU Extension 全開的客制版目前還沒有可和 tensorflow 1.14.0 版搭配的 MacOS 平台版本，大概只能自行建置

各種 TenforFlow 版本安裝

通用標準版

沒有經過特別設定的的話應該都是安裝這個版本的 TensorFlow，也就是經由以下指令安裝的版本。在執行過程中會在 Console 中偵測 CPU 能提供什麼擴充指令集然後顯示出來。

pip install tensorflow

TensorFlow 對 CPU 擴充指令集的提示像這樣：

I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA

Intel 優化版

從名字可以知道是 Intel 調教過的版本，主要是啟用 Intel CPU 的 SSE4.1 SSE4.2 AVX 這 3 個擴充指令集。套件資訊中雖然沒有寫相依性套件資料，但個人覺得應該有依賴於 tensorflow 官方套件。

$ pip show intel-tensorflow
Name: intel-tensorflow
Version: 0.0.1
Summary: Intel Optimized Tensorflow with MKL
Home-page: https://github.com/IntelAI
Author: Intel Tensorflow optimization team
Author-email: [email protected]
License: UNKNOWN
Location: /usr/local/anaconda3/envs/ai/lib/python3.6/site-packages
Requires: pip
Required-by:

安裝這個套件後，TensorFlow 對 CPU 擴充指令集的提示變成這樣：

I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

客制版

就算是 Intel 優化版的 TenforFlow 也沒有完整啟用 CPU 擴充指令集，於是就有強者在 GitHub 上預先編譯 TensorFlow 套件並加入完整擴充指令集設定，如 tensorflow-build 及 tensorflow-windows-wheel。
套件安裝方式只要使用以下 pip 指令即可，但找到對的套件包很麻煩。

pip install --ignore-installed --upgrade "下載的 .whl 檔完整路徑"

各種組態下的訓練狀況

此處針對文章開頭的情境分別進行訓練，每次訓練前都會關閉 Jupyter （因為要安裝套件）並直接對 Jupyter Kernel 執行 Restart & Run All 的結果。
硬體環境為：MacBook Pro (15-inch, 2018), 2.6 GHz Intel Core i7, Radeon Pro 560X 4 GB 獨立顯示卡。

Backend 類型	花費時間（秒）	CPU 使用率	風扇運轉狀態
tensorflow (1.13.1)	211.579	1100% ↑	全速
intel-tensorflow (1.13.1)	113.840	800% ↓	微速
客製版 1.13.1	97.193	700% ↑	微速
PlaidML w/ AMD Radeon Pro 560X	91.528	250% ↓	全速

測試用的 Code 可參考 keras_MNIST.ipynb，要自行驗證的話記得把 backend 改回 tensorflow。

2019年1月31日

使用 nightwatch.js 進行 E2E 測試

這兩天來努力使用 nightwatch.js 對網站進行驗證，一些小心得先記錄一下好了。以下內容不一定是正規解法，有比較好的方式歡迎留言討論，謝謝。

nightwatch.js 目前還不支援 Data Driven Testing

整個場景是要針對已知的數筆資料（可能會增減）對網站系統進行登入及查驗的動作，這些資料被塞在一個 .txt 檔內，理想狀況是 nightwatch.js 打開該檔後依序跑指定的 test case。
但是它沒有迴圈的概念，所以永遠只跑第一筆記錄...
Google 後有人說可以用它的 mocha runner，但是... 看不懂啊... XD
沒關係，macOS 環境下有 shell，在外面跑迴圈把資料餵食給 nightwatch.js 總行了吧？

奇怪的命令列參數傳遞

待測試的資料是數字型的，直接丟給 nightwatch.js 竟然報錯...

node_modules/nightwatch/bin/nightwatch  tests/crawler.js 1234345678
   The "path" argument must be of type string. Received type number
       at validateString (internal/validators.js:125:11)
       at Object.resolve (path.js:1080:7)

最後傳參數的方式變成以下這種很奇怪的格式：

node_modules/nightwatch/bin/nightwatch  tests/crawler.js -ABCDEFG -12345678

在 nightwatch.js 內取得參數的方式倒是不難，就是濾掉 - 號以及檔案內可能參雜的 \r 或 \n 而以。

const args = process.argv.slice(3);
const param1 = args[0].replace( /^-/, '').replace( /\r/, '').replace( /\n/, '');
const param2 = args[1].replace( /^-/, '' ).replace( /\r/, '').replace( /\n/, '');

點選某一個 radiobutton

本來應該使用 .click() api 去點選頁面上的元件，但它馬的不會動啊... 最後透過 .execute() 搭配 document.querySelector() 方式總算點到了... = =

browser
    .execute(function(){
        document.querySelector( '#A > span > input[type="radio"]' ).click();
        document.querySelector( '#B > div.section_wrap > form > ul > li.ht > div.ipt_wrap_2 > label:nth-child(1) > span > input[type="radio"]' ).click();
    })

checkbox 還要檢查是不是可見元素

一樣還是透過 .execute() 及 document.querySelector() 取得 checkbox 本身，然後檢查它的 offsetParent 屬性是不是 null，若是 null 表示使用者看不到該 checkbox ，可以不用理它。
另外，別忘了檢查 checked 屬性，亂點可是會捅漏子的...

let box = document.querySelector( '#A input[type="checkbox"]' );
if( box.offsetParent != null && box.checked == false ) {
    box.click();
}

關於 elements() API

如果單純在 elements() 的 callback() 檢視它的傳回值時，會看到一組類似這樣的東西：


{ sessionId: '31bf7adb53f4f9a93a0c4404e41d303f',
  status: 0,
  value:
   [ { ELEMENT: '0.7637505561737095-2' },
     { ELEMENT: '0.7637505561737095-3' },
     { ELEMENT: '0.7637505561737095-4' } ] }

這似乎是 Selenium 內部用來管理 element 用的代號，可以透過 .elementIdXXXXXX() 之類的 API 取得真正的 element 的指定屬性（對，是那個指定屬性放在 .value 屬性內，不是傳回該 element ），所以可以這樣子捉出頁面上的資訊：

browser
    .elements( 'css selector', 'div.plan_items', function(result){
         result.value.forEach(element => {
            browser.elementIdAttribute(element.ELEMENT, 'id', function(result){
                console.log( result.value );
                planList.push( result.value.replace( /^plan_items_/, '' ) );
                });
            });
        })

結論

這些心得微小的不足為外人道，但它馬的花了我 2 天啊....
補個幹！

Ada Hsu 的胡思亂想

2019年4月28日

啟用 TensorFlow 的 Intel CPU 擴充指令集

先說結論

補充說明：

補充說明 20190806：

各種 TenforFlow 版本安裝

通用標準版

Intel 優化版

客制版

各種組態下的訓練狀況

2019年1月31日

使用 nightwatch.js 進行 E2E 測試

nightwatch.js 目前還不支援 Data Driven Testing

奇怪的命令列參數傳遞

點選某一個 radiobutton

checkbox 還要檢查是不是可見元素

關於 elements() API

結論

Hard to Read ?

搜尋此網誌

文章分類

熱門文章

網誌存檔

追蹤者

關於我自己

總瀏覽量

網路連署