Crawl News Archive Using Scrapy

This is a simple instruction that trying to crawl news content from an authenticated ASP.NET website using Scrapy.

Scrapy is an application framework for crawling web sites and extracting structured data which can be used for a wide range of useful applications, like data mining, information processing or historical archival.

Deploy Jekyll Engine To Heroku

Install Bundler

Make sure Ruby and gem are installed in the system. Bundler is required by Heroku to handle dependency hell. Check the Heroku's sample app tutorial. To install the bundle program, run the following command:

1 gem install bundler
2 % or %
3 sudo gem install bundler

Scrapy Error On Cryptography Under MacOS

Possible Problem When Install Scrapy Using Pip

In MacOS, if Scrapy package is installed using pip, the following warning maybe generated when trying to run scrapy:

cffi.ffiplatform.VerificationError: importing '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cryptography/_Cryptography_cffi_a269d620xd5c405b7.so': dlopen(/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cryptography/_Cryptography_cffi_a269d620xd5c405b7.so, 2): Symbol not found: _CRYPTO_malloc_debug_init

Integrate Blog Content With MathJax

MathJax is a javascript library that able to display mathematical formula in your web page.

To integrate MathJax into your Jekyll server. You will need to include the following code in your <head>...</head> html block.

1 <script type="text/javascript"
2   src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
3 </script>