您的位置 > 首页 > 商业智能 > Add Shine to your Data Science Resume with these 8 Ambitious Projects on GitHub

Add Shine to your Data Science Resume with these 8 Ambitious Projects on GitHub

来源:分析大师 | 2019-10-08 | 发布:BOB体育娱乐平台之家

There are multiple ways of learning data science. We can go through courses, pour through books, or sift through articles. All of these lack one fundamental thing, however – practice.It all comes down to how much conceptual knowledge are you applying on a daily basis. That is what will improve, enhance and build your data science career (and consequently your chances of landing a data science role).Did you know that top tech behemoths open source a lot of their code on GitHub? It’s a brilliant way of applying and learning data science – pick up the open-source code, understand it, play around with it, and build your own model!So in this article, I have put together eight ambitious data science projects for you to immediately get your hands on. I have broadly divided them into three categories – Natural Language Processing (NLP), Computer Vision, and others that don’t fall into the above two sections.This article is part of the monthly GitHub project series we host on Analytics Vidhya. Here’s the full list for 2019 in case you missed out on some mind-blowing projects:NLP is booming right now. It is the hottest field in data science with breakthrough after breakthrough happening on a regular basis. I feel like I’m barely getting to grips with a new framework and another one comes along.That’s not a bad thing though! It just means there’s more to learn and experiment with. So in that spirit, here are four cool projects on Natural Language Processing that will definitely get you excited!Pretrained models are all the rage these days. Most of us don’t have a GPU sitting idle at home (let alone several of them) so it’s simply not possible to code deep neural network models from scratch.Enter pretrained models. These have become ubiquitous with the advent of transfer learning – the ability to train a model on one dataset and then adapt that model to perform different NLP functions on a different dataset. Pretrained models enable us to use an existing model and play around with it.This GitHub repository is a collection of over 60 pretrained language models. These include BERT, XLNet, ERNIE, ELMo, ULMFiT, among others. Here’s a diagrammatic illustration of the papers you’ll find in this repository:This is a jackpot of a repository in my opinion and one you should readily bookmark (or star) if you’re an NLP enthusiast. Here are a few resources and excellent in-depth tutorials on some of these language models:I really like this project because it shows how a simple idea can produce powerful results. The Mexican government released its annual report on September 1st and the creator of this project decided to use simple NLP text mining techniques to unearth patterns and insights.The first challenge, as the author has highlighted in the above link, was to extract all the text from the PDF file where the report was housed. He used a library called PyPDF2 to do this. The entire process is well documented in this project along with a step-by-step explanation plus Python code.Check out this visualization generated using seaborn:It’s simple yet powerful – it shows the number of
本文已经过优化显示,查看原文请点击以下链接:
查看原文:https://www.analyticsvidhya.com/blog/2019/10/8-ambitious-data-science-projects-github/
京ICP备11001960号  京ICP证090565号 京公网安备1101084107号 论坛法律顾问:王进律师知识产权保护声明免责及隐私声明   主办单位:人大经济论坛 版权所有
联系QQ:2881989700  邮箱:service@pinggu.org
合作咨询电话:(010)62719935 广告合作电话:13661292478(刘老师)

投诉电话:(010)68466864 不良信息处理电话:(010)68466864