Python for Data Analysis(full pdf download)

c#小王子 c#小王子 2021-04-20 1810 programming,Python


Python for Data Analysis(full pdf download)


What Is This Book About?


This book is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods usingPython as the implementation language.


When I say “data”, what am I referring to exactly? The primary focus is on structured data, a deliberately vague term that encompasses many different common forms of data, such as


• Multidimensional arrays (matrices)

Tabular or spreadsheet-like data in which each column may be a different type(string, numeric, date, or otherwise). This includes most kinds of data commonly stored in relational databases or tab- or comma-delimited text files

• Multiple tables of data interrelated by key columns (what would be primary or foreign keys for a SQL user)

• Evenly or unevenly spaced time series


This is by no means a complete list. Even though it may not always be obvious, a large percentage of data sets can be transformed into a structured form that is more suitable for analysis and modeling. If not, it may be possible to extract features from a data set into a structured form. As an example, a collection of news articles could be processed into a word frequency table which could then be used to perform sentiment analysis.


Most users of spreadsheet programs like Microsoft Excel, perhaps the most widely used data analysis tool in the world, will not be strangers to these kinds of data.


Why Python for Data Analysis?


For many people (myself among them), the Python language is easy to fall in love with. Since its first appearance in 1991, Python has become one of the most popular dynamic, programming languages, along with Perl, Ruby, and others. Python and Ruby have become especially popular in recent years for building websites using their numerous web frameworks, like Rails (Ruby) and Django (Python). Such languages are often called scripting languages as they can be used to write quick-and-dirty small programs, or scripts. I don’t like the term “scripting language” as it carries a connotation that they cannot be used for building mission-critical software. Among interpreted languages Python is distinguished by its large and active scientific computing community. Adoption of Python for scientific computing in both industry applications and academic research has increased significantly since the early 2000s.


For data analysis and interactive, exploratory computing and data visualization, Python will inevitably draw comparisons with the many other domain-specific open source and commercial programming languages and tools in wide use, such as R, MATLAB, SAS, Stata, and others. In recent years, Python’s improved library support (primarily pandas) has made it a strong alternative for data manipulation tasks. Combined with Python’s strength in general purpose programming, it is an excellent choice as a single language for building data-centric applications.


Python as Glue


Part of Python’s success as a scientific computing platform is the ease of integrating C, C++, and FORTRAN code. Most modern computing environments share a similar set of legacy FORTRAN and C libraries for doing linear algebra, optimization, integration, fast fourier transforms, and other such algorithms. The same story has held true for many companies and national labs that have used Python to glue together 30 years’

worth of legacy software.


Most programs consist of small portions of code where most of the time is spent, with large amounts of “glue code” that doesn’t run often. In many cases, the execution time of the glue code is insignificant; effort is most fruitfully invested in optimizing the computational bottlenecks, sometimes by moving the code to a lower-level language like C.


In the last few years, the Cython project (http://cython.org) has become one of the preferred ways of both creating fast compiled extensions for Python and also interfacing with C and C++ code.


Solving the “Two-Language” Problem


In many organizations, it is common to research, prototype, and test new ideas using a more domain-specific computing language like MATLAB or R then later port those ideas to be part of a larger production system written in, say, Java, C#, or C++. What people are increasingly finding is that Python is a suitable language not only for doing research and prototyping but also building the production systems, too. I believe that more and more companies will go down this path as there are often significant organizational benefits to having both scientists and technologists using the same set of programmatic tools.


Why Not Python?


While Python is an excellent environment for building computationally-intensive scientific applications and building most kinds of general purpose systems, there are a number of uses for which Python may be less suitable.


As Python is an interpreted programming language, in general most Python code will run substantially slower than code written in a compiled language like Java or C++. As programmer time is typically more valuable than CPU time, many are happy to make this tradeoff. However, in an application with very low latency requirements (for example, a high frequency trading system), the time spent programming in a lower-level,

lower-productivity language like C++ to achieve the maximum possible performance might be time well spent.


Python is not an ideal language for highly concurrent, multithreaded applications, particularly applications with many CPU-bound threads. The reason for this is that it has what is known as the global interpreter lock (GIL), a mechanism which prevents the interpreter from executing more than one Python bytecode instruction at a time. The technical reasons for why the GIL exists are beyond the scope of this book, but as of this writing it does not seem likely that the GIL will disappear anytime soon. While it is true that in many big data processing applications, a cluster of computers may be required to process a data set in a reasonable amount of time, there are still situations where a single-process, multithreaded system is desirable.


This is not to say that Python cannot execute truly multithreaded, parallel code; that code just cannot be executed in a single Python process. As an example, the Cython project features easy integration with OpenMP, a C framework for parallel computing, in order to to parallelize loops and thus significantly speed up numerical algorithms.


Essential Python Libraries


For those who are less familiar with the scientific Python ecosystem and the libraries used throughout the book, I present the following overview of each library.


NumPy


NumPy, short for Numerical Python, is the foundational package for scientific computing in Python. The majority of this book will be based on NumPy and libraries built on top of NumPy. It provides, among other things:


• A fast and efficient multidimensional array object ndarray

• Functions for performing element-wise computations with arrays or mathematical operations between arrays

• Tools for reading and writing array-based data sets to disk

• Linear algebra operations, Fourier transform, and random number generation

• Tools for integrating connecting C, C++, and Fortran code to Python


Beyond the fast array-processing capabilities that NumPy adds to Python, one of its primary purposes with regards to data analysis is as the primary container for data to be passed between algorithms. For numerical data, NumPy arrays are a much more efficient way of storing and manipulating data than the other built-in Python data structures. Also, libraries written in a lower-level language, such as C or Fortran, can operate on the data stored in a NumPy array without copying any data.


【Download link】

link: https://pan.baidu.com/s/16MppRHeN28d9o8RRqOLXlA

Extraction code:3r69


相关文章


Python for Data Analysis(full pdf download)

This book is concerned with the nuts and bolts of manipulating, proces

Mining the SocialWeb(HD full text Download)

how toacquire.analyze and summarize data from all corners of the socia


文章热度: 166291
文章数量: 333
推荐阅读

FlashFXP绿色版网盘下载,附激活教程 1926

FlashFxp百度网盘下载链接:https://pan.baidu.com/s/1MBQ5gkZY1TCFY8A7fnZCfQ。FlashFxp是功能强大的FTP工具

Adobe Fireworks CS6 Ansifa绿色精简版网盘下载 1665

firework可以制作精美或是可以闪瞎眼的gif,这在广告领域是需要常用的,还有firework制作下logo,一些原创的图片还是很便捷的,而且fireworks用法简单,配合dw在做网站这一块往往会发挥出很强大的效果。百度网盘下载链接:https://pan.baidu.com/s/1fzIZszfy8VX6VzQBM_bdZQ

navicat for mysql中文绿色版网盘下载 1700

Navicat for Mysql是用于Mysql数据库管理的一款图形化管理软件,非常的便捷和好用,可以方便的增删改查数据库、数据表、字段、支持mysql命令,视图等等。百度网盘下载链接:https://pan.baidu.com/s/1T_tlgxzdQLtDr9TzptoWQw 提取码:y2yq

火车头采集器(旗舰版)绿色版网盘下载 1793

火车头采集器是站长常用的工具,相比于八爪鱼,简洁好用,易于配置。火车头能够轻松的抓取网页内容,并通过自带的工具对内容进行处理。站长圈想要做网站,火车头采集器是必不可少的。百度网盘链接:https://pan.baidu.com/s/1u8wUqS901HgOmucMBBOvEA

Photoshop(CS-2015-2023)绿色中文版软件下载 1916

安装文件清单(共46G)包含Window和Mac OS各个版本的安装包,从cs到cc,从绿色版到破解版,从安装文件激活工具,应有尽有,一次性打包。 Photoshop CC绿色精简版 Photoshop CS6 Mac版 Photoshop CC 2015 32位 Photoshop CC 2015 64位 Photoshop CC 2015 MAC版 Photoshop CC 2017 64位 Adobe Photoshop CC 2018 Adobe_Photoshop_CC_2018 Photoshop CC 2018 Win32 Photoshop CC 2018 Win64

知之

知之平台是全球领先的知识付费平台。提供各个领域的项目实战经验分享,提供优质的行业解决方案信息,来帮助您的工作和学习

使用指南 建议意见 用户协议 友情链接 隐私政策 Powered by NOOU ©2020 知之