Python pandas库｜任凭弱水三千，我只取一瓢饮（7）

上一篇链接：

Python pandas库｜任凭弱水三千，我只取一瓢饮（6）_Hann Yang的博客-CSDN博客

to_系列函数：22个（12~22）

Function12

to_numpy(self, dtype: 'NpDtype | None' = None, copy: 'bool' = False, na_value=<no_default>) -> 'np.ndarray'

Help on function to_numpy in module pandas.core.frame:
to_numpy(self, dtype: 'NpDtype | None' = None, copy: 'bool' = False, na_value=<no_default>) -> 'np.ndarray'
Convert the DataFrame to a NumPy array.

By default, the dtype of the returned array will be the common NumPy
dtype of all types in the DataFrame. For example, if the dtypes are
“float16“ and “float32“, the results dtype will be “float32“.
This may require copying data and coercing values, which may be
expensive.

Parameters
———-
dtype : str or numpy.dtype, optional
The dtype to pass to :meth:`numpy.asarray`.
copy : bool, default False
Whether to ensure that the returned value is not a view on
another array. Note that “copy=False“ does not *ensure* that
“to_numpy()“ is no-copy. Rather, “copy=True“ ensure that
a copy is made, even if not strictly necessary.
na_value : Any, optional
The value to use for missing values. The default value depends
on `dtype` and the dtypes of the DataFrame columns.

.. versionadded:: 1.1.0

Returns
——-
numpy.ndarray

See Also
——–
Series.to_numpy : Similar method for Series.

Examples
——–
>>> pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()
array([[1, 3],
[2, 4]])

With heterogeneous data, the lowest common type will have to
be used.

>>> df = pd.DataFrame({"A": [1, 2], "B": [3.0, 4.5]})
>>> df.to_numpy()
array([[1. , 3. ],
[2. , 4.5]])

For a mix of numeric and non-numeric types, the output array will
have object dtype.

>>> df['C'] = pd.date_range('2000', periods=2)
>>> df.to_numpy()
array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],
[2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)

Function13

to_parquet(self, path: 'FilePathOrBuffer | None' = None, engine: 'str' = 'auto', compression: 'str | None' = 'snappy', index: 'bool | None' = None, partition_cols: 'list[str] | None' = None, storage_options: 'StorageOptions' = None, **kwargs) -> 'bytes | None'

Help on function to_parquet in module pandas.core.frame:
to_parquet(self, path: 'FilePathOrBuffer | None' = None, engine: 'str' = 'auto', compression: 'str | None' = 'snappy', index: 'bool | None' = None, partition_cols: 'list[str] | None' = None, storage_options: 'StorageOptions' = None, **kwargs) -> 'bytes | None'
Write a DataFrame to the binary parquet format.

This function writes the dataframe as a `parquet file
<https://parquet.apache.org/>`_. You can choose different parquet
backends, and have the option of compression. See
:ref:`the user guide <io.parquet>` for more details.

Parameters
———-
path : str or file-like object, default None
If a string, it will be used as Root Directory path
when writing a partitioned dataset. By file-like object,
we refer to objects with a write() method, such as a file handle
(e.g. via builtin open function) or io.BytesIO. The engine
fastparquet does not accept file-like objects. If path is None,
a bytes object is returned.

.. versionchanged:: 1.2.0

Previously this was "fname"

engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'
Parquet library to use. If 'auto', then the option
“io.parquet.engine“ is used. The default “io.parquet.engine“
behavior is to try 'pyarrow', falling back to 'fastparquet' if
'pyarrow' is unavailable.
compression : {'snappy', 'gzip', 'brotli', None}, default 'snappy'
Name of the compression to use. Use “None“ for no compression.
index : bool, default None
If “True“, include the dataframe's index(es) in the file output.
If “False“, they will not be written to the file.
If “None“, similar to “True“ the dataframe's index(es)
will be saved. However, instead of being saved as values,
the RangeIndex will be stored as a range in the metadata so it
doesn't require much space and is faster. Other indexes will
be included as columns in the file output.
partition_cols : list, optional, default None
Column names by which to partition the dataset.
Columns are partitioned in the order they are given.
Must be None if path is not a string.
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to “urllib“ as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
“fsspec“. Please see “fsspec“ and “urllib“ for more details.

.. versionadded:: 1.2.0

**kwargs
Additional arguments passed to the parquet library. See
:ref:`pandas io <io.parquet>` for more details.

Returns
——-
bytes if no path argument is provided else None

See Also
——–
read_parquet : Read a parquet file.
DataFrame.to_csv : Write a csv file.
DataFrame.to_sql : Write to a sql table.
DataFrame.to_hdf : Write to hdf.

Notes
—–
This function requires either the `fastparquet
<https://pypi.org/project/fastparquet>`_ or `pyarrow
<https://arrow.apache.org/docs/python/>`_ library.

Examples
——–
>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_parquet('df.parquet.gzip',
… compression='gzip') # doctest: +SKIP
>>> pd.read_parquet('df.parquet.gzip') # doctest: +SKIP
col1 col2
0 1 3
1 2 4

If you want to get a buffer to the parquet content you can use a io.BytesIO
object, as long as you don't use partition_cols, which creates multiple files.

>>> import io
>>> f = io.BytesIO()
>>> df.to_parquet(f)
>>> f.seek(0)
0
>>> content = f.read()

Function14

to_period(self, freq: 'Frequency | None' = None, axis: 'Axis' = 0, copy: 'bool' = True) -> 'DataFrame'

Help on function to_period in module pandas.core.frame:
to_period(self, freq: 'Frequency | None' = None, axis: 'Axis' = 0, copy: 'bool' = True) -> 'DataFrame'
Convert DataFrame from DatetimeIndex to PeriodIndex.

Convert DataFrame from DatetimeIndex to PeriodIndex with desired
frequency (inferred from index if not passed).

Parameters
———-
freq : str, default
Frequency of the PeriodIndex.
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis to convert (the index by default).
copy : bool, default True
If False then underlying input data is not copied.

Returns
——-
DataFrame with PeriodIndex

Function15

to_pickle(self, path, compression: 'CompressionOptions' = 'infer', protocol: 'int' = 5, storage_options: 'StorageOptions' = None) -> 'None'

Help on function to_pickle in module pandas.core.generic:
to_pickle(self, path, compression: 'CompressionOptions' = 'infer', protocol: 'int' = 5, storage_options: 'StorageOptions' = None) -> 'None'
Pickle (serialize) object to file.

Parameters
———-
path : str
File path where the pickled object will be stored.
compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'
A string representing the compression to use in the output file. By
default, infers from the file extension in specified path.
Compression mode may be any of the following possible
values: {¡®infer¡¯, ¡®gzip¡¯, ¡®bz2¡¯, ¡®zip¡¯, ¡®xz¡¯, None}. If compression
mode is ¡®infer¡¯ and path_or_buf is path-like, then detect
compression mode from the following extensions:
¡®.gz¡¯, ¡®.bz2¡¯, ¡®.zip¡¯ or ¡®.xz¡¯. (otherwise no compression).
If dict given and mode is ¡®zip¡¯ or inferred as ¡®zip¡¯, other entries
passed as additional compression options.
protocol : int
Int which indicates which protocol should be used by the pickler,
default HIGHEST_PROTOCOL (see [1]_ paragraph 12.1.2). The possible
values are 0, 1, 2, 3, 4, 5. A negative value for the protocol
parameter is equivalent to setting its value to HIGHEST_PROTOCOL.

.. [1] https://docs.python.org/3/library/pickle.html.

storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to “urllib“ as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
“fsspec“. Please see “fsspec“ and “urllib“ for more details.

.. versionadded:: 1.2.0

See Also
——–
read_pickle : Load pickled pandas object (or any object) from file.
DataFrame.to_hdf : Write DataFrame to an HDF5 file.
DataFrame.to_sql : Write DataFrame to a SQL database.
DataFrame.to_parquet : Write a DataFrame to the binary parquet format.

Examples
——–
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> original_df.to_pickle("./dummy.pkl")

>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9

>>> import os
>>> os.remove("./dummy.pkl")

Function16

to_records(self, index=True, column_dtypes=None, index_dtypes=None) -> 'np.recarray'

Help on function to_records in module pandas.core.frame:
to_records(self, index=True, column_dtypes=None, index_dtypes=None) -> 'np.recarray'
Convert DataFrame to a NumPy record array.

Index will be included as the first field of the record array if
requested.

Parameters
———-
index : bool, default True
Include index in resulting record array, stored in 'index'
field or using the index label, if set.
column_dtypes : str, type, dict, default None
If a string or type, the data type to store all columns. If
a dictionary, a mapping of column names and indices (zero-indexed)
to specific data types.
index_dtypes : str, type, dict, default None
If a string or type, the data type to store all index levels. If
a dictionary, a mapping of index level names and indices
(zero-indexed) to specific data types.

This mapping is applied only if `index=True`.

Returns
——-
numpy.recarray
NumPy ndarray with the DataFrame labels as fields and each row
of the DataFrame as entries.

See Also
——–
DataFrame.from_records: Convert structured or record ndarray
to DataFrame.
numpy.recarray: An ndarray that allows field access using
attributes, analogous to typed columns in a
spreadsheet.

Examples
——–
>>> df = pd.DataFrame({'A': [1, 2], 'B': [0.5, 0.75]},
… index=['a', 'b'])
>>> df
A B
a 1 0.50
b 2 0.75
>>> df.to_records()
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
dtype=[('index', 'O'), ('A', '<i8'), ('B', '<f8')])

If the DataFrame index has no label then the recarray field name
is set to 'index'. If the index has a label then this is used as the
field name:

>>> df.index = df.index.rename("I")
>>> df.to_records()
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
dtype=[('I', 'O'), ('A', '<i8'), ('B', '<f8')])

The index can be excluded from the record array:

>>> df.to_records(index=False)
rec.array([(1, 0.5 ), (2, 0.75)],
dtype=[('A', '<i8'), ('B', '<f8')])

Data types can be specified for the columns:

>>> df.to_records(column_dtypes={"A": "int32"})
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
dtype=[('I', 'O'), ('A', '<i4'), ('B', '<f8')])

As well as for the index:

>>> df.to_records(index_dtypes="<S2")
rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],
dtype=[('I', 'S2'), ('A', '<i8'), ('B', '<f8')])

>>> index_dtypes = f"<S{df.index.str.len().max()}"
>>> df.to_records(index_dtypes=index_dtypes)
rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],
dtype=[('I', 'S1'), ('A', '<i8'), ('B', '<f8')])

Function17

to_sql(self, name: 'str', con, schema=None, if_exists: 'str' = 'fail', index: 'bool_t' = True, index_label=None, chunksize=None, dtype: 'DtypeArg | None' = None, method=None) -> 'None'

Help on function to_sql in module pandas.core.generic:
to_sql(self, name: 'str', con, schema=None, if_exists: 'str' = 'fail', index: 'bool_t' = True, index_label=None, chunksize=None, dtype: 'DtypeArg | None' = None, method=None) -> 'None'
Write records stored in a DataFrame to a SQL database.

Databases supported by SQLAlchemy [1]_ are supported. Tables can be
newly created, appended to, or overwritten.

Parameters
———-
name : str
Name of SQL table.
con : sqlalchemy.engine.(Engine or Connection) or sqlite3.Connection
Using SQLAlchemy makes it possible to use any DB supported by that
library. Legacy support is provided for sqlite3.Connection objects. The user
is responsible for engine disposal and connection closure for the SQLAlchemy
connectable See `here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_.

schema : str, optional
Specify the schema (if database flavor supports this). If None, use
default schema.
if_exists : {'fail', 'replace', 'append'}, default 'fail'
How to behave if the table already exists.

* fail: Raise a ValueError.
* replace: Drop the table before inserting new values.
* append: Insert new values to the existing table.

index : bool, default True
Write DataFrame index as a column. Uses `index_label` as the column
name in the table.
index_label : str or sequence, default None
Column label for index column(s). If None is given (default) and
`index` is True, then the index names are used.
A sequence should be given if the DataFrame uses MultiIndex.
chunksize : int, optional
Specify the number of rows in each batch to be written at a time.
By default, all rows will be written at once.
dtype : dict or scalar, optional
Specifying the datatype for columns. If a dictionary is used, the
keys should be the column names and the values should be the
SQLAlchemy types or strings for the sqlite3 legacy mode. If a
scalar is provided, it will be applied to all columns.
method : {None, 'multi', callable}, optional
Controls the SQL insertion clause used:

* None : Uses standard SQL “INSERT“ clause (one per row).
* 'multi': Pass multiple values in a single “INSERT“ clause.
* callable with signature “(pd_table, conn, keys, data_iter)“.

Details and a sample callable implementation can be found in the
section :ref:`insert method <io.sql.method>`.

Raises
——
ValueError
When the table already exists and `if_exists` is 'fail' (the
default).

See Also
——–
read_sql : Read a DataFrame from a table.

Notes
—–
Timezone aware datetime columns will be written as
“Timestamp with timezone“ type with SQLAlchemy if supported by the
database. Otherwise, the datetimes will be stored as timezone unaware
timestamps local to the original timezone.

References
———-
.. [1] https://docs.sqlalchemy.org
.. [2] https://www.python.org/dev/peps/pep-0249/

Examples
——–
Create an in-memory SQLite database.

>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite://', echo=False)

Create a table from scratch with 3 rows.

>>> df = pd.DataFrame({'name' : ['User 1', 'User 2', 'User 3']})
>>> df
name
0 User 1
1 User 2
2 User 3

>>> df.to_sql('users', con=engine)
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 1'), (1, 'User 2'), (2, 'User 3')]

An `sqlalchemy.engine.Connection` can also be passed to `con`:

>>> with engine.begin() as connection:
… df1 = pd.DataFrame({'name' : ['User 4', 'User 5']})
… df1.to_sql('users', con=connection, if_exists='append')

This is allowed to support operations that require that the same
DBAPI connection is used for the entire operation.

>>> df2 = pd.DataFrame({'name' : ['User 6', 'User 7']})
>>> df2.to_sql('users', con=engine, if_exists='append')
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 1'), (1, 'User 2'), (2, 'User 3'),
(0, 'User 4'), (1, 'User 5'), (0, 'User 6'),
(1, 'User 7')]

Overwrite the table with just “df2“.

>>> df2.to_sql('users', con=engine, if_exists='replace',
… index_label='id')
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 6'), (1, 'User 7')]

Specify the dtype (especially useful for integers with missing values).
Notice that while pandas is forced to store the data as floating point,
the database supports nullable integers. When fetching the data with
Python, we get back integer scalars.

>>> df = pd.DataFrame({"A": [1, None, 2]})
>>> df
A
0 1.0
1 NaN
2 2.0

>>> from sqlalchemy.types import Integer
>>> df.to_sql('integers', con=engine, index=False,
… dtype={"A": Integer()})

>>> engine.execute("SELECT * FROM integers").fetchall()
[(1,), (None,), (2,)]

Function18

to_stata(self, path: 'FilePathOrBuffer', convert_dates: 'dict[Hashable, str] | None' = None, write_index: 'bool' = True, byteorder: 'str | None' = None, time_stamp: 'datetime.datetime | None' = None, data_label: 'str | None' = None, variable_labels: 'dict[Hashable, str] | None' = None, version: 'int | None' = 114, convert_strl: 'Sequence[Hashable] | None' = None, compression: 'CompressionOptions' = 'infer', storage_options: 'StorageOptions' = None) -> 'None'

Help on function to_stata in module pandas.core.frame:
to_stata(self, path: 'FilePathOrBuffer', convert_dates: 'dict[Hashable, str] | None' = None, write_index: 'bool' = True, byteorder: 'str | None' = None, time_stamp: 'datetime.datetime | None' = None, data_label: 'str | None' = None, variable_labels: 'dict[Hashable, str] | None' = None, version: 'int | None' = 114, convert_strl: 'Sequence[Hashable] | None' = None, compression: 'CompressionOptions' = 'infer', storage_options: 'StorageOptions' = None) -> 'None'
Export DataFrame object to Stata dta format.

Writes the DataFrame to a Stata dataset file.
"dta" files contain a Stata dataset.

Parameters
———-
path : str, buffer or path object
String, path object (pathlib.Path or py._path.local.LocalPath) or
object implementing a binary write() function. If using a buffer
then the buffer will not be automatically closed after the file
data has been written.

.. versionchanged:: 1.0.0

Previously this was "fname"

convert_dates : dict
Dictionary mapping columns containing datetime types to stata
internal format to use when writing the dates. Options are 'tc',
'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer
or a name. Datetime columns that do not have a conversion type
specified will be converted to 'tc'. Raises NotImplementedError if
a datetime column has timezone information.
write_index : bool
Write the index to Stata dataset.
byteorder : str
Can be ">", "<", "little", or "big". default is `sys.byteorder`.
time_stamp : datetime
A datetime to use as file creation date. Default is the current
time.
data_label : str, optional
A label for the data set. Must be 80 characters or smaller.
variable_labels : dict
Dictionary containing columns as keys and variable labels as
values. Each label must be 80 characters or smaller.
version : {114, 117, 118, 119, None}, default 114
Version to use in the output dta file. Set to None to let pandas
decide between 118 or 119 formats depending on the number of
columns in the frame. Version 114 can be read by Stata 10 and
later. Version 117 can be read by Stata 13 or later. Version 118
is supported in Stata 14 and later. Version 119 is supported in
Stata 15 and later. Version 114 limits string variables to 244
characters or fewer while versions 117 and later allow strings
with lengths up to 2,000,000 characters. Versions 118 and 119
support Unicode characters, and version 119 supports more than
32,767 variables.

Version 119 should usually only be used when the number of
variables exceeds the capacity of dta format 118. Exporting
smaller datasets in format 119 may have unintended consequences,
and, as of November 2020, Stata SE cannot read version 119 files.

.. versionchanged:: 1.0.0

Added support for formats 118 and 119.

convert_strl : list, optional
List of column names to convert to string columns to Stata StrL
format. Only available if version is 117. Storing strings in the
StrL format can produce smaller dta files if strings have more than
8 characters and values are repeated.
compression : str or dict, default 'infer'
For on-the-fly compression of the output dta. If string, specifies
compression mode. If dict, value at key 'method' specifies
compression mode. Compression mode must be one of {'infer', 'gzip',
'bz2', 'zip', 'xz', None}. If compression mode is 'infer' and
`fname` is path-like, then detect compression from the following
extensions: '.gz', '.bz2', '.zip', or '.xz' (otherwise no
compression). If dict and compression mode is one of {'zip',
'gzip', 'bz2'}, or inferred as one of the above, other entries
passed as additional compression options.

.. versionadded:: 1.1.0

storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to “urllib“ as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
“fsspec“. Please see “fsspec“ and “urllib“ for more details.

.. versionadded:: 1.2.0

Raises
——
NotImplementedError
* If datetimes contain timezone information
* Column dtype is not representable in Stata
ValueError
* Columns listed in convert_dates are neither datetime64[ns]
or datetime.datetime
* Column listed in convert_dates is not in DataFrame
* Categorical label contains more than 32,000 characters

See Also
——–
read_stata : Import Stata data files.
io.stata.StataWriter : Low-level writer for Stata data files.
io.stata.StataWriter117 : Low-level writer for version 117 files.

Examples
——–
>>> df = pd.DataFrame({'animal': ['falcon', 'parrot', 'falcon',
… 'parrot'],
… 'speed': [350, 18, 361, 15]})
>>> df.to_stata('animals.dta') # doctest: +SKIP

Function19

Help on function to_string in module pandas.core.frame:
to_string(self, buf: 'FilePathOrBuffer[str] | None' = None, columns: 'Sequence[str] | None' = None, col_space: 'int | None' = None, header: 'bool | Sequence[str]' = True, index: 'bool' = True, na_rep: 'str' = 'NaN', formatters: 'fmt.FormattersType | None' = None, float_format: 'fmt.FloatFormatType | None' = None, sparsify: 'bool | None' = None, index_names: 'bool' = True, justify: 'str | None' = None, max_rows: 'int | None' = None, min_rows: 'int | None' = None, max_cols: 'int | None' = None, show_dimensions: 'bool' = False, decimal: 'str' = '.', line_width: 'int | None' = None, max_colwidth: 'int | None' = None, encoding: 'str | None' = None) -> 'str | None'
Render a DataFrame to a console-friendly tabular output.

Parameters
———-
buf : str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
columns : sequence, optional, default None
The subset of columns to write. Writes all columns by default.
col_space : int, list or dict of int, optional
The minimum width of each column.
header : bool or sequence, optional
Write out the column names. If a list of strings is given, it is assumed to be aliases for the column names.
index : bool, optional, default True
Whether to print index (row) labels.
na_rep : str, optional, default 'NaN'
String representation of “NaN“ to use.
formatters : list, tuple or dict of one-param. functions, optional
Formatter functions to apply to columns' elements by position or
name.
The result of each function must be a unicode string.
List/tuple must be of length equal to the number of columns.
float_format : one-parameter function, optional, default None
Formatter function to apply to columns' elements if they are
floats. This function must return a unicode string and will be
applied only to the non-“NaN“ elements, with “NaN“ being
handled by “na_rep“.

.. versionchanged:: 1.2.0

sparsify : bool, optional, default True
Set to False for a DataFrame with a hierarchical index to print
every multiindex key at each row.
index_names : bool, optional, default True
Prints the names of the indexes.
justify : str, default None
How to justify the column labels. If None uses the option from
the print configuration (controlled by set_option), 'right' out
of the box. Valid values are

* left
* right
* center
* justify
* justify-all
* start
* end
* inherit
* match-parent
* initial
* unset.
max_rows : int, optional
Maximum number of rows to display in the console.
min_rows : int, optional
The number of rows to display in the console in a truncated repr
(when number of rows is above `max_rows`).
max_cols : int, optional
Maximum number of columns to display in the console.
show_dimensions : bool, default False
Display DataFrame dimensions (number of rows by number of columns).
decimal : str, default '.'
Character recognized as decimal separator, e.g. ',' in Europe.

line_width : int, optional
Width to wrap a line in characters.
max_colwidth : int, optional
Max width to truncate each column in characters. By default, no limit.

.. versionadded:: 1.0.0
encoding : str, default "utf-8"
Set character encoding.

.. versionadded:: 1.0

Returns
——-
str or None
If buf is None, returns the result as a string. Otherwise returns
None.

See Also
——–
to_html : Convert DataFrame to HTML.

Examples
——–
>>> d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
>>> df = pd.DataFrame(d)
>>> print(df.to_string())
col1 col2
0 1 4
1 2 5
2 3 6

Function20

to_timestamp(self, freq: 'Frequency | None' = None, how: 'str' = 'start', axis: 'Axis' = 0, copy: 'bool' = True) -> 'DataFrame'

Help on function to_timestamp in module pandas.core.frame:
to_timestamp(self, freq: 'Frequency | None' = None, how: 'str' = 'start', axis: 'Axis' = 0, copy: 'bool' = True) -> 'DataFrame'
Cast to DatetimeIndex of timestamps, at *beginning* of period.

Parameters
———-
freq : str, default frequency of PeriodIndex
Desired frequency.
how : {'s', 'e', 'start', 'end'}
Convention for converting period to timestamp; start of period
vs. end.
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis to convert (the index by default).
copy : bool, default True
If False then underlying input data is not copied.

Returns
——-
DataFrame with DatetimeIndex

Function21

to_xarray(self)

Help on function to_xarray in module pandas.core.generic:
to_xarray(self)
Return an xarray object from the pandas object.

Returns
——-
xarray.DataArray or xarray.Dataset
Data in the pandas structure converted to Dataset if the object is
a DataFrame, or a DataArray if the object is a Series.

See Also
——–
DataFrame.to_hdf : Write DataFrame to an HDF5 file.
DataFrame.to_parquet : Write a DataFrame to the binary parquet format.

Notes
—–
See the `xarray docs <https://xarray.pydata.org/en/stable/>`__

Examples
——–
>>> df = pd.DataFrame([('falcon', 'bird', 389.0, 2),
… ('parrot', 'bird', 24.0, 2),
… ('lion', 'mammal', 80.5, 4),
… ('monkey', 'mammal', np.nan, 4)],
… columns=['name', 'class', 'max_speed',
… 'num_legs'])
>>> df
name class max_speed num_legs
0 falcon bird 389.0 2
1 parrot bird 24.0 2
2 lion mammal 80.5 4
3 monkey mammal NaN 4

>>> df.to_xarray()
<xarray.Dataset>
Dimensions: (index: 4)
Coordinates:
* index (index) int64 0 1 2 3
Data variables:
name (index) object 'falcon' 'parrot' 'lion' 'monkey'
class (index) object 'bird' 'bird' 'mammal' 'mammal'
max_speed (index) float64 389.0 24.0 80.5 nan
num_legs (index) int64 2 2 4 4

>>> df['max_speed'].to_xarray()
<xarray.DataArray 'max_speed' (index: 4)>
array([389. , 24. , 80.5, nan])
Coordinates:
* index (index) int64 0 1 2 3

>>> dates = pd.to_datetime(['2018-01-01', '2018-01-01',
… '2018-01-02', '2018-01-02'])
>>> df_multiindex = pd.DataFrame({'date': dates,
… 'animal': ['falcon', 'parrot',
… 'falcon', 'parrot'],
… 'speed': [350, 18, 361, 15]})
>>> df_multiindex = df_multiindex.set_index(['date', 'animal'])

>>> df_multiindex
speed
date animal
2018-01-01 falcon 350
parrot 18
2018-01-02 falcon 361
parrot 15

>>> df_multiindex.to_xarray()
<xarray.Dataset>
Dimensions: (animal: 2, date: 2)
Coordinates:
* date (date) datetime64[ns] 2018-01-01 2018-01-02
* animal (animal) object 'falcon' 'parrot'
Data variables:
speed (date, animal) int64 350 18 361 15

Function22

Help on function to_xml in module pandas.core.frame:
to_xml(self, path_or_buffer: 'FilePathOrBuffer | None' = None, index: 'bool' = True, root_name: 'str | None' = 'data', row_name: 'str | None' = 'row', na_rep: 'str | None' = None, attr_cols: 'str | list[str] | None' = None, elem_cols: 'str | list[str] | None' = None, namespaces: 'dict[str | None, str] | None' = None, prefix: 'str | None' = None, encoding: 'str' = 'utf-8', xml_declaration: 'bool | None' = True, pretty_print: 'bool | None' = True, parser: 'str | None' = 'lxml', stylesheet: 'FilePathOrBuffer | None' = None, compression: 'CompressionOptions' = 'infer', storage_options: 'StorageOptions' = None) -> 'str | None'
Render a DataFrame to an XML document.

.. versionadded:: 1.3.0

Parameters
———-
path_or_buffer : str, path object or file-like object, optional
File to write output to. If None, the output is returned as a
string.
index : bool, default True
Whether to include index in XML document.
root_name : str, default 'data'
The name of root element in XML document.
row_name : str, default 'row'
The name of row element in XML document.
na_rep : str, optional
Missing data representation.
attr_cols : list-like, optional
List of columns to write as attributes in row element.
Hierarchical columns will be flattened with underscore
delimiting the different levels.
elem_cols : list-like, optional
List of columns to write as children in row element. By default,
all columns output as children of row element. Hierarchical
columns will be flattened with underscore delimiting the
different levels.
namespaces : dict, optional
All namespaces to be defined in root element. Keys of dict
should be prefix names and values of dict corresponding URIs.
Default namespaces should be given empty string key. For
example, ::

namespaces = {"": "https://example.com"}

prefix : str, optional
Namespace prefix to be used for every element and/or attribute
in document. This should be one of the keys in “namespaces“
dict.
encoding : str, default 'utf-8'
Encoding of the resulting document.
xml_declaration : bool, default True
Whether to include the XML declaration at start of document.
pretty_print : bool, default True
Whether output should be pretty printed with indentation and
line breaks.
parser : {'lxml','etree'}, default 'lxml'
Parser module to use for building of tree. Only 'lxml' and
'etree' are supported. With 'lxml', the ability to use XSLT
stylesheet is supported.
stylesheet : str, path object or file-like object, optional
A URL, file-like object, or a raw string containing an XSLT
script used to transform the raw XML output. Script should use
layout of elements and attributes from original output. This
argument requires “lxml“ to be installed. Only XSLT 1.0
scripts and not later versions is currently supported.
compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'
For on-the-fly decompression of on-disk data. If 'infer', then use
gzip, bz2, zip or xz if path_or_buffer is a string ending in
'.gz', '.bz2', '.zip', or 'xz', respectively, and no decompression
otherwise. If using 'zip', the ZIP file must contain only one data
file to be read in. Set to None for no decompression.
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to “urllib“ as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
“fsspec“. Please see “fsspec“ and “urllib“ for more details.

Returns
——-
None or str
If “io“ is None, returns the resulting XML format as a
string. Otherwise returns None.

See Also
——–
to_json : Convert the pandas object to a JSON string.
to_html : Convert DataFrame to a html.

Examples
——–
>>> df = pd.DataFrame({'shape': ['square', 'circle', 'triangle'],
… 'degrees': [360, 360, 180],
… 'sides': [4, np.nan, 3]})

>>> df.to_xml() # doctest: +SKIP
<?xml version='1.0' encoding='utf-8'?>
<data>
<row>
<index>0</index>
<shape>square</shape>
<degrees>360</degrees>
<sides>4.0</sides>
</row>
<row>
<index>1</index>
<shape>circle</shape>
<degrees>360</degrees>
<sides/>
</row>
<row>
<index>2</index>
<shape>triangle</shape>
<degrees>180</degrees>
<sides>3.0</sides>
</row>
</data>

>>> df.to_xml(attr_cols=[
… 'index', 'shape', 'degrees', 'sides'
… ]) # doctest: +SKIP
<?xml version='1.0' encoding='utf-8'?>
<data>
<row index="0" shape="square" degrees="360" sides="4.0"/>
<row index="1" shape="circle" degrees="360"/>
<row index="2" shape="triangle" degrees="180" sides="3.0"/>
</data>

>>> df.to_xml(namespaces={"doc": "https://example.com"},
… prefix="doc") # doctest: +SKIP
<?xml version='1.0' encoding='utf-8'?>
<doc:data xmlns:doc="https://example.com">
<doc:row>
<doc:index>0</doc:index>
<doc:shape>square</doc:shape>
<doc:degrees>360</doc:degrees>
<doc:sides>4.0</doc:sides>
</doc:row>
<doc:row>
<doc:index>1</doc:index>
<doc:shape>circle</doc:shape>
<doc:degrees>360</doc:degrees>
<doc:sides/>
</doc:row>
<doc:row>
<doc:index>2</doc:index>
<doc:shape>triangle</doc:shape>
<doc:degrees>180</doc:degrees>
<doc:sides>3.0</doc:sides>
</doc:row>
</doc:data>

df.to_excel

与pd.read_excel()对应，找这个pd.DataFrame.to_excel()扩展学习：

参数表及对应默认值：

to_excel(
	excel_writer,
	sheet_name: 'str' = 'Sheet1',
	na_rep: 'str' = '',
	float_format: 'Optional[str]' = None,
	columns=None,
	header=True,
	index=True,
	index_label=None,
	startrow=0,
	startcol=0,
	engine=None,
	merge_cells=True,
	encoding=None,
	inf_rep='inf',
	verbose=True,
	freeze_panes=None,
	storage_options: 'StorageOptions' = None)

注意点：

要将单个对象写入Excel的.xlsx文件，只需指定目标文件名。但要写入多个工作表，需要创建一个具有目标文件名的“ExcelWriter”对象，并在文件中指定要写入的工作表。

通过指定唯一的“sheet_name”，可以写入多个工作表。

将所有数据写入文件后，需要保存更改。

请注意，使用已存在的文件名创建“ExcelWriter”对象将导致删除现有文件的内容。

参数详解：

01. excel_writer

excel_writer:path like、file like或ExcelWriter对象文件路径或现有ExcelWriter。

02. sheet_name: 'str' = 'Sheet1'

sheet_name:str，默认为“Sheet1”

将包含DataFrame的工作表的名称。

03. na_rep: 'str' = ''

na_rep:str，默认“”

缺少数据表示。

04. float_format: 'Optional[str]' = None

float_format:str，可选

设置浮点数的字符串格式。例如

“float_format=“%.2f”“将设置0.1234到0.12的格式。

05. columns=None

str的序列或列表，可选

要写入的列。

06. header=True

header：bool或str列表，默认为True

写出列名。如果给定了字符串列表，则假定为列名的别名。

07. index=True

index:bool，默认为True

写入行名称（索引）。

08. index_label=None

index_label:str或sequence，可选

索引列的列标签（如果需要）。如果未指定，并且“header”和“index”为True，则使用索引名称。如果DataFrame使用MultiIndex，则应给出序列。

09. startrow=0

startrow:int，默认值0

要转储数据帧的左上单元格行。

10. startcol=0

startcol:int，默认值0

要转储数据帧的左上角单元格列。

11. engine=None

引擎：str，可选

要使用的写入引擎“openpyxl”或“xlsxwriter”。您也可以通过选项`io.exel.xlsx.writer“、`io.excel.xls.writer“和`io.exex.xlsm.writer`设置此选项。

..已弃用：：1.2.0

作为`xlwt<https://pypi.org/project/xlwt/>`__包不再维护，“xlwt”引擎将在未来版本的panda中删除。

merge_cells=True

merge_cells:bool，默认为True

将多索引和分层行写入合并单元格。

12. encoding=None

编码：str，可选

生成的excel文件的编码。只有xlwt才需要，其他编写器本机支持unicode。

13. inf_rep='inf'

inf_rep:str，默认为“inf”

无限表示（Excel中没有无限的原生表示）。

14. verbose=True

verbose:bool，默认为True

在错误日志中显示更多信息。

15. freeze_panes=None

冷冻路径：整数元组（长度2），可选

指定要冻结的最底行和最右列。

16. storage_options: 'StorageOptions' = None

storage_options:dict，可选

对特定存储连接有意义的额外选项，例如。

主机、端口、用户名、密码等，如果使用将由`fsspec“解析的URL，例如，开始“s3://”、“gcs://”。如果使用非fsspec URL提供此参数，将引发错误。

请参阅fsspec和后端存储实现文档，以获取一组允许的键和值。

待续……

文章出处登录后可见！

已经登录？立即刷新

Python pandas库｜任凭弱水三千，我只取一瓢饮（7）

相关推荐